Introduction

Schizophrenia (SZ) is a complex and debilitating neuropsychiatric illness that has approximately 1% average lifetime prevalence globally, although rates vary regionally up to fivefold. 1- 3 The disease is characterized by “positive” symptoms, consisting of recurrent psychosis, hallucinations, and disorganized speech; “negative” symptoms, including anhedonia, social withdrawal, and flattened affect; and broad cognitive dysfunction. 1 Devastatingly, those diagnosed with SZ have a staggeringly high unemployment rate of 80% to 90% 4 , 5 and a life expectancy that is reduced by 10 to 20 years, with mortality often caused by suicide or cardiovascular conditions. 2 , 6 An incredibly isolating disease on a personal level, SZ also poses a great economic burden in terms of health and social care. 7

SZ is thought to result from an intricate choreography of genetic and environmental factors throughout early brain development that predispose an individual to the disease. Heritability estimates range up to 80%, underlining the importance of genetic contributions to the calculation of SZ disease risk. 7- 9 The field has benefited greatly from the advent of next-generation sequencing and large-scale genome-wide association studies (GWAS) that have identified a combination of rare and common variants that are associated with SZ risk, spurring on the investigation of potential etiological mechanisms. Evidence from both genetic and epidemiological studies point to a disease of impaired early, potentially prenatal, brain development. 10

A disease with heterogeneous clinical presentation, SZ’s non-Mendelian genetic risk architecture is equally heterogeneous and nuanced, with contributions from rare penetrant variants, such as copy number variants (ie, deletions and duplications of chromosomal regions) to common variants of low to modest effect size. 11 The most recent SZ genome-wide association study (GWAS), performing a meta-analysis of approximately 41 000 cases and 65 000 controls, identified 145 risk loci, composed of a collection of single nucleotide polymorphisms (SNPs) in linkage disequilibrium (LD), adding 50 more loci to those from an earlier study. 12 , 13 However, despite advances in mapping the complex genetic risk architecture of SZ, successful drug development has still been elusive due to lack of a unifying neuropathology and difficulties in identifying the likely casual genes and vulnerable cellular contexts. In this regard, neuroepigenetics—broadly defined as the study of chromatin configuration and function from development to adulthood in the nervous system—could provide an important perspective in prioritizing risk variants and their target genes, paving the way for novel diagnostic and therapeutic approaches. 14

The beginnings of epigenetic mechanisms in schizophrenia

Early work in SZ epigenetics included investigating DNA cytosine methylation of candidate gene promoters such as RELN (glycoprotein for neurodevelopment and migration), 15 , 16 GAD1 (glutamic acid decarboxylase 1), 17 COMT (catechyl-O-methyltransferase), 18 , 19 and SOX10 (developmental transcription factor) 19 , 20 to name a few examples. Specifically, the promoters of RELN and GAD1 , both involved in the γ-aminobutyric acid (GABA) pathway important for inhibitory neurotransmission, are hypermethylated in SZ, 15 - 17 which is consistent with reduced expression of the two genes in postmortem brain of SZ patients. 21 - 23 Likewise, the promoter of oligodendrocyte-specific transcription factor SOX10 is hypermethylated in patients with SZ, in concordance with its downregulation in the diseased brain as well. 20 On the other hand, the promoter of COMT , a dopaminergic pathway component, is hypomethylated in postmortem SZ brains. 19 While the early studies were illustrative in tying together potential epigenetic mechanisms and expression patterns for important neurobiological genes, they did not allow for scalable exploration of the whole genome and required knowledge of candidate genes to design the study. More recent studies using techniques that allow genome-wide interrogation, through high-throughput next-generation sequencing, are able to substantially increase sample sizes, finding many SZ-associated differentially methylated regions distributed throughout the genome in blood (689 cases, 645 controls) 24 and across four different brain regions (41 cases, 46 controls). 25 To dissect the association between methylation and SZ risk further, one study combined high-density methylation profiling with genome-wide SNP genotyping in 166 human fetal brain samples (56 to 166 days post-conception) to find >16 000 methylation quantitative trait loci (mQTLs), instances of DNA methylation that can be influenced by variation in the sequence. 25 Moreover, these fetal brain mQTLs, many of which are remarkably stable even in the adult brain, were significantly enriched amongst SZ risk loci, allowing for the homing in on discrete sites of methylation in the fetal brain that also harbor SZ risk variants. 25

Histone post-translational modifications (PTMs) are another layer of epigenetic control that contribute to higher-order chromatin regulation, such that combinatorial PTMs establish various chromatin states, including active or silenced transcription, regulatory sequences, etc. 26 Like in early methylation studies of SZ, histone PTMs were assayed in single-gene fashion. 27 For instance at the GAD1 locus, Huang et al discovered decreased histone 3 lysine 4 trimethylation (H3K4me3), associated with the transcriptional process, in the prefrontal cortex (PFC) of predominantly female patients with SZ when compared with controls at the 5’ end of the GAD1 gene, overlapping with a SZ risk locus. 28 A later study detected decreased H3K9K14 acetylation, also marking transcriptionally active chromatin, at several candidate genes, including GAD1 . 29 Eventually, the chromatin immunoprecipitation (ChIP) assay used in these two studies was refined and made scalable to high-throughput library generation in postmortem brain, taking on the form of ChIP with next-generation sequencing (ChIP-Seq) to query the whole genome, and not just at predetermined sites, in a cell type-specific fashion. 30 Only recently has this approach been applied to the study of psychiatric disorders, specifically SZ. A study of 157 open chromatin-associated histone profile (H3K4me3 and H3K27ac) reference maps from PFC and anterior cingulate cortex (ACC) of sorted neuronal (NeuN+) and non-neuronal (NeuN-) cell populations revealed differing landscapes at loci based on cell type and brain region. 31 Notably, the authors found a striking overrepresentation of risk variants for SZ that was highly specific to neuronal chromatin, as determined by partitioning heritability, highlighting the need to study epigenetic contributions to disease in a cell type-specific fashion. 31 This is especially important when considering the fact that around half of the noncoding regions could have regulatory functions specific to a given tissue, cell type, or developmental stage. 32 , 33 In the brain, this nuance is compounded by the enormous cellular heterogeneity and long developmental trajectories that occurs to a lesser degree in other tissues. 34

Given that the majority of SZ risk variants occur in these noncoding (ie, intronic or intergenic) regions of the genome, it is difficult to identify the associated causal gene, when considering that the variants could be in regulatory elements, such as enhancers or repressors, impacting the expression of distally located genes. 13, 35 Indeed, using heritability partitioning, the authors found that 16% of imputed SNPs overlapping DNAse hypersensitivity sites—indicators of open chromatin and regulatory regions—explained an average of 79% of the SNP heritability spanning 11 diseases, including SZ. 36 One approach is to leverage RNA-sequencing (RNA-seq) data with SNPs to identify genetic variants that are correlated with expression of genes, even if they are not immediately adjacent, called expression quantitative trait loci (eQTL). This is precisely what the CommonMind Consortium did, using expression data from dorsolateral PFC of people with SZ (N=258) and control subjects (N =279) 37 and imputed SNP genotypes. They found that, of the 108 loci previously reported, 13 20 of them (~20%) had eQTL SNP-gene pairs within 1 megabase (Mb) of linear genomic distance (ie, cis eQTLs) that might contribute to altered gene expression and SZ liability. 37 A recent study was able to expand the number of GWAS loci colocalized with eQTL signal from 20 to 40 by including conditional eQTLs (ie, not just one primary eQTL for each gene), beginning to consider epigenetic and cell states within which different SNPs might function. 38

Assay for transposase-accessible chromatin with sequencing (ATAC-seq) also offers glimpses into open chromatin regions (OCRs) along with footprints of DNA-binding factors, such as transcription factors. 39 One study of sorted neuronal and non-neuronal cells from postmortem PFC identified cell type-specific OCRs, with neuronal OCRs more enriched for distal regulatory elements and evolutionarily conserved regions, as well as a subset of transcription factors highly enriched in SZ loci specific to neurons, providing a potential functional role. 40 An expanded study assayed the same two cell types and 14 different brain regions in five individuals, reporting higher regional variability in neuronal chromatin and significant enrichment only for neuropsychiatric traits, including SZ, in cell- and region-specific fashion, 41 again highlighting the need for approaches that take into account the many nuances of the brain. A case-control study of chromatin accessibility (142 SZ, 143 control) only identified three differentially accessible regions, suggesting that differences could be subtle between diseases and healthy PFC, 42 although looking at cell type-specific maps may have improved the results. Taken together, such GWAS-guided ChIP-seq, eQTL, and ATAC-seq studies have begun to shed light on possible mechanisms in which SNPs located in noncoding portions of the human genome may exert a regulatory influence on gene expression, even distally, if they happen to coincide with functional elements such as enhancers or repressors.

Recently, the field has enjoyed a wave of “post-GWAS” analyses aimed at increasing the power to prioritize causal genes by performing gene-based, as opposed to single-variant-based, associations, with the underlying assumption that many genetic variants influence traits, like SZ, via transcriptional regulation. 43 By focusing on the genetic component of expression, environmental factors are excluded from consideration, thereby increasing statistical power. 44 One of these methods, called the transcriptome-wide association study (TWAS), 45 has revealed 157 significant genes, 35 of which did not overlap a known GWAS locus. 46 Of 157 genes, 42 genes were associated with specific chromatin phenotypes (ie, histone modification ChIP-seq peaks), suggesting putative regulatory mechanisms. Moreover, 105/157 TWAS-predicted risk genes overlapped with genes linked to index SNPs through chromatin interaction data from fetal brain. 46 The method also revealed that the TWAS genes were more significantly associated with the trait in question than the nearest gene and had stronger eQTL effects at the index SNP, 47 emphasizing the need to go beyond the linear genome. A similar method called PrediXcan 48 has been applied to the largest transcriptomic imputation study in SZ to date, identifying 67 genic associations across 13 brain regions, 19 of which were novel. 49 These genes were involved in new biological pathways such as hexaminidase-A and porphyrin metabolism, both of which have historic connections to SZ without much prior genetic evidence supporting them. 49 Recently, this technique has again confirmed that most trait associations are tissue-specific, underscoring the necessity of studying homogeneous populations of specific cell types. 50

Large-scale functional genomic and epigenetic studies

Looking ahead, the most promising approaches could involve integration of multiple orthogonal datasets taking into account cell type contexts, particularly with larger sample sizes, in order to triangulate more functionally implicated genetic variants associated with SZ ( Figure 1). Leveraging both RNA-seq and ATAC-seq data, one study identified 118 differentially transcribed enhancer RNAs (eRNAs), important for activity-dependent gene regulation in the human brain, in SZ compared with controls. The same study also found genetic variants that affect expression of approximately 1000 enhancers. 51 A recent multiomics study of tissue and single- cell RNA-seq, histone modification landscapes, CTCF transcription factor binding sites, DNA methylation, and genotypes (1230 samples from 48 brains; 18 288 single cells/nuclei from 12 brains) aimed to comprehensively chart the human neurodevelopmental trajectory, from the embryonic stage to adulthood, of gene expression programs and their regulation in various brain regions and cellular contexts. 52 With this integrative approach, the authors were able to identify a global transition in the transcriptomic profile during late fetal development where there was a sharp reduction in differences across brain regions coincident with increased signatures of expression of genes important for neuronal structure and activity. 52 Importantly, using partitioned linkage disequilibrium score regression analysis, they found that SZ SNP heritability was enriched in the dorsolateral PFC-specific regulatory elements as identified by H3K27ac peaks. With regard to cell type context, cortical excitatory neurons, embryonic/fetal progenitor cells, and adult cortical interneurons were enriched for expression of high-stringency genes associated with SZ, 52 echoing another single-cell study that found similar cellular patterns. 53 Analysis of gene coexpression modules revealed one particular module enriched for genes enriched in fetal and adult excitatory neurons associated with SZ, fetal enhancers, neuronal (as opposed to neural progenitor or glial) expression, and neuronal undermethylated sites. 52 This module, consisting of 145 total genes, included MEF2C and SATB2 , which have previously been associated with neurodevelopmental disorders in general and specifically SZ. 13 , 54 - 56

Figure 1.
Figure 1. Advancing genetic findings with functional epigenomic information and integration. Genetic analyses identify common variation associated with schizophrenia (SZ) risk using genome-wide association studies (GWAS), as indicated by the Manhattan plot (left). Most variants do not alter protein structure and may have diverse regulatory functions (eg, enhancer, repressor, regulator of splicing, etc). Quantitative mapping approaches (middle) can measure transcript abundance and splicing (RNA-seq); chromatin state through histone modifications (ChIP-seq), DNA methylation, identification of active/open chromatin (ATAC-seq, DNase-seq); and higher-order chromatin conformations (Hi-C, HiChIP, Capture-C). Integration of these functional datasets with the genetic architecture from GWAS can identify putative causal variants that impact any one (or more) of the epigenetic features listed above, providing a mechanism for potential gene dysregulation converging on important neural and developmental pathways.

Another large study (over 2000 postmortem brain samples from individuals with SZ, bipolar disorder, and autism spectrum disorder as well as controls) looking at the transcript isoform level across SZ, bipolar disorder, and autism spectrum disorders, uncovered disease-specific differential splicing and expression and that isoform-level changes, as opposed to gene-level, showed the largest effect sizes and greatest disease specificity. 57 Of note, approximately 700 noncoding RNAs (including 208 long intergenic noncoding RNAs, or lincRNAs), which are thought to have a transcriptional regulatory role, were differentially expressed in SZ, specifically in excitatory neurons. 57 A TWAS performed on these data identified a stringent list of 64 significant genes, consistently prioritized by multiple methods, including downregulated lysine methyltransferases SETD6 and SETD8 along with brain-enriched lincRNA LINC00634 , 57 highlighting transcriptional and epigenetic regulation as important molecular mechanisms to consider in SZ risk. Taken together, these two studies testify to the power gained by leveraging large brain datasets across multiple modalities in order to identify the contributions of cell type-specific pathways and their regulation over time. Such “four-dimensional (4D) mapping,” including spatial and temporal components, will be vital to distinguish between the effects of direct genetic insult and its resulting indirect cascade of molecular events, as well as to identify when specifically in development intervention would be most ameliorative.

Schizophrenia risk-associated genome in 3D

Even with the identification of transcriptomic patterns and gene coexpression modules or the annotation of noncoding variants in the context of SZ risk, led in large part by the generation of brain-specific functional genomic datasets by the PsychENCODE consortium, 58 there is still a part of the story that is missing. We cannot simply assume that noncoding SZ variants, which make up most of the genetic risk architecture, act on the genes that are closest to them on the linear genome 59 , 60 or are in linkage disequilibrium. 61 Therefore, it is necessary to consider the genome in three dimensions and map chromatin interactions that can bring distal regions into close physical proximity, in many cases for fine-tuning gene expression through regulatory elements (eg, promoter-enhancer interactions ( Figure 1 )). 62

The three dimensional conformation of chromatin and entire chromosomes is referred to as the “3D genome” or 3DG. Chromosome conformation capture methods combined with deep sequencing, such as in situ Hi-C, are used to study the 3DG. 63 Briefly, the technique involves crosslinking cells or tissue, isolating intact nuclei, digesting the chromatin within the nuclei with restriction enzymes, biotinylating cut ends, ligating fragments in close proximity, shearing the DNA, pulling down biotinylated chimeric fragments, and generating libraries for deep paired-end sequencing. 63 Variants of the technique include capture Hi-C to isolate fragments of interest with baits, 64 , 65 single cell Hi-C, 66 - 68 Hi-C paired with ChIP (HiChIP), 69 and low input Hi-C, 70 - 72 among others. This cadre of methods has enumerated various principles of 3DG organization (see Figure 2 for more details). Perhaps the biggest layer of the 3DG involves the clustering of euchromatic (A) and heterochromatic (B) sequences into regions of approximately ~5 megabases (Mb) called “compartments” 73 ( Figure 2 ), although recent reports say that they may be at a smaller scale than previously envisioned (15-300kb), often segregated by shared transcriptional states and histone modification landscapes. 63 , 74 , 75

Figure 2.
Figure 2. The genome in 3D. Chromatin, organized as arrays of the nucleosome (146 bp of DNA wrapped in 2.5 loops around the core histone octamer), is arranged into A (open, euchromatic) or B (closed, heterochromatic) compartments that can be megabases wide. Superimposed upon these structures are topologically associating domains (TADs), which are on average (median size) 185 kb, and can be within either A or B compartments. Sequences within TADs are much more likely to come into contact with each other than with loci from outside domains. TAD boundaries and chromatin loop formations (eg, promoter-enhancer loops) are often demarcated by CTCF (and additional proteins not shown here). Chromatin loops, or interactions, allow distal regulatory elements, like enhancers, to come into contact with gene promoters in order to regulate gene expression, often in cell type-specific fashion, with modulation of the RNA Polymerase II transcriptional machinery.

The next component of the 3DG is topologically associating domains (TADs), which are genomic sequences that preferentially interact with themselves as opposed to regions outside of their boundaries. 76, 77 They act as insulators, preventing inappropriate interactions between elements such as promoters and enhancers, which may lead to abnormal levels of gene expression. 76 , 78 Furthermore, TADs are thought to be more conserved across cell types, and even species, than are compartments. 77 , 79 - 81 However, their insulation capacity and interconnections with other regions have been shown to be subject to cell type-specific changes, at least during lineage specification when primed perhaps by transcription factor binding. 82 Additionally, TAD insulation is highly correlated with transcription, such that new borders are able to form at developmentally regulated gene promoters, suggesting that novel TAD landscapes may arise with certain contexts. 83

The spatial genome also includes chromatin loops or interactions, commonly defined as distinct pairwise contacts that, in Hi-C maps, sharply stand out from the surrounding “linear” genome background, often linking regulatory elements with distal gene promoters in a cell type-specific fashion. 63 The anchors of many chromatin loops (65% to 92%) often have CTCF, which in this case acts as an architectural scaffolding protein, in an inward/convergent orientation at binding sites 63 , 77 , 81 , 84 that, when experimentally inverted, could in some cases affect normal patterns of gene expression. 85

Because the majority of functional elements in noncoding portions of the human genome, such as enhancers and repressors, are not contacting the nearest TSS but instead are interacting with genes located elsewhere on the chromosome, 86 it is unsurprising therefore that non-3DG based approaches relying on purely linear genomic distance, had only limited success in assigning specific target genes to risk loci. 34 The inability to catalog reliably gene targets of SZ risk variants understandably hindered the downstream efforts to identify networks contributory to disease etiology. eQTLs, as discussed earlier, began to consider the impacts of common variants on distal genes; however, these are still mostly restricted to SNP-gene associations within 1 Mb and are computationally derived without using actual measurements of the 3DG space. 37 , 87 In a pioneering study, Won and colleagues generated Hi-C chromosomal conformations from fetal brain and mapped gene targets for schizophrenia GWAS noncoding variants; many of these genes were involved in disease-relevant pathways such as neurogenesis or cholinergic signaling. 60 Another Hi-C study generating 3DG maps from fetal and adult human cortex to explore GWAS loci for multiple neuropsychiatric disorders and traits found that chromatin interactions explained ~73% of genes implicated by eQTLs (2,292/3,121); strikingly, there are 4101 genes discovered only through Hi-C that are otherwise unaccounted for when considering only location or eQTLs. 88 For SZ specifically, the authors found that genes implicated by location were enriched for neuronal and synaptic processes, which are certainly important in disease etiology, whereas as those genes implicated by functional chromatin interaction data were enriched for regulatory chromatin biology, perhaps pointing to more upstream dysregulation. 88 While such datasets have advanced our understanding of the genetic risk architecture of psychiatric disease, 10 , 60 3DG mapping from postmortem tissue lacks cell type-specific resolution and may not capture higher-order chromatin structures sensitive to the autolytic process. 89

Two recent studies addressed some of these technical shortcomings. First, Hi-C mapping of human induced pluripotent stem cell (hiPSC)-derived neural progenitor cells (NPCs) and their isogenically differentiated excitatory neurons and glia increased the number of transcribed genes associated with SZ GWAS loci by approximately 2- to 3-fold (total N expressed genes connected to or located within an SZ GWAS locus ranged from 201-386, depending on cell type). 90 Since neurons, together with NPCs, had the greatest number of cell type-specific interactions anchored in a risk locus (as compared with glia), one could conclude that the 3DG space corresponding to SZ risk disproportionately affects neurons, 90 echoing other studies that report a neuronal burden in the SZ genetic architecture. 31 , 52 , 53 , 57 Remarkably, the SZ-associated chromosomal connectome (ie, GWAS loci and their connected genes) specific to NPC or excitatory neurons was associated with coordinated gene expression “clusters” and protein-protein interactions, where one cluster strongly enriched for regulators of neuronal connectivity and synaptic plasticity, and another cluster for chromatin-associated proteins, including transcriptional regulators, 90 similar to the results of Giusti-Rodriguez et al. 88 Many interactions of interest between noncoding SZ variants and neurally relevant genes, such as pro-neuronal transcription factor ASCL1 or members of the clustered Protocadherin family, were functionally validated in NPCs with CRISPR epigenomic and genomic editing. 91 Likewise, an expanded genome space involving higher-order chromatin interactions anchored in schizophrenia risk loci, prioritized with orthogonal CTCF and histone modification ChIP-seq datasets, has also been described for cultured primary sensory neurons from the olfactory neuroepithelium, also pointing to neurogenesis as an important gene set. 92

Furthermore, a massive multiomics study by the PsychENCODE consortium analyzed the open chromatin landscape, transcriptional histone marks, and the transcriptome from altogether 2000 postmortem brains, including hundreds of cases diagnosed with schizophrenia, and combined these “linear genome” profiles with Hi-C data from fetal and adult reference brains. 93 The study mapped ~79 000 brain-active enhancers with their associated chromosomal contacts and TAD landscapes and identified a vast number of eQTLs and gene regulatory networks; perhaps most importantly, the investigators applied deep machine learning algorithms that, for the first time, were able to predict presence or absence of disease (ie, SZ) based on a subject’s brain transcriptome and chromatin profiles. 93 The study approached disease prediction at a probability level of ~75%, reflecting a significant advancement over more conventional genomic approaches predicting disease only marginally above chance (50%). 93 Very recently, another study presented an integrative risk gene selector (iRIGS), a computational framework to integrate multiomics data, that predicted high-confidence risk genes that account for significantly enriched heritability, are expressed in brain tissues (especially prenatal), and are enriched for targets of already approved drugs, providing new opportunities to repurpose existing drugs for SZ. 94

Conclusions

The field of SZ epigenetics has experienced a recent surge of discovery, in part facilitated by innovation in the technical capacities of epigenomic architecture mapping and high-throughput sequencing. As a field, we have been able to progress from single candidate gene studies to probing the entire human genome, both the protein-coding and the hitherto enigmatic noncoding regions, in unbiased fashion across numerous modalities and cell types. However, the polygenic nature of SZ with its nuanced network of common variant influence requires more investigation to decipher, through both large-scale consortium-led efforts as well as incisive functional and mechanistic studies on discrete, contributory pathways. The many studies enumerated here in the new wave of SZ epigenetics research utilized high-resolution genomic and epigenomic datasets from relevant tissue and cell types, taking into account the 3DG space and leveraging orthogonal approaches, in order to arrive at higher confidence gene sets and predictive capacity. In this vein, the polygenic risk score (PRS), a metric that summarizes the inherited common variant burden in an individual, is interesting from a clinical precision-medicine diagnostic perspective. While being in the top 10% of PRSs carries a greater than 10-fold increased risk of SZ, 13 there is still a substantial overlap in the distribution of PRS between cases and controls, such that many controls are in the top decile and many cases in the lowest. 34 Deep-learning algorithms such as those employed by Wang et al pave the way forward, combining different slices of information to construct a more holistic functional genomic picture that is missed when considering simply the genetic information, thereby enhancing disease prediction. Another benefit of compiling and integrating many brain datasets is the identification of high-fidelity risk variant-gene interactions with regulatory epigenomic landscapes. What genes are revealed as important in multiple approaches? Do they converge on pathways with pathogenic potential? Are there druggable targets for major regulators of these pathways or hubs? Questions like these can help focus the expeditions ahead as we try to understand the genetic, functional, and cellular/network architecture of SZ culminating in disease.