Introduction

Unlike other areas of medicine, psychiatry lacks diagnostic criteria based on validated biomarkers: there is no blood test for depression; there is no brain scan for psychosis. Prognosis is equally if not more important, yet is often harder still. This knowledge gap hinders efforts to target at-risk individuals for early interventions that could attenuate or prevent illness, or, once an illness manifests, to predict its trajectory and determine the optimal treatment strategy.

The advent of functional neuroimaging—especially functional magnetic resonance imaging (fMRI), which noninvasively measures blood oxygenation level as a proxy for neural activity in awake, behaving humans—seemed promising as a means to fill this gap. Yet after more than 2 decades of research, this promise remains unfulfilled; fMRI has had essentially no impact on day-to-day decisions in the psychiatry clinic.

However, recent advances have reignited the hope of developing useful fMRI-based tools for psychiatry.

Here, we review some of these advances and urge a move away from traditional patient-control contrasts toward analyses of individual subjects, with the ultimate goal of discovering brain-based biomarkers of mental illness. This article is organized into four sections: first, we outline the technique of functional connectivity (FC) and discuss state- versus trait-related variance in the FC signal; second, we summarize recent evidence for intrinsic FC profiles that are both reliable within individuals and unique across individuals; third, we demonstrate how these FC profiles predict behavioral phenotypes in individual subjects; and finally, we discuss the implications of this work for personalized approaches to psychiatric illness.

Functional connectivity: the method

Traditional fMRI studies rely on evoked-activity paradigms, in which experimenters give subjects a task to perform while in the scanner and identify brain regions whose activity fluctuates in a time-locked manner with respect to the task events. However, task-related fluctuations in the blood-oxygen-level-dependent (BOLD) signal are usually small relative to the baseline against which they are measured: in cognitive tasks, the change can be less than 2%. To get enough statistical power, researchers typically average many trials of the same task from many different subjects. Thus, most task paradigms, by their very nature, detect state-related brain activity that is consistent both within and across individuals. Although numerous studies have compared task-evoked activity between healthy controls and patients with various psychiatric illnesses, the differences between groups are almost always quantitative rather than qualitative: in other words, population means are significantly different, but with a good deal of overlap between the two distributions. This overlap effectively precludes using a measurement from a given individual as an indicator of diagnostic status.

Considering this, it is not surprising that evoked-activity paradigms have not lead to reliable biomarkers for psychiatric illness. A second type of fMRI paradigm that has exploded in popularity- in recent years is FC. Rather than measure magnitude of activity in single brain regions, FC measures the synchrony of activity across two or more regions. One way to characterize FC on a whole-brain level is to divide the brain into a set of nodes and calculate the Pearson correlation between the activity timecourses of each pair of nodes, producing a connectivity matrix; a schematic of this approach is provided in Figure 1 More comprehensive primers on FC methods are available elsewhere.1,2

Figure 1.
Figure 1. Schematic of functional connectivity analysis. (A) 268-Node functional brain atlas covering cortical, subcortical, and cerebellar structures. This atlas was defined using a groupwise clustering algorithm on resting-state data from healthy adults.26 The algorithm groups voxels into nodes with maximally coherent timecourses. (B) An example of two blood-oxygen level dependent (BOLD) signal timecourses from a pair of nodes “ i ” (red) and “ j ” (green). The similarity of these two signals is measured using Pearson correlation (r); a high correlation coefficient implies a strong functional connection. (C) Correlating the timecourses of all possible pairs of nodes produces a symmetric 268 x 268 connectivity matrix. Connectivity matrices can be calculated using data from a single subject and a single scan session, such that each individual has a unique matrix associated with a particular scan condition. A 268-node atlas produces a matrix with 35 778 unique elements; this set of correlation strengths is what is referred to here as a “ functional connectivity (FC) profile. ”

FC analyses have several advantages over evoked-activity paradigms: rather than studying small task-related magnitude changes occurring on top of ongoing fluctuations, FC studies treat this baseline, considered “ noise ” in evoked-activity paradigms, as the signal of interest. This results in an increased signal-to-noise ratio3 across subjects, but crucially, within subjects as well. In contrast to many evoked-activity paradigms, FC analyses can be performed on data from single subjects with reasonable statistical rigor.

Another advantage is that FC can be measured either during task performance or while subjects are simply at rest, not performing any explicit task. For patients, then, these “resting-state” acquisitions are identical to a clinical anatomic magnetic resonance scan. Measurement at rest is free of confounds associated with task performance, more practical for certain populations such as the very young or very old, more amenable to longitudinal designs (since practice effects are minimized), and easier to standardize across sites to facilitate data sharing. FC measured at rest also demonstrates good test-retest reliability.4,5 For these reasons, among others, resting-state FC is a popular approach for investigating how brain activity is disrupted in psychiatric illness.

There is now a wealth of literature reporting FC differences between healthy controls and patients with various psychiatric illnesses, including—but not limited to—autism,6 schizophrenia,7 depression,8,9 bipolar disorder,10 attention-deficit/hyperactivity disorder (ADHD),11 addiction,12 anxiety disorders,13 and others (citations refer to review papers or meta-analyses and are by no means exhaustive). However, this breadth of work has sometimes produced findings that are difficult to replicate, inconsistent, or, at worst, contradictory.14 What is more, to date, none of this basic research has resulted in practical FC-based clinical tools.

One major limitation of these studies at least partially accounts for both the inconsistencies and the failure to translate findings into practical tools: simplycontrasting population means between patients and controls ignores the considerable heterogeneity in neural and behavioral phenotypes within each group. Here, we make the case for replacing group contrasts with individual FC profiles in the search for biomarkers that could ultimately guide personalized approaches to psychiatric illness.

Functional connectivity: state or trait?

Despite the fact that FC—especially when measured at rest—is touted as a reflection of “intrinsic” brain organization, there is no doubt that FC signals contain state-related information. Sometimes this state-related information is of interest, such as in studies examining how different tasks modulate connectivity both at the group level15 and within individuals.16 But connectivity also contains information about states that may or may not be interesting from a cognitive perspective, such as mood,17 arousal,18,19 or how much caffeine a subject has consumed that day.20

What does this mean for discovering biomarkers in resting-state fMRI? Despite its advantages, rest is a task in and of itself—just an ill-defined one.21,22 Nearly all resting-state studies implicitly assume that mental state at the time of scan varies randomly across a sample, and thus does not pose a systematic confound. However, particularly in psychiatry, this may be a risky assumption. Although it is generally supposed that rest involves mind-wandering or introspective processes, the MRI scanner is not a neutral environment, and subjects' reactions to this environment could in theory produce systematic differences in brain activity between patients and controls. For example, certain stimuli—such as loud noises and feelings of claustrophobia—could be more salient to particular types of patients and thus command more of their attention during resting-state scans. (As an aside, it has been noted that in MRI studies of healthy subjects given ketamine, some subjects experience auditory hallucinations, which are notably absent from the typical ketamine-induced symptoms outside the scanner; this may be due to the altered perceptual environment of the scanner23). Thus, resting-state studies contrasting patients and controls cannot rule out the possibility that the observed connectivity differences between groups are due to state rather than trait variables.

State differences are interesting from a cognitive psychology perspective and crucial for understanding symptoms that fluctuate in presence and intensity within an individual, such as auditory hallucinations.24 However, trait variables that reflect endophenotypes and other, potentially causal, factors in pathophysiology are more likely to serve as useful biomarkers of disease. So one key question is, how much of the variance in FC is accounted for by state-related variables, as opposed to more stable interindividual differences? Is there a reliable, trait-level signature to be found in individual connectivity profiles?

The individual functional connectivity “fingerprint”

To address this question, our group investigated the reliability of individual differences in FC across different cognitive states. Briefly, we demonstrated that FC profiles could identify individuals from a large group, regardless of the task conditions (ie, cognitive state) in which the data were acquired.

We used data from 126 healthy adults obtained by the Human Connectome Project (HCP).25 Each subject was scanned over a period of 2 days. We included data from six sessions: two resting-state sessions (one on each day) as well as four task sessions involving distinct cognitive systems (working memory, emotion, motor, and language). For each subject for each session, we computed an FC matrix consisting of the pairwise correlations between each pair of nodes in a 268-node, functionally defined whole-brain atlas.26 Each matrix contains roughly 35 000 unique values representing the strength of the functional connection, or “edge,” between two specific nodes (Figure 1). Thus, the full data set contained six sets of 126 FC profiles.

The identification analysis was performed as follows. First, we selected one of the six sessions to serve as the “target” session and a second to serve as the “database” session. The database and target session were always acquired on different days in order to minimize confounds associated with scan session, such as arousal or satiety levels. Next, in an iterative analysis, we selected one matrix from the target set and compared it with each of the database matrices in turn to find the one that was maximally similar. Similarity was defined as the Pearson correlation between the edge values in the target matrix and the edge values from each of the database matrices. The predicted identity was the subject in the database whose matrix had the highest correlation coefficient with the target. We then selected a second matrix from the target set and repeated the above steps. After obtaining predicted identities for each matrix in the target set, the true identities were decoded and an overall accuracy was computed (expressed as number of correctly predicted identities over the total number of subjects). Finally, the roles of database and target session were reversed. There were nine total pairs of database-target configurations, representing various combinations of rest-rest, rest-task, and task-task pairs.

In a first-pass analysis using the whole-brain connectivity matrix, we achieved an average identification accuracy of 93% between the pair of resting-state scans (a highly significant result, as chance is approximately 0.8%). For rest-task and task-task comparisons, accuracy ranged from 54% to 87% (Figure 2); this drop relative to the rest-rest pair is unsurprising because tasks impose the same external stimuli upon all subjects, presumably evoking similar time-locked activity and blurring some of the subject-specific spontaneous activity. Still, good identification accuracy even across rest-task and task-task comparisons indicates that a high ratio of inter- to intraindividual variability is preserved regardless of cognitive state. We also performed several control analyses to prove that identification power came from true differences in FC above and beyond idiosyncrasies in anatomy, head motion, or other confounding variables.27

Figure 2.
Figure 2. Identification accuracies across pairs of rest and task conditions. Color-coded matrix displaying identification accuracy between all 18 possible database-target pairs of rest and task sessions, expressed as the fraction of correctly predicted identities (number of successful trials out of a total of n=1 26 subjects). While identification was most successful in the rest-rest condition pair, accuracy remained quite high even across changes in cognitive state induced by different task demands. Note that chance in all cases is approximately 0.8. Em, emotion; ID, identification; Lg, language; Mt, motor; R1, first rest session (day 1); R2, second rest session (day 2); WM, working memory. Adapted from results described in reference 27.

In follow-up analyses, we found that specific networks comprised of nodes in the frontal, parietal, and temporal association cortices were the most discriminative: in fact, restricting identification to these features resulted in even higher accuracy (up to 99%) than the whole -brain connectivity matrix. Since much individual variation in both structure and function occurs at the level of these high-order association cortices,28,29 this result is consistent with what we might have predicted a priori. This was our first clue that individual differences in connectivity may relate in meaningful ways to individual differences in cognitive phenotypes.

The implication of this result is that the majority of the variance in FC is accounted for by who you are and not what you are doing while being scanned. That individuals generally look most similar to themselves, regardless of how the brain is engaged during imaging, should at least partially allay concerns about the unconstrained nature of rest and whether rest represents a fundamentally different state for healthy versus psychiatric populations.

A note on task versus rest and effect of mental state

The above is not to say that there are not important differences between rest and task. In fact, the combination of rest and task-based connectivity may be more powerful than either on its own for characterizing interindividual differences. Indeed, we found that when the database was expanded to include two entries per subject—one resting-state matrix and one task -based matrix—identification was more successful than a single-entry database consisting of either rest or task alone,27 reaching 100% accuracy in some cases. Other studies have found that interindividual FC differences are shaped to some extent by the cognitive state in which these differences are measured.30 In some cases, introducing a task manipulation may enhance interindividual variability in connections of interest; we may be able to exploit this to increase sensitivity in the development of biomarkers.31 For an analogy from another field of medicine, think of a glucose tolerance test as a screen for diabetes: administering a glucose challenge under controlled laboratory settings and monitoring the resulting blood glucose levels can often identify abnormalities even before the fasting blood glucose level becomes abnormal.

Rather than suggest that task effects are irrelevant to FC, this result should put such effects into perspective: tasks seem to induce interesting but ultimately small modulations atop a large bedrock of variance accounted for by the intrinsic FC signature of a given individual. Ultimately, this result is promising for the eventual use of FC-fMRI in personalized approaches to psychiatric illness.

Outstanding questions

Despite the impressive identification power of FC profiles in healthy subjects over a period of days, there are a number of outstanding questions about the reliability of FC profiles under expanded circumstances. For example, how stable are FC profiles over longer time frames—ie, weeks, months, or years? When do they emerge in the course of development? How do they change with normal processes, such as aging, or pathological ones, such as illness onset and trajectory? Answering these questions will require large longitudinal data sets, which are challenging to acquire but represent important next steps in this line of work.

Relating functional connectivity to behavior

Establishing that individuals have unique patterns of FC is important, but to use this result as a springboard in the search for useful biomarkers, these individual differences in FC must be relevant to individual differences in behavior. To explore this, we tested whether FC profiles could be used to predict levels of fluid intelligence (Gf), which is the general ability to think abstractly, discern patterns, and solve new problems independent of learned knowledge.32 This trait is of particular interest because Gf levels vary widely even in the healthy population, and individual differences in Gf are generally stable in rank order over the lifespan,33 suggesting a strong intrinsic component. Furthermore, Gf is significantly correlated with other indicators of cognitive ability34 and quality of life, including health outcomes.35,36

Gf scores from an out-of-scanner behavioral assessment were available for all HCP participants whose data were used in the identification experiments described above. In a cross-validated analysis, we showed that a model based on FC features—ie, strength of specific functional connections—could predict Gf score in a previously unseen individual solely on the basis of his or her FC profile,27

In a second paper,37 we showed that FC profiles also predict individual differences in the ability to sustain attention. Using a set of data from 25 healthy adults scanned as they were doing an attention- taxing continuous performance task, we built a model to predict task performance from either task-based or resting-state FC. After validating the model's performance on unseen subjects within this original data set, we extended the model to an independent data set consisting of resting-state data from children and adolescents scanned in China, provided along with ratings of ADHD symptom severity as part of the ADHD-200 project.38 Subjects for whom the model predicted better hypothetical performance on our sustained attention task tended to have fewer symptoms of attention dysfunction, regardless of whether they had received an official ADHD diagnosis; conversely, those for whom the model predicted worse performance showed more severe attention-deficit symptoms. This indicates that an FC-based model captures variance in attentional abilities that spans populations (healthy adults in New Haven versus children and adolescents in China) and even specific behavioral measures of attention (performance on a sustained attention task versus clinician-rated ADHD symptom scores). The impressive generalizability of this particular model is an encouraging example in the search for reliable FC-based biomarkers.

The case for dimensional approaches

One important feature of the two analyses described above is that the phenotype of interest was a continuous variable—performance on a cognitive task—rather than a discrete one, ie, diagnosis. Traditional studies contrasting patients and controls suffer from at least two fundamental limitations. First, experimenters often try to maximize homogeneity within each group via careful recruiting practices; although this increases the likelihood of finding a statistically significant difference, it limits the real-world applicability of the results.39 As Kapur et al point out, “clinically, one is rarely taxed with distinguishing a textbook patient from a perfectly healthy individual.”14 Rather, the situations in which psychiatric biomarkers would be most useful are those in which clinicians need to make nuanced distinctions between individuals that appear superficially similar in their clinical presentation. Second, psychiatric diagnosis is a perennially contentious issue, and psychiatry has always lagged behind other fields of medicine in achieving diagnostic consensus based on objective criteria. This lack of a gold standard creates a chicken-and-egg problem for developing brain-based biomarkers in psychiatry40

Taking a dimensional, rather than categorical, approach to studying brain-behavior relationships can at least partially overcome these limitations (Figure 3). This framework is especially appealing in psychiatry, given that for many experiences and behaviors traditionally considered indicators of psychiatric illness, there is no clear dividing line from normal, “healthy” experiences.41 Of note, conceptualizing neural and behavioral phenotypes as a continuum rather than a dichotomy is at the heart of the National Institute of Mental Health's Research Domain Criteria (RDoC) framework, which eschews traditional diagnoses in favor of understanding the full range of mental functioning.42

Figure 3.
Figure 3. Group contrasts versus dimensional approaches. (A) An example of a traditional contrast in an observed brain measurement (eg, strength of a functional connection or network) between patients and controls (n=20 in each group). The difference between group means is significant at α<0.05 according to a two-tailed t-test, but individual data points are highly overlapping. If a new subject is brought in (red circle) with a known brain measurement, this overlap makes it difficult to predict diagnostic status. (B) An example of a dimensional approach, in which a phenotype is objectively measured in subjects both with and without a diagnosis and all subjects are placed on the same axis, revealing a clear association between the brain measurement and phenotypic measurement. The phenotypic variable could be performance on a task, score on self-report or clinician-rated scale, future illness status, response to an intervention, or any other continuous measurement. In contrast to (A), if a new subject is brought in with a known brain measurement (red circle), it is straightforward to generate a phenotype prediction for this subject using the regression model built on the original data set.

Our results mentioned above, that the same sustained attention network model predicted severity of attention-deficit symptoms in individuals both with and without a diagnosis of ADHD, support this characterization, suggesting that the same connections that go awry in ADHD are disrupted to a lesser degree in those with subclinical attention problems.37 As another example, consider paranoid delusions, which are a hallmark symptom of schizophrenia and other psychotic illness. Among the general population, up to 30% of people report experiencing certain types of paranoid thoughts (eg, “I need to be on my guard against others”) on a regular basis,43 and the degree of paranoia in the population (as measured by number of items endorsed on a paranoia questionnaire) follows an exponential, rather than bimodal, distribution.44 At the neural level, empirical evidence supports trait-level paranoia as a continuum between normality and pathology: neuroimaging studies of subclinical populations have identified patterns of brain activity that vary parametrically with tendency toward paranoid or delusional ideation.45-48 Other dimensional traits that are relevant to psychiatric illness include impulsivity as an index of risk for addiction,49 or rumination for depression.50

Using scales or behaviors appropriate for both clinical and subclinical populations will help place all subjects on the same axis, facilitating analyses that cut across diagnostic categories. In the case of paranoid delusions, for example, rather than classic schizophrenia scales such as the Positive and Negative Syndrome Scale (PNASS),51 experimenters may consider using scales such as the Peters et al Delusions Inventory (PDI),52 which measures general tendency toward delusional ideation on the basis of items that can be meaningfully answered by patients and controls alike (eg, “Do you ever feel as if people are reading your mind?”; “Do you ever feel as if you have been chosen by God in some way?”).

Note that this dimensional approach is not at odds with the possibility that a binary marker of pathology exists. Rather, it is a framework with which to approach the study of individual differences in both health and disease that can afford improved sensitivity and predictive power.42 In fact, understanding the full continuum of mental experience may ultimately help us revisit decisions that are necessarily dichotomous—such as whether or not to treat a given individual—with a more informed cutoff point, or “gold standard,” based at least partially on neuroimaging biomarkers.

Applications to psychiatry: toward a personalized approach

Establishing that FC profiles are both reliable within subjects and unique across subjects, and that features of these profiles relate to behavioral phenotypes, provides a foundation for exploring their potential in personalized approaches to mental illness. Where should we focus our efforts? In this final section, we review potential targets of FC-based prediction that could eventually lead to real-world clinical tools for psychiatry.

Disease status

To date, the majority of fMRI-based psychiatric “prediction” studies have focused on classifying disease status at the time of scan; these are reviewed extensively elsewhere.53 Briefly, these studies use FC-derived features from individual subjects as input to machine-learning algorithms that are trained on a subset of the data and applied to either a held-out sample within the same data set, or occasionally, a separate replication data set, to decode disease status in unseen subjects. Successful classification—with accuracies often in the 70% to 100% range—has been reported in several psychiatric illnesses, including depression,54,55 schizophrenia,31,56 ADHD,57,58 and autism.59,60

While these studies are an important proof of concept, the reported statistics for sensitivity and specificity often exaggerate a study's translational utility, since the data sets usually contain similar numbers of patients and controls, and analyses do not take into account illness prevalence in the real world. For illnesses that are relatively rare on a population level, positive and negative predictive values may be substantially lower than what these high accuracies suggest.53 More fundamentally, classification of current disease state is not true prediction, because the diagnosis is always contemporaneous with the data acquisition. These paradigms therefore suffer from an inevitable circularity, with the fMRI simplyserving as a noisier measure of current diagnostic status.

Risk for illness/illness trajectory

Another potentially more fruitful approach is to collect data before the onset of illness, follow subjects longitudinally, then retrospectively assess baseline differences between those who go on to develop illness and those who do not. This approach is more logistically challenging, because it requires recruiting a large initial cohort under the assumption that only a subset—often a small minority—will go on to become ill. However, these prospective studies are critical to discovering biomarkers with true predictive power.

Several large-scale initiatives of this nature are underway. One example is the North American Prodromal Longitudinal Study (NAPLS),61 which is a US-based multisite study recruiting individuals in the prodromal (earliest) phase of psychosis, as well as those otherwise determined to be at clinical high risk for schizophrenia, along with demographically matched healthy controls. The study involves both resting-state and task-based fMRI, as well as a suite of clinical and behavioral measures, and participants have now been followed up for several years. Initial reports have found some baseline FC differences associated with risk of conversion from the prodrome to full-blown psychosis62; whether these results generalize to previously unseen prodromal individuals remains to be seen.

Another such initiative is IMAGEN (not an acronym), a Europe-based consortium collecting neuroimaging along with genetic, behavioral, and neuropsychological data from a large group of 14-year-old adolescents, with plans for a longitudinal follow-up at 16 years.63 The project is primarily designed to study neural correlates of risk-taking and reinforcement behavior, with an eye toward uncovering risk factors for addiction, as well as other psychiatric illnesses, such as affective and anxiety disorders.

Understanding risk factors and early symptom trajectories could help develop interventions that prevent or slow illness onset and target them to the individuals who would benefit most. The hope is that this early identification and treatment may ultimately reduce the lifetime burden of mental illness for some patients.

Prognosis and response to intervention

Given mounting consensus that traditional psychiatric diagnoses do not reflect valid natural boundaries,64 many contend that we should bypass—or at least substantially de-emphasize—the diagnostic step and skip straight to prognosis based on a patient's current biological, social, and clinical profile.65 Until our understanding of psychopathology permits a restructuring of the diagnostic system based on validated biological factors, one can argue that there is little reason to diagnose for its own sake when the variables of practical concern are course of illness and response to potential interventions, both variables that have classically resisted neat associations with existing diagnostic labels.66,67

This shift in focus affects how we might approach the search for FC-based biomarkers, at least in the short to medium term. Longitudinal studies such as the ones mentioned above may help answer questions about course of illness. However, it is important to remember that for purposes of selecting effective treatments, the relevant interindividual differences may not be in brain networks directly affected by the illness, but rather those networks whose plasticity (or lack thereof) underlies the success or failure of an intervention.68 Given this perspective, searching for biomarkers in baseline FC—meaning FC measured before the intervention begins—could prove useful.

Existing reports give reason to be optimistic about this approach. For example, it has been reported that baseline connectivity between hippocampus and other cortical and subcortical regions predicts response to math tutoring in children,69 and baseline connectivity between the orbitofrontal cortex and the rest of the brain predicts response to neurofeedback for treatment of symptoms of obsessive-compulsive disorder.70 These small studies require replication before firm conclusions can be drawn, and we must tread carefully given the ethical implications of stratifying patients in a health care system constrained by limited resources. Still, any additional information to help guide treatment choices will benefit psychiatric practice, which currently relies largely on trial and error.

Conclusion

The discovery that individual FC profiles are both unique and reliable compels a move away from group-level contrasts between classic diagnostic groups in favor of studies that leverage the considerable heterogeneity in both groups to study individual differences, with an eye toward developing biomarkers that will be useful at the single-subject level. Given large enough, well-characterized databases with longitudinal information, one might imagine a future in which a new patient presents and their FC profile, along with other demographic, behavioral, and clinical variables, is compared with similar existing profiles to help predict the likelihood of various health outcomes and guide treatment decisions. Establishing the validity of individual FC profiles, as well as their meaningful relationship to behavior, provides a crucial foundation for future studies to continue exploring the potential of FC-based tools in personalized medicine.