Introduction

Over the past 50 years, clinical psychiatry has been revolutionized by the development of a broad range of psychopharmaceutical agents that can effectively ameliorate symptoms of many major mental illnesses, including schizophrenia, mood disorders, and anxiety disorders. Despite these advances, no curative treatments have yet been developed, so patients must generally take medications for a long period of time—often years—creating a risk for long-term adverse events.1 Although most currently used psychopharmaceutical agents are safe and well-tolerated, side effects can be problematic even with acute use, leading to reduced treatment adherence and poorer outcomes.2 Moreover, response rates for most psychopharmaceutical agents are not optimal, and it can take months for clinicians to find the best medication or combination treatment for individual patients. Even less is known about predictors of response to vital nonpharmacological treatments such as psychotherapy.

Improved prediction of treatment response could have many benefits for patients and reduce the health care costs that accrue from the use of inappropriate or suboptimal treatments. This is a key goal of the personalized medicine movement.3 However, prediction is a challenging endeavor, since many different variables can affect individual treatment outcomes. Among these, diagnosis, comorbidity, and treatment adherence are clearly important. In recent years, considerable effort has gone into developing biomarkers of treatment response that might supplement clinical factors. Genetic markers have received the most attention thanks to the exciting advances in pharmacogenomic research.4

Pharmacogenomics aims to reduce acute and longterm adverse events and optimize response rates by using genetic information to match medications to individual patients.5 Pharmacogenomic research has advanced greatly in recent years, owing to the wealth of data and new technologies that have arisen from the Human Genome Project and related initiatives. However, firm genetic associations have proven to be the exception rather than the rule, perhaps owing to limited sample sizes. A few robust research findings have emerged, but the translation of these findings into better clinical care remains challenging. Other biomarkers are being sought by a variety of methods, including neuroimaging, proteomics, and metabolomics. These areas are still relatively unexplored in psychiatry as sources of information that could inform prediction of treatment outcomes. New advances are expected in the near future that could alter the impact of these kinds of biomarkers in psychiatry.

In this review, the current state of treatment prediction research in psychiatry will be reviewed, with the aim of summarizing progress and highlighting areas where research findings have the best potential for near-term translation to clinical practice. This is not a systemic overview of the literature, but rather an attempt to provide an expert review that summarizes key findings for a nonspecialist audience.

We will begin with an overview of the science of prediction, highlighting some recent developments in multivariate and actuarial methods of relevance to clinical medicine. Next, we will discuss some of the typical end points that clinicians seek to predict, along with optimal study designs. Next we address the ways in which treatment outcomes can be measured in psychiatry, where improvement can seem subjective and difficult to opera tionalize. After a review of some well-studied predictive factors, including diagnosis, comorbidity, and genetic and other biomarkers, we will conclude with a view to future directions for research and practice.

Prediction science

Prediction of treatment outcomes is a specific case of the broader and very vibrant field of prediction science.6 Fundamentally, a prediction is a statement about the way things will be in the future. Good predictions are based on experience and data, but these can be noisy sources of information. The best predictions arise from statistical methods that fit precise data to valid models describing how the relevant variables contribute to the outcome. Experience tells us that an apple dropped from a tree will fall to the ground; Newtonian mechanics offer a model that makes use of the precise dropping height, gravitational constant, and mass to predict the force with which the apple will impact the ground.

This example highlights several challenges that make the prediction of treatment outcomes more complex than the prediction of an apple's descent: What are the relevant variables? How can they be measured with precision? What is the most valid model that fits the data? The impact of stochastic (chance-wise) variation must also be considered.7 Stochastic variation makes it more difficult to predict outcome in a specific individual than it is to predict average outcomes of a group.

Consider the case of life insurance. Life insurance works since insurers use large sets of actuarial data to predict, with considerable accuracy, how long groups of individuals with particular lifestyles, health histories, and demographic characteristics will live and to set premium rates accordingly. In theory, similar principles apply to prediction in medicine, but amassing the required data in sufficient sample sizes can be challenging. Hie advent of electronic medical records may improve this situation,8 especially in societies where medical care is organized in such a way that all outcomes can be tracked over time without significant attrition. On the other hand, prediction algorithms based on large samples do not help much in predicting an individual outcome, which is most important from the perspective of an individual patient. Insurance companies cannot predict exactly who will survive and who will not, only group outcomes. Similarly, standard medical approaches to predicting mortality generally perform modestly, at best.9

In the best of worlds, predictions for a specific individual could be based on the right combination of clinical findings, biomarker measures, and genetic information. These data could be used to place a given patient in a subclass with well-studied outcomes. Breast cancer offers a good example, where we can already tailor treatment and make a good prognosis on the basis of family history, clinical staging, and expression of estrogen receptors by tumor cells.10 Of course, the brain is a more complex organ than breast, liver, or kidney, but the same principles of prediction can be applied to psychiatric treatment outcomes.

Measurement of prediction in medicine

In medicine, the predictive value of a test depends on three main factors.11 First, and most important, is the likelihood of the outcome of interest given the population from which the individual being tested is drawn, also known as the prior probability. If prior probability is low, even the best tests will return many false positives. Test sensitivity is another important factor, ex pressing how good the test is at detecting the outcome, or the probability of & positive test in someone who will show the outcome of interest. Test specificity expresses how good the test is at differentiating among possible outcomes, or the probability of a negative test in someone who will not show the outcome of interest.

Sensitivity and specificity tend to vary inversely, so the goal is often to find the test value that maximizes both. This can be expressed as the area under the curve (AUG) of a receiver operating curve (ROC), where sensitivity is plotted against specificity (Figure 1). Clinically useful tests usually have AUC values over 80%. In practice, however, it is often more useful to know the positive predictive value (PPV) of a test, which expresses the probability that a positive test correctly detects someone who will develop the outcome of interest:

PPV = true positives/(true positives + false positives)

Related concepts comprise measures that also attempt to address the clinical utility of a test: does the test provide unique information of sufficient impact to affect clinical decision-making14? One of the bestknown such measures is the number needed to screen (NNS), which estimates the number of patients who would need to be tested for every true positive detected.15 There is no hard rule for NNS, since it may make more sense to screen a large number of people for a rare but serious outcome, while a less serious outcome might call for a smaller NNS.

Figure 1
Figure 1 Left: The receiver operating characteristic (ROC) curve.12 Right: Example of a good diagnostic test, serum troponin T as a predictor of acute myocardial infarction.13 Se, sensitivity; Sp, specificity. Reproduced from references 12 and 13: Receiver Operating Characteristic Curve. Available at: httpy/www.adscience.eu/uploads/ckfiies/files/htmL_files/ StatEL/statel_ROC_curve.htm. Accessed October 2014. Copyright © Adscience 2014; Aldous SJ, Richards M, Cullen L, Troughton R, Than M. Diagnostic and prognostic utility of early measurement with high-sensitivity troponin T assay in patients presenting with chest pain. Can Med Assoc J. 2012;184:e260-e268. Copyright © Canadian Medical Association 2012

Typical end points

Treatment outcomes can cover a range of end points, including symptom reduction, recurrence, relapse, and adverse events. Symptom reduction is a key goal, but as an end point in psychiatry, it is often complicated by spontaneous remissions, placebo effects, and other factors beyond simple response to treatment. Major depression is a good example, where placebo effects are often very prominent.16 Recurrence and relapse are usually distinguished by the duration (and completeness) of symptom reduction, both of which can be difficult to rate reliably, especially in retrospect. Adverse events are a less desirable end point, but can be quite important when they affect adherence—such as sexual dysfunction—or pose a serious health risk—such as agranulocytosis. Distinctive adverse events are often good targets for biomarker studies, although ascertainment of rare or delayed adverse events can be a challenge.

In designing a treatment outcomes study, it is important to consider the expected frequency of the outcome(s) of interest. Acute symptom reduction is best studied with a prospective, randomized, double-blind, repeated measures design.17 Uncommon or rare adverse events cannot usually be studied in this way, since it would be impractical to prospectively follow the large group of patients needed to accumulate a sufficient sample of adverse outcomes. Such end points are better studied with retrospective case-control designs, where patients who have experienced the adverse outcome of interest are compared with those who did not.18 The best case-control designs carefully match each case with one or more controls that address bias and potential confounders. Variables that cannot be matched in this way can often be handled through statistical tools such as regression, but the protection against confounding is not as secure.19

Replication samples are always important, especially when a retrospective case-control design is employed18 or when data-mining approaches are used to identify potential predictors. Failure to build in a sufficient replication sample can lead to biases such as overfilling, Winner's Curse, and false negative replication. Overfitting refers to the tendency to accumulate spurious predictors when multivariate models are fit to a single dataset.20 The Winner's Curse, well-known in genetics studies, describes the overestimation of effect sizes in an initial sample, especially when many hypotheses are tested, as in a genome -wide association study.21 Several good methods can correct for the Winner's Curse, but study of an independent sample is usually needed.22,23 False negative replication is the risk of failing to confirm a true finding when the replication sample was too small or too different from the initial sample to support a valid test of the null hypothesis.24 To reduce this risk, replication samples should be larger than initial samples and should ideally be ascertained using identical methods.

Measurement of treatment outcomes

Treatment outcome research in psychiatry has a long history rooted in observational studies and randomized clinical trials. These kinds of studies have influenced the ways in which treatment outcomes are measured by developing widely used and highly reliable measures of symptoms and impairment.25 Table I lists some examples of commonly used instruments. These instruments are often best-suited for acute treatment outcomes. Fewer well-validated instruments exist for the measurement of longer-term outcomes, especially for episodic disorders or those whose primary symptoms change over time. Rare adverse events that emerge after years of treatment are particularly difficult to detect and characterize.26

Despite these limitations, it is now possible to measure treatment outcomes for most major psychiatric illnesses with a high degree of precision and reliability. There is a rich literature on measurement of mood symptoms,27 psychotic symptoms,28 anxiety,29 and cognitive impairment.30,31 There are also good instruments for measuring symptom exacerbation,25 episode recurrence, and relapse over time and their relationship to treatment interventions.32-34 Global measures of social or occupational impairment of have also been developed that offer a perspective on illness outcomes that crosses diagnostic boundaries.35

Mood symptoms
Hamilton Depression Rating Scale (HDRS)
Beck Depression Inventory (BDI)
Montgomery Asberg Depression Rating Scale (MÀDRS)
Young Mania Scale (YMS)
Psychotic symptoms
Scales for the Assessment of Negative and Positive Symptoms (SANS-SAPS)
Positive and Negative Symptom Scale (PANSS)
Anxiety
Beck Anxiety Inventory
Hamilton Anxiety Scale
Cognitive impairment
Mini Mental State Exam (MMSE)
Trail Making Test
Wechlser Adult Intelligence Scale (WAIS)
General impairment and disability
Global Assessment of Functioning (GAF)
Hopkins Symptom Checklist (HSCL-90)
Brief Psychiatric Rating Scale (BPRS)
Present State Exam (PSE)
Longer-term outcomes
Retrospective Assessment of Lithium Response (Aida Scale)

The best studied predictive factors

Diagnosis and clinical features

Considerable research, reaching back into the last century, has been focused on using diagnosis as a predictor of treatment outcome. Attempts have been made to delineate clinical forms of schizophrenia that are more responsive to antipsychotics,36,37 lithium-responsive forms of bipolar disorder,38,39 and treatment-resistant forms of depression.40-43

There has been some progress in this approach. For example, early age at onset is a consistent predictor of greater symptom severity in mood disorders, although not necessarily of treatment outcome.43-46 Symptom severity is another well-studied predictor, but its predictive value varies by diagnosis. Major depression is more responsive to medications when symptoms are severe.47 In contrast, severe schizophrenia is less responsive to treatment.48

The major limitation of predictors based on diagnosis and clinical features is dependence on clinical assessments that may be inaccurate, imprecise, or unstable over time. Diagnosis may also be too distal from underlying biological processes that presumably underlie various outcomes. New diagnostic systems such as the research domain criteria (RDoC) seek to address this problem by focussing on dimensions of observable behavior and neurobiological measures.49

Based on this idea, much recent research has focused on cognitive symptoms in schizophrenia as prognostic indicators and as treatment targets in themselves.50-52 Widely used antipsychotics that control positive symptoms such as hallucinations do not treat—and may actually exacerbate—many of the deficits in working memory, social cognition, and executive functioning that contribute to the disability associated with schizophrenia.53 Psychotherapeutic and pharmacologic interventions that effectively target these cognitive symptoms could thus contribute to better functional outcomes in patients suffering from schizophrenia and other psychotic disorders.

Treatment adherence

Treatment adherence is an obvious but sometimes overlooked factor that can play a major role in treatment outcome.2 In research settings, treatment adherence can be monitored with blood levels, pill counts, or direct supervision. In clinical settings these measures may be difficult or impossible to implement. One consistent predictor of good treatment compliance is a subjective sense of positive regard for the treating clinician,36,54 which emphasizes the importance of the doctor-patient relationship in predicting treatment outcomes.

Treatment adherence is also related to adverse drug events. For example, in the STAR*D study of outpatients with major depression, reported adherence was lowest in those with the highest perceived side-effect burden.54 A similar phenomenon was observed in the CATIE study of outpatients with schizophrenia.28 These data suggest that efforts aimed at reducing perceived side effects could lead to better outcomes mediated by better treatment adherence.

Comorbidity

Among the general factors that affect treatment outcomes, comorbidity looms large, particularly in psychiatry. As a general rule, the more comorbidity, the poorer the outcome. This rule applies not only to psychiatric disorders, which are typically comorbid, but also to nonpsychiatric illnesses, especially chronic conditions such as heart disease,55 kidney disease,56 and diabetes.57 Comorbid anxiety disorders are one well-known risk factor for poor treatment outcomes in major depression.54,58 Comorbid substance-use disorders (SUDs) are another major predictor of poor treatment outcomes. SUDs directly interfere with treatment response to antidepressants and anti-anxiety agents. SUDs are also associated with poorer treatment adherence, greater complaints of treatment-emergent adverse events, and more medical complications, all of which correlate with poorer outcomes.59,60

Genetic markers

All biomarkers represent correlations between an observation (test value) and an outcome. Correlation does not necessarily mean causation, since the observation may be the consequence of the outcome, rather than its cause (reverse causation), or may be correlated with some unobserved causal factor. Genetic markers can be especially useful as biomarkers since they are not subject to the reverse causation problem. The value of a genetic marker (ie, a basepair or set of basepairs in the inherited sequence of DNA) cannot reflect the result of a health outcome.

Genetic markers may still represent correlated, noncausal events, however. Genetic markers detected in a genome-wide association study (GWAS) are a good example. In most cases, the markers do not themselves represent a genetic variation with a direct biological impact, but are instead associated with one or more functional changes in the coding or regulatory sequence of a nearby gene. This fact also means that genetic markers emerging from GWAS are often difficult to follow up biologically, even when the markers themselves may reflect a valid association with a clinical outcome.

Occasionally, individual genetic markers may have predictive value for outcomes that reflect a major impact of a single gene, such as a Mendelian disease or a rare adverse event heavily influenced by only a few genes. Stevens-Johnson Syndrome in patients exposed to carbamazepine is a good example of this.61 More often, the causal architecture of treatment outcomes is the result of a complex mixture of non-genetic factors and several different genes. Recent methods that take into account large sets of common genetic markers - hundreds to millions—show some promise for increased predictive value, but even in the best-case scenarios AUC values have tended to top out around 65%—too low for clinical utility.62 More complex models that consider large sets of genetic markers along with clinical predictors could have greater predictive value,63 but such models have so far been little studied.

There has been considerable interest recently in approaches that go beyond common genetic variation to encompass rare variants that may have a large impact on health in individuals. The personal genome movement represents a fresh opportunity to use genetic information in the prediction of individual treatment outcomes. Personal genome analysis uses extensive bioinformatic annotation of an individual's genome to generate probabilistic statements about disease risks and treatment outcomes.64,65 For this kind of analysis, sequence may focus on the portion of the genome transcribed into proteins—the “exome”—but whole genome sequence is best, since it represents noncoding and regulatory variation that may have an impact on disease.66

From the “personal genomics” perspective, individual health outcomes are influenced uniquely by each person's total genetic endowment, most of which represents rare genetic variants that can only be discovered by high-throughput (whole exome or whole genome) sequencing.64 While it is undeniable that each person is genetically unique, personal genomics needs to tackle some problems with inference before it can establish itself as a useful tool in medicine. If each person is unique, then how can we make predictions that are statistically valid?67 How can we compare disease and treatment outcomes between groups? Can we practically amass the very large sample sizes needed to assess the impact of rare variation?68 While there are still many methodological and ethical issues that will need to be addressed,1 personal genome analysis will grow more and more powerful as the worldwide database of human genome sequence data and health outcomes grows.69

Other biomarkers

Other biomarkers that are the subject of ongoing research include measures of gene expression,70 neuroimaging measures,71-74 circulating inflammatory factors,75 electroencephalographic measures,76,77 and metabolomics. Some interesting findings have also emerged in the field of proteomics, which go beyond measures of individual proteins to assess widespread patterns of protein expression.78

Most of these approaches are severely limited by the inaccessibility of the brain in living patients. Even this barrier may fall in the future. New technologies such as induced pluripotent stem cells (iPSc) allow scientists to “reprogram” peripheral tissues (such as skin) to differentiate into neurons and glia.The cells express the same genome as the patient from whom they were collected, although epigenetic marks such as DNA methylation are lost. While challenging and expensive, iPSc methodology has already shown promise as a means of developing cellular models of mental illness that exhibit defects in neuronal development and synapse formation in cells derived from patients with schizophrenia.79,80 In the future, such models may be one way to design drug treatment regimens specifically tailored to an individual patient's genetics and cellular pathology.

Although promising, none of these biomarkers have been studied in sufficiently large samples to support firm conclusions. Integration of predictors across clinical and biological domains is another area that has so far been little explored,40 but is of growing interest as the multifactorial nature of treatment outcome becomes increasingly clear.

Other predictive factors

Other factors that have been shown to influence treatment outcomes include socioeconomic status, race, and adverse life events.81,82 Since these variables tend to be correlated in many populations, it is difficult to disentangle causes from consequences. One approach takes advantage of genome-wide data to separate genetic ancestry from race, which can be especially useful in populations of mixed ancestry where genetics and race are not tightly correlated.83 One big disadvantage of these kinds of variables is that they are often not considered modifiable risk factors for poor treatment outcomes, so will always have limited clinical utility.

Recent literature on response to a variety of psychotropic treatments has highlighted the importance of early symptomatic improvement. While the published studies vary in their definition of early response, several have suggested that a reduction in symptoms within the first week of drug exposure strongly predicts response after many weeks of therapy. In patients with unipolar major depression, sensitivities and specificities in the range of 80% have been reported.84 For patients with bipolar depression, specificity is low, ie, lack of early improvement predicts poor response, but the presence of early improvement is unreliable.85 Interestingly, this finding seems to hold both for pharmacological and psychotherapeutic (cognitive-behavioral) modalities86 and has been linked to changes in brain-derived neurotropic factor (BDNF) in one study.87 Similar results have been reported for mania, schizophrenia, and ADHD.88-90 These findings may contradict the established view that response to psychiatric treatments cannot be accurately gauged before several weeks. If supported in future studies, this observation could prove to be a valuable clinical predictor of response: patients who fail to showany response within the first week of treatment may benefit from a change.

The recent focus on genetics has also revived interest in the family history as a predictive tool.91 Families can be uniquely informative since they provide a “natural laboratory” in which genetics and environment come together in a manner that is not seen in other groupings of people.92 While everyone in a family is genetically distinct, relatives share substantial fractions of rare and common genetic variation in a context of shared environment and life experiences. Thus we should not be surprised that family history is still a much better predictor of common health and treatment outcomes than any available genetic tests.93,94 Just one example: family history is still the best predictor of response to lithium in bipolar disorder.38,95 This renewed interest in family history has also led to proposals for better, more efficient collection of family history data in research and clinical contexts.96,97

Summary and future directions

Prediction of treatment outcomes remains a significant challenge for psychiatry, as it does for the rest of medicine. While there has been considerable progress in recent years, major barriers to progress remain. Inadequate sample sizes are a significant problem, since psychiatric diagnostic categories remain quite heterogeneous, with little assurance that similarly diagnosed patients share similar etiological and pathophysiological factors. We still have a poor understanding of how events that can be described at the level of genes or neurons emerge as complex mental states characterized by disturbances in emotions, cognitions, and behaviors. And as in other fields of medicine, treatment outcomes will continue to be influenced by individual choices and disparities in social, economic, and educational opportunities that are largely beyond the control of clinicians.

We now have a solid foundation in prediction science, outcomes measurement, and genomics on which future research can build. The best predictions will need to take into account clinical as well as biomarker data. Genetic markers show special promise in this regard, but we need more research into the clinical utility of genetic information in individual treatment decisions. Integrative approaches that like the family history can jointly model genetic and environmental influences stand the best chance for generating clinically-relevant predictions.

How far can we go in predicting treatment outcomes in psychiatry? As data accumulate, we should be able to move well beyond the largely trial-and-error approach most psychiatrists still rely on. Even the best predictions are ultimately limited by the great extent of human variation, without which, to paraphrase William Osier, medicine would be more of a science and less of an art.