|dc.description.abstract||Measuring gene expression, or the genome-wide levels of mRNA molecules in a human cell, provides a rich source of information about the functional pathways and processes active or dysregulated in the cell. Gene expression is context-specific: expression levels of the approximately 20,000 human protein-coding genes can vary widely between cells from different tissue types and can also change under disease conditions or in response to experimental conditions, such as drug treatment. However, while quantifying per-gene expression changes can be useful for identifying individual genes associated with the biological condition being studied, such a large list is limited in its ability to provide meaningful insights into the underlying biology. One of the most widely used bioinformatics approaches for interpreting coordinate, pathway-level changes in differentially expressed gene lists is gene set enrichment analysis (GSEA). GSEA aggregates individual gene-level statistics up to the broader scope of a gene set, or group of coordinately regulated and typically functionally similar genes. For an experiment where less than seven samples per condition are being compared, the GSEA “Preranked” algorithm generates a gene set enrichment score representing the degree to which genes in a given pathway are overrepresented at either end of a user-supplied ranked list. The enrichment score is tested for significance using a null distribution of enrichment scores generated from permuted gene sets, wherein genes are randomly selected from the input experiment. Thus, the GSEA Preranked statistical test assumes that the null distribution of ranks for every gene is uniform. In pharmacological studies, GSEA Preranked is frequently used to identify biological pathways and processes that change with drug treatment, for example, by comparing drug-treated cells and vehicle-treated control cells. To investigate gene rank distributions in the context of different tissues and drugs, we curated a compendium of 5,718 publicly available gene expression experiments employing cell lines from 23 tissues treated with 231 small molecule compounds. We modeled individual gene ranks across experiments and also analyzed all experiments with GSEA Preranked. We found that the gene rank distribution was not uniform: ranks were influenced by the drug and tissue type being studied, and experiments using the same drug or tissue type showed similar enriched gene sets reported by GSEA. We also found that some genes and gene sets showed consistent patterns of up- or down-regulation across tissues and drug treatments. For example, comparing GSEA Preranked results for the entire compendium of experiments, gene sets related to inflammation, cellular stress response, and the cell cycle were consistently differentially regulated, regardless of the cell type or experimental condition. Taken together, these results show that the GSEA Preranked model of a uniform gene rank distribution does not fully represent the true biology. Alternatively, creating an informed null distribution using data from relevant, existing experiments would enable a researcher to identify which results are unique to their in vitro drug treatment experiment and which results are not unique to that context but, rather, are observed under wide a variety of experimental conditions. This is important because, while numerous studies have published post-treatment gene expression data, the complete feature space of all possible drug and cell line combinations remains largely untouched. Wet laboratory experiments screening drugs and profiling gene expression are time-intensive and costly. Thus, a computational method capable of leveraging information from existing tissue-drug combinations would help generate hypotheses and prioritize new experiments rapidly, and at lower overall cost. Several groups have proposed methodological variations on GSEA Preranked seeking to address different statistical challenges, for example, by increasing its statistical power, modifying the test to account for gene correlations, or extending its applicability to single-sample analysis. Other tools have been developed recently to enable meta-analysis on GSEA results by combining p-values across multiple studies. Nevertheless, none of these methods utilize the wealth of information contained in the publicly available data sets to inform biologically relevant null distributions for statistical significance testing. To this end, we developed an algorithm called GSEA-InContext to perform gene set enrichment analysis while incorporating selected data from previously published studies into the statistical testing procedure. GSEA-InContext enables a researcher to address two novel and important questions about their experiment: 1) which gene sets enriched in my experiment have also been observed in other, published experiments and 2) which gene sets are uniquely enriched under my experimental conditions as compared to other conditions and experiments? Utilizing simulations and real biological applications, we demonstrate that GSEA-InContext is able to complement and enrich the results generated by GSEA Preranked, allowing researchers to test multiple hypotheses in silico and thus form a more complete picture of their own experiment. Inspired by the large number of potential applications of GSEA-InContext, we developed a web application called GSEA-InContext Explorer whose graphical interface makes the GSEA-InContext algorithm widely accessible for use by researchers of all fields and backgrounds. GSEA-InContext Explorer enables a researcher to interactively select from thousands of published experiments to build a custom background set of experiments relevant to their hypothesis. For example, a researcher may be interested in comparing their experiment to others that used a similar drug. Subsequently, the researcher can upload their experiment and run the GSEA-InContext algorithm using their previously selected background experiments. Finally, GSEA-InContext Explorer offers several visualization tools, which show a researcher how their experiment of interest compares against the compendium of publicly available experiments. Overall, we have developed a novel method for gene set enrichment analysis that uses a researcher-defined set of background experiments to inform the null distribution used in significance testing of gene sets. Conceptually, GSEA-InContext allows researchers to put their experiments into the context of potentially thousands of previously published results, allowing them to more fully explore their scientific findings. Freely available to a wide audience, GSEA-InContext will help address the many gaps existing in our current understanding of tissue- and drug-specific effects on gene expression, knowledge which can ultimately improve how we treat and manage disease.