Show / hide all abstracts


E Saberski, AK Bock, R Goodridge, V Agarwal, T Lorimer, SA Rifkin, G Sugihara. (2021). PLoS Computational Biology. 17(9): e1009329
Download | Show/hide abstract
Behavioral phenotyping of model organisms has played an important role in unravelling the complexities of animal behavior. Techniques for classifying behavior often rely on easily identified changes in posture and motion. However, such approaches are likely to miss complex behaviors that cannot be readily distinguished by eye (e.g., behaviors produced by high dimensional dynamics). To explore this issue, we focus on the model organism Caenorhabditis elegans, where behaviors have been extensively recorded and classified. Using a dynamical systems lens, we identify high dimensional, nonlinear causal relationships between four basic shapes that describe worm motion (eigenmodes, also called “eigenworms”). We find relationships between all pairs of eigenmodes, but the timescales of the interactions vary between pairs and across individuals. Using these varying timescales, we create “interaction profiles” to represent an individual’s behavioral dynamics. As desired, these profiles are able to distinguish well-known behavioral states: i.e., the profiles for foraging individuals are distinct from those of individuals exhibiting an escape response. More importantly, we find that interaction profiles can distinguish high dimensional behaviors among divergent mutant strains that were previously classified as phenotypically similar. Specifically, we find it is able to detect phenotypic behavioral differences not previously identified in strains related to dysfunction of hermaphrodite-specific neurons.
T Lorimer, R Goodridge, AK Bock, V Agarwal, E Saberski, G Sugihara, SA Rifkin. (2021). PLoS One. 16(5): e0251023
Download | Show/hide abstract
Automated analysis of video can now generate extensive time series of pose and motion in freely-moving organisms. This requires new quantitative tools to characterise behavioural dynamics. For the model roundworm Caenorhabditis elegans, body pose can be accurately quantified from video as coordinates in a single low-dimensional space. We focus on this well-established case as an illustrative example and propose a method to reveal subtle variations in behaviour at high time resolution. Our data-driven method, based on empirical dynamic modeling, quantifies behavioural change as prediction error with respect to a time-delay-embedded ‘attractor’ of behavioural dynamics. Because this attractor is constructed from a user-specified reference data set, the approach can be tailored to specific behaviours of interest at the individual or group level. We validate the approach by detecting small changes in the movement dynamics of C. elegans at the initiation and completion of delta turns. We then examine an escape response initiated by an aversive stimulus and find that the method can track return to baseline behaviour in individual worms and reveal variations in the escape response between worms. We suggest that this general approach—defining dynamic behaviours using reference attractors and quantifying dynamic changes using prediction error—may be of broad interest and relevance to behavioural researchers working with video-derived time series.
B Yang, SA Rifkin. (2020). eLife. 9:e62689.
Download | Show/hide abstract
The speed at which a cell fate decision in nematode worms evolves is due to the number of genes that control the decision, rather than to a high mutation rate.
A Taton, C Erikson, Y Yang, BE Rubin, SA Rifkin, JW Golden, SS Golden. (2020). Nature Communications. 11:1668.
Download | Supplementary Material | Related Articles from PubMed | Show/hide abstract
Natural genetic competence-based transformation contributed to the evolution of prokaryotes, including the cyanobacterial phylum that established oxygenic photosynthesis. The cyanobacterium Synechococcus elongatus is noted both as a model system for analyzing a prokaryotic circadian clock and for its facile, but poorly understood, natural competence. Here a genome-wide screen aimed at determining the genetic basis of competence in cyanobacteria identified all genes required for natural transformation in S. elongatus, including conserved Type IV pilus, competence-associated, and newly described genes, and revealed that the circadian clock controls the process. The findings uncover a daily program that determines the state of competence in S. elongatus and adapts to seasonal changes of day-length. Pilus biogenesis occurs daily in the morning, but competence is maximal upon the coincidence of circadian dusk and the onset of darkness. As in heterotrophic bacteria, where natural competence is conditionally regulated by nutritional or other stress, cyanobacterial competence is conditional and is tied to the daily cycle set by the cell’s most critical nutritional source, the Sun.
S Kuo, JD Egertson, GE Merrihew, MJ MacCoss, DA Pollard, SA Rifkin. (bioRxiv preprint)
bioRxiv preprint | Show/hide abstract
Although mRNA is a necessary precursor to protein, several studies have argued that the relationship between mRNA and protein levels is often weak. This claim undermines the functional relevance of conclusions based on quantitative analyses of mRNA levels, which are ubiquitous in modern biology from the single gene to the whole genome scale. Furthermore, if post-translational processes vary between strains and species, then comparative studies based on mRNA alone would miss an important driver of diversity. However, gene expression is dynamic, and most studies examining relationship between mRNA and protein levels at the genome scale have analyzed single timepoints. We measure yeast gene expression after pheromone exposure and show that, for most genes, protein timecourses can be predicted from mRNA timecourses through a simple, gene-specific, generative model. By comparing model parameters and predictions between strains, we find that while mRNA variation often leads to protein differences, evolution also manipulates protein-specific processes to amplify or buffer transcriptional regulation.
DG Welkie, BE Rubin, Y-G Chang, S Diamond, SA Rifkin, A LiWang, SS Golden. (2018). PNAS. 201802940
Download | Supplementary Material | Related Articles from PubMed | Show/hide abstract
The recurrent pattern of light and darkness generated by Earth’s axial rotation has profoundly influenced the evolution of organisms, selecting for both biological mechanisms that respond acutely to environmental changes and circadian clocks that program physiology in anticipation of daily variations. The necessity to integrate environmental responsiveness and circadian programming is exemplified in photosynthetic organisms such as cyanobacteria, which depend on light-driven photochemical processes. The cyanobacterium Synechococcus elongatus PCC 7942 is an excellent model system for dissecting these entwined mechanisms. Its core circadian oscillator, consisting of three proteins, KaiA, KaiB, and KaiC, transmits time-of-day signals to clock-output proteins, which reciprocally regulate global transcription. Research performed under constant light facilitates analysis of intrinsic cycles separately from direct environmental responses but does not provide insight into how these regulatory systems are integrated during light–dark cycles. Thus, we sought to identify genes that are specifically necessary in a day–night environment. We screened a dense bar-coded transposon library in both continuous light and daily cycling conditions and compared the fitness consequences of loss of each nonessential gene in the genome. Although the clock itself is not essential for viability in light–dark cycles, the most detrimental mutations revealed by the screen were those that disrupt KaiA. The screen broadened our understanding of light–dark survival in photosynthetic organisms, identified unforeseen clock–protein interaction dynamics, and reinforced the role of the clock as a negative regulator of a nighttime metabolic program that is essential for S. elongatus to survive in the dark.

Understanding how photosynthetic bacteria respond to and anticipate natural light–dark cycles is necessary for predictive modeling, bioengineering, and elucidating metabolic strategies for diurnal growth. Here, we identify the genetic components that are important specifically under light–dark cycling conditions and determine how a properly functioning circadian clock prepares metabolism for darkness, a starvation period for photoautotrophs. This study establishes that the core circadian clock protein KaiA is necessary to enable rhythmic derepression of a nighttime circadian program.
BE Rubin, TN Huynh, DG Welkie, S Diamond, R Simkovsky, EC Pierce, A Taton, LC Lowe, JJ Lee, SA Rifkin, JJ Woodward, SS Golden. (2018). PLoS Genetics. 14:e1007301.
Download | Related Articles from PubMed | Show/hide abstract
The broadly conserved signaling nucleotide cyclic di-adenosine monophosphate (c-di-AMP) is essential for viability in most bacteria where it has been studied. However, characterization of the cellular functions and metabolism of c-di-AMP has largely been confined to the class Bacilli, limiting our functional understanding of the molecule among diverse phyla. We identified the cyclase responsible for c-di-AMP synthesis and characterized the molecule’s role in survival of darkness in the model photosynthetic cyanobacterium Synechococcus elongatus PCC 7942. In addition to the use of traditional genetic, biochemical, and proteomic approaches, we developed a high-throughput genetic interaction screen (IRB-Seq) to determine pathways where the signaling nucleotide is active. We found that in S. elongatus c-di-AMP is produced by an enzyme of the diadenylate cyclase family, CdaA, which was previously unexplored experimentally. A cdaA-null mutant experiences increased oxidative stress and death during the nighttime portion of day-night cycles, in which potassium transport is implicated. These findings suggest that c-di-AMP is biologically active in cyanobacteria and has non-canonical roles in the phylum including oxidative stress management and day-night survival. The pipeline and analysis tools for IRB-Seq developed for this study constitute a quantitative high-throughput approach for studying genetic interactions.

Author Summary
Cyclic di-adenosine monophosphate (c-di-AMP) is a molecule that has significant roles in many microorganisms. This work shows the existence of c-di-AMP for the first time in photosynthetic microorganisms, cyanobacteria, and demonstrates its role in survival during the light-to-dark shifts that occur in day-night cycles. Despite the obvious importance of adaptation to these daily cycles for organisms that are fundamentally reliant on light, such as cyanobacteria, understanding of diurnal physiology is lacking because most cyanobacterial research is conducted during growth in constant light. To identify other players in c-di-AMP’s function we developed a low-cost and efficient method for finding interactions between genes. The technique combines one mutation, in this case for the gene that encodes the enzyme for synthesis of c-di-AMP, with thousands of other individual mutations to find pairwise interactions that affect fitness of the resulting mutants. Mutants are tagged with DNA barcodes to allow their survival to be easily tracked in a population of cells. The method enables us to place the function of c-di-AMP within the context of pathways previously known to be involved in day-night survival. Taken together, this work expands the known roles of c-di-AMP, improves our understanding of cyanobacterial survival in day-night cycles, and presents an improved approach for determining genetic interactions.
DA Pollard, CK Asamoto, H Rahnamoun, AS Abendroth, SR Lee, SA Rifkin (bioRxiv preprint).
bioRxiv preprint | Show/hide abstract
Heritable variation in gene expression patterns plays a fundamental role in trait variation and evolution, making understanding the mechanisms by which genetic variation acts on gene expression patterns a major goal for biology. Both theoretical and empirical work have largely focused on variation in steady-state mRNA levels and mRNA synthesis rates, particularly of protein-coding genes. Yet in order for this variation to affect higher order traits it must lead to differences at the protein level. Variation in protein-specific processes including protein synthesis rates and protein decay rates could amplify, mask, or even reverse effects transmitted from the transcript level, but the extent to which this happens is unclear. Moreover, mechanisms that underlie protein expression variation under dynamic conditions have not been examined. To address this challenge, we analyzed how mRNA and protein expression dynamics covary between two strains of Saccharomyces cerevisiae during mating pheromone response. Although divergent steady-state mRNA expression levels explained divergent steady-state protein levels for four out of five genes in our study, the same was true for only one out of five genes for expression dynamics. By integrating decay rate and allele-specific protein expression analyses, we resolved that expression divergence for Fig1p was caused by genetic variation acting in trans on protein synthesis rate, expression divergence for Ina1p was caused by cis-by-trans epistatic effects on transcript level and protein synthesis rate, and expression divergence for Fus3p and Tos6p were caused by divergence in protein synthesis rates. Our study demonstrates that steady-state analysis of gene expression is insufficient to understand the impact of genetic variation on gene expression variation. An integrated and dynamic approach to gene expression analysis - comparing mRNA levels, protein levels, protein decay rates, and allele-specific protein expression - allows for a detailed analysis of the genetic mechanisms underlying protein expression divergences.
SR Stockwell, SA Rifkin (2017). Molecular Systems Biology. 13:908.
Donwload | Data and Analysis Scripts | bioRxiv preprint | Related Articles from PubMed | Show/hide abstract
When a cell encounters a new environment, its transcriptional response can be constrained by its history. For example, yeast cells in galactose induce GAL genes with a speed and unanimity that depends on previous nutrient conditions. Cellular memory of long‐term glucose exposure delays GAL induction and makes it highly variable with in a cell population, while other nutrient histories lead to rapid, uniform responses. To investigate how cell‐level gene expression dynamics produce population‐level phenotypes, we built living vector fields from thousands of single‐cell time courses of the proteins Gal3p and Gal1p as cells switched to galactose from various nutrient histories. We show that, after sustained glucose exposure, the lack of these GAL transducers leads to induction delays that are long but also variable; that cellular resources constrain induction; and that bimodally distributed expression levels arise from lineage selection—a subpopulation of cells induces more quickly and outcompetes the rest. Our results illuminate cellular memory in this important model system and illustrate how resources and randomness interact to shape the response of a population to a new environment.
L Du, S Tracy, SA Rifkin (2016). Developmental Biology. 412: 160-170.
Download | Related Articles from PubMed | Show/hide abstract
Cis-regulatory elements (CREs) are crucial links in developmental gene regulatory networks, but in many cases, it can be difficult to discern whether similar CREs are functionally equivalent. We found that despite similar conservation and binding capability to upstream activators, different GATA cis-regulatory motifs within the promoter of the C. elegans endoderm regulator elt-2 play distinctive roles in activating and modulating gene expression throughout development. We fused wild-type and mutant versions of the elt-2 promoter to a gfp reporter and inserted these constructs as single copies into the C. elegans genome. We then counted early embryonic gfp transcripts using single-molecule RNA FISH (smFISH) and quantified gut GFP fluorescence. We determined that a single primary dominant GATA motif located 527 bp upstream of the elt-2 start codon was necessary for both embryonic activation and later maintenance of transcription, while nearby secondary GATA motifs played largely subtle roles in modulating postembryonic levels of elt-2. Mutation of the primary activating site increased low-level spatiotemporally ectopic stochastic transcription, indicating that this site acts repressively in non-endoderm cells. Our results reveal that CREs with similar GATA factor binding affinities in close proximity can play very divergent context-dependent roles in regulating the expression of a developmentally critical gene in vivo.
M Maduro, G Broitman-Maduro, H Choi, F Carranza, AC-Y Wu, SA Rifkin (2015). Developmental Biology. 404: 66-79.
Download | Related Articles from PubMed | Faculty of 1000 review | Show/hide abstract
The MED-1,2 GATA factors contribute to specification of E, the progenitor of the C. elegans endoderm, through the genes end-1 and end-3, and in parallel with the maternal factors SKN-1, POP-1 and PAL-1. END-1,3 activate elt-2 and elt-7 to initiate a program of intestinal development, which is maintained by positive autoregulation. Here, we advance the understanding of MED-1,2 in E specification. We find that expression of end-1 and end-3 is greatly reduced in med-1,2(-) embryos. We generated strains in which MED sites have been mutated in end-1 and end-3. Without MED input, gut specification relies primarily on POP-1 and PAL-1. 25% of embryos fail to make intestine, while those that do display abnormal numbers of gut cells due to a delayed and stochastic acquisition of intestine fate. Surviving adults exhibit phenotypes consistent with a primary defect in the intestine. Our results establish that MED-1,2 provide robustness to endoderm specification through end-1 and end-3, and reveal that gut differentiation may be more directly linked to specification than previously appreciated. The results argue against an "all-or-none" description of cell specification, and suggest that activation of tissue-specific master regulators, even when expression of these is maintained by positive autoregulation, does not guarantee proper function of differentiated cells.
AC-Y Wu, SA Rifkin (2015). BMC Bioinformatics. 16: 102.
Download | Related Articles from PubMed | Show/hide abstract | Software
Background. Recent techniques for tagging and visualizing single molecules in fixed or living organisms and cell lines have been revolutionizing our understanding of the spatial and temporal dynamics of fundamental biological processes. However, fluorescence microscopy images are often noisy, and it can be difficult to distinguish a fluorescently labeled single molecule from background speckle.
Results. We present a computational pipeline to distinguish the true signal of fluorescently labeled molecules from background fluorescence and noise. We test our technique using the challenging case of wide-field, epifluorescence microscope image stacks from single molecule fluorescence in situ experiments on nematode embryos where there can be substantial out-of-focus light and structured noise. The software recognizes and classifies individual mRNA spots by measuring several features of local intensity maxima and classifying them with a supervised random forest classifier. A key innovation of this software is that, by estimating the probability that each local maximum is a true spot in a statistically principled way, it makes it possible to estimate the error introduced by image classification. This can be used to assess the quality of the data and to estimate a prediction interval for the molecule count estimate, all of which are important for quantitative interpretations of the results of single-molecule experiments.
Conclusions. The software classifies spots in these images well, with >95% AUROC on realistic artificial data, and outperforms other commonly used techniques on challenging real data. Its interval estimates provide a unique measure of the quality of an image and confidence in the classification.
SR Stockwell, CR Landry, SA Rifkin (2015). Molecular Biosystems. 11: 28-37.
Download | Related Articles from PubMed | Show/hide abstract
Recent experiments have revealed surprising behavior in the yeast galactose (GAL) pathway, one of the preeminent systems for studying gene regulation. Under certain circumstances, yeast cells display memory of their prior nutrient environments. We distinguish two kinds of cellular memory discovered by quantitative investigations of the GAL network and present a conceptual framework for interpreting new experiments and current ideas on GAL memory. Reinduction memory occurs when cells respond transcriptionally to one environment, shut down the response during several generations in a second environment, then respond faster and with less cell-to-cell variation when returned to the first environment. Persistent memory describes a long-term, arguably stable response in which cells adopt a bimodal or unimodal distribution of induction levels depending on their preceding environment. Deep knowledge of how the yeast GAL pathway responds to different sugar environments has enabled rapid progress in uncovering the mechanisms behind GAL memory, which include cytoplasmic inheritance of inducer proteins and positive feedback loops among regulatory genes. This network of genes, long used to study gene regulation, is now emerging as a model system for cellular memory.
MA Bakowski, CA Desjardins, MG Smelkinson, TA Dunbar, IF Lopez-Moyado, SA Rifkin, CA Cuomo, ER Troemel (2014). PLoS Pathogens. 10: e1004200.
Download | Related Articles from PubMed | Show/hide abstract
Microsporidia comprise a phylum of over 1400 species of obligate intracellular pathogens that can infect almost all animals, but little is known about the host response to these parasites. Here we use the whole-animal host C. elegans to show an in vivo role for ubiquitin-mediated response to the microsporidian species Nematocida parisii, as well to the Orsay virus, another natural intracellular pathogen of C. elegans. We analyze gene expression of C. elegans in response to N. parisii, and find that it is similar to response to viral infection. Notably, we find an upregulation of SCF ubiquitin ligase components, such as the cullin ortholog cul-6, which we show is important for ubiquitin targeting of N. parisii cells in the intestine. We show that ubiquitylation components, the proteasome, and the autophagy pathway are all important for defense against N. parisii infection. We also find that SCF ligase components like cul-6 promote defense against viral infection, where they have a more robust role than against N. parisii infection. This difference may be due to suppression of the host ubiquitylation system by N. parisii: when N. parisii is crippled by anti-microsporidia drugs, the host can more effectively target pathogen cells for ubiquitylation. Intriguingly, inhibition of the ubiquitin-proteasome system (UPS) increases expression of infection-upregulated SCF ligase components, indicating that a trigger for transcriptional response to intracellular infection by N. parisii and virus may be perturbation of the UPS. Altogether, our results demonstrate an in vivo role for ubiquitin-mediated defense against microsporidian and viral infections in C. elegans.
CR Landry, SA Rifkin (2012). Evolutionary Systems Biology. (Soyer, ed.). Advances in Experimental Medicine and Biology. 751: 371-398.
Download | Related Articles from PubMed | Show/hide abstract
The processes by which genetic variation in complex traits is generated and maintained in populations has for a long time been treated in abstract and statistical terms. As a consequence, quantitative genetics has provided limited insights into our understanding of the molecular bases of quantitative trait variation. With the developing technological and conceptual tools of systems biology, cellular and molecular processes are being described in greater detail. While we have a good description of how signaling and other molecular networks are organized in the cell, we still do not know how genetic variation affects these pathways, because systems and molecular biology usually ignore the type and extent of genetic variation found in natural populations. Here we discuss the quantitative genetics and systems biology approaches for the study of complex trait architecture and discuss why these two disciplines would synergize with each other to answer questions that neither of the two could answer alone.
SA Rifkin (editor) (2012). Quantitative Trait Loci: Methods and Protocols. Methods in Molecular Biology. 871
Book website | Show/hide preface
For over a century, biologists have searched for the genetic bases of phenotypic variation. While this program has been quite successful for simple Mendelian traits, most traits are complex, shaped by context-dependent interactions between multiple loci and the envi- ronment. Over the last 2 decades, leaps in genotyping technology, coupled with the development of sophisticated quantitative genetic analytical techniques, have made it possible to dissect complex traits and link quantitative variation in traits to allelic variation on chromosomes or quantitative trait loci (QTLs). Propelled by the genome projects and their spinoff technologies, QTL analyses have pervaded all fields of biology and form the backbone for the recent explosion of studies tying specific alleles to human disease. As sequencing becomes ever cheaper and easier, QTL studies will make it possible to relatively quickly identify key genes underlying traits even in non-model organisms, paving the way for discovering new biology. As with any expanding field, the original QTL methodologies have been elaborated into a host of alternative and complementary techniques. A QTL experiment has many components—preparing the experimental mapping population, genotyping, measuring traits, analyzing the data and identifying QTLs, and feeding this information to down- stream analyses—and its success depends upon each part fitting together and being appropriate for answering the motivating question. This volume contains chapters that focus on specific components of the entire process and also a set of case studies at the end where these individual components are linked together into an entire study. This book is intended to serve as a practical resource for researchers interested in links between phenotypic and genotypic variation in fields from medicine to agriculture and from molecular biology to evolution to ecology. Many of the methods are similar between fields. QTL studies often involve multiple authors with complementary expertise, and the case studies in particular are intended to facilitate communication between scientists working on different parts of a project and to give a broader perspective on how each piece fits into the whole. QTL techniques will continue to be developed and further refined and extended. As phenotyping technology improves and as genotyping technology continues to accelerate, statistical approaches to dissecting the genotype–phenotype map will become increasingly important and powerful tools for biological research.
SA Rifkin (2011). Molecular Methods for Evolutionary Genetics. (Orgogozo & Rockman, eds.). Methods in Molecular Biology. 772: 329-348.
Download | Related Articles from PubMed | Show/hide abstract
In the past several years, a host of new technologies have made it possible to visualize single molecules within cells and organisms (Raj et al., Nat Methods 5:877–879, 2008; Paré et al., Curr Biol 19:2037–2042, 2009; Lu and Tsourkas, Nucleic Acids Res 37:e100, 2009; Femino et al., Science 280:585–590, 1998; Rodriguez et al., Semin Cell Dev Biol 18:202–208, 2007; Betzig et al., Science 313:1642–1645, 2006; Rust et al., Nat Methods 3:793–796, 2006; Fusco et al., Curr Biol 13:161–167, 2003). Many of these are based on fluorescence, either fluorescent proteins or fluorescent dyes coupled to a molecule of interest. In many applications, the fluorescent signal is limited to a few pixels, which poses a classic signal processing problem: how can actual signal be distinguished from background noise? In this chapter, I present a MATLAB (MathWorks (2010) MATLAB. Retrieved from software suite designed to work with these single-molecule visualization technologies (Rifkin (2010) spotFinding Suite. It takes images or image stacks from a fluorescence microscope as input and outputs locations of the molecules. Although the software was developed for the specific application of identifying single mRNA transcripts in fixed specimens, it is more general than this and can be used and/or customized for other applications that produce localized signals embedded in a potentially noisy background. The analysis pipeline consists of the following steps: (a) create a gold-standard dataset, (b) train a machine-learning algorithm to classify image features as signal or noise depending upon user defined statistics, (c) run the machine-learning algorithm on a new dataset to identify mRNA locations, and (d) visually inspect and correct the results.
CR Landry, SA Rifkin (2010). Molecular Systems Biology. 6: 434.
Download | Related Articles from PubMed | Show/hide abstract
Whereas mechanistic developmental biology and evolutionary genetics largely proceeded independently from each other throughout most of the twentieth century, new discoveries and technologies have made it possible to revisit longstanding questions of how molecular mechanisms generate the phenotypic effects of alternative alleles. Pioneers such as Schmalhausen (1949) emphasized that phenotypic variation can often be surprisingly limited to both within and between species and proposed that the process of development and its genetic underpinnings are organized to allow a ‘reserve of hereditary variability’ to accumulate within a species that can then be mobilized when conditions change . We are now in a position to dissect the molecular mechanisms that generate the apparent mismatch between extensive genetic and limited phenotypic variation. One important milestone was the discovery that knocking out the activity of the molecular chaperone Hsp90 results in an efflorescence of phenotypic variation due to the exposure of underlying genetic variation. The effects of new mutations are context dependent, and functional Hsp90 dramatically reduces these effects under normal conditions (Rutherford and Lindquist, 1998). Such genes that allow variation to accumulate without having an effect have been dubbed capacitors (Figure 1). In a recent article published in Molecular Systems Biology, Tirosh et al (2010) provide new evidence that chromatin regulators may also act as capacitors for gene expression.
The phenotypic differences between individual organisms can often be ascribed to underlying genetic and environmental variation. However, even genetically identical organisms in homogeneous environments vary, indicating that randomness in developmental processes such as gene expression may also generate diversity. To examine the consequences of gene expression variability in multicellular organisms, we studied intestinal specification in the nematode Caenorhabditis elegans in which wild-type cell fate is invariant and controlled by a small transcriptional network. Mutations in elements of this network can have indeterminate effects: some mutant embryos fail to develop intestinal cells, whereas others produce intestinal precursors. By counting transcripts of the genes in this network in individual embryos, we show that the expression of an otherwise redundant gene becomes highly variable in the mutants and that this variation is subjected to a threshold, producing an ON/OFF expression pattern of the master regulatory gene of intestinal differentiation. Our results demonstrate that mutations in developmental networks can expose otherwise buffered stochastic variability in gene expression, leading to pronounced phenotypic variation.
A Raj, P van den Bogaard, SA Rifkin, A van Oudenaarden, S Tyagi (2008). Nature Methods. 5:877-889
Download | Related Articles from PubMed | Supp. Info | Show/hide abstract
We describe a method for imaging individual mRNA molecules in fixed cells by probing each mRNA species with 48 or more short, singly labeled oligonucleotide probes. This makes each mRNA molecule visible as a computationally identifiable fluorescent spot by fluorescence microscopy. We demonstrate simultaneous detection of three mRNA species in single cells and mRNA detection in yeast, nematodes, fruit fly wing discs, and mammalian cell lines and neurons.
Y Gilad, SA Rifkin, JK Pritchard (2008). Trends in Genetics. 24:408-413
Download | Related Articles from PubMed | Show/hide abstract
Expression quantitative trait loci (eQTL) mapping studies have become a widely used tool for identifying genetic variants that affect gene regulation. In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining studies of variation in gene expression patterns with genome-wide genotyping. Results from recent eQTL mapping studies have revealed substantial heritable variation in gene expression within and between populations. In many cases, genetic factors that influence gene expression levels can be mapped to proximal (putatively cis) eQTLs and, less often, to distal (putatively trans) eQTLs. Beyond providing great insight into the biology of gene regulation, a combination of eQTL studies with results from traditional linkage or association studies of human disease may help predict a specific regulatory role for polymorphic sites previously associated with disease.
CR Landry, B Lemos, SA Rifkin, WJ Dickinson, DL Hartl (2008). Science. 317:118-121
Download | Related Articles from PubMed | Supp. Info | Show/hide abstract
Faculty of 1000
Identifying the properties of gene networks that influence their evolution is a fundamental research goal. However, modes of evolution cannot be inferred solely from the distribution of natural variation, because selection interacts with demography and mutation rates to shape polymorphism and divergence. We estimated the effects of naturally occurring mutations on gene expression while minimizing the effect of natural selection. We demonstrate that sensitivity of gene expression to mutations increases with both increasing trans-mutational target size and the presence of a TATA box. Genes with greater sensitivity to mutations are also more sensitive to systematic environmental perturbations and stochastic noise. These results provide a mechanistic basis for gene expression evolvability that can serve as a foundation for realistic models of regulatory evolution.
Y Gilad, A Oshlack, SA Rifkin (2006). Trends in Genetics. 22:456-461
Download | Related Articles from PubMed | Show/hide abstract
Changes in genetic regulation contribute to adaptations in natural populations and influence susceptibility to human diseases. Despite their potential phenotypic importance, the selective pressures acting on regulatory processes in general and gene expression levels in particular are largely unknown. Studies in model organisms suggest that the expression levels of most genes evolve under stabilizing selection, although a few are consistent with adaptive evolution. However, it has been proposed that gene expression levels in primates evolve largely in the absence of selective constraints. In this article, we discuss the microarray-based observations that led to these disparate interpretations. We conclude that in both primates and model organisms, stabilizing selection is likely to be the dominant mode of gene expression evolution. An important implication is that mutations affecting gene expression will often be deleterious and might underlie many human diseases.
SA Rifkin, D Houle, J Kim, KP White (2005). Nature. 438:220-223.
Download | Related Articles from PubMed | Supp. Info | Data | Array data on GEO | Show/hide abstract
Faculty of 1000
Mutation is the ultimate source of biological diversity because it generates the variation that fuels evolution. Gene expression is the first step by which an organism translates genetic information into developmental change. Here we estimate the rate at which mutation produces new variation in gene expression by measuring transcript abundances across the genome during the onset of metamorphosis in 12 initially identical Drosophila melanogaster lines that independently accumulated mutations for 200 generations. We find statistically significant mutational variation for 39% of the genome and a wide range of variability across corresponding genes. As genes are upregulated in development their variability decreases, and as they are downregulated it increases, indicating that developmental context affects the evolution of gene expression. A strong correlation between mutational variance and environmental variance shows that there is the potential for widespread canalization. By comparing the evolutionary rates that we report here with differences between species, we conclude that gene expression does not evolve according to strictly neutral models. Although spontaneous mutations have the potential to generate abundant variation in gene expression, natural variation is relatively constrained.
Y Gilad, SA Rifkin, P Bertone, M Gerstein, KP White (2005). Genome Research. 15:674-680.
Download | Related Articles from PubMed | Supp. Info | Data | Show/hide abstract
Faculty of 1000
Interspecies comparisons of gene expression levels will increase our understanding of the evolution of transcriptional mechanisms and help to identify targets of natural selection. This approach holds particular promise for apes, as many human-specific adaptations are thought to result from differences in gene expression rather than in coding sequence. To date, however, all studies directly comparing interspecies gene expression have been performed on single-species arrays, so that it has been impossible to distinguish differential hybridization due to sequence mismatches from underlying expression differences. To evaluate the severity of this potential problem, we constructed a new multiprimate cDNA array using probes from human, chimpanzee, orangutan, and rhesus. We find a large effect of sequence divergence on hybridization signal, even in the closest pair of species, human and chimpanzee. By comparing single-species array analyses with results from multispecies arrays, we examine how estimates of differential gene expression are affected by sequence divergence. Our results indicate that naive use of single-species arrays in direct interspecies comparisons can yield spurious results.
N Carriero, MV Osier, KH Cheung, PL Miller, M Gerstein, H Zhao, B Wu, S Rifkin, J Chang, H Zhang, K White, K Williams, MY Schultz (2005). J. American Medical Informatics Association. 12:90-98.
Download | Related Articles from PubMed | Show/hide abstract
The rapid advances in high-throughput biotechnologies such as DNA microarrays and mass spectrometry have generated vast amounts of data ranging from gene expression to proteomics data. The large size and complexity involved in analyzing such data demand a significant amount of computing power. High-performance computation (HPC) is an attractive and increasingly affordable approach to help meet this challenge. There is a spectrum of techniques that can be used to achieve computational speedup with varying degrees of impact in terms of how drastic a change is required to allow the software to run on an HPC platform. This paper describes a high- productivity/low-maintenance (HP/LM) approach to HPC that is based on establishing a collaborative relationship between the bioinformaticist and HPC expert that respects the former's codes and minimizes the latter's efforts. The goal of this approach is to make it easy for bioinformatics researchers to continue to make iterative refinements to their programs, while still being able to take advantage of HPC. The paper describes our experience applying these HP/LM techniques in four bioinformatics case studies: (1) genome-wide sequence comparison using Blast, (2) identification of biomarkers based on statistical analysis of large mass spectrometry data sets, (3) complex genetic analysis involving ordinal phenotypes, (4) large-scale assessment of the effect of possible errors in analyzing microarray data. The case studies illustrate how the HP/LM approach can be applied to a range of representative bioinformatics applications and how the approach can lead to significant speedup of computationally intensive bioinformatics applications, while making only modest modifications to the programs themselves.
V Stolc, Z Gauhar, C Mason, G Halasz, MF van Batenburg, SA Rifkin, S Hua, T Herreman, W Tongprasit, PE Barbano, HJ Bussemaker, KP White (2004) . Science. 306:655-660.
Download | Related Articles from PubMed | Supp. Info | Show/hide abstract
Faculty of 1000
We used a maskless photolithography method to produce DNA oligonucleotide microarrays with unique probe sequences tiled throughout the genome of Drosophila melanogaster and across predicted splice junctions. RNA expression of protein coding and nonprotein coding sequences was determined for each major stage of the life cycle, including adult males and females. We detected transcriptional activity for 93% of annotated genes and RNA expression for 41% of the probes in intronic and intergenic sequences. Comparison to genome-wide RNA interference data and to gene annotations revealed distinguishable levels of expression for different classes of genes and higher levels of expression for genes with essential cellular functions. Differential splicing was observed in about 40% of predicted genes, and 5440 previously unknown splice forms were detected. Genes within conserved regions of synteny with D. pseudoobscura had highly correlated expression; these regions ranged in length from 10 to 900 kilobase pairs. The expressed intergenic and intronic sequences are more likely to be evolutionarily conserved than nonexpressed ones, and about 15% of them appear to be developmentally regulated. Our results provide a draft expression map for the entire nonrepetitive genome, which reveals a much more extensive and diverse set of expressed sequences than was previously predicted.
Z Gu, SA Rifkin, KP White, WH Li (2004). Nature Genetics. 36:577-579
Download | Related Articles from PubMed | Supp. Info | Array Data on GEO | Show/hide abstract
Using microarray gene expression data from several Drosophila species and strains, we show that duplicated genes, compared with single-copy genes, significantly increase gene expression diversity during development. We show further that duplicate genes tend to cause expression divergences between Drosophila species (or strains) to evolve faster than do single-copy genes. This conclusion is also supported by data from different yeast strains.
Little is known about broad patterns of variation and evolution of gene expression during any developmental process. Here we investigate variation in genome-wide gene expression among Drosophila simulans, Drosophila yakuba and four strains of Drosophila melanogaster during a major developmental transition -the start of metamorphosis. Differences in gene activity between these lineages follow a phylogenetic pattern, and 27% of all of the genes in these genomes differ in their developmental gene expression between at least two strains or species. We identify, on a gene-by-gene basis, the evolutionary forces that shape this variation and show that, both within the transcriptional network that controls metamorphosis and across the whole genome, the expression changes of transcription factor genes are relatively stable, whereas those of their downstream targets are more likely to have evolved. Our results demonstrate extensive evolution of developmental gene expression among closely related species.
SA Rifkin, J Kim (2002). Bioinformatics. 18:1176-1183.
Download | Related Articles from PubMed | Show/hide abstract
A gene expression trajectory moves through a high dimensional space where each axis represents the mRNA abundance of a different gene. Genome wide gene expression has a dynamic structure, especially in studies of development and temporal response. Both visualization and analyses of such data require an explicit attention to the temporal structure. Using three cell cycle trajectories from Saccharomyces cerevisiae to illustrate, we present several techniques which reveal the geometry of the data. We import phase-delay time plots from chaotic systems theory as a dynamic data visualization device and show how these plots capture important aspects of the trajectories. We construct an objective function to find an optimal two-dimensional projection of the cell cycle, demonstrate that the system returns to this plane after three different initial perturbations, and explore the conditions under which this geometric approach outperforms standard approaches such as singular value decomposition and Fourier analysis. Finally, we show how a geometric analysis can isolate distinct parts of the trajectories, in this case the initial perturbation versus the cell cycle.
SA Rifkin, K Atteson, J Kim (2000). Functional and Integrative Genomics. 1:174-185.
Download | Related Articles from PubMed | Show/hide abstract
A microarray experiment gives a snapshot of the state of an organism in terms of the relative abundances of its mRNA transcripts, locating the organism at a point in a high dimensional state space where each axis represents the relative expression level of a single gene. Multiple experiments generate a cloud of points in this gene expression space. We present a geometric approach to analyzing the covariational properties of such a cloud and use a dataset from Saccharomyces cerevisiae as an illustration. In particular, we use singular value decomposition to identify significant linear sub-structures in the data and analyze the contributions of both individual genes and functional classes of genes to these major directions of variation. Analyzing the publicly available yeast expression data, we show that under all experimental conditions the variation in expression is limited to a small number of linear dimensions. Projections of individual gene axes onto the significant dimensions can order the contribution of individual genes to variation in expression within an experiment. We show that no particular groups of genes characterize particular experimental conditions. Instead, the particular structure of the coordinated expression of the entire genome characterizes a particular experiment.
KP White, SA Rifkin, P Hurban, DS Hogness (1999). Science. 286:2179-2184.
Download | Related Articles from PubMed | Show/hide abstract
Metamorphosis is an integrated set of developmental processes controlled by a transcriptional hierarchy that coordinates the action of hundreds of genes. In order to identify and analyze the expression of these genes, high-density DNA microarrays containing several thousand Drosophila melanogaster gene sequences were constructed. Many differentially expressed genes can be assigned to developmental pathways known to be active during metamorphosis, whereas others can be assigned to pathways not previously associated with metamorphosis. Additionally, many genes of unknown function were identified that may be involved in the control and execution of metamorphosis. The utility of this genome-based approach is demonstrated for studying a set of complex biological processes in a multicellular organism.