Even genetically identical cells in identical environments exhibit wildly different phenotypical behaviors due to cellular fluctuations known as gene expression "noise". Previously, such noise was considered a nuisance that compromised cellular responses, complicated modeling, and made predictive understanding all but impossible. Many studies focused on how cellular processes remove or exploit noise to a cell's advantage. However, different cellular mechanisms affect these cellular fluctuations in different ways, and it is now clear that these fluctuations contain valuable information about underlying cellular mechanisms. Finding and exploiting this information requires a strong integration of single-cell/single-molecule measurements with discrete stochastic analyses. My focus is to utilize this information to gain predictive understanding of new biological phenomena. Along these lines, we have studied natural and synthetic transcriptional regulation pathways in bacteria, yeast and mammalian cells.
Emerging techniques now allow for precise quantification of distributions of biological molecules in single cells. These rapidly advancing experimental methods have created a need for more rigorous and efficient modeling tools. Many of the tools we use are extensions of the Finite State Projection approach, which allows us to compute the precise time-varying probability distributions of single-cell responses, even in fluctuating environments.
New Bounds on Likelihoods of Single-Cell Data. We have recently derived new bounds on the likelihood that observations of single-cell, single-molecule responses come from a discrete stochastic model, which we pose in the form of the chemical master equation (CME). These strict upper and lower bounds are based on a technique known as the Finite State Projection, and they converge monotonically to the exact likelihood value. By calculating these bounds, we can rigorously discriminate between models with a minimum level of computational effort. In practice, we have incorporated these FSP-derived likelihood bounds bounds into stochastic model identification and parameter inference routines, which improve the accuracy and efficiency of endeavors to analyze and predict single-cell behavior. We have demonstrated the applicability of our approach using simulated data for multiple models with simulated data as well as for experimental measurements of a time-varying stochastic transcriptional response in yeast.
Z. Fox, G. Neuert and B. Munsky, Finite state projection based bounds to compare chemical master equation models using single-cell data, Journal of Chemical Physics, 145:7, 074101 (2016), online here
When biological models under-perform expectations, it is tempting to attribute failure to “bad models” or “insufficient data”. However, predictions from good models and sufficient data may fail due to poor integration of the two. Unlike most engineered systems, biological fluctuations are dominated by discrete fluctuations in DNA, RNA and protein. Integrating stochastic models with single-cell experiments can provide a wealth of information about gene regulatory dynamics , but for discrete, positive fluctuations, standard data-model integration analyses (e.g., assuming normal distributions or making CLT arguments) can produce nearly perfect fits to old data yet fail dramatically to predict new phenomena. Yet, when these fail, approaches that dispense with CLT assumptions can yield extremely accurate quantitative predictions, even for the exact same data and exact same models.
We are demonstrating these crucial model-data integration concerns using single-cell-single-molecule data collected on an evolutionarily conserved Mitogen-Activated Protein Kinase (MAPK) pathway and its downstream induction of mRNA transcription in yeast. We discuss how different modeling assumptions affect parameter uncertainties or bias and how these errors affect predictive understanding.
We examine the stress response High Osmolarity Glycerol (Hog1) pathway and its control of transcription mechanisms (polymerase initiation and elongation, mRNA export, and accumulation and degradation) during transient adaptation to hyper-osmotic shock. Our collaborators in the Neuert Lab at Vanderbilt University have quantified individual mRNA at the site of transcription, in the nucleus, and in the cytoplasm for multiple genes using single-molecule fluorescence hybridization for more than 65,000 cells at many points in time, different environmental conditions, and in multiple replica experiments. These measured distributions are demonstrably non-normal and non-symmetric, which has important implications on the results of model-data integration.
We extend a multi-state gene expression model [1,2] to account for transcriptional regulation and spatial localization of mRNA. We solve this model with three exact computational methods: (1) a deterministic ODE analysis, (2) a linear noise analysis, and (3) a chemical master equation (CME) solution. We invoke the CLT to approximate the likelihood of all data given the model for analyses (1)&(2), and we compute the exact likelihood using analysis (3). For each case, we use Metropolis Hastings sampling to find the maximum likelihood and posterior distribution of the parameters, given the data.
Despite excellent fits to training data, the CLT-methods fail to predict the full statistics, and parameter uncertainty and bias errors in the CLT-approaches are orders of magnitude larger than for the CME-approach. Use of second moments (i.e., (co)variances) modestly reduces uncertainty, but exacerbates the bias and yields even worse predictions. We trace this effect to asymmetry in the RNA distributions, which causes systematic under-estimation of the moments and leads CLT-approaches to overestimate RNA degradation rates by multiple orders of magnitude compared to results in different yeast strains . In contrast, the CME-approach recovers these rates within 5-8%, indicating strong repeatability of both experiments and analyses.
We used the identified models to predict the elongation dynamics of nascent mRNA at transcription sites (TS). Using TS images for endogenous mRNA for the CTT1 gene, we estimated Pol II elongation rate in excelent agreement with published rates. Using no additional free parameters, we correctly predicted and then measured (i) the average full-length STL1 mRNA per active TS, (ii) the quantitative fraction of cells that have active TS’s versus time, and (iii) the full distributions of nascent mRNA (or equivalently the number of associated elongating Pol II) per TS.
One of the many ways that bacteria use to evade antibiotic treatments is Bacterial Persistence. In this phenomena, rare cells in a large population transiently enter into a dormant state in which they do not grow, but they are also not responsive to antibiotics in their environment. these cells can later escape their dormant state to replenish the population after antibiotics are cleared from the environment. We are interested to develop quantitative models for the epigenetic heterogeneity of persistence in the context of time varying environments. If we can predict what circumstances lead to persistence, we can use these to design more effective control strategies to maximize the effectiveness of antibiotics while minimizing the time and amount of treatments.
New Computational Approaches to Model Heterogenous Populations. Population modeling aims to capture and predict the dynamics of cell populations in constant or fluctuating environments. At the elementary level, population growth proceeds through sequential divisions of individual cells. Due to stochastic effects, populations of cells are inherently heterogeneous in phenotype, and some phenotypic variables have an effect on division or survival rates, as can be seen in partial drug resistance. Therefore, when modeling population dynamics where the control of growth and division is phenotype dependent, the corresponding model must take account of the underlying cellular heterogeneity. The finite state projection (FSP) approach has often been used to analyze the statistics of independent cells. Here, we extend the FSP analysis to explore the coupling of cell dynamics and biomolecule dynamics within a population. This extension allows a general framework with which to model the state occupations of a heterogeneous, isogenic population of dividing and expiring cells. The method is demonstrated with a simple model of cell-cycle progression, which we use to explore possible dynamics of drug resistance phenotypes in dividing cells. We use this method to show how stochastic single-cell behaviors affect population level efficacy of drug treatments, and we illustrate how slight modifications to treatment regimens may have dramatic effects on drug efficacy.
R. Johnson and B. Munsky, The finite state projection approach to analyze dynamics of heterogeneous populations, Physical Biology, 14:3, 035002 (2017), online here