Below are a few of the active software projects being undertaken by the Munsky group. For full information and most recent versions, please visit our GitHub page at: https://github.com/MunskyGroup

Stochastic System Identification Toolkit (Matlab)

MATLAB Stochastic System Identification Toolkit (SSIT) for modeling single-cell fluorescence microscopy data

Authors: Huy Vo, Joshua Cook, Brian Munsky

The SSIT allows users to specifiy and solve the chemical master equation for discreate stochastic models, especially those used for the analysis of single-cell gene regulaton.

To learn more about the FSP theory that underlies the SSIT, please see the slide from our Nov. 3, 2022 BPPB Seminar

The SSIT includes command line tools and a graphical user interface to:

Build, save, and load models
Generate synthetic data from models using Stochastic Simulations
Solve models using the Finite State Projection algorithm
Compute sensitivity of FSP solutions to parameter variations
Load experimental smFISH data
Compute/Maximize the likelihood of data givien model
Run Metropolis Hastings algorithm to estimate parametr uncertainties given single-cell data
Compute the Fisher Information Matrix for CME models
Search experiment design space to find optimally informative experiments

Dependencies

For all basic functionalities:

MATLAB R2021b or later.
Symbolic Computing Toolbox.
Global Optimization Toolbox.
Parallel Computing Toolbox.
Tensor Toolbox for MATLAB. You will need to make sure to add the TTB to the Matlab path before running the SSIT.

Installation

Clone this package to a local folder on your computer. Then add the path to that folder (with subfolders) into MATLAB’s search path. You can then call all functions from MATLAB.

Getting Started

The SSIT provides two basic interaction options: (1) command line tools and (2) a graphical user interface.

GUI Version

To get started with the GUI, compile and launch to tool kit with the following commands:

src2app;

A = SSIT_GUI;

You should then see the model loading and building page of the graphical interface, and you are off to the races…

Command Line Version

To get started with the Command line Tools, navigate to the directory “CommandLine” and open one of the tutorial scripts “example_XXX.m”. Or you caan start creating and solving models as follows.

Example for generating an FSI model and fitting it to smFISH data for Dusp1 activation following glucocorticoid stimulation:

Define SSIT Model

Model = SSIT;

Model.species = {‘x1′;’x2’};

Model.initialCondition = [0; 0];

Model.propensityFunctions = {‘kon * IGR * (2-x1)’; ‘koff * x1’; ‘kr * x1’; ‘gr * x2’};

Model.stoichiometry = [1,-1,0,0; 0,0,1,-1];

Model.inputExpressions = {‘IGR’,’a0 + a1 * exp(-r1 * t) * (1-exp(-r2 * t)) * (t>0)’};

Model.parameters = ({‘koff’,0.14; ‘kon’,0.14; ‘kr’,25; ‘gr’,0.01; ‘a0’,0.006; ‘a1’,0.4; ‘r1’,0.04; ‘r2’,0.1});

Model.initialTime = -120; % large negative time to simulate steady state at t=0

Load and Fit smFISH Data

Model = Model.loadData(‘../ExampleData/DUSP1_Dex_100nM_Rep1_Rep2.csv’,{‘x2′,’RNA_nuc’});

Model.tSpan = unique([Model.initialTime,Model.dataSet.times]); fitOptions = optimset(‘Display’,’iter’,’MaxIter’,100); pars,likelihood] = Model.maximizeLikelihood([],fitOptions);

Update Model and Make Plots of Results

Model.parameters(:,2) = num2cell(pars);

Model.makeFitPlot

You should arrive at a fit of the model to the experimentally measured Dusp1 mRNA distributions looking something like this:

smFISH Processing Toolkit (Python)

Fluorescence In Situ Hybridization (FISH) – automated image processing

Authors: Luis U. Aguilera, Linda Forero-Quintero, Eric Ron, Joshua Cook, Brian Munsky

Description

Repository to automatically process Fluorescence In Situ Hybridization (FISH) images. This repository uses PySMB to allow the user to transfer data between Network-attached storage (NAS) and a remote or local server. Then it uses Cellpose to detect and segment cells on microscope images. Big-FISH is used to quantify the number of spots per cell. Data is processed using Pandas data frames for single-cell and cell population statistics.

Code architecture

Code overview

Cell segmentation

* The code can achieve accurate cell segmentation for the nucleus and cytosol in the images. The segmentation is performed using cellpose and complex optimization routines that ensure the maximum number of cells detected in the image.

Spot detection

* Spot detection is achieved using Big-FISH. Customization is added in this code to detect spots in multiple color channels. Additionally, this repository contains algorithms to measure spots that are co-detected in different color channels.

Spot counting

* The code quantifies the number of spots per cell and allows the visualization of these numbers as a function of cell size.

Spot intensity quantification

* The code allows quantifying the intensity of each spot, using the disk and a ring mask method developed by Morisaki and Stasevich, Methods Mol Biol. 2022.

Data management

* A complete data-frame for all processed images and cells is generated. This data-frame contains information about the location and intensity of each detected spot.

Data reproducibility report

* To increase reproducibility a metadata report is generated. This report contains information about the list of images processed, the specific parameters used to process the data, the user that processed the data, and the version of the modules and packages used.

Data visualization and publication quality images.

* Plotting a complete field of view

* Plotting the detected spots and transcription sites in a selected cell.

* Plotting all color channels for a selected cell.

* Plotting all z-slices for a selected cell.

Installation

Installation on a local computer

To install this repository and all its dependencies. We recommend installing Anaconda.

Clone the repository.

git clone --depth 1 https://github.com/MunskyGroup/FISH_Processing.git

To create a virtual environment, navigate to the location of the requirements file, and use:

 conda create -n FISH_processing python=3.8 -y
 source activate FISH_processing

To install pytorch for GPU usage in Cellpose (Optional step). Only for Linux and Windows users check the specific version for your computer on this link :

 conda install pytorch cudatoolkit=10.2 -c pytorch -y

To install pytorch for CPU usage in Cellpose (Optional step). Only for Mac users check the specific version for your computer on this link :

 conda install pytorch -c pytorch

To include the rest of the requirements use:

 pip install -r requirements.txt

Installation on the Keck-Cluster (Rocky Linux 8)

The following instructions are intended to use the codes on the Keck Cluster.

Clone the repository to the cluster.

git clone --depth 1 https://github.com/MunskyGroup/FISH_Processing.git

Move to the directory

cd FISH_Processing

Create an environment from this YAML file.

conda env create -f FISH_env.yml

Using this repository

Most codes are accessible as notebook scripts or executables.

To use the codes locally with an interactive environment, use the notebooks folder

To process images use the notebook FISH pipeline
After processing the images use the notebook FISH pipeline to analyze multiple datasets

Executable codes are located in cluster folder

A Bash script is used to execute a python script containing the image processing pipeline. Please adapt these scripts to your specific configuration and target folders.

Miscellaneous instructions:

To login to the NAS, it is needed to provide a configuration YAML file with the format:

    user:
        username: user_name
        password: user_password
        remote_address : remote_ip_address
        domain: remote_domain

Creating an environment file (YAML) use:

conda env export > FISH_env.yml

Additional steps to deactivate or remove the environment from the computer:

To deactivate the environment, use

 conda deactivate

To remove the environment use:

 conda env remove -n FISH_processing

To create the documentation use the following modules.

pip install sphinx
pip install sphinx_rtd_theme
pip install Pygments

Licenses for dependencies

Please check this file with the licenses for BIG-FISH, Cellpose, and PySMB.

Citation

If you use this repository, make sure to cite BIG-FISH and Cellpose:

Big-FISH: Imbert, Arthur, et al. “FISH-quant v2: a scalable and modular tool for smFISH image analysis.” RNA (2022): rna-079073.
Cellpose: Stringer, Carsen, et al. “Cellpose: a generalist algorithm for cellular segmentation.” Nature Methods 18.1 (2021): 100-106.

RNA Sequence to NAscent Protein Experiment Simmulator (rSNAPsim)

rSNAPsim – RNA Sequence to NAscent Protein Simulation

ssa c++ library

Project Goal

Provide a Python module that takes nucleotide sequence as an input and does the following:

Choose a file or pull a file from GeneBank
Analyzes the sequence and identifies proteins
Detects or adds fluorescent tags
Simulates translation trajectories and converts to intensity vectors of A.U. under various conditions
- Constructs with Rare codons only or Common codons, FRAP or Harringtonite assays
Provides analyses of the trajectories
Allows the user to save or export the data
Commandline / GUI implementations

Documentation

Tutorials, Module Documentation, Installiation and more [LINK TO MUNSKY GROUP WEBSITE]

Dependencies:

Instillation

Within a conda enviroment:

conda install eigen 
pip install rsnapsim-ssa-cpp 
pip install rsnapsim

Within a Google Colab:

!apt install libeigen3-dev
!ln -sf /usr/include/eigen3/Eigen /usr/include/Eigen
!pip install rsnapsim-ssa-cpp
!pip install rsnapsim
!pip install --upgrade rsnapsim

Compilation of the C++

The c++ model should attempt to compile when you pip install the ssa-cpp module, however in the event that it cannot here are some common errors:

cannot include eigen3/Eigen/Dense
- This means eigen was not installed correctly from the conda installiation, you may have to manually download eigen and pass the argument to the setup.py command. python setup.py build_ext --inplace -I[PATH TO EIGEN FOLDER]
gcc not found

Example Colab Notebooks

Simulating Translation
Simulating Constructs with Different codon usages
Intensity Analyses
Harringtonine / FRAP simulations
Model Maker/ Designer
MW/Diffusion Calculations

Future work

Example notebooks of all functions

RNA Sequence to NAscent Protein Experiment Designer (rSNAPed)

	`rSNAPed:` RNA Sequence to NAscent Protein Experiment Designer. `Authors:` Luis U. Aguilera, William Raymond, Tatsuya Morisaki, Brooke Silagy, Timothy J. Stasevich, and Brian Munsky.

Description

rSNAPed is a library to simulate single-molecule gene expression experiments to test machine learning and computational pipelines. The code generates simulated intensity translation spots using rSNAPsim. Cell segmentation is performed using Cellpose. Spot detection and tracking is achieved using Trackpy. If you use rSNAPed, please make sure you properly cite cellpose, trackpy and rSNAPsim.

Summary of uses

Simulating the single-molecule translation for any gene.
Design of single-molecule gene expression experiments.
Tracking for single-molecule translation (RNA + nascent protein) spots.
Tracking for single-molecule RNA spots.

Ethical Considerations and Content Policy

You must accept our Content Policy when using this library:

All simulated images generated with this software are intended to be used to test Machine learning or computational algorithms.
All images generated with this software should always be labeled with the specific terms “simulated data” or “simulated images”.
All datasets resulting from a simulated image should explicitly be reported with the term “simulated data”.
Under any circumstance, a simulated image or dataset generated with rSNAPed should not be used to misrepresent real data.
For public or private use, you must disclose that the generated images are simulated data and give proper credit to rSNAPed.

Test the codes in Google Colab

Description	Link
How to simulate your cell! 👉
Harringtonin experiment 👉
Manual particle tracking 👉
🔥 Automated cell segmentation and particle tracking 🔥 👉
Multiplexing experiments 👉

Simulating single-molecule translation

The code generates videos with the simulated cell and a data frame containing spot and intensity positions. This simulation can be used to train new algorithms.

Local installation using PIP

To create a virtual environment using:

    conda create -n rsnaped_env python=3.8.5 -y
    source activate rsnaped_env

Open the terminal and use pip for the installation:

    pip install rsnaped

Local installation from the Github repository

To create a virtual environment navigate to the location of the requirements file, and use:

    conda create -n rsnaped_env python=3.8.5 -y
    source activate rsnaped_env

To install GPU for Cellpose (Optional step). For Linux and Windows users check the specific version for your computer on this link :

    conda install pytorch cudatoolkit=10.2 -c pytorch -y

To install CPU for Cellpose (Optional step). For Mac users check the specific version for your computer on this link :

    conda install pytorch -c pytorch

To include the rest of the requirements use:

    pip install -r requirements.txt

Additional steps to deactivate or remove the environment from the computer:

To deactivate the environment use

    conda deactivate

To remove the environment use:

    conda env remove -n rsnaped_env

References for main dependencies

rSNAPsim: Aguilera, Luis U., et al. “Computational design and interpretation of single-RNA translation experiments.” PLoS computational biology 15.10 (2019): e1007425.
Trackpy: Dan Allan, et al. (2019, October 16). soft-matter/trackpy: Trackpy v0.4.2 (Version v0.4.2). Zenodo. http://doi.org/10.5281/zenodo.3492186
Cellpose: Stringer, Carsen, et al. “Cellpose: a generalist algorithm for cellular segmentation.” Nature Methods 18.1 (2021): 100-106.

Licenses for dependencies

For a complete list containing the complete licenses for the dependencies, check file: Licenses_Dependencies.md.

Cite as

Luis Aguilera, William Raymond, Tatsuya Morisaki, Brooke Silagy, Timothy J. Stasevich, & Brian Munsky. (2022). rSNAPed. RNA Sequence to NAscent Protein Experiment Designer. (v0.1-beta.2). Zenodo. https://doi.org/10.5281/zenodo.6967555

UQ-Bio Summer School (2023 Tutorials and Software)

Welcome to the 3rd Annual UQ-Bio Summer School!

Below is the Github repository holding all the links to Colab Notebooks and files needed during the course.

Authors

Brian Munsky, Luis Aguilera, William Raymond, 
Joshua Cook, Michael May, Zachary Fox, 
Eric Ron, Keisha Cook, Kaan Ocal, 
Ania Baetica, and Ana Carolina Padua.

uqbio.summer.school@gmail.com • 2023 Undergraduate Summer School Schedule • UQ-Bio • Munsky Group

Modules

Module 0 (Online) : Getting Started with Basic Scientific Computing in Python.

Date (MST)	Location	Description
May 22	Online	Intro to Python: Hello (Python) World, Types, Arithmetic Operations, Iterables, and Containers (Instructor: Zach Fox)
May 22	Online	Intro to Python: Loops, Ranges, Functions, Lambdas, List Comprehension (Instructor: Luis Aguilera)
May 22	Online	Intro to Python: importing packages, classes/modules, os navigation, reading files (Instructor: Will Raymond)
May 22	Online	Intro to Python: Google Colab environment setup and navigation (Instructor: Will Raymond)
May 22	Online	Intro to Python: Matplotlib visualization (Instructor: Will Raymond)
May 22	Online	Intro to Python: NumPy and Linear Algebra Review (Instructor: Michael May)
May 22	Online	Intro to Python: Using OpenAI’s GPT3.5 in Python (Instructor: Brian Munsky)

Module 1 : Optical Microscopy Experiments and Image Processing .

Date (MST)	Location	Description
May 24	Online	Python Preliminary Image Loading and Processing (Instructor: Luis Aguilera)
May 31	in person	Tutorial 1.1 — Basics of Image Processing, Dr. Luis Aguilera (Colorado State University)
June 1	In person	Tutorial 1.2 — Image Segmentation and Tracking, Dr. Carolina Padua (Champalimaud Center for the Unknown, Lisbon, Portugal)

Module 2 : Multivariable Statistics and Machine Learning for Biological Data.

Date (MST)	Location	Description	Link
June 2	in person	Tutorial 2.1 — Basic statistics, Prof. Ania Baetica (Drexel University)
June 5	in person	Tutorial 2.2 – Basics of Regression, Classification and Machine Learning, Dr. Zachary Fox (Oak Ridge National Laboratory)

Module 3 : Stochastic Simulations of Biological Processes.

Date (MST)	Location	Description	Link
June 6	in person	Tutorial 3.1 – Formulating and solving models for gene regulation dynamics, Joshua Cook (Colorado State University)
June 7	in person	Tutorial 3.2 – Sampling from Probability Distributions and Generating Stochastic Simulations, Prof. Keisha Cook (Clemson University)

Module 4 : Master Equation Analyses of Biological Processes.

Date (MST)	Location	Description	Link
June 8	in person	Tutorial 4.1 – Chemical Master Equation, Michael May (Colorado State University)
June 9	in person	Tutorial 4.2 – Markov Chain Monte Carlo and Model Inference, Kaan Ocal (Edinburgh University)

UQ-Bio23 Drug Discovery Challenge

Drug Discovery Challenge Presentation

Date (MST)	Location	Description
May 31	NT135	Stage 1: Experimental Quantification (Instructor: Luis Aguilera)
June 2	NT135	Stage 2: Statistical Analysis (Instructor: Ania Baetica and Brian Munsky )
June 5	NT135	Stage 3: Regression Analysis (Instructor: Zach Fox and Brian Munsky )
June 7	NT135	Stage 4: Models and Stochastic Simulation (Instructor: Michael May, Joshua Cook, and Keisha Cook )
June 13	BSB 107	Final Predictions and Presentations