BayesPrism Documentation
1
Login
The user needs to log in by clicking '
Log in' link at the top-right corner of the page. Having an account provides a number of benefits, and is free and easy.

2
Create a new experiment
Select the BayesPrism application on the dashboard panel to create a data analysis for your data, as the following screenshot (Figure 2).

3
Set experiment name
Rename
Experiment Name, and click
Add a descriptionto comment on the experimental setup (optional). Choose the project that the experiment belongs to. By default, the "Default Project" is created and used.

4
Upload count matrix files
The bayesPrism need two types of count matrix file: the bulk RNA-seq count matrix and the reference count matrix. Currently we implement multiple data import from the dfferent data source, such as tsv, xls, rds/dataframe, rds/suerat, h5ad. For details please check the input tab of this page.
The gateway provides two ways to upload count matrix files for users. (1) Click "Select files from storage" to choose existing files submitted for previous tasks, or (2) click "Drop files here or browse" to upload new files from user's storage.
Note:
(1) Each row of count matrix indicates one unique gene id, so the count matrices should have same gene set in the bulk and in the reference.
(2) Count matrices can not be normalized
(3) At least 50 reads for each cell type are suggested.

5
Set computing parameters
(1) Specify species for the gene removal in ribosomal, mitochondria, chrX, and chrY. For other species, the users need to remove these genes manually.
(2) Specify the cell type and tumor state for each cell sample in the reference count matrix using CSV format. 4 columns are defined: cell_id, cell_type, cell_subtype and tumor_state. The tumor state should be 0 (non-tumor) or 1(Tumor).
(3) Specify the prefix of the output files. This can help distinguish results from multiple experiments.

6
Submit the job
Once steps 1-5 are finished, proceed to "save and launch". Input data and parameters will be submitted to the computing node of the ACCESS cluster via the BayesPrism gateway server. Click the checkbox next to "Receive email notification of experiment status" if needed. Upon launching, users will be directed to the "Experiments" page, shown in Fig. 4. A typical experiment usually finishes within 4 hrs. Users may view the progress by logging in and clicking the "Experiment button on the left control panel at the dashboard.
7
Check the status
Users may view the progress by logging in and clicking the "Experiments" button on the left control panel at the dashboard. All experiments submitted are listed on this page.

8
Check the results
Once a job is completed, the user can click selected BayesPrism experiment and the website will jump to Experiment Summary page. All parameters used to set up the experiment are listed on this page. The user can also access output files of BayesPrism stored in the
ARCHIVE. Just click the
ARCHIVEto check any single result file. A compressed file, including input count matrix file set, two task log files and all result files, is also provided for users. Click
Download Zipbutton to download a compressed file. The downloaded file with the 'tar.gz' extension can be decompressed by the 'tar' command, the file with the 'gz' extension can be decompressed by the 'gunzip' command in Linux.In
Safari, it could be problematic because Safari tries to unzip the compressed results automatically using a non-compatible compress method. Please disable this feature.

9
Tutorial: Bulk RNA-seq deconvolution using BayesPrism
The tutorial is about loading the BayesPrism package, loading the dataset, QC of cell type and state labels, filtering outlier genes, constructing a prism object, running BayesPrism, extracting results, and downstream analysis.
The input to BayesPrism consists of two input matrices which represent the raw read count in bulk samples and the single-cell RNA-seq reference which can be supplied as either cell-by-gene raw count matrix (Reference Data Type=count matrix) or user-generated cell state-by-gene expression profile (Reference Data Type=GEP). Our gateway allows the count matrix or GEP describing the scRNA-seq reference to be exported from other single cell packages, such as Seurat and CellRanger. The details of the data format for the input matrices are described below.
1 Input Matrices

In addition, the users should note that:
(1) The bulk matrix and the reference matrix should use the same gene annotation. BayesPrism will perform deconvolution over the genes shared between these two matrices.
(2) Raw read count is always preferred, as BayesPrism models the count directly. If raw count is not obtainable, BayesPrism is also robust to linearly transformed data, such as CPM, RPKM, TPM. Log-transformed data should be avoided.
(3) We recommend representing each cell state using at least 20 to 50 cells (depending on the library size of the data).
(4) GEP can be generated by summing raw counts for each cell state.
Data Format | Used For | Description |
---|---|---|
TSV | Bulk,Reference | A tab-separated values file containing read counts of each genes (rows) in each bulk sample / single cell (column). BayesPrism takes the TSV header as column names and takes the first column as row names. |
XLS | Bulk,Reference | An Excel file contains read counts for each genes (rows) in each bulk sample / single cell (column). BayesPrism takes the first row as column names and takes the first column as row names. |
RDS/dataframe | Bulk,Reference | An RDS file of an R dataframe containing read counts for each genes (rows) in each bulk sample / single cell (column). BayesPrism requires the data frame to have rownames and colnames. |
RDS/sce | Reference | An RDS file of a SingleCellExperiment object representing read counts for each genes (rows) in each single cell (column). |
RDS/seurat | Reference | An RDS file of a Seurat object representing single-cell expression data. Each Seurat object revolves around a set of cells and consists of one or more Assay objects. |
h5ad | Reference | Hierarchical Data Format version 5 (HDF5) is used to store both the expression values and associated annotations on the genes and cells in Python. H5AD format can be read into R as a SingleCellExperiment. |
* GEP only supports TSV, XLS, and RDS/dataframe.
2 Cell Metadata
If the reference matrix does not contain the cell type and the tumor state for each cell, the users must provide a CSV file to denote the cell type and tumor state for each cell. The CSV illustrated in Figure 1 (cellprofile) have 4 columns: cell id, cell type, cell subtype, and tumor state ( 0 for normal or 1 for tumor ).
3 Species
BayesPrism removes genes in ribosomal, mitochondria, chrX, and chrY before deconvolution. For deconvolution using bulk and reference from unmatched sex, we recommend users to exclude genes from chrX and chrY.
If the gene annotation is not human or mouse, users need to remove these genes manually.
1 BayesPrism output files
BayesPrism generates a RDATA file ($PREFIX.rdata) for R users and a compressed file ($PREFIX.tar.gz) for Python users.
R users can open RDATA file using "load" commmand easily. Python users need to extract multiple RDS files (see the following table) using the decommpresion command "tar -xvzf" on Linux
Note: All files below are stored in the "ARCHIVE" directory.
File name | Description |
---|---|
$PREFIX.rdata | This Rdata file contains the 'bp.res' object which can be explored by the 'str' command. The following table shows all contents. |
$PREFIX.tar.gz | The compressed file contains multiple RDS data which represent the items of the 'bp.res' object. |
$plot.tar.gz | The compressed file contains all the plots that you may need. All plots are showed in the third part. |
$out.bk.vs.sc.pdf | The plot indicates the concordance of gene expression for different types of genes. Note this only works for human data. For other species, you are advised to make plots by yourself. |
2 Contents in $PREFIX.rdata
Name | Description |
---|---|
bp@prism | The input prism. |
bp@prism@phi_cellState@phi | The expression matrix in the format of cell states(rows) by genes(columns). |
bp@prism@phi_cellType@phi | The expression matrix in the format of cell types(rows) by genes(columns). |
bp@prism@map | The information of all the cell types and cell states. |
bp@prism@mixture | The mean count of gene expression in each bulk sample. |
bp@posterior.initial.cellState | The results of step2. |
bp@posterior.initial.cellState@Z | The estimation of the mean of posterior read count for each cell state in each bulk sample. |
bp@posterior.initial.cellState@theta | The initial estimation of fraction for all cell state in each bulk sample. |
bp@posterior.initial.cellState@theta.cv | The coefficient of variation (CV) of cell state fraction. |
bp@posterior.initial.cellType | The results of step3. |
bp@posterior.initial.cellType@Z | The estimation of the mean of posterior read count for each cell type in each bulk sample. |
bp@posterior.initial.cellType@theta | The initial estimation of fraction for all cell type in each bulk sample. |
bp@posterior.initial.cellType@theta.cv | The coefficient of variation (CV) of cell type fraction |
bp@reference.update | The updated reference ψ. |
bp@reference.update@psi_mal | The gene expression profile of each tumor sample. |
bp@reference.update@psi_env | The gene expression profile of each non-tumor sample. |
bp@posterior.theta_f | The results of step4. |
bp@posterior.theta_f@theta | The final estimation of fraction for all cell type. |
bp@posterior.theta_f@theta.cv | The coefficient of variation (CV) of cell type fraction. |
bp@control_param | The parameters to run BayesPrism. |
3 Read RDS results in Python.
Python users can use 'pyreadr' to read RDS file (https://stackoverflow.com/questions/40996175/loading-a-rds-file-in-pandas).
Here we briefly show how to read it in Python.
import pyreadr
result = pyreadr.read_r('bp.posterior.initial.cellType.theta.rds')
# Extract the pandas data frame. In the case of Rds there is only one object with None as key
df = result[None]
4 Tutorial and downstream analysis, please click this link.