Cell Ranger1.2, printed on 11/21/2024
Cell Ranger's pipelines analyze sequencing data produced from Chromium single cell 3’ RNA-seq libraries. This involves the following steps:
Run cellranger mkfastq on the Illumina BCL output folder to generate FASTQ files.
Run cellranger count on each library that was demultiplexed by cellranger mkfastq.
Optionally, run cellranger aggr to aggregate multiple libraries from a single experiment that were analyzed by cellranger count.
Optionally run cellranger reanalyze to re-run the secondary analysis on a library or aggregated set of libraries (i.e., PCA, t-SNE, and clustering).
For the following example, assume that the Illumina BCL output is in a folder named /sequencing/140101_D00123_0111_AHAWT7ADXX
.
First, follow the instructions on running cellranger mkfastq to generate FASTQ files. For example, if the flowcell serial number was HAWT7ADXX
, then cellranger mkfastq will output FASTQ files in HAWT7ADXX/outs/fastq_path
.
To generate single-cell gene counts for a single library, run cellranger count with the following arguments. For a complete list of command-line arguments, run cellranger count --help.
Argument | Description |
---|---|
--id | A unique run ID string: e.g. sample345 |
--fastqs | Path of the FASTQ folder generated by cellranger mkfastq e.g. /home/jdoe/runs/HAWT7ADXX/outs/fastq_path |
--sample | (optional) Sample name as specified in the sample sheet supplied to mkfastq . |
--indices | (optional) Sample indices associated with this sample. Comma-separated list of:
|
--transcriptome | Path to the Cell Ranger compatible transcriptome reference e.g.
|
--cells | (optional) Expected number of recovered cells |
--lanes | (optional) Lanes associated with this sample |
--localcores | Restricts cellranger to use specified number of cores to execute pipeline stages. By default, cellranger will use all of the cores available on your system. |
--localmem | Restricts cellranger to use specified amount of memory (in GB) to execute pipeline stages. By default, cellranger will use 90% of the memory available on your system. Please note that cellranger requires at least 16 GB of memory to run all pipeline stages. |
After determining these input arguments, run cellranger:
$ cd /home/jdoe/runs $ cellranger count --id=sample345 \ --transcriptome=/opt/refdata-cellranger-GRCh38-1.2.0 \ --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \ --indices=SI-3A-A1 \ --cells=1000
Following a set of preflight checks to validate input arguments, cellranger count pipeline stages will begin to run:
Martian Runtime - 2.1.2 Running preflight checks (please wait)... 2016-11-10 14:23:52 [runtime] (ready) ID.sample345.SC_RNA_COUNTER_CS.SC_RNA_COUNTER.SETUP_CHUNKS 2016-11-10 14:23:55 [runtime] (split_complete) ID.sample345.SC_RNA_COUNTER_CS.SC_RNA_COUNTER.SETUP_CHUNKS 2016-11-10 14:23:55 [runtime] (run:local) ID.sample345.SC_RNA_COUNTER_CS.SC_RNA_COUNTER.SETUP_CHUNKS.fork0.chnk0.main ...
By default, cellranger will use all of the cores available on your
system to execute pipeline stages. You can specify a different number of cores
to use with the --localcores
option; for example, --localcores=16
will limit cellranger to using up to sixteen cores at once. Similarly,
--localmem
will restrict the amount of memory (in GB) used by
cellranger.
The pipeline will create a new folder named with the sample ID you specified (e.g. /home/jdoe/runs/sample345
) for its output. If this folder already exists, cellranger will assume it is an existing pipestance and attempt to resume running it.
A successful cellranger count run should conclude with a message similar to this:
2016-11-10 16:10:09 [runtime] (join_complete) ID.sample345.SC_RNA_COUNTER_CS.SC_RNA_COUNTER.SUMMARIZE_REPORTS Outputs: - Run summary HTML: /opt/sample345/outs/web_summary.html - Run summary CSV: /opt/sample345/outs/metrics_summary.csv - BAM: /opt/sample345/outs/possorted_genome_bam.bam - BAM index: /opt/sample345/outs/possorted_genome_bam.bam.bai - Filtered gene-barcode matrices MEX: /opt/sample345/outs/filtered_gene_bc_matrices - Filtered gene-barcode matrices HDF5: /opt/sample345/outs/filtered_gene_bc_matrices_h5.h5 - Unfiltered gene-barcode matrices MEX: /opt/sample345/outs/raw_gene_bc_matrices - Unfiltered gene-barcode matrices HDF5: /opt/sample345/outs/raw_gene_bc_matrices_h5.h5 - Secondary analysis output CSV: /opt/sample345/outs/analysis - Per-molecule read information: /opt/sample345/outs/molecule_info.h5 Pipestance completed successfully!
The output of the pipeline will be contained in a folder named with the sample ID you specified (e.g. sample345
). The subfolder named outs
will contain the main pipeline output files:
File Name | Description |
---|---|
web_summary.html | Run summary metrics and charts in HTML format |
metrics_summary.csv | Run summary metrics in CSV format |
possorted_genome_bam.bam | Reads aligned to the genome and transcriptome annotated with barcode information |
possorted_genome_bam.bam.bai | Index for possorted_genome_bam.bam |
filtered_gene_bc_matrices | Filtered gene-barcode matrices containing only cellular barcodes in MEX format |
filtered_gene_bc_matrices_h5.h5 | Filtered gene-barcode matrices containing only cellular barcodes in HDF5 format |
raw_gene_bc_matrices | Unfiltered gene-barcode matrices containing all barcodes in MEX format |
raw_gene_bc_matrices_h5.h5 | Unfiltered gene-barcode matrices containing all barcodes in HDF5 format |
analysis | Secondary analysis data including dimensionality reduction, cell clustering, and differential expression |
molecule_info.h5 | Molecule-level information used by cellranger aggr to aggregate samples into larger datasets. |
Once cellranger count has successfully completed, you can browse the resulting summary HTML file in any supported web browser, or refer to the Understanding Output section to explore the data by hand.