Cell Ranger4.0, printed on 11/14/2024
In this tutorial, you will learn how to:
The cellranger count pipeline aligns sequencing reads in FASTQ files to a reference transcriptome and generates a .cloupe file for visualization and analysis in Loupe Browser, along with a number of other outputs compatible with other publicly-available tools for further analysis.
Start by making a directory to run the analysis in.
mkdir ~/yard/run_cellranger_count cd ~/yard/run_cellranger_count
Next, download FASTQ files from one of the publicly-available data sets on the 10X Genomics support site. This example uses the 1,000 PBMC data set from human peripheral blood mononuclear cells (PBMC), consisting of lymphocytes (T cells, B cell, and NK kills) and monocytes.
wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_fastqs.tar
The size of this dataset is 5.17G and takes a few minutes to download.
Since this is a tar file and not a tar.gz file, you don't need the -z argument used in previous tutorials to extract it.
tar -xvf pbmc_1k_v3_fastqs.tar
The output is similar to the following:
pbmc_1k_v3_fastqs/ pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_I1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_I1_001.fastq.gz
Now you have a directory of two sets of FASTQ files, and can see they are named based on the bcl2fastq naming convention: Sample_S1_L00X_R1_001.fastq.gz. The files names indicate that they were all from the same sample called pbmc_1k_v3 and the library was run on two lanes, Lane 1: L001 and lane 2: L002.
Next, you need a reference transcriptome. From the download page for the FASTQ files it showed that these are human cells. There are several prebuilt human reference transcriptome packages on the 10X Genomics support site. Download the latest package and decompress it.
wget https://cf.10xgenomics.com/supp/cell-exp/refdata-cellranger-GRCh38-3.0.0.tar.gz tar -zxvf refdata-cellranger-GRCh38-3.0.0.tar.gz
The size of the reference genome is 10.6G and takes less than five minutes to download.
Once you have downloaded and extracted the reference transcriptome files, you can keep them for future runs. However, if you need to delete to save space on your server between runs, the pre-compiled reference files are publicly-available, and can re-downloaded if needed.
Your raw data FASTQ files, however, are raw data that cannot be replaced. We strongly recommend backing these up and archiving them in case something happens to the disk space.
Once you have FASTQ files and a reference transcriptome, you are ready to run cellranger count.
Print the usage statement to see what is needed to build the command.
cellranger count --help
The output is similar to the following:
/mnt/home/user.name/yard/apps/cellranger-3.1.0/cellranger-cs/3.1.0/bin cellranger count (3.1.0) Copyright (c) 2019 10x Genomics, Inc. All rights reserved. ------------------------------------------------------------------------------- 'cellranger count' quantifies single-cell gene expression. The commands below should be preceded by 'cellranger': Usage: count --id=ID [--fastqs=PATH] [--sample=PREFIX] --transcriptome=DIR [options] count[options] count -h | --help | --version
To run cellranger count, you need to specify an --id. This can be any string, which is a sequence of alpha-numeric characters, underscores, or dashes and no spaces, that is less than 64 characters. Cell Ranger creates an output directory that is named using this id. This directory is called a "pipeline instance" or pipestance for short. The fastq_path should be a path to the directory containing the FASTQ files. If you demultiplexed your data using cellranger mkfastq, you can use the path to fastq_path directory in the outs from the pipeline. If there is more than one sample in the FASTQ directory, use the --sample argument to specify which samples to use. This --sample argument works off of the sample id at the beginning of the FASTQ file name. It is unnecessary for this run because all of the FASTQ files are from the same sample, but it is included as an example. The last argument needed is the path to the transcriptome reference package.
cellranger count --id=run_count_1kpbmcs \
--fastqs=/mnt/home/user.name/yard/run_cellranger_count/pbmc_1k_v3_fastqs \
--sample=pbmc_1k_v3 \
--transcriptome=/mnt/home/user.name/yard/run_cellranger_count/refdata-cellranger-GRCh38-3.0.0
Since this is a full-sized dataset, it can take several hours to complete.
The output is similar to the following:
/mnt/yard/user.name/yard/apps/cellranger-3.1.0/cellranger-cs/3.1.0/bin cellranger count (3.1.0) Copyright (c) 2019 10x Genomics, Inc. All rights reserved. ------------------------------------------------------------------------------- Martian Runtime - '3.1.0-v3.2.3' ... Pipestance completed successfully! 2019-09-12 15:39:08 Shutting down. Saving pipestance info to run_count_1kpbmcs/run_count_1kpbmcs.mri.tgz
When the output of the cellranger count command says, “Pipestance completed successfully!”, this means the job is done.
The cellranger count pipeline outputs are in the pipestance directory in the outs folder. List the contents of this directory with ls -1.
ls -1 run_count_1kpbmcs/outs
The output is similar to the following:
analysis cloupe.cloupe filtered_feature_bc_matrix filtered_feature_bc_matrix.h5 metrics_summary.csv molecule_info.h5 possorted_genome_bam.bam possorted_genome_bam.bam.bai raw_feature_bc_matrix raw_feature_bc_matrix.h5 web_summary.html
Check the web_summary.html to see results of the experiment. You can also load the cloupe.cloupe file into the Loupe Browser and start an analysis. This outs directory also contains a number of outputs that can be used as input for software tools developed outside of 10X Genomics, such as the Seurat R package.