Cell Ranger DNA1.1 (latest), printed on 11/13/2024
Analysis software for the 10x Genomics single cell DNA product is no longer supported. Raw data processing pipelines and visualization tools are available for download and can be used for analyzing legacy data from 10x Genomics kits in accordance with our end user licensing agreement without support. |
When doing large studies involving multiple GEM wells, run cellranger-dna cnv on FASTQ data from each of the GEM wells individually, and then pool the results using cellranger-dna aggr, as described here.
cellranger-dna aggr is not designed for combining multiple sequencing runs of the GEM Well. For that, you should pass a list of FASTQ files from multiple sequencing runs of the same GEM well to the --fastqs argument of cellranger-dna cnv. |
The cellranger-dna aggr command takes a CSV file specifying a list of cellranger-dna cnv output files (specifically the cnv_data.h5
from each run), and produces a single cnv_data.h5
containing the aggregated data. Please note that only cnv_data.h5
files from cellranger-dna version 1.1 can be aggregated.
cellranger-dna aggr requires input cnv_data.h5 files generated by Cell Ranger DNA v1.1.
|
When combining data from multiple GEM wells, the barcode sequences for each channel are distinguished by a GEM well suffix appended to the barcode sequence (see GEM wells).
The first step is to run cellranger-dna cnv on each individual GEM well prepared using the 10x Chromiumâ„¢ platform, as described in Copy Number Variation Analysis.
For example, suppose you ran three count pipelines as follows:
$ cd /home/jdoe/runs $ cellranger-dna cnv --id=normal ... ... wait for pipeline to finish ... $ cellranger-dna cnv --id=tumor_primary ... ... wait for pipeline to finish ... $ cellranger-dna cnv --id=tumor_metastases ... ... wait for pipeline to finish ...
These three runs can now be aggregated into a single analysis. In order to do so, you must create an Aggregation CSV.
Create a CSV file with a header line containing the following columns:
library_id
: Unique identifier for this input GEM well. This will be used for labeling purposes only; it doesn't need to match any previous ID you've assigned to the GEM well.cnv_data
: Path to the cnv_data.h5
file produced by cellranger-dna cnv. For example, if you processed your GEM well by calling cellranger-dna cnv --id=ID in some directory /DIR
, the cnv_data.h5
would be /DIR/ID/outs/cnv_data.h5
.You can either make the CSV file in a text editor or create it in Excel and export to CSV. Continuing the example from the previous section, your Excel spreadsheet would look like this:
A | B | |
---|---|---|
1 | library_id | cnv_data |
2 | normal | /home/jdoe/runs/normal/outs/cnv_data.h5 |
3 | tumor_primary | /home/jdoe/runs/tumor_primary/outs/cnv_data.h5 |
4 | tumor_metastases | /home/jdoe/runs/tumor_metastases/outs/cnv_data.h5 |
When you save it as a CSV, the result would look like this:
library_id,cnv_data normal,/home/jdoe/runs/normal/outs/cnv_data.h5 tumor_primary,/home/jdoe/runs/tumor_primary/outs/cnv_data.h5 tumor_metastases,/home/jdoe/runs/tumor_metastases/outs/cnv_data.h5
These are the most common command line arguments (run cellranger-dna aggr --help for a full list):
Argument | Description |
---|---|
--id=ID | A unique run ID string: e.g. AGG123 |
--csv=CSV | Path of a CSV file containing a list of cellranger-dna count outputs (see Setting up a CSV). |
--reference=PATH | Path to a Cell Ranger DNA reference. |
--description=TEXT | (optional) More detailed sample description. |
--soft-min-avg-ploidy=FLOAT | (optional) Use a known lower limit on the average ploidy of the sample. |
--soft-max-avg-ploidy=FLOAT | (optional) Use a known upper limit on the average ploidy of the sample. |
After specifying these input arguments, run cellranger-dna aggr:
$ cd /home/jdoe/runs $ cellranger-dna aggr --id=AGG123 \ --csv=AGG123_libraries.csv \ --reference=/home/jdoe/refs/GRCh37
The pipeline will begin to run, creating a new folder named with the aggregation ID you specified (e.g. /home/jdoe/runs/AGG123
) for its output. If this folder already exists, cellranger-dna cnv will assume it is an existing pipestance and attempt to resume running it.
The cellranger-dna aggr pipeline generates output files that contain all of the data from the individual input runs for convenient multi-sample analysis. The GEM well suffix of each barcode is updated to prevent barcode collisions, as described below.
Each output file produced by cellranger-dna aggr follows the format described in the Understanding Output section of the documentation, but includes the union of all the relevant barcodes from each input run.
cellranger-dna aggr does not perform a cell-calling step, it simply aggregates the cell calls as encoded in cnv_data.h5 from each input run into a final set of cell calls. |
A successful run should conclude with a message similar to this:
2019-05-06 20:35:47 [runtime] (run:local) ID.AGGR123.CNV_AGGREGATOR_CS.DLOUPE_PREPROCESS.fork0.join 2019-05-06 20:35:48 [runtime] (chunks_complete) ID.AGGR123.CNV_AGGREGATOR_CS._POSTPROCESSING.MAKE_WEBSUMMARY 2019-05-06 20:35:54 [runtime] (join_complete) ID.AGGR123.CNV_AGGREGATOR_CS.DLOUPE_PREPROCESS Outputs: - Aggregation specification: /home/jdoe/runs/AGGR123/outs/aggregate.csv - Run alerts: /home/jdoe/runs/AGGR123/outs/alarms_summary.txt - HDF5 file with CNV data: /home/jdoe/runs/AGGR123/outs/cnv_data.h5 - Loupe visualization file: /home/jdoe/runs/AGGR123/outs/dloupe.dloupe - CNV calls with imputation: /home/jdoe/runs/AGGR123/outs/node_cnv_calls.bed - CNV calls without imputation: /home/jdoe/runs/AGGR123/outs/node_unmerged_cnv_calls.bed - Per-cell summary metrics: /home/jdoe/runs/AGGR123/outs/per_cell_summary_metrics.csv - Analysis summary metrics: /home/jdoe/runs/AGGR123/outs/summary.csv - Run summary HTML: /home/jdoe/runs/AGGR123/outs/web_summary.html Pipestance completed successfully!
Once cellranger-dna aggr has successfully completed, you can browse the resulting summary HTML file in any supported web browser, open the .dloupe file in Loupe scDNA Browser, or refer to the Understanding Output section to explore the data by hand. For machine-readable versions of the summary metrics, refer to the CSV page of the Understanding Outputs section.
Each GEM well is a physically distinct set of GEM partitions, but draws barcode sequences randomly from the pool of valid barcodes, known as the barcode whitelist. To keep the barcodes unique when aggregating multiple libraries, we append a small integer (called a GEM well suffix) identifying the GEM well to the barcode nucleotide sequence, and use that nucleotide sequence plus ID as the unique identifier. For example, AGACCATTGAGACTTA-1
and AGACCATTGAGACTTA-2
are distinct cell barcodes from different GEM wells, despite having the same barcode nucleotide sequence. The numbering of the GEM wells will reflect the order that the GEM wells were provided in the Aggregation CSV.