HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell ATAC

Single-Library Analysis with cellranger-atac count

Cell Ranger ATAC's pipelines analyze sequencing data produced from Chromium Single Cell ATAC libraries. This involves the following steps:

  1. Run cellranger-atac mkfastq on the Illumina® BCL output folder to generate FASTQ files.

  2. Run cellranger-atac count on each library that was demultiplexed by cellranger-atac mkfastq.

For the following example, assume that the Illumina® BCL output is in a folder named /sequencing/140101_D00123_0111_AHAWT7ADXX.

Run cellranger-atac mkfastq

First, follow the instructions on running cellranger-atac mkfastq to generate FASTQ files. For example, if the flowcell serial number was HAWT7ADXX, then cellranger-atac mkfastq will output FASTQ files in HAWT7ADXX/outs/fastq_path.

Run cellranger-atac count

To generate single-cell accessibility counts for a single library, run cellranger-atac count with the following arguments. For a complete list of command-line arguments, run cellranger-atac count --help.

ArgumentDescription
--idA unique run ID string: e.g. sample345
--fastqsEither:
Path of the fastq_path folder generated by cellranger-atac mkfastq
e.g. /home/jdoe/runs/HAWT7ADXX/outs/fastq_path. This contains a directory hierarchy that cellranger-atac count will automatically traverse.
- OR -
Any folder containing fastq files, for example if the fastq files were generated by a service provider and delivered outside the context of the mkfastq output directory structure.
Can take multiple comma-separated paths, which is helpful if the same library was sequenced on multiple flowcells.
Doing this will treat all reads from the library, across flowcells, as one sample.
We do not support combining analyses from multiple libraries for this version.
--sampleSample name as specified in the sample sheet supplied to cellranger-atac mkfastq.
Can take multiple comma-separated values, which is helpful if the same library was sequenced on multiple flowcells and the sample name used (and therefore fastq file prefix) is not identical between them.
Doing this will treat all reads from the library, across flowcells, as one sample.
Allowable characters in sample names are letters, numbers, hyphens, and underscores.
--referencePath to the Cell Ranger ATAC compatible genome reference. For example,
  • For a human-only sample, use /opt/refdata-cellranger-atac-GRCh38-1.2.0
  • For a human and mouse mixture sample, use /opt/refdata-cellranger-atac-GRCh38-and-mm10-1.2.0
--force-cells(optional) Force pipeline to use this number of cells, bypassing the cell detection algorithm. Use this if the number of cells estimated by Cell Ranger ATAC is not consistent with the barcode rank plot.
--dim-reduce(optional) Chose the algorithm for dimensionality reduction prior to clustering and tsne: 'lsa' (default), 'plsa', or 'pca'.
--downsample(optional) Force pipeline to downsample sequencing data to this number of gigabases.
--lanes(optional) Lanes associated with this sample
--localcoresRestricts cellranger-atac to use specified number of cores to execute pipeline stages. By default, cellranger-atac will use all of the cores available on your system.
--localmemRestricts cellranger-atac to use specified amount of memory (in GB) to execute pipeline stages. By default, cellranger-atac will use 90% of the memory available on your system.
After determining these input arguments, run cellranger-atac:
$ cd /home/jdoe/runs
$ cellranger-atac count --id=sample345 \
                   --reference=/opt/refdata-cellranger-atac-GRCh38-1.2.0 \
                   --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \
                   --sample=mysample \
                   --localcores=8 \
                   --localmem=64 

Following a set of preflight checks to validate input arguments, cellranger-atac count pipeline stages will begin to run:

Martian Runtime - 3.2.4
 
Running preflight checks (please wait)...
2018-09-17 21:33:47 [runtime] (ready)           ID.sample345.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.SETUP_CHUNKS
2018-09-17 21:33:47 [runtime] (run:local)       ID.sample345.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.SETUP_CHUNKS.fork0.chnk0.main
2018-09-17 21:33:56 [runtime] (chunks_complete) ID.sample345.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER._BASIC_SC_ATAC_COUNTER._ALIGNER.SETUP_CHUNKS
...

By default, cellranger-atac will use all of the cores available on your system to execute pipeline stages. You can specify a different number of cores to use with the --localcores option; for example, --localcores=16 will limit cellranger-atac to using up to sixteen cores at once. Similarly, --localmem will restrict the amount of memory (in GB) used by cellranger-atac.

The pipeline will create a new folder named with the sample ID you specified (e.g. /home/jdoe/runs/sample345) for its output. If this folder already exists, cellranger-atac will assume it is an existing pipestance and attempt to resume running it.

Output Files

A successful cellranger-atac count run should conclude with a message similar to this:

2018-09-17 22:26:56 [runtime] (join_complete)   ID.sample345.SC_ATAC_COUNTER_CS.SC_ATAC_COUNTER.CLOUPE_PREPROCESS
 
Outputs:
- Per-barcode fragment counts & metrics:        /opt/sample345/outs/singlecell.csv
- Position sorted BAM file:                     /opt/sample345/outs/possorted_bam.bam
- Position sorted BAM index:                    /opt/sample345/outs/possorted_bam.bam.bai
- Summary of all data metrics:                  /opt/sample345/outs/summary.json
- HTML file summarizing data & analysis:        /opt/sample345/outs/web_summary.html
- Bed file of all called peak locations:        /opt/sample345/outs/peaks.bed
- Raw peak barcode matrix in hdf5 format:       /opt/sample345/outs/raw_peak_bc_matrix.h5
- Raw peak barcode matrix in mex format:        /opt/sample345/outs/raw_peak_bc_matrix
- Directory of analysis files:                  /opt/sample345/outs/analysis
- Filtered peak barcode matrix in hdf5 format:  /opt/sample345/outs/filtered_peak_bc_matrix.h5
- Filtered peak barcode matrix:                 /opt/sample345/outs/filtered_peak_bc_matrix
- Barcoded and aligned fragment file:           /opt/sample345/outs/fragments.tsv.gz
- Fragment file index:                          /opt/sample345/outs/fragments.tsv.gz.tbi
- Filtered tf barcode matrix in hdf5 format:    /opt/sample345/outs/filtered_tf_bc_matrix.h5
- Filtered tf barcode matrix in mex format:     /opt/sample345/outs/filtered_tf_bc_matrix
- Loupe Browser input file:                /opt/sample345/outs/cloupe.cloupe
- csv summarizing important metrics and values: /opt/sample345/outs/summary.csv
 
Pipestance completed successfully!

The output of the pipeline will be contained in a folder named with the sample ID you specified (e.g. sample345). The subfolder named outs will contain the main pipeline output files:

File NameDescription
singlecell.csvPer-barcode fragment counts & metrics
possorted_bam.bamPosition sorted BAM file
possorted_bam.bam.baiPosition sorted BAM index
summary.jsonSummary of all data metrics
web_summary.htmlHTML file summarizing data & analysis
peaks.bedBed file of all called peak locations
raw_peak_bc_matrix.h5Raw peak barcode matrix in hdf5 format
raw_peak_bc_matrixRaw peak barcode matrix in mex format
analysisDirectory of analysis files
filtered_peak_bc_matrix.h5Filtered peak barcode matrix in hdf5 format
filtered_peak_bc_matrixFiltered peak barcode matrix
fragments.tsv.gzBarcoded and aligned fragment file
fragments.tsv.gz.tbiFragment file index
filtered_tf_bc_matrix.h5Filtered tf barcode matrix in hdf5 format
filtered_tf_bc_matrixFiltered tf barcode matrix in mex format
cloupe.cloupeLoupe Browser input file

Once cellranger-atac count has successfully completed, you can browse the resulting summary HTML file in any supported web browser, open the .cloupe file in Loupe Browser, or refer to the Understanding Output section to explore the data by hand.