HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell ATAC

Per barcode QC, ATAC signal, and cell calling

The cellranger-atac pipeline performs cell calling where it determines whether each barcode is a cell of any species included in the reference. Based on mapping information, the pipelines also provides QC information associated with the fragments per barcode. Additionally, the pipeline computes the ATAC signal per barcode, captured by various targeting metrics such as number of fragments overlapping transcription start sites (TSS) annotated in the reference package. All of this per barcode information is collated and produced in a single output table: singlecell.csv.

Structure

The structure and contents of singlecell.csv from a single species analysis are shown below:

$ cd /home/jdoe/runs/sample345/outs
$ head -5 singlecell.csv
barcode,total,duplicate,chimeric,unmapped,lowmapq,mitochondrial,nonprimary,passed_filters,is_cell_barcode,excluded_reason,TSS_fragments,DNase_sensitive_region_fragments,enhancer_region_fragments,promoter_region_fragments,on_target_fragments,blacklist_region_fragments,peak_region_fragments,peak_region_cutsites
NO_BARCODE,986507,102223,401,118334,63547,1882,324,699796,0,0,0,0,0,0,0,0,0,0
AAACGAAAGAAAGGGT-1,8,0,0,5,0,0,0,3,0,0,1,0,0,0,1,0,1,2
AAACGAAAGAAATACC-1,7,2,0,3,0,0,0,2,0,2,0,0,0,0,0,0,0,0
AAACGAAAGAAATGGG-1,10,4,0,1,1,0,0,4,0,0,0,0,0,0,0,0,1,2

The table contains many columns, including the primary barcode column. All the barcodes in the dataset are listed in this column. The NO_BARCODE row contains a summary of fragments that are not associated with any whitelisted barcodes. It usually forms a small fraction of all reads.

Column Definitions

ColumnTypeDescriptionPipeline specific changesReference specific changes
barcodekeybarcodes present in input data
totalsequencingtotal read-pairsabsent in aggr, reanalyze
duplicatemappingnumber of duplicate read-pairs
chimericmappingnumber of chimerically mapped read-pairsabsent in aggr, reanalyze
unmappedmappingnumber of read-pairs with at least one end not mappedabsent in aggr, reanalyze
lowmapqmappingnumber of read-pairs with <30 mapq on at least one endabsent in aggr, reanalyze
mitochondrialmappingnumber of read-pairs mapping to mitochondria and non-nuclear contigsabsent in aggr, reanalyze
nonprimarymappingthe number of reads that map to non-primary contigs
passed_filtersmappingnumber of non-duplicate, usable read-pairs i.e. "fragments"absent in aggr, reanalyzefor multi species, for example hg19 and mm10, expect additional columns: passed_filters_hg19 and passed_filtered_mm10
is_cell_barcodecell callingbinary indicator of whether barcode is associated with a cellfor multi species, for example hg19 and mm10, expect columns is_hg19_cell_barcode and is_mm10_cell_barcode instead.
excluded_reasoncell calling0: barcode was not excluded; 1: barcode was excluded because it is a gel bead doublet; 2: barcode was excluded because it is low-targeting; 3: barcode was excluded because it is a barcode multiplet
TSS_fragmentstargetingnumber of fragments overlapping with TSS regions
DNase_sensitive_region_fragmentstargetingnumber of fragments overlapping with DNase sensitive regionsFor custom references or references missing the dnase.bed file, this count is 0
enhancer_region_fragmentstargetingnumber of fragments overlapping enhancer regionsFor custom references or references missing the enhancer.bed file, this count is 0
promoter_region_fragmentstargetingnumber of fragments overlapping promoter regionsFor custom references or references missing the promoter.bed file, this count is 0
on_target_fragmentstargetingnumber of fragments overlapping any of TSS, enhancer, promoter and DNase hypersensitivity sites (counted with multiplicity)For custom references or references having only the tss.bed file, this count is simply equal to the TSS_fragments
blacklist_region_fragmentstargetingnumber of fragments overlapping blacklisted regions
peak_region_fragmentsdenovo targetingnumber of fragments overlapping peaksfor multi species, for example hg19 and mm10, expect additional columns: peak_region_fragments_hg19 and peak_region_fragments_mm10
peak_region_cutsitesdenovo targetingnumber of ends of fragments in peak regions

Note that the number of columns and the column names themselves change and depend on what pipeline and what reference was used to generate the output file. Briefly, as described in the last two columns in the table,

Loading and using in Python

singlecell.csv can be loaded easily in Python as a pandas dataframe:

import pandas as pd
 
singlecell_file  = "/home/jdoe/runs/sample345/outs/singlecell.csv"
# load without index
scdf = pd.read_csv(singlecell_file, sep=",")
 
# load with barcode as index
scdf2 = pd.read_csv(singlecell_file, sep=",", index_col="barcode" )

You can use this file in many ways. Below are some examples:

Regenerate the targeting plot in web summary

Assume you are analyzing data from a single species library, such as hg19. To reproduce the targeting plot on the right side in Targeting section of the websummary, you can do the following:

import matplotlib as plt
cell_mask = (scdf['is__cell_barcode'] == 1)
noncell_mask = (scdf['is__cell_barcode'] != 1 && scdf['barcode'] != 'NO_BARCODE')
plt.plot(scdf[cell_mask]['passed_filters'],
     scdf[cell_mask]['peak_region_fragments'] / scdf[cell_mask]['passed_filters'],
     c='b')
plt.plot(scdf[noncell_mask]['passed_filters'],
     scdf[noncell_mask]['peak_region_fragments'] / scdf[noncell_mask]['passed_filters'],
     c='r')

Edit cell calling for use in aggr and reanalyze

The singlecell.csv file captures the cell calling information in the is_{species}_cell_barcode field. The Cell Ranger ATAC aggr pipeline requires you to specify the singlecell.csv as part of the aggr_csv argument.