Cell Ranger DNA1.0, printed on 11/12/2024
The pipeline produces three tab-separated text files containing information about the mappability of the genome and copy number calls for all cells and groups of cells defined by the hierarchical clustering. The first three lines of these files are comment lines that begin with the # character and specify the software version, the path to the reference genome, and the column headers.
A BED file containing the mappable regions of the genome, defined as regions with mappability greater than 90%. This is the subset of the genome where the copy number is estimated. On regions shorter than 500 kb with lower mappability we impute the copy number values based on neighboring mappable bins. For more details refer to the algorithms section. In the example below, chr1:860,000-1,620,000 is a region with high mappability.
#cellranger-dna 1.0.0 #reference genome: /opt/refdata-GRCh38-1.0.0 #chrom start end chr1 860000 1620000 chr1 1760000 2640000 ...
Copy number calls for each cell and group of cells defined by the tree over the reference genome in a six column BED-like format. These calls impute copy number values over bins with low mappability. For an experiment with N cells, there are N-1 groups of cells defined by the tree. We index the cells as 0, 1, ..., N-1, and the groups of cells are indexed as N, N+1, ..., 2N-2. The index 2N-2 corresponds to the root node of the hierarchical clustering tree associated to the group containing all cells in the sample. We also report normal diploid regions in this file and not just the regions with aberrant copy number. The calls are sorted by genome position. Each non-comment line in the file consists of six fields:
Column | Description |
---|---|
chrom | primary contig/chromosome containing event. |
start | start position of event in 0-based coordinates, inclusive. |
end | end position of event in 0-based coordinates, exclusive. |
id | index of cell/group of cells. Indices 0, 1, ..., N-1 refer to cells and N, ..., 2N-2 refer to groups. |
copy_number | copy number of genomic region in cell/group of cells. |
event_confidence | Confidence score assigned to event. For details on how the event confidence is computed refer to the section on CNV calling. |
In this example, the region chr1:860,000-2,640,000 of cell 0 has copy number 4 with an event confidence score of 18. Note that this call overlaps the region chr1:1,620,000-1,760,000 of low mappability as reported by the mappable_regions.bed file above. This means that we estimated the copy number based on read coverage over the regions chr1:860,000-1,620,000 and chr1:1,760,000-2,640,000 to be 4 and imputed a value of 4 for the intervening low mappability region chr1:1,620,000-1,760,000.
#cellranger-dna 1.0.0 #reference genome: /opt/refdata-GRCh38-1.0.0 #chrom start end id copy_number event_confidence chr1 860000 2640000 0 4 18 ...
Copy number calls where the imputed segments are separately reported with an event confidence of 0. The file format is identical to the node_cnv_calls.bed file. The copy number 4 region chr1:860,000-2,640,000 from the node_cnv_calls.bed file appears in this file split into three events and the imputed segment chr1:1,620,000-1,760,000 is explicitly called out with an event confidence score of 0.
#cellranger-dna 1.0.0 #reference genome: /opt/refdata-GRCh38-1.0.0 #chrom start end id copy_number event_confidence chr1 860000 1620000 0 4 18 ... chr1 1620000 1760000 0 4 0 ... chr1 1760000 2640000 0 4 18 ...