10x Genomics
Chromium Single Cell CNV
Cell Ranger DNA1.0, printed on 11/09/2024
Per cell summary metrics
Each row in the per_cell_summary_metrics.csv file corresponds to a cell and each column a corresponding metric. The columns are as follows:
- barcode
- 16-base 10x barcode labeling the partition containing the cell followed by the GEM group
-1
.
- cell_id
- Numerical index for cells in the experiment ranging from 0 to N-1, where N is the number of cells.
- total_num_reads
- Total number of sequencing reads associated with the cell.
- num_unmapped_reads
- Total number of sequencing reads associated with the cell that cannot be mapped to the reference genome.
- num_lowmapq_reads
- Total number of sequencing reads that map to the genome with mapping quality less than 30.
- num_duplicate_reads
- Total number of sequencing reads with mapping quality at least 30 that are duplicates.
- num_mapped_dedup_reads
- Total number of sequencing reads with mapping quality at least 30 and are not duplicates. These reads are used for CNV calling.
- frac_mapped_duplicates
- num_duplicate_reads divided by total_num_reads.
- effective_depth_of_coverage
- Fraction of the genome covered by non-duplicate reads with mapping quality at least 30. Equals num_mapped_dedup_reads multiplied by the average read length divided by the genome size.
- effective_reads_per_1Mbp
- num_mapped_dedup_reads divided by the genome size in megabases.
- raw_mapd
- MAPD of the number of read-pairs with mapping quality at least 30 and are not duplicates per 500 kb bin. MAPD is a measure of unevenness of the coverage per bin distribution that is robust to the presence of copy number events. This includes unevenness caused by low sequencing depth. See the interpreting metrics page for more information about MAPD.
- normalized_mapd
- MAPD of the GC-corrected number of read-pairs with mapping quality at least 30 and are not duplicates per 500 kb bin. See the interpreting metrics page for more information about MAPD.
- raw_dimapd
- DIMAPD of the number of non-duplicate read-pairs per 500 kb bin. DIMAPD is a measure of residual unevenness of the coverage per bin distribution after subtracting out unevenness caused by random fluctuations due to finite sequencing depth effects. See the interpreting metrics page for more information about DIMAPD.
- normalized_dimapd
- DIMAPD of the GC-corrected number of non-duplicate read-pairs per 500 kb bin. See the interpreting metrics page for more information about DIMAPD.
- mean_ploidy
- average ploidy or average copy number of the cell, approximately 2 for a diploid genome.
- ploidy_confidence
- a score measuring the overall confidence of the copy number estimation algorithm. The copy number is determined by minimizing an objective function as described in the CNV calling section. The ploidy confidence is calculated as the difference in objective function values between the next to lowest minimum and the absolute minimum. Scores greater than 2 are considered confident. Negative values are special signal values.
-1
: CNV calling was performed with the options --soft-min-avg-ploidy or --soft-max-avg-ploidy and the confidence score is not estimated in these cases.
-2
: copy number was determined by picking the solution with average ploidy closest to 2. This is the case when most of the genome has the same copy number.
-3
: only a single minimum to the objective function was found and therefore the score cannot be calculated.
-4
: the ploidy confidence score of this cell was <= 2 and the average ploidy of the best-fit solution was significantly different from other cells in the sample with highly similar read count profiles. This occurs when the cell is degraded or the DNA was inaccessible due to other reasons. In this case, we override the solution chosen by minimizing the objective function and instead pick a solution that is closer in average ploidy to other similar cells.
- is_high_dimapd
- is 1 when the cell is has a DIMAPD value that is an outlier relative to the other cells in the sample, and 0 otherwise. We fit a Gaussian distribution to the DIMAPD per cell distribution and define outliers as cells whose DIMAPD deviates from the Gaussian with a significance threshold of 0.01.
- is_noisy
- is 1 if a cell is noisy and 0 otherwise. A cell is deemed noisy if is_high_dimapd is 1 or if ploidy_confidence is -4 or if the ploidy_confidence is between 0 and 2.
Analysis summary metrics
The summary.csv file contains sample metrics that are aggregated over all the reads or cells in CSV format.
- total_num_reads
- total number of sequencing reads.
- frac_bases_R1_Q30
- fraction of read 1 bases with base quality at least 30.
- frac_bases_R2_Q30
- fraction of read 2 bases with base quality at least 30.
- correct_bc_rate
- fraction of total sequencing reads that can be associated with a valid 10x barcode.
- frac_non_cell_barcode
- fraction of total sequencing reads that are associated with barcodes that do not correspond to cells, i.e., they label empty partitions.
- shortest_primary_contig
- the shortest primary contig in the reference genome on which CNV calling was performed.
- frac_mappable_bins
- the fraction of 20 kb bins in the reference genome that have high mappability. See the preprocessing section for more details.
- num_cells
- the number of barcodes that label partitions containing cells. See the preprocessing section for more details.
- total_num_reads_in_cells
- total number of sequencing reads with barcodes associated to cells.
- total_num_mapped_dedup_reads_in_cells
- total number of sequencing reads associated with cells that are not duplicates and have mapping quality at least 30.
- median_frac_mapped_duplicates_per_cell
- median over cells of the fraction of total sequencing reads per cell that have mapping quality at least 30 and are duplicates.
- mean_mapped_dedup_reads_per_cell
- mean over cells of the number of sequencing reads per cell that are not duplicates with mapping quality at least 30.
- median_effective_reads_per_1Mbp
- median over cells of the number of sequencing reads per cell that are not duplicates with mapping quality at least 30 divided by the genome size in megabases.
- median_unmapped_frac
- median over the cells of the fraction of total reads per cell that cannot be mapped to the genome.
- mean_ploidy_p25, mean_ploidy_p50, mean_ploidy_p75
- quartiles of the average ploidy per cell distribution.
- raw_mapd_p25, raw_mapd_p50, raw_mapd_p75
- quartiles of the MAPD of the read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about MAPD.
- normalized_mapd_p25, normalized_mapd_p50, normalized_mapd_p75
- quartiles of the MAPD of the GC-corrected read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about MAPD.
- normalized_dimapd_p25, normalized_dimapd_p50, normalized_dimapd_p75
- quartiles of the DIMAPD of the GC-corrected read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about DIMAPD.
- raw_dimapd_p25, raw_dimapd_p50, raw_dimapd_p75
- quartiles of the DIMAPD of the read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about DIMAPD.
- frac_noisy_cells
- fraction of cells in the sample that are considered noisy as described in the previous section. See the interpreting data page for more information.