HOME  ›   pipelines
If your question is not answered here, please email us at:  ${email.software}

10x Genomics
Chromium Single Cell CNV

Per cell summary metrics

Each row in the per_cell_summary_metrics.csv file corresponds to a cell and each column a corresponding metric. The columns are as follows:

barcode
16-base 10x barcode labeling the partition containing the cell followed by the GEM group -1.

cell_id
Numerical index for cells in the experiment ranging from 0 to N-1, where N is the number of cells.

total_num_reads
Total number of sequencing reads associated with the cell.

num_unmapped_reads
Total number of sequencing reads associated with the cell that cannot be mapped to the reference genome.

num_lowmapq_reads
Total number of sequencing reads that map to the genome with mapping quality less than 30.

num_duplicate_reads
Total number of sequencing reads with mapping quality at least 30 that are duplicates.

num_mapped_dedup_reads
Total number of sequencing reads with mapping quality at least 30 and are not duplicates. These reads are used for CNV calling.

frac_mapped_duplicates
num_duplicate_reads divided by total_num_reads.

effective_depth_of_coverage
Fraction of the genome covered by non-duplicate reads with mapping quality at least 30. Equals num_mapped_dedup_reads multiplied by the average read length divided by the genome size.

effective_reads_per_1Mbp
num_mapped_dedup_reads divided by the genome size in megabases.

raw_mapd
MAPD of the number of read-pairs with mapping quality at least 30 and are not duplicates per 500 kb bin. MAPD is a measure of unevenness of the coverage per bin distribution that is robust to the presence of copy number events. This includes unevenness caused by low sequencing depth. See the interpreting metrics page for more information about MAPD.

normalized_mapd
MAPD of the GC-corrected number of read-pairs with mapping quality at least 30 and are not duplicates per 500 kb bin. See the interpreting metrics page for more information about MAPD.

raw_dimapd
DIMAPD of the number of non-duplicate read-pairs per 500 kb bin. DIMAPD is a measure of residual unevenness of the coverage per bin distribution after subtracting out unevenness caused by random fluctuations due to finite sequencing depth effects. See the interpreting metrics page for more information about DIMAPD.

normalized_dimapd
DIMAPD of the GC-corrected number of non-duplicate read-pairs per 500 kb bin. See the interpreting metrics page for more information about DIMAPD.

mean_ploidy
average ploidy or average copy number of the cell, approximately 2 for a diploid genome.

ploidy_confidence
a score measuring the overall confidence of the copy number estimation algorithm. The copy number is determined by minimizing an objective function as described in the CNV calling section. The ploidy confidence is calculated as the difference in objective function values between the next to lowest minimum and the absolute minimum. Scores greater than 2 are considered confident. Negative values are special signal values.
  • -1 : CNV calling was performed with the options --soft-min-avg-ploidy or --soft-max-avg-ploidy and the confidence score is not estimated in these cases.
  • -2 : copy number was determined by picking the solution with average ploidy closest to 2. This is the case when most of the genome has the same copy number.
  • -3 : only a single minimum to the objective function was found and therefore the score cannot be calculated.
  • -4 : the ploidy confidence score of this cell was <= 2 and the average ploidy of the best-fit solution was significantly different from other cells in the sample with highly similar read count profiles. This occurs when the cell is degraded or the DNA was inaccessible due to other reasons. In this case, we override the solution chosen by minimizing the objective function and instead pick a solution that is closer in average ploidy to other similar cells.

is_high_dimapd
is 1 when the cell is has a DIMAPD value that is an outlier relative to the other cells in the sample, and 0 otherwise. We fit a Gaussian distribution to the DIMAPD per cell distribution and define outliers as cells whose DIMAPD deviates from the Gaussian with a significance threshold of 0.01.

is_noisy
is 1 if a cell is noisy and 0 otherwise. A cell is deemed noisy if is_high_dimapd is 1 or if ploidy_confidence is -4 or if the ploidy_confidence is between 0 and 2.

est_cnv_resolution_mb
defined as the smallest copy number 2 to 3 change that can be detected with 90% sensitivity and 90% PPV given the sequencing depth and the properties of the single cell DNA library, including but not restricted to the evenness of amplification and the library complexity. While this number is calculated based on 2 to 3 events it serves as a ballpark estimate of the overall event resolution.

Analysis summary metrics

The summary.csv file contains sample metrics that are aggregated over all the reads or cells in CSV format.

total_num_reads
total number of sequencing reads.

frac_bases_R1_Q30
fraction of read 1 bases with base quality at least 30.

frac_bases_R2_Q30
fraction of read 2 bases with base quality at least 30.

correct_bc_rate
fraction of total sequencing reads that can be associated with a valid 10x barcode.

frac_non_cell_barcode
fraction of total sequencing reads that are associated with barcodes that do not correspond to cells, i.e., they label empty partitions.

shortest_primary_contig
the shortest primary contig in the reference genome on which CNV calling was performed.

frac_mappable_bins
the fraction of 20 kb bins in the reference genome that have high mappability. See the preprocessing section for more details.

num_cells
the number of barcodes that label partitions containing cells. See the preprocessing section for more details.

total_num_reads_in_cells
total number of sequencing reads with barcodes associated to cells.

total_num_mapped_dedup_reads_in_cells
total number of sequencing reads associated with cells that are not duplicates and have mapping quality at least 30.

median_frac_mapped_duplicates_per_cell
median over cells of the fraction of total sequencing reads per cell that have mapping quality at least 30 and are duplicates.

mean_mapped_dedup_reads_per_cell
mean over cells of the number of sequencing reads per cell that are not duplicates with mapping quality at least 30.

median_effective_reads_per_1Mbp
median over cells of the number of sequencing reads per cell that are not duplicates with mapping quality at least 30 divided by the genome size in megabases.

median_unmapped_frac
median over the cells of the fraction of total reads per cell that cannot be mapped to the genome.

mean_ploidy_p25, mean_ploidy_p50, mean_ploidy_p75
quartiles of the average ploidy per cell distribution.

raw_mapd_p25, raw_mapd_p50, raw_mapd_p75
quartiles of the MAPD of the read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about MAPD.

normalized_mapd_p25, normalized_mapd_p50, normalized_mapd_p75
quartiles of the MAPD of the GC-corrected read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about MAPD.

normalized_dimapd_p25, normalized_dimapd_p50, normalized_dimapd_p75
quartiles of the DIMAPD of the GC-corrected read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about DIMAPD.

raw_dimapd_p25, raw_dimapd_p50, raw_dimapd_p75
quartiles of the DIMAPD of the read counts per 500 kb bin per cell distribution. See the interpreting metrics page for more information about DIMAPD.

frac_noisy_cells
fraction of cells in the sample that are considered noisy as described in the previous section. See the interpreting data page for more information.
median_est_cnv_resolution_mb
median over all cells of the estimated CNV detection resolution per cell as computed in the per_cell_summary_metrics.csv described above.