Cell Ranger DNA1.0, printed on 11/12/2024
We create a matrix of copy numbers where the rows are cells and the columns are genome 20 kb mappable bins. To minimize the effects of sampling noise we compute a coarse-grained copy number matrix by taking the integer-rounded average over windows, where the window size is determined from the read count matrix as the median number of bins across cells required to give an aggregated-bin read count of 200. We define a pair-wise distance between cells as the L1-norm distance between the respective coarse-grained copy numbers. Using the pair-wise distances we compute a hierarchical clustering using the complete linkage method.
Each internal node of the tree defines a group of cells that descend from the node. This allows us to aggregate reads from the descendant cells and calculate the read-pair coverage for each group defined by the tree.
We run the copy number detection algorithm described above on each group of cells, effectively regarding each group as a single cell. The breakpoint detection algorithm is unchanged but the scale factor for the group of cells is calculated as a coverage-weighted mean of the scale factors of the single cells. The higher read coverage for groups of cells allows for more accurate detection of small copy number events (see Sequencing Depth for more information).