Cell Ranger6.0, printed on 11/22/2024
Cell Ranger 6.0 now supports analysis of Cell Multiplexing data for the 3' Gene Expression, Targeted Gene Expression, and Feature Barcode solutions. Instructions for running the cellranger multi subcommand are described in the running multi page. A new Getting Started Tutorial is also available. The Cell Multiplexing algorithms include a new method to call singlets, multiplets, and empty drops. The output file structure has also changed to accomodate multiple samples multiplexed in a single GEM well.
The aggr
subcommand now supports analysis of cellranger multi outputs for the 3' Gene Expression, Targeted Gene Expression, and Feature Barcode solutions. Further details are described in the running aggr page.
The column names for the Aggregation CSV
file required by the aggr
sub-command have changed: library_id
has been changed to sample_id
and library_outs
has been changed to sample_outs
. Further details are described in the running aggr page.
The molecule_info.h5
and unfiltered feature-barcode matrix files (raw_feature_bc_matrix
in H5 and MEX formats) will only contain barcodes with at least one read, rather than all barcodes in the whitelist.
The change to the unfiltered feature-barcode matrix summarized in (4) above results in a subtle change to the distribution of UMI counts amongst background, i.e. non-cell barcodes, which results in minor changes to the results of the cell calling algorithm. This change occurs due to the second step that identifies non-ambient cell-barcodes as described in the algorithms page.
Cell Ranger 6.0 is the first Cell Ranger release to use Python 3.
A bug has been fixed in the graph-based clustering output: previously, in a sample with K clusters, the first K cell-associated barcodes (ordered as in the filtered feature-barcode matrix) may have been assigned incorrect cluster labels. This change does not affect the number of clusters output.
A bug has been fixed for multi-genome experiments, wherein the species annotation may have been incorrect for cell-associated barcodes identified by the second step of the cell-calling algorithm, as described in the algorithms page. Changes in metrics are expected to be minor, unless the the proportion of such cells is large.
The --qc option has been deprecated from cellranger mkfastq.
A bug has been fixed for multi-genome experiments, wherein the species annotation may have been incorrect for cell-associated barcodes identified by the second step of the cell-calling algorithm, as described in the algorithms page. Changes in metrics are expected to be minor, unless the the proportion of such cells is large.
In Cell Ranger 6.0, the following changes apply to joint analysis of Immune Profiling, Gene Expression, and Feature Barcode data with the multi
sub-command:
The structure of the outs folder has been updated, as described in running cellranger multi.
When running the cellranger aggr
subcommand on samples that have immune profiling, gene expression, and/or feature barcode data analyzed with multi
, the sample_outs
field now contains the path to the outputs for that sample (e.g. outs/per_sample_outs/sample_x
). Further details are described in running aggr.
Cell Ranger 6.0 also introduces some improvements and bug fixes related to the clonotype inference algorithms:
There are subtle changes to clonotyping heuristics that have little effect on overall behavior, but recover a small number of joins that were previously missed and might be critical for a particular experiment. These changes are described in terms of technical parameters to the algorithm, specifically raising the default for MAX_DIFFS
from 50 to 55 and raising the default for MAX_CDR3_DIFFS
from 10 to 15. There were also compensatory changes to prevent the rate of false positive joins from increasing: the default for MAX_DEGRADATION
was lowered from 3 to 2, and the default for MAX_SCORE
was lowered from 1,000,000 to 500,000. For more details, visit enclone help.
Single-chain clonotypes are now more likely to be merged with two-chain and three-chain clonotypes. This causes significantly more clonotypes to have single-chain exact subclonotypes.
Fixed a bug that caused failures on some very short (defective) V gene reference sequences.
The algorithm for deciding to use a donor reference allele now checks all donor reference alleles for all V genes having the same name as the one originally assigned to a contig. For more details, visit enclone help.
A doublet test has been added. This removes some exact subclonotypes that appear to represent doublets. Details are documented on the Enclone pages. The typical effect is to remove some three-chain and four-chain clonotypes, with the fraction removed depending on the emperical doublet rate. In some cases, large, complex clonotypes are accurately split into multiple smaller clonotypes by this change.
There is no longer a restriction on the length of CDR3 sequences (previously maximum 27).
The immune profiling output file all_contig_annotations.csv
contains new fields fwr1
, ..., fwr4
and cdr1
, cdr2
, providing the amino acid sequences of framework and complementarity-determining regions (in addition to cdr3
, which was already present). The definitions used to define these regions are provided in the enclone features page. The corresponding nucleotide sequences are provided (e.g.fwr1_nt
). These fields are also provided in the file consensus_annotations.csv
, as are nucleotide start and end positions (e.g. fwr1_start
).
The immune profiling output file all_contig_annotations.csv
contains new field exact_subclonotype_id
providing the exact subclonotype ID to which the cell barcode was assigned. Details about exact subclonotypes can be found on the clonotype grouping page.
The --qc option has been deprecated from cellranger mkfastq.