Cell Ranger4.0, printed on 11/14/2024
cellranger vdj pipeline produces V(D)J annotations on the assembled contigs and on the clonotype consensus sequences in multiple formats.
File type | Description
|-
CSV | High-level annotations with one contig, consensus, or clonotype per row. JSON | Detailed annotations, including alignment coordinates and amino acid translations. BED | Germline V(D)J segments as features, for use with a tool like IGV. TSV | Used for the AIRR rearrangement format of VDJ contigs and consensus sequences.
File | Description
|-
clonotypes.csv | High-level descriptions of each clonotype.
consensus_annotations.{csv,json} | High-level and detailed annotations of each clonotype consensus sequence.
filtered_contig_annotations.csv | High-level annotations of each high-confidence, cellular contig. This is a subset of all_contig_annotations.csv
.
all_contig_annotations.{csv,bed,json} | High-level and detailed annotations of each contig.
airr_rearrangement.tsv | Annotated contigs and consensus sequences of VDJ rearrangements in the AIRR format.
Column | Description |
---|---|
clonotype_id | The ID of the clonotype to which this consensus sequence was assigned. |
frequency | The observed number of cell barcodes with this clonotype. |
proportion | The observed fraction of cell barcodes with this clonotype. |
cdr3s_aa | A semicolon-delimited list of chain:sequence pairs, where chain is for example TRA, TRB, IGK, IGL, or IGH and sequence is the CDR3 amino acid sequence for that chain. |
cdr3s_nt | A semicolon-delimited list of chain:sequence pairs, where chain is for example TRA, TRB, IGK, IGL, or IGH and sequence is the CDR3 nucleotide sequence for that chain. |
Column | Description |
---|---|
barcode | Cell-barcode for this contig. |
is_cell | True or False value indicating whether the barcode was called as a cell. |
contig_id | Unique identifier for this contig. |
high_confidence | True or False value indicating whether the contig was called as high-confidence (unlikely to be a chimeric sequence or some other artifact). |
length | The contig sequence length in nucleotides. |
chain | The chain associated with this contig; for example, TRA, TRB, IGK, IGL, or IGH. A value of "Multi" indicates that segments from multiple chains were present. |
v_gene | The highest-scoring V segment, for example, TRAV1-1. |
d_gene | The highest-scoring D segment, for example, TRBD1. |
j_gene | The highest-scoring J segment, for example, TRAJ1-1. |
c_gene | The highest-scoring C segment, for example, TRAC. |
full_length | If the contig was declared as full-length. |
productive | If the contig was declared as productive. |
cdr3 | The predicted CDR3 amino acid sequence. |
cdr3_nt | The predicted CDR3 nucleotide sequence. |
reads | The number of reads aligned to this contig. |
umis | The number of distinct UMIs aligned to this contig. |
raw_clonotype_id | The ID of the clonotype to which this cell barcode was assigned. |
raw_consensus_id | The ID of the consensus sequence to which this contig was assigned. |
Column | Description |
---|---|
clonotype_id | The ID of the clonotype to which this consensus sequence was assigned. |
consensus_id | The ID of this consensus sequence. |
The remaining columns are shared with those under the Contig Annotation CSV Files section.
Column | Description |
---|---|
cell_id | Cell barcode defining the cell for the query sequence. |
clone_id | Clonotype ID/clonotype assignment. |
rev_comp | Set to false by default (10x Genomics VDJ sequences are not reverse complemented). |
sequence_id | The name of the contig associated with the rearrangement. |
sequence | The nucleotide sequence of the rearrangement. |
sequence_aa | The amino acid sequence of the rearrangement. |
productive | Whether or not the rearrangement is productive. |
v_call | The name of the aligned V gene for the rearrangement. |
v_cigar | The CIGAR string of the V gene alignment. |
v_sequence_start | 1-based index on the contig of the V region start position. |
v_sequence_end | 1-based index on the contig of the V region end position. |
d_call | The name of the aligned D gene for the rearrangement. |
d_cigar | The CIGAR string of the D gene alignment. |
d_sequence_start | 1-based index on the contig of the D region start position. |
d_sequence_end | 1-based index on the contig of the D region end position. |
j_call | The name of the aligned J gene for the rearrangement. |
j_cigar | The CIGAR string of the J gene alignment. |
j_sequence_start | 1-based index on the contig of the J region start position. |
j_sequence_end | 1-based index on the contig of the J region end position. |
c_call | The name of the aligned C gene for the rearrangement. |
c_cigar | The CIGAR string of the C gene alignment. |
c_sequence_start | 1-based index on the contig of the C region start position. |
c_sequence_end | 1-based index on the contig of the C region end position. |
sequence_alignment | The aligned sequence of the VDJ rearrangement. |
germline_alignment | The assembled, aligned, full-length inferred germline sequence of the aligned sequence. |
junction | The nucleotide sequence of the rearrangement's junction (CDR3). |
junction_aa | The amino acid sequence of the rearrangement's junction (CDR3). |
duplicate_count | The number of unique molecular identifiers associated with this rearrangement. |
consensus_count | The number of reads associated with this rearrangement. |
junction_length | The length of the rearrangement's junction nucleotide sequence. |
junction_aa_length | The length of the rearrangement's junction amino acid sequence. |
is_cell | Is this rearrangement cell-associated? |
The AIRR rearrangement file includes all mandatory AIRR fields and several optional variables to enhance reproducibility and guide analyses.