Long Ranger1.3, printed on 11/14/2024
The longranger run pipeline outputs an indexed BAM file containing position-sorted, aligned reads. Each read in this BAM file has GemCode barcode and phasing information attached. The following assumes basic familiarity with the BAM format. More details on the the SAM/BAM standard are available online.
GemCode barcode information for each read is stored as TAG fields:
Tag | Type | Description |
---|---|---|
BX | Z | GemCode barcode sequence that is error-corrected and confirmed against a list of known-good barcode sequences. Use this for analysis. |
BC | Z | Sample index (I7) read. |
QT | Z | Sample index (I7) read quality. Phred scores as reported by sequencer. |
The BX
tag includes a suffix with a dash separator followed by a number:
AGAATGGTCTGCAT-1
This number denotes what we call a GEM group, and is used to virtualize barcodes in order to achieve a higher effective barcode diversity when combining samples generated from separate GEM chip channel runs. Normally, this number will be "1" across all barcodes when analyzing a sample generated from a single GEM chip channel. It can either be left in place and treated as part of a unique barcode identifier, or explicitly parsed out to leave only the barcode sequence itself.
The following tags will also be present on reads that were confidently assigned to a haplotype.
Tag | Type | Description |
---|---|---|
PS | Z | Phase set containing this read |
HP | i | Haplotype of the molecule that generated the read |
MI | i | Global molecule identifier for molecule that generated this read |
Phase sets, defined in the VCF standard,
are regions within which identified haplotypes are mutually consistent. As a
result, HP
tags are only comparable between reads that share a
common PS
. By definition, adjacent phase sets lack sufficient
Linked-Reads to determine the relationship between their haplotypes.