Long Ranger2.2 (latest), printed on 11/23/2024
Analysis software for 10x Genomics linked read products is no longer supported. Raw data processing pipelines and visualization tools are available for download and can be used for analyzing legacy data from 10x Genomics kits in accordance with our end user licensing agreement without support. |
Long Ranger's Whole Genome Mode analyzes sequencing data from a Chromium-prepared library. This involves the following steps:
Run longranger mkfastq on the Illumina BCL output folder to generate FASTQ files.
Run longranger wgs for each sample that was demultiplexed by longranger mkfastq.
For the following example, assume that the Illumina BCL output is in a folder named /sequencing/140101_D00123_0111_AHAWT7ADXX
.
First, follow the instructions on running longranger mkfastq to generate FASTQ files. For example, if the flowcell serial number was HAWT7ADXX
, then longranger mkfastq will output FASTQ files in HAWT7ADXX/outs/fastq_path
.
To run Long Ranger in whole genome mode, you use the longranger wgs command, with the following common parameters. For a complete list of command-line options, run longranger wgs --help.
For help on which arguments to use to target a particular set of FASTQs, consult Running 10x Pipelines on FASTQ Files. |
Parameter | Description |
---|---|
--id | A unique run ID string: e.g. sample345 |
--fastqs | Path of the FASTQ folder generated by longranger mkfastq e.g. /home/jdoe/runs/HAWT7ADXX/outs/fastq_path |
--vcmode | (required, except when specifying --precalled ) Must be one of: freebayes, gatk:/path/to/GenomeAnalysisTK.jar, or disable |
--sample | (optional) Sample name as specified in the sample sheet supplied to mkfastq . |
--downsample | (optional) Specify the maximum amount of sequencing data to be used by the pipeline, in gigabases. If more data is available than this request, reads will be randomly downsampled. If less data is available, this option will have no effect. |
--reference | Path to a 10x compatible reference, e.g. /opt/refdata-hg19-2.1.0 .See Installation for how to download and install the default reference. |
--precalled | (optional) Path to a "pre-called" VCF file. Variants in this file will be phased. When setting --precalled do not specify a --vcmode |
--sex | (optional) Sex of the sample: male or female . Sex will be detected based on coverage if not supplied. |
--somatic | (optional) Supply this flag for somatic samples. This will increase the sensitivity of the large-scale SV caller for somatic SVs, by allowing the detection of sub-haplotype events. Note: this option currently does not affect small-scale variant calling. The small scale variant caller is not currently optimized for somatic variants |
longranger mkvcf
to extract samples and standardize your VCF files. Run longranger mkvcf --help
for details.After determining these input arguments, call longranger wgs:
$ cd /home/jdoe/runs $ longranger wgs --id=sample345 \ --reference=/opt/refdata-hg19-2.1.0 \ --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path
Following a set of preflight checks to validate input arguments, Long Ranger pipeline stages will begin to run:
longranger wgs 2.2.2 Copyright (c) 2016 10x Genomics, Inc. All rights reserved. ----------------------------------------------------------------------------- Martian Runtime - 2.3.2 Running preflight checks (please wait)... 2016-05-01 12:00:00 [runtime] (ready) ID.sample345.PHASER_SVCALLER_CS.PHASER_SVCALLER._ALIGNER.SETUP_CHUNKS 2016-05-01 12:00:00 [runtime] (run:local) ID.sample345.PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER.SORT_GROUND_TRUTH 2016-05-01 12:00:00 [runtime] (run:local) ID.sample345.PHASER_SVCALLER_CS.PHASER_SVCALLER._SNPINDEL_PHASER.SORT_GROUND_TRUTH.fork0.chnk0.main ...
By default, longranger wgs will use all of the cores available on your
system to execute pipeline stages. You can specify a different number of cores
to use with the --localcores
option; for example, --localcores=16
will limit the pipeline to using up to sixteen cores at once. Similarly,
--localmem
will restrict the amount of memory (in GB) used by
longranger wgs.
The pipeline will create a new folder named with the sample ID you specified (e.g. /home/jdoe/runs/sample345
) for its output. If this folder already exists, Long Ranger will assume it is an existing pipestance and attempt to resume running it.
longranger wgs calls SNPs and indels with GATK or Freebayes, phases those variants, and adds structural variant calls.
Alternatively, you may provide an existing VCF file as an input to the pipeline. In this "pre-called VCF" mode, longranger wgs does not call SNPs and indels itself, but phases the variants in the supplied VCF file, and also outputs structural variant calls.
As of Long Ranger 2.0, longranger wgs no longer supports setting the --vcmode argument when supplying a pre-called VCF file.
A successful longranger wgs execution should conclude with a message similar to this:
2016-05-02 15:46:41 [runtime] (run:local) ID.sample345.PHASER_SVCALLER_CS.PHASER_SVCALLER.LOUPE_PREPROCESS.fork0.join 2016-05-02 15:46:44 [runtime] (join_complete) ID.sample345.PHASER_SVCALLER_CS.PHASER_SVCALLER.LOUPE_PREPROCESS 2016-05-02 15:46:55 [runtime] VDR killed 4738 files, 223GB. Outputs: - Run summary: /home/jdoe/runs/sample345/outs/summary.csv - BAM barcoded: /home/jdoe/runs/sample345/outs/phased_possorted_bam.bam - BAM index: /home/jdoe/runs/sample345/outs/phased_possorted_bam.bam.bai - VCF phased: /home/jdoe/runs/sample345/outs/phased_variants.vcf.gz - VCF index: /home/jdoe/runs/sample345/outs/phased_variants.vcf.gz.tbi - Large-scale SV calls: /home/jdoe/runs/sample345/outs/large_sv_calls.bedpe - Large-scale SV candidates: /home/jdoe/runs/sample345/outs/large_sv_candidates.bedpe - Large-scale SVs: /home/jdoe/runs/sample345/outs/large_svs.vcf.gz - Large-scale SVs index: /home/jdoe/runs/sample345/outs/large_svs.vcf.gz.tbi - Mid-scale deletions: /home/jdoe/runs/sample345/outs/dels.vcf.gz - Mid-scale deletions index: /home/jdoe/runs/sample345/outs/dels.vcf.gz.tbi - Loupe file: /home/jdoe/runs/sample345/outs/loupe.loupe Pipestance completed successfully!
The output of the pipeline will be contained in a folder named with the sample ID you specified (e.g. sample345
). The subfolder named outs
will contain the main pipeline output files:
File Name | Description |
---|---|
summary.csv | Run summary metrics in CSV format |
phased_possorted_bam.bam | Aligned reads annotated with barcode information |
phased_possorted_bam.bam.bai | Index for phased_possorted_bam.bam |
phased_variants.vcf.gz | VCF annotated with barcode and phasing information |
phased_variants.vcf.gz.tbi | Index for phased_variants.vcf.gz |
large_sv_calls.bedpe | Large-scale (≥30Kbp or inter-chromosomal) structural variant and CNV calls, excluding low confidence candidates, in BEDPE format |
large_sv_candidates.bedpe | Large-scale (≥30Kbp or inter-chromosomal) structural variant and CNV calls, including low confidence candidates, in BEDPE format |
large_svs.vcf.gz | Large-scale (≥30Kbp or inter-chromosomal) structural variant and CNV calls, including low confidence candidates, in VCF format |
large_svs.vcf.gz.tbi | Index for large_svs.vcf.gz |
dels.vcf.gz | Mid-scale deletion structural variant calls (50bp-30Kbp) |
dels.vcf.gz.tbi | Index for dels.vcf.gz |
loupe.loupe | File that can be opened in the Loupe genome browser |
Once longranger wgs has successfully completed, you can browse the resulting .loupe
file in the Loupe genome browser, or refer to the Understanding Output section to explore the data by hand.