Cell Ranger ARC2.0 (latest), printed on 11/14/2024
There are three options for generating FASTQ files from BCL files, all of which work for 10x Genomics Chromium libraries:
Illumina's software may provide greater control over demultiplexing parameters.
Both bcl2fastq and BCL Convert are currently available. BCL Convert is a newer demultiplexing software. For more information, please check Illumina's website. |
Demultiplexing Chromium data with Illumina's BCL Convert or bcl2fastq software requires the correct specification of the sample sheet and command line options. This guide walks you through generating Cell Ranger-compatible FASTQs with BCL Convert and bcl2fastq.
The Multiome GEX library is dual-indexed. This section describes how to configure bcl2fastq or bcl-convert for GEX libraries created with the Dual Index Plate TT, Set A.
The Dual Index Plate TT, Set A are 'unique dual-indexing' sample indexes. This means that there is a unique sample index barcode in both the i7
and i5
index reads (also known as I1
and I2
respectively). When demultiplexing flow cells where both index reads have been sequenced, bcl2fastq and bcl-convert require that both index sequences match the expected sequence for a read to be assigned to that sample. This solves the 'index hopping' issue present on Illumina patterned flow cell sequencers.
You can download the Sample Index Reference files for the Gene Expression dual indexing kits here: Dual Index Plate TT, Set A CSV or Dual Index Plate TT, Set A JSON.
The Multiome ATAC library is single-indexed. You can download the Sample Index Kit N Set A as a CSV or JSON.
There is a key difference to keep in mind when creating single index sample sheets for a Chromium run. Each Chromium sample index set is a blend of four different sequence oligos, and each oligo must be represented as a separate row in the sample sheet. This means that for every sample being demultiplexed from the flow cell, there should be four lines in the sample sheet.
Select a tab for information about running bcl2fastq or bcl-convert:
The bcl2fastq software is available for download and installation on the Illumina support website as an RPM package. An Illumina account is required for download. Please contact Illumina Support if you have questions about bcl2fastq versions, or for help troubleshooting its download and installation. See 10x Genomics Knowledge Base article, How to troubleshoot installing bcl2fastq or bcl-convert?
You must create a sample sheet for bcl2fastq to correctly embed the names of samples into output FASTQ files. This section has an example dual index GEX sample sheet and an example single index ATAC sample sheet that can be customized for your experiment.
The Illumina Experiment Manager can also be used to create sample sheets for bcl2fastq.
Please note that the index
sequence in the sample index reference file should be entered into the index
column of the bcl2fastq
sample sheet. Either the index2_workflow_a
or index2_workflow_b
sequence should be entered into the index2
column of the bcl2fastq sample sheet, depending on the sequencing instrument in use.
index2_workflow_a
: NovaSeq™ 6000 v1, MiSeq™, HiSeq™ 2500, and HiSeq™ 2000.index2_workflow_b
: NovaSeq™ 6000 v1.5, iSeq™ 100, MiniSeq™, NextSeq™, HiSeq™ X, and HiSeq™ 3000/4000.More information about dual-indexing is available in the Illumina Indexed Sequencing Overview Guide
Do not trim adapters during demultiplexing. Leave these settings blank. Trimming adapters from reads can potentially damage the 10x barcodes and the UMIs, resulting in pipeline failure or data loss.
If you are using an Illumina sample sheet for demultiplexing with bcl2fastq, BCL Convert, or our mkfastq pipeline, please remove these lines under the [Settings] section: Adapter or AdapterRead1 or AdapterRead2 .
|
When you plan an experiment, you should know the name of the sample index set used for each sample, which comes from the reagent kit (such as "SI-TT-A2"). For each sample, enter its lane, sample name, and sample index set into the Illumina bcl2fastq sample sheet. Here is a bcl2fastq sample sheet for a HiSeq 2500. This sample sheet shows two samples, sample1
is split on two lanes (1 and 2) and the sample2
is only found on lane 1:
[Data] Lane,Sample_ID,index,index2 1,sample1,GTAACATGCG,AGGTAACACT 2,sample1,GTAACATGCG,AGGTAACACT 1,sample2,GTGGATCAAA,GCCAACCCTG
For each sample, enter its lane, sample name, and set of four sample indices into the Illumina bcl2fastq sample sheet. Here is an example using "SI-GA-A1" indices. This sample sheet shows two samples, sample1
is split on two lanes (1 and 2) and the sample2
is only found on lane 1:
[Data] Lane,Sample_ID,index 1,sample1,GGTTTACT 1,sample1,CTAAACGG 1,sample1,TCGGCGTC 1,sample1,AACCGTAA 2,sample1,GGTTTACT 2,sample1,CTAAACGG 2,sample1,TCGGCGTC 2,sample1,AACCGTAA 1,sample2,AAACGGCG 1,sample2,CCTACCAT 1,sample2,GGCGTTTC 1,sample2,TTGTAAGA
Illumina bcl2fastq must be called with the correct --use-bases-mask
argument and other arguments to properly demultiplex and output FASTQs for all the reads in a Chromium library. Learn more about the --use-bases-mask
argument in this Knowledge Base article.
In the examples below, ${FLOWCELL_DIR}
is the directory that contains a flow cell's Data folder, ${OUTPUT_DIR}
is the directory that you want to output FASTQs to, and
${SAMPLE_SHEET_PATH}
is the path to the sample sheet CSV you created.
bcl2fastq v2.20 or later
bcl2fastq --use-bases-mask=Y50,I8,Y24,Y50 \ --create-fastq-for-index-reads \ --minimum-trimmed-read-length=8 \ --mask-short-adapter-reads=8 \ --ignore-missing-positions \ --ignore-missing-controls \ --ignore-missing-filter \ --ignore-missing-bcls \ -r 6 -w 6 \ -R ${FLOWCELL_DIR} \ --output-dir=${OUTPUT_DIR} \ --interop-dir=${INTEROP_DIR} \ --sample-sheet=${SAMPLE_SHEET_PATH}
To limit bcl2fastq to a subset of lanes, supply values
to the --tiles
argument.
If you add extra bases to a sample index read, you will need to account for this in the
--use-bases-mask
argument. For example, if you ran a sample index read with 9 bases, you must truncate the last base for Cell Ranger ARC to run correctly.
You can exclude a single base by adding a single n
character to
the read argument, or adding n*
to exclude all bases after a certain position. See below:
Read | Desired | Actual | Argument |
---|---|---|---|
i7 Index Read (I1) | 8 | 9 | I8n |
Learn more about the --use-bases-mask
argument in this Knowledge Base article.
Illumina's BCL Convert is another software application that converts BCL files into FASTQ files. This page explains how to use BCL Convert for Chromium Single Cell Multiome ATAC + Gene Expression libraries and provides example sample sheets to use as inputs. In addition, there is a step-by-step guide with an example BCL dataset for generating FASTQs compatible with Cell Ranger ARC.
BCL Convert is available for download and installation on the Illumina support website as an RPM package. An Illumina account is required for download. Please contact Illumina Support if you have questions about BCL Convert versions, or need help troubleshooting its download and installation. See 10x Genomics Knowledge Base article, How to troubleshoot installing bcl2fastq or bcl-convert?
BCL Convert uses a sample sheet CSV file to specify sample information and parameters for a run, instead of command line options. For a full description of the sample sheet and list of settings, please see the Illumina documentation.
Do not trim adapters during demultiplexing. Leave these settings blank. Trimming adapters from reads can potentially damage 10x Barcodes and UMIs, resulting in pipeline failure or data loss.
If you are using an Illumina sample sheet for demultiplexing with bcl2fastq, bcl-convert, or our mkfastq pipeline, please remove these lines under the [Settings] section: Adapter or AdapterRead1 or AdapterRead2 .
|
The basic sample sheet has three sections. Each section is described here and example sample sheets are provided for both single and dual indexed samples.
[Header]
can be used to specify the BCL sample sheet version.
[BCLConvert_Settings]
in a V2 sample sheet, this section is used to specify several FASTQ conversion settings including whether or not to create FASTQ files for indices. Use [Settings]
in a V1 sample sheet. Leave adapter trim settings blank in this section.
[BCLConvert_Data]
in a V2 sample sheet, this section is used to sort samples and index adapters based on the following column headers. The [BCLConvert_Data]
section must be renamed [Data]
or [data]
for a V1 sample sheet:
Column name | Description |
---|---|
Lane |
Optional. Generates FASTQ files only for the samples with the specified lane number. Allows only one valid integer. If the same sample has been run on multiple lanes of the flow cell, add a new row for each lane. If the lane is not specified, indices are searched in all lanes. |
Sample_ID |
The sample ID |
index |
I7 index sequence |
index2 |
I5 index sequence |
Sample_Project |
Optional. Used when --bcl-sampleproject-subdirectories is specified in BCL Convert run. Only alphanumeric characters, dashes, and underscores are allowed. Logs or Reports should not be used as directory names for this flag, as they are already default output directories. Learn more. |
This section shows an example sample sheet for GEX libraries created with the Dual Index Kit TT, Set A. The parameter CreateFastqForIndexReads,0
under [BCLConvert_Settings]
tells BCL Convert not to generate FASTQ files for indices. Cell Ranger does not require FASTQ files for indices. This sample sheet shows two samples, sample1
is split on two lanes (1 and 2) and the sample2
is only found on lane 1:
[Header] FileFormatVersion,2 [BCLConvert_Settings] CreateFastqForIndexReads,0 [BCLConvert_Data] Lane,Sample_ID,index,index2 1,sample1,GTAACATGCG,AGGTAACACT 2,sample1,GTAACATGCG,AGGTAACACT 2,sample2,GTGGATCAAA,GCCAACCCTG
This section shows an example sample sheet for ATAC libraries created with the Single Index Kit N Set A. Please note that some settings under the [BCLConvert_Settings]
section are different because ATAC libraries have a different read structure. The parameter CreateFastqForIndexReads,1
in combination with TrimUMI,0
tells BCL Convert to output UMI cycles to FASTQ files. The OverrideCycles
parameter specifies the sequencing and indexing cycles that should be used when processing the data and must have the same number of semicolon-delimited fields in string as the sequencing and indexing reads specified in your RunInfo.xml
.
This sample sheet shows two samples, sample1
and sample2
. Since lane
is not specified in this example, indices are searched in all lanes.
[Header] FileFormatVersion,2 [BCLConvert_Settings] CreateFastqForIndexReads,1 TrimUMI,0 OverrideCycles,Y50;I8;U24;Y49 [BCLConvert_Data] Sample_ID,index Sample1_SI,GGTTTACT Sample1_SI,CTAAACGG Sample1_SI,TCGGCGTC Sample1_SI,AACCGTAA Sample2_SI,AGCTGCGT Sample2_SI,CAACCATC Sample2_SI,GTGGAGCA Sample2_SI,TCTATTAG
The command to run BCL Convert:
bcl-convert --bcl-input-directory <folder-with-bcls> \ --output-directory <name-of-output-dir-for-FASTQs> \ --sample-sheet <samplesheet-filename.csv>
Required arguments:
--bcl-input-directory
: path to the input directory containing BCL files--output-directory
: path to an output directory for newly created FASTQ files. This directory must not exist before command execution.--sample-sheet
: path to a CSV file containing sample information as described in the creating the sample sheet section. Providing a path to the directory instead of the specific CSV file can cause the software to hang.A successful BCL Convert run looks like this:
Sample sheet being processed by common lib? Yes SampleSheet Settings: CreateFastqForIndexReads = 1 OverrideCycles = Y50;I8;U24;Y49 TrimUMI = 0 shared-thread-linux-native-asio output is disabled bcl-convert Version 00.000.000.4.1.5 Copyright (c) 2014-2022 Illumina, Inc. Command Line: --bcl-input-directory miseq-gex --output-directory gex-fastqs --sample-sheet gex_samplesheet. Conversion Begins. # CPU hw threads available: 64 Parallel Tiles: 4. Threads Per Tile: 16 SW compressors: 64 SW decompressors: 32 SW FASTQ compression level: 1 Conversion Complete.
A new folder is created (name specified by the --output-directory
flag). This folder contains FASTQ file sets, one per sample. The folder also contains Logs
and Reports
sub-directories that contain the run logs and metrics output files respectively.
A convenient way to test BCL Convert is by downloading the MiSeq example dataset. This dataset has been selected for its relatively small size. The example below applies to Chromium Single Cell ATAC and Multiome libraries. It should not be used to run downstream pipelines (e.g. cellranger-arc count).
To follow along:
tar -xvf /working-directory/cellranger-arc-tiny-bcl-atac-2.0.0.tar.gz tar -xvf /working-directory/cellranger-arc-tiny-bcl-gex-2.0.0.tar.gz
bcl-convert --bcl-input-directory /working-directory/miseq-gex \ --output-directory /working-directory/gex-fastqs \ --sample-sheet /working-directory/cellranger-arc-tiny-bcl-gex-samplesheet-2.0.0.csv
Remember to customize the --bcl-input-directory
path with the path to your input directory. This command takes ~10 minutes to complete.
A folder called gex-fastqs
is created in the working directory. This folder contains your newly created FASTQ files.
. ├── gex-fastqs │ ├── Logs │ │ ├── Errors.log │ │ ├── FastqComplete.txt │ │ ├── Info.log │ │ └── Warnings.log │ ├── Reports │ │ ├── Adapter_Cycle_Metrics.csv │ │ ├── Adapter_Metrics.csv │ │ ├── Demultiplex_Stats.csv │ │ ├── Demultiplex_Tile_Stats.csv │ │ ├── fastq_list.csv │ │ ├── Index_Hopping_Counts.csv │ │ ├── IndexMetricsOut.bin │ │ ├── Quality_Metrics.csv │ │ ├── Quality_Tile_Metrics.csv │ │ ├── RunInfo.xml │ │ ├── SampleSheet.csv │ │ └── Top_Unknown_Barcodes.csv │ ├── test_sample_gex_S1_L001_R1_001.fastq.gz │ ├── test_sample_gex_S1_L001_R2_001.fastq.gz │ ├── Undetermined_S0_L001_R1_001.fastq.gz │ └── Undetermined_S0_L001_R2_001.fastq.gz ├── gex_samplesheet.csv ├── runinfo_gex.txt └── runinfo.txt
bcl-convert --bcl-input-directory /working-directory/miseq-atac \ --output-directory /working-directory/atac-fastqs \ --sample-sheet /working-directory/cellranger-arc-tiny-bcl-atac-samplesheet-2.0.0.csv
Remember to customize the --bcl-input-directory
path with the path to your input directory. This command takes ~10 minutes to complete.
A folder called atac-fastqs
is created in the working directory. This folder contains your newly created FASTQ files.
. ├── atac-fastqs │ ├── Logs │ │ ├── Errors.log │ │ ├── FastqComplete.txt │ │ ├── Info.log │ │ └── Warnings.log │ ├── Reports │ │ ├── Adapter_Cycle_Metrics.csv │ │ ├── Adapter_Metrics.csv │ │ ├── Demultiplex_Stats.csv │ │ ├── Demultiplex_Tile_Stats.csv │ │ ├── fastq_list.csv │ │ ├── Index_Hopping_Counts.csv │ │ ├── IndexMetricsOut.bin │ │ ├── Quality_Metrics.csv │ │ ├── Quality_Tile_Metrics.csv │ │ ├── RunInfo.xml │ │ ├── SampleSheet.csv │ │ └── Top_Unknown_Barcodes.csv │ ├── test_sample_S1_L001_I1_001.fastq.gz │ ├── test_sample_S1_L001_I2_001.fastq.gz │ ├── test_sample_S1_L001_R1_001.fastq.gz │ ├── test_sample_S1_L001_R2_001.fastq.gz │ ├── Undetermined_S0_L001_I1_001.fastq.gz │ ├── Undetermined_S0_L001_I2_001.fastq.gz │ ├── Undetermined_S0_L001_R1_001.fastq.gz │ └── Undetermined_S0_L001_R2_001.fastq.gz ├── bcl_convert_samplesheet.csv └── runinfo.txt
Please contact Illumina Support for general questions about BCL Convert. Or refer to BCL Convert documentation:
After generating FASTQs, you are ready to run the cellranger-arc count.