Instructions to Download and Process FASTQs of 1.3M Brain Cells
Technical Note, Last Modified on June 17, 2022, Permalink
Due to the large size of the data (3.6 Terabytes), the raw data will not be available directly from our website. Users can download the raw fastq data from Amazon S3 at their cost using the 'Requestor Pays Bucket' option. (Depending on user's Amazon region, Amazon may charge up to ~$350 for this data transfer). Interested users should follow the steps here to download and process the FASTQ files.
1. Download the files
- Use the instructions at this Amazon Link to transfer the fastq files from this bucket: s3://10x.largefiles/1M_neurons
- Download the mros (mros.tar.gz) from s3://10x.largefiles/1M_neurons to the location you want to run the pipestances, e.g. /path/to/pipestances
- Download the aggregator.csv file from s3://10x.largefiles/1M_neurons to the location you want to run the aggregation, e.g. /path/to/aggregator
- Download the reanalyze.csv file from https://support.10xgenomics.com/single-cell/datasets/1M_neurons to the location you want to run the reanalysis, e.g. /path/to/reanalyze
2. Extract the fastqs
- One flowcell per folder - to a common location - e.g.
/path/to/fastqs/<flowcell ID>
3. Run the count pipeline
- Extract the mros
- Replace /path/to/fastqs in all the mros with your actual path to the flowcell folders
- Replace /path/to/refdata-cellranger/mm-1.2.0 in all the mros with your actual path to the refdata-cellranger/mm-1.2.0
- For each mro, run:
cellranger mrp <mrofile>.mro <mrofile>
4. Run the aggregator pipeline
- Replace /path/to/pipestances in aggregator.csv with your actual pipestance path
- Run:
cellranger aggr --id=neuron_aggregation --csv=aggregator.csv --nosecondary
- NOTE: this uses >300GB RAM - it was run at 10x on a machine with 384GB RAM
- NOTE: E18_20160930_Neurons_Sample_14 and E18_20160930_Neurons_Sample_26 are omitted from the aggregation
5. Run the reanalyze pipeline:
cellranger reanalyze --id=neuron_reanalyze --matrix=/path/to/aggregator/neuron_aggregation/outs/filtered_gene_bc_matrices_h5.h5 --params=reanalyze.csv
(NOTE: this uses >300GB RAM - it was run at 10x on a machine with 384GB RAM)