Technical Note, Last Modified on December 9, 2016, Permalink
Supernova is a de novo assembly program that has been designed to assemble germline** human** genomes, from data generated in a precise fashion that we outline below.
Not following these instructions entails significant risk. Both choice of organism and processing are critical. We note in particular two pitfalls: Assembling a nonhuman genome may produce outstanding or unsatisfactory results. Due to the diversity of genome characteristics it is impossible to know a priori how well a specific genome will assemble. Supernova requires long, undamaged DNA. We provide protocols for DNA extraction from blood and cell lines, both of which have been validated for human samples. Getting good material from other sample types and organisms can be challenging. More details and guidance are provided below. |
Supernova Support Policy | |
Class | support |
Human germline genomes, per instructions below | Send us the information packet described below. We will do everything we can to make your project work. |
All other | Send us the information packet. We will look at it, but cannot guarantee a solution. It can be difficult to troubleshoot failed Supernova assemblies. |
H****uman germline genomes: officially supported and validated to give high quality results.
All non-human genomes: experimental, may have unexpected features that limit assembly quality
Mammalian genomes: likely to work well (but not identically to human)
High repeat content: may not work well (and may increase runtime several fold)
Non-diploid genomes: unlikely to work well
Small genomes (100 Mb or greater**)**: may work if specific instructions below are followed, but high repeat content can be a problem.
Large genomes (> 3.2 Gb): risky, see instructions below
Please keep reading! Other factors can dramatically affect results.
Clonality: We strongly recommend that DNA be obtained from an individual organism or clonal population.
DNA size: **Long, undamaged DNA is required for a high quality assembly****. **Supernova measures DNA versus an intermediate assembly, and reports the length-weighted mean that it observes. We recommend that this value be at least 50 kb, and preferably 100 kb. DNA length is highly correlated with several assembly statistics, including contig length, phase block length and scaffold length. Because of differences in measurement and possible breaking of molecules during library construction, the DNA length observed by Supernova may be smaller than the DNA length observed initially on a gel. The exact mechanisms whereby DNA is broken are not fully understood, but may include breaking at nick sites.
**_Note: The accuracy of DNA length reported by Supernova is dependent on achieving a minimum_**
_ _assembly quality. In cases where the assembly has a problem, the length reported will be
_ inaccurate. A value of 0, in particular, indicates a problem has occurred in the calculation._
NextSeq Standard Not tested. a
Read length: Supernova requires as input 2x150 base reads. Use of significantly shorter or longer reads may result in inferior assemblies.
Sequencing depth: For genomes >1.6 Gb, we recommend sequencing to depth between 38x and 56x. (See below for smaller genomes.) Within this range, higher coverage will generally give better results. For highly polymorphic organisms, we recommend 56x. Coverage higher than 56x may not improve results, and could make them worse.
Genomes significantly smaller than human may be challenging to assemble with Supernova for three reasons:
They may be highly repetitive or have other features that confound the algorithm.
If starting material is solid tissue, it may be challenging to obtain high quality DNA. This is not unique to small genomes, but empirically, a common problem for them.
The number of read pairs for each long molecule (Linked-Reads Per Molecule, or LPM) will be too low without adjusting sequencing. See below and the Tech Note "Chromium Genome Guidance for Alternative Applications" for more details. (DL adding link.)
For this third point, we describe here a schema that theoretically should work, but which has not been tested. Following these instructions entails risk, and success is not guaranteed.
Load less DNA. For human germline genomes, the standard input mass is 1.25 ng. Loading of as little as 0.6 ng is likely to give high quality data. For genomes in the range 1.6-3.2 Gb, the mass can be reduced proportionally to the genome size, and that is the only modification needed. It is very important that input DNA be quantified accurately. Accidentally loading too little DNA may result in a high read duplication rate.
Sequence to greater depth and subsample by barcode. For genomes smaller than 1.6 Gb, load 0.6 ng and sequence to depth (1.6/G) * 56x, where G is your genome size in Gb. Once sequencing is complete, subsample your reads, selecting all reads in a random sample of barcodes, comprising (G/1.6) times the total number of barcodes. For now you will need to provide your own solution to do this.
These instructions should yield the same LPM that would be obtained for human genomes at 56x coverage.
Genomes larger than human can in principle be assembled with Supernova, provided that the genome characteristics are not too different than human. However, you should supply at most 2000M reads to Supernova. Thus for example for 38x coverage (the lowest recommended), the largest genome would be 8.0 Gb, and for 56x coverage (the highest recommended), the largest genome would be 5.3 Gb.
If you need assistance reviewing your results, please perform the following
Upload Supernova diagnostic logs:
Run supernova upload youremail yoursample/yoursample.mri.tgz
http://support.10xgenomics.com/de-novo-assembly/software/pipelines/latest/troubleshooting
Create a support ticket by emailing support@10xgenomics.com. Please include the following information
Make a comment that your logs have been uploaded.
Attach your supernova summary.csv
Attach your Gel image + any information about your expected molecule length
What sequencing instrument was used.
Include information about your organism and genome characteristics
Haploid genome size
Ploidy
Organism
Non Human organisms
Questions about internals of the Supernova software .
We provide support for questions regarding use of the Supernova pipeline.
Supernova source code is available for advanced customers who want to understand the algorithm internals. However, we can not provide support for questions pertaining to modifications (e.g. adjusting parameters or source code) or specifics of how the algorithm works.