Technical Note, Last Modified on January 12, 2018, Permalink
Supernova is a de novo assembly program that has been designed to assemble germline human genomes, from data generated in a precise fashion that we outline below. For users who are interested in using Supernova to assemble human and non-human genomes, this guidance document outlines some important considerations.
For Supernova assemblies of nonhuman genomes, we have seen both successes and failures, depending on the characteristics of the genome and the quality of the data. Due to the diversity of genome and data characteristics, it is impossible to know a priori how well a specific dataset will assemble. For that reason, using Supernova to assemble non-human genomes remains experimental and may lead to unexpected failures or limited assembly quality.
Supernova requires long, undamaged DNA. We provide protocols for DNA extraction from blood and cell lines, both of which have been validated for human samples. Getting good material from other sample types and organisms can be challenging.
Please Note: Even following all of the guidance in this document by no means guarantees success in assembling non-human genomes.
Class | Support Policy |
---|---|
Human germline genomes | Officially supported. Send us the information packet described below. |
All other | Experimental. Send us the information packet. We will try to help, but cannot guarantee a solution. |
Human germline genomes: officially supported and validated to give high quality results.
All non-human genomes: experimental, may have unexpected features that limit assembly quality
Mammalian genomes: likely to work well (but not identically to human)
Human Non-Germline (e.g. Cancer): unlikely to work well
High repeat content: may not work well (and may increase runtime several fold)
Genomes having ploidy > 2: unlikely to work well; performance would depend on the particular genome
Small genomes (100 Mb or greater): may work if specific instructions below are followed, but high repeat content can be a problem.
Microbes (<100Mb): unlikely to work well
Large genomes (> 3.2 Gb): risky, see instructions below
Please keep reading! Other factors can dramatically affect results.
Clonality: We strongly recommend that DNA be obtained from an individual organism or clonal population.
DNA size: Long, undamaged DNA is required for a high quality assembly. Supernova measures DNA versus an intermediate assembly, and reports the length-weighted mean that it observes. We recommend that this value be at least 50 kb, and preferably 100 kb. Reported DNA length is highly correlated with several assembly statistics, including contig length, phase block length and scaffold length. Because of differences in measurement and possible breaking of molecules during library construction, the DNA length observed by Supernova may be smaller than the DNA length observed initially on a gel. The exact mechanisms whereby DNA is broken are not fully understood, but may include breaking at nick sites.
*Note: The accuracy of the DNA length reported by Supernova is dependent on achieving a minimum assembly quality. If Supernova is unable to estimate the DNA length, it will report a value of zero, which means "not estimated." In some cases, Supernova may report a low value that is incorrect, due to a fragmented assembly. We are working to make this calculation more robust.
Instrument | Configuration | Results |
---|---|---|
HiSeq X | Standard | Excellent. |
HiSeq 2500 | Rapid run | Excellent. |
HiSeq 2500 | High output | Not tested. |
HiSeq 4000 | Standard | Usable, but observed to produce contigs half as long as those from HiSeq X, for 38x human data. |
MiSeq | Standard | Not tested, but performance likely comparable to HiSeq 2500 in rapid run mode. |
NextSeq | Standard | Not Recommended. |
NovaSeq | Standard | Preliminary results look comparable to HiSeq X results. |
Read length: Supernova requires as input 2x150 base reads. Use of significantly shorter or longer reads may result in inferior assemblies.
Sequencing depth: For genomes >1.6 Gb, we recommend sequencing to depth between 38x and 56x. (See below for smaller genomes.) Within this range, higher coverage will generally give better results. For highly polymorphic organisms, we recommend 56x. Coverage higher than 56x may not improve results, and could make them worse.
Genomes significantly smaller than human may be challenging to assemble with Supernova for three reasons:
They may be highly repetitive or have other features that confound the algorithm.
If starting material is solid tissue, it may be challenging to obtain high quality DNA. This is not unique to small genomes, but empirically, a common problem for them.
Read depth per molecule will in general be too low without adjusting laboratory and computational protocols. Specific instructions for how to do this may be found (here). If you plan to assemble a small genome, it essential that you read these instructions before you start generating data.
Genomes larger than human can in principle be assembled with Supernova, provided that the genome characteristics are not too different than human. However, you should supply at most 2000M reads to Supernova. Thus for example for 38x coverage (the lowest recommended), the largest genome would be 8.0 Gb, and for 56x coverage (the highest recommended), the largest genome would be 5.3 Gb. Available memory and disk requirements should be adjusted accordingly.
If you need assistance reviewing your results, please perform the following:
Upload Supernova diagnostic logs (upload instructions)
Create a support ticket by emailing support@10xgenomics.com. Please include the following information:
Non-Human organisms
Questions about internals of the Supernova software.