Supernova2.0, printed on 11/12/2024
The Supernova 2.0 release includes a number of significant changes to the code, with corresponding changes in performance. We compared the performance of versions 1.2 and 2.0 with respect to 20 different datasets, including ones that had been run previously, customer datasets, and novel datasets created just for this purpose. Most of these, along with their assemblies, are available for download.
Supernova has been tested across a broad range of sample types, including vertebrates, plants, and insects; performance on all three categories is markedly improved.
Assembly quality is greatly improved. For example, for human samples, the typical contig length is now 160 kb and the typical scaffold length is now 40 Mb.
We demonstrate end-to-end, practical assembly of single insects, avoiding the problems of standard approaches that inbreed or mix wild individuals.
Streamlined user experience, with many critical metrics added and improved, including accurate measurement of molecule length and genome size.
Assemblies are produced by a turnkey laboratory and computational process based on one library and at markedly low cost. A new simplified workflow requires only standard depth sequencing, and is compatible with the NovaSeq platform.
Supernova runs on a single server, and uses 256 GB memory for most genomes (18 of the 20 tested), and 512 GB for most other genomes. Run time varies from a few hours for small genomes to a few days for human and other similarly sized genomes. The longest observed run time was 8 days, for maize. Several computational bottlenecks have been removed.
We have tested Supernova on an extensive range of samples, ranging from controls to wild-caught specimens. For each sample, we created a single Chromium Linked-Read library, which we sequenced and then assembled using both Supernova 1.2 and 2.0.0, without any tuning or specification of parameters, except varying the number of input reads in a few cases (see below).
# | Sample | Description | Material | DNA Prep | Notes | ||||
---|---|---|---|---|---|---|---|---|---|
1 | hgp | Human Genome project, male [1] | blood | MagAttract | control | ||||
2 | chm | equimolar mix of CHM1/CHM13 | cell line | MagAttract | control | ||||
3 | wfu | NA12878, European, female | cell line | MagAttract | control | ||||
4 | chi | HG00512, Chinese, male | cell line | MagAttract | |||||
5 | yor | NA19240, Yoruba, female | cell line | MagAttract | |||||
6 | yorm | NA19238, Yoruba, female | cell line | MagAttract | |||||
7 | ash | NA24385, Ashkenazi, male | cell line | MagAttract | |||||
8 | pr | HG00733, Puerto Rican, female | cell line | MagAttract | |||||
9 | hummer | hummingbird [2] | tissue | KingFisher | |||||
10 | fish | zebrafish SAT from ZIRC | tissue | Amplicon Express | control | ||||
11 | ruby | dog named Ruby | blood | MagAttract | |||||
12 | grape | flame seedless grape [3] | leaves | grape protocol | |||||
13 | maize | maize B73 | leaves | Amplicon Express | control | ||||
14 | chili | chili pepper [4] | leaves | modified CTAB | |||||
15 | fly | fruit fly iso-1 x Canton-S [5] | one insect | salting out | control | ||||
16 | omoth | one moth collected in Pleasanton | one insect | salting out | |||||
17 | pmoth | second moth collected in Pleasanton | one insect | salting out | |||||
18 | cater | caterpillar collected in Pleasanton | one insect | salting out | |||||
19 | aphid | aphid collected in Pleasanton | one insect | salting out | |||||
20 | aedes | Aedes aegypti F1 ref cross [6] | one insect | salting out | control |
1. Anonymous donor | 4. Allen van Deynze, UCD, Hort Res 5 | |
2. Erich Jarvis, HHMI, bioRxiv | 5. Bloomington Stock Center | |
3. Doreen Ware, CSHL | 6. Ben Matthews, Rockefeller Institute | |
The number of reads provided as input was generally our best guess based on the estimated genome size, so as to yield about 56x coverage. In those cases for which we did not know the genome size, a preliminary run with a guess was sufficient to get an estimate from Supernova. In a few cases, it was advantageous to raise the coverage above 56x; the actual number of reads used is available.
Supernova calculates various metrics on different aspects of the input data and the genome it represents, such as genome size, repetitivity, heterozygosity, and molecule length. For these datasets, they vary widely, as shown below; brief descriptions of the metrics follow the table.
Sample | gsize | %rep | het | %hat | mol_len | p10 | seq | raw_cov | ||
---|---|---|---|---|---|---|---|---|---|---|
hgp | 3274 | 8.1 | 1.42 | 0.09 | 139 | 234 | X | 55.0 | ||
chm | 3212 | 6.5 | 1.30 | 0.10 | 79 | 139 | X | 56.0 | ||
wfu | 3391 | 8.1 | 1.38 | 0.09 | 95 | 146 | X | 53.1 | ||
chi | 3247 | 8.1 | 1.61 | 0.11 | 103 | 125 | X | 55.4 | ||
yor | 3156 | 6.4 | 1.03 | 0.10 | 122 | 156 | X | 57.0 | ||
yorm | 3288 | 7.4 | 1.13 | 0.10 | 119 | 132 | X | 54.7 | ||
ash | 3124 | 7.2 | 1.39 | 0.11 | 119 | 140 | X | 57.6 | ||
pr | 3399 | 8.5 | 1.44 | 0.11 | 103 | 146 | X | 53.0 | ||
hummer | 1102 | 4.2 | 0.36 | 0.06 | 66 | 230 | 2500 | 61.2 | ||
fish | 1680 | 12.6 | 0.30 | 0.47 | 89 | 93 | Nova | 54.3 | ||
ruby | 2407 | 4.5 | 0.86 | 0.22 | 81 | 180 | 2500 | 54.0 | ||
grape | 602 | 20.5 | 0.21 | 1.03 | 74 | 247 | 2500 | 47.0 | ||
maize | 2219 | 35.8 | 7.01 | 0.03 | 81 | 175 | X | 64.0 | ||
chili | 3215 | 6.7 | 0.29 | 0.25 | 45 | 88 | X | 61.1 | ||
fly | 143 | 8.3 | 0.23 | 0.12 | 68 | 455 | Nova | 68.8 | ||
omoth | 199 | 7.7 | 0.08 | 0.51 | 20 | 34 | Nova | 56.4 | ||
pmoth | 330 | 6.0 | 0.17 | 0.20 | 22 | 40 | Nova | 72.8 | ||
cater | 458 | 13.0 | 0.16 | 0.08 | 20 | 18 | Nova | 72.1 | ||
aphid | 512 | 15.7 | 0.41 | 0.99 | 30 | 78 | Nova | 57.1 | ||
aedes | 1323 | 17.6 | 0.41 | 0.04 | 70 | 69 | Nova | 62.9 |
(est_genome_size)
(repfrac)
(hetdist)
(high_AT_index)
(lw_mean_mol_len)
(p10)
(likely_sequencers)
(raw_coverage)
The following table adds N50 contig, phase block, and scaffold sizes. (In the metrics description table, these metrics are contig_N50, phase_block_N50 and scaffold_N50, respectively.) These all show a notable improvement in almost every case. For example, for the hgp sample, the N50 contig size rose from 120.9 to 162.0 kb, the N50 phase block size rose from 4.30 to 5.83 Mb, and the N50 scaffold size rose from 17.18 to 45.60 Mb, a greater than two-fold improvement.
Sample | gsize | %rep | het | %hat | mol_len | p10 | seq | raw_cov | 1.2 contig | 2.0 contig | 1.2 phase | 2.0 phase | 1.2 scaff | 2.0 scaff | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hgp | 3274 | 8.1 | 1.42 | 0.09 | 139 | 234 | X | 55.0 | 120.9 | 162.0 | 4.30 | 5.83 | 17.18 | 45.60 | |||||
chm | 3212 | 6.5 | 1.30 | 0.10 | 79 | 139 | X | 56.0 | 116.4 | 175.2 | 2.65 | 3.21 | 14.78 | 39.53 | |||||
wfu | 3391 | 8.1 | 1.38 | 0.09 | 95 | 146 | X | 53.1 | 120.1 | 165.5 | 2.79 | 3.15 | 18.31 | 39.92 | |||||
chi | 3247 | 8.1 | 1.61 | 0.11 | 103 | 125 | X | 55.4 | 113.7 | 156.1 | 2.60 | 3.12 | 15.51 | 38.17 | |||||
yor | 3156 | 6.4 | 1.03 | 0.10 | 122 | 156 | X | 57.0 | 119.2 | 167.4 | 9.76 | 14.15 | 15.23 | 47.78 | |||||
yorm | 3288 | 7.4 | 1.13 | 0.10 | 119 | 132 | X | 54.7 | 113.4 | 159.0 | 8.68 | 12.55 | 19.42 | 49.47 | |||||
ash | 3124 | 7.2 | 1.39 | 0.11 | 119 | 140 | X | 57.6 | 106.1 | 153.8 | 4.02 | 5.26 | 16.71 | 36.11 | |||||
pr | 3399 | 8.5 | 1.44 | 0.11 | 103 | 146 | X | 53.0 | 122.3 | 169.0 | 3.29 | 3.96 | 18.16 | 46.30 | |||||
hummer | 1102 | 4.2 | 0.36 | 0.06 | 66 | 230 | 2500 | 61.2 | 100.5 | 175.0 | 11.38 | 17.48 | 12.42 | 31.86 | |||||
fish | 1680 | 12.6 | 0.30 | 0.47 | 89 | 93 | Nova | 54.3 | 17.1 | 20.5 | 0.17 | 1.70 | 0.68 | 4.04 | |||||
ruby | 2407 | 4.5 | 0.86 | 0.22 | 81 | 180 | 2500 | 54.0 | 77.5 | 100.4 | 2.91 | 3.69 | 13.05 | 36.24 | |||||
grape | 602 | 20.5 | 0.21 | 1.03 | 74 | 247 | 2500 | 47.0 | 38.3 | 55.7 | 0.48 | 1.70 | 0.58 | 2.29 | |||||
maize | 2219 | 35.8 | 7.01 | 0.03 | 81 | 175 | X | 64.0 | 20.9 | 31.0 | 0.04 | 0.04 | 0.27 | 1.78 | |||||
chili | 3215 | 6.7 | 0.29 | 0.25 | 45 | 88 | X | 61.1 | 105.7 | 167.2 | 1.72 | 3.91 | 3.09 | 13.60 | |||||
fly | 143 | 8.3 | 0.23 | 0.12 | 68 | 455 | Nova | 68.8 | 113.7 | 166.5 | 5.00 | 13.68 | 9.12 | 20.49 | |||||
omoth | 199 | 7.7 | 0.08 | 0.51 | 20 | 34 | Nova | 56.4 | 37.8 | 63.2 | 0.23 | 0.68 | 0.24 | 0.69 | |||||
pmoth | 330 | 6.0 | 0.17 | 0.20 | 22 | 40 | Nova | 72.8 | 63.7 | 107.8 | 0.97 | 2.23 | 1.71 | 6.68 | |||||
cater | 458 | 13.0 | 0.16 | 0.08 | 20 | 18 | Nova | 72.1 | 21.7 | 32.8 | 0.07 | 0.17 | 0.06 | 0.06 | |||||
aphid | 512 | 15.7 | 0.41 | 0.99 | 30 | 78 | Nova | 57.1 | 75.4 | 104.7 | 0.98 | 4.48 | 1.04 | 5.00 | |||||
aedes | 1323 | 17.6 | 0.41 | 0.04 | 70 | 69 | Nova | 62.9 | 20.3 | 29.7 | 0.09 | 0.35 | 0.07 | 0.15 |
The following table adds assembly accuracy and organization measures. Because the perfect stretch and misassembly estimate rely on a reference sequence, these two metrics are not generally available; we have calculated them here where we can. All three metrics are described below the table.
Sample | gsize | %rep | het | %hat | mol_len | p10 | seq | raw_cov | 1.2 perf | 2.0 perf | 1.2 mis | 2.0 mis | 1.2 m10 | 2.0 m10 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hgp | 3274 | 8.1 | 1.42 | 0.09 | 139 | 234 | X | 55.0 | 22.77 | 26.79 | 1.13 | 0.44 | 2.41 | 1.89 | |||||
chm | 3212 | 6.5 | 1.30 | 0.10 | 79 | 139 | X | 56.0 | . | . | 0.32 | 0.12 | 2.05 | 1.57 | |||||
wfu | 3391 | 8.1 | 1.38 | 0.09 | 95 | 146 | X | 53.1 | 20.04 | 21.74 | 1.02 | 0.69 | 2.08 | 1.59 | |||||
chi | 3247 | 8.1 | 1.61 | 0.11 | 103 | 125 | X | 55.4 | . | . | 0.75 | 0.42 | 2.38 | 1.93 | |||||
yor | 3156 | 6.4 | 1.03 | 0.10 | 122 | 156 | X | 57.0 | . | . | 0.41 | 0.26 | 2.18 | 1.73 | |||||
yorm | 3288 | 7.4 | 1.13 | 0.10 | 119 | 132 | X | 54.7 | . | . | 0.38 | 0.18 | 2.51 | 1.99 | |||||
ash | 3124 | 7.2 | 1.39 | 0.11 | 119 | 140 | X | 57.6 | . | . | 0.62 | 0.35 | 2.46 | 1.90 | |||||
pr | 3399 | 8.5 | 1.44 | 0.11 | 103 | 146 | X | 53.0 | . | . | 0.46 | 0.24 | 2.24 | 1.84 | |||||
hummer | 1102 | 4.2 | 0.36 | 0.06 | 66 | 230 | 2500 | 61.2 | . | . | . | . | 6.02 | 5.40 | |||||
fish | 1680 | 12.6 | 0.30 | 0.47 | 89 | 93 | Nova | 54.3 | . | . | . | . | 31.62 | 25.18 | |||||
ruby | 2407 | 4.5 | 0.86 | 0.22 | 81 | 180 | 2500 | 54.0 | . | . | . | . | 2.89 | 2.11 | |||||
grape | 602 | 20.5 | 0.21 | 1.03 | 74 | 247 | 2500 | 47.0 | . | . | . | . | 26.67 | 15.26 | |||||
maize | 2219 | 35.8 | 7.01 | 0.03 | 81 | 175 | X | 64.0 | 15.82 | 30.55 | 2.14 | 1.40 | 26.38 | 9.85 | |||||
chili | 3215 | 6.7 | 0.29 | 0.25 | 45 | 88 | X | 61.1 | . | . | . | . | 6.79 | 4.48 | |||||
fly | 143 | 8.3 | 0.23 | 0.12 | 68 | 455 | Nova | 68.8 | 29.27 | 37.10 | 0.62 | 0.09 | 7.06 | 5.66 | |||||
omoth | 199 | 7.7 | 0.08 | 0.51 | 20 | 34 | Nova | 56.4 | . | . | . | . | 26.39 | 14.16 | |||||
pmoth | 330 | 6.0 | 0.17 | 0.20 | 22 | 40 | Nova | 72.8 | . | . | . | . | 6.88 | 3.29 | |||||
cater | 458 | 13.0 | 0.16 | 0.08 | 20 | 18 | Nova | 72.1 | . | . | . | . | 36.93 | 20.14 | |||||
aphid | 512 | 15.7 | 0.41 | 0.99 | 30 | 78 | Nova | 57.1 | . | . | . | . | 12.28 | 6.92 | |||||
aedes | 1323 | 17.6 | 0.41 | 0.04 | 70 | 69 | Nova | 62.9 | 10.30 | 13.22 | 3.76 | 3.32 | 44.41 | 22.27 |
(m10)
The following table adds computational performance statistics. As shown, all but two assemblies were run on 256 GB servers. Memory use for Supernova has increased about 10% since version 1.2, and run times have increased on average by 60%. However, because of targeted optimizations, the likelihood of the extreme run times experienced by some users of 1.2 should now be much lower.
Sample | gsize | %rep | het | %hat | mol_len | p10 | seq | rawcov | mem | days | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hgp | 3274 | 8.1 | 1.42 | 0.09 | 139 | 234 | X | 55.0 | 256 | 3.2 | |||
chm | 3212 | 6.5 | 1.30 | 0.10 | 79 | 139 | X | 56.0 | 256 | 2.8 | |||
wfu | 3391 | 8.1 | 1.38 | 0.09 | 95 | 146 | X | 53.1 | 256 | 3.4 | |||
chi | 3247 | 8.1 | 1.61 | 0.11 | 103 | 125 | X | 55.4 | 256 | 3.3 | |||
yor | 3156 | 6.4 | 1.03 | 0.10 | 122 | 156 | X | 57.0 | 256 | 2.9 | |||
yorm | 3288 | 7.4 | 1.13 | 0.10 | 119 | 132 | X | 54.7 | 256 | 3.1 | |||
ash | 3124 | 7.2 | 1.39 | 0.11 | 119 | 140 | X | 57.6 | 256 | 3.2 | |||
pr | 3399 | 8.5 | 1.44 | 0.11 | 103 | 146 | X | 53.0 | 256 | 3.4 | |||
hummer | 1102 | 4.2 | 0.36 | 0.06 | 66 | 230 | 2500 | 61.2 | 256 | 1.1 | |||
fish | 1680 | 12.6 | 0.30 | 0.47 | 89 | 93 | Nova | 54.3 | 256 | 2.3 | |||
ruby | 2407 | 4.5 | 0.86 | 0.22 | 81 | 180 | 2500 | 54.0 | 256 | 1.9 | |||
grape | 602 | 20.5 | 0.21 | 1.03 | 74 | 247 | 2500 | 47.0 | 256 | 0.7 | |||
maize | 2219 | 35.8 | 7.01 | 0.03 | 81 | 175 | X | 64.0 | 512 | 7.9 | |||
chili | 3215 | 6.7 | 0.29 | 0.25 | 45 | 88 | X | 61.1 | 512 | 3.4 | |||
fly | 143 | 8.3 | 0.23 | 0.12 | 68 | 455 | Nova | 68.8 | 256 | 0.1 | |||
omoth | 199 | 7.7 | 0.08 | 0.51 | 20 | 34 | Nova | 56.4 | 256 | 0.2 | |||
pmoth | 330 | 6.0 | 0.17 | 0.20 | 22 | 40 | Nova | 72.8 | 256 | 0.3 | |||
cater | 458 | 13.0 | 0.16 | 0.08 | 20 | 18 | Nova | 72.1 | 256 | 0.8 | |||
aphid | 512 | 15.7 | 0.41 | 0.99 | 30 | 78 | Nova | 57.1 | 256 | 0.5 | |||
aedes | 1323 | 17.6 | 0.41 | 0.04 | 70 | 69 | Nova | 62.9 | 256 | 2.4 |
(mem_peak)
(etime_h)
All assemblies were carried out on 28 core servers at 10x Genomics, having processor “Intel Xeon CPU E5-2697 v3 @ 2.6GHz”.
All assemblies were run twice to confirm exact reproducibility of results.