General Information
1.3.2 version of ABySS (Assembly By Short Sequences). ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. ABySS can be executed in parallel.
See also the installed [intlink id=”6043″ type=”post”]velvet[/intlink] and comparing both we have published article.
How to use
The executables can be found in /software/abyss/bin
. To run abyss in a script type in it:
/software/abyss/bin/abyss-pe [abyss-pe options]
Performance
See also the installed [intlink id=”6043″ type=”post”]velvet[/intlink] and comparing both we have published article.
Parallelization
Some benchmarks has been performed with ABySS. They have been performed using file from an Illumina HiSeq2000 NGS with 100 bp per sequence. In the table 1 we can see an example about how ABySS scales as a function of the number of cores. As we can see ABySS scales very up to 8 cores. The results is valid unless for more than 10e6 sequences.
cores | 2 | 4 | 8 | 12 | 24 |
Time (s) | 47798 | 27852 | 16874 | 14591 | 18633 |
Aceleration | 1 | 1.7 | 2.8 | 3.3 | 2.6 |
Performance(%) | 100 | 86 | 71 | 55 | 21 |
Execution time
We have analized as well the execution time as a function of the size of the data. In the table 2 we observe how from 1 million to 10 millions of sequences the execution time increases by 10 as well. From 10 to 100 millions of sequences the time increases a little more, between 10 t0 20. Therefore, the behavior is more or less lineal.
sequences | 10e6 | 10e7 | 10e8 |
Time in 2 cores (s) | 247 | 2620 | 47798 |
Time in 4 cores (s) | 134 | 1437 | 27852 |
Time in 8 cores (s) | 103 | 923 | 16874 |
RAM memory
In these kind of programs more important than the execution time, which is reasonable, is the RAM memory usage, which can limit the calculation type. In the table 3 we observe how the RAM increases as a function of the number of sequences. We also show the logarithms of the measured values which has been used for a lineal regression. The jobs has been performed in 12 cores.
sequences | 10e6 | 5*10e6 | 10e7 | 5*10e7 | 10e8 |
RAM (GB) | 4.0 | 7.6 | 11 | 29 | 44 |
log(sequences) | 6 | 6.7 | 7 | 7.7 | 8 |
log(RAM) | 0.60 | 0.88 | 1.03 | 1.46 | 1.65 |
From the values of the table we obtain a fitting of the RAM in GB as a function of the number of sequences (s) to the equation
log(RAM)=0.53*log(s)-2.65
o equivalently
RAM=(s^0.53)/447
Conclusion
The memory usage is smaller than in other assemblers like [intlink id=”6043″ type=”post”]Velvet[/intlink], see as well the report Velvet performance in the machines of the Computing Service of the UPV/EHU and comparing both we have published article. In addition, the parallelization with MPI of ABySS allows to aggregate the RAM memory of several nodes to perform larger calculations.
More information
ABySS web page.
[intlink id=”6043″ type=”post”]Velvet[/intlink] assembler.
Velvet performance in the machines of the Computing Service of the UPV/EHU report.
Velvet and ABySS performance in the machines of the Computing Service of the UPV/EHU, post in the hpc blog.
abyss-pe