ABySS | IZO-SGI Scientific Computation

General Information

1.3.2 version of ABySS (Assembly By Short Sequences). ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. ABySS can be executed in parallel.

See also the installed [intlink id=”6043″ type=”post”]velvet[/intlink] and comparing both we have published article.

How to use

The executables can be found in /software/abyss/bin. To run abyss in a script type in it:

/software/abyss/bin/abyss-pe [abyss-pe options]

Performance

See also the installed [intlink id=”6043″ type=”post”]velvet[/intlink] and comparing both we have published article.

Parallelization

Some benchmarks has been performed with ABySS. They have been performed using file from an Illumina HiSeq2000 NGS with 100 bp per sequence. In the table 1 we can see an example about how ABySS scales as a function of the number of cores. As we can see ABySS scales very up to 8 cores. The results is valid unless for more than 10e6 sequences.

Table 1. Execution time of `abyss-pe` in seconds as a function the number of cores
cores	2	4	8	12	24
Time (s)	47798	27852	16874	14591	18633
Aceleration	1	1.7	2.8	3.3	2.6
Performance(%)	100	86	71	55	21

Execution time

We have analized as well the execution time as a function of the size of the data. In the table 2 we observe how from 1 million to 10 millions of sequences the execution time increases by 10 as well. From 10 to 100 millions of sequences the time increases a little more, between 10 t0 20. Therefore, the behavior is more or less lineal.

Table 2. Execution time in seconds of `abyss-pe` executed in 2, 4 y 8 cores as a function of the number of processed sequences.
sequences	10e6	10e7	10e8
Time in 2 cores (s)	247	2620	47798
Time in 4 cores (s)	134	1437	27852
Time in 8 cores (s)	103	923	16874

RAM memory

In these kind of programs more important than the execution time, which is reasonable, is the RAM memory usage, which can limit the calculation type. In the table 3 we observe how the RAM increases as a function of the number of sequences. We also show the logarithms of the measured values which has been used for a lineal regression. The jobs has been performed in 12 cores.

Tabl3 3. RAM memory used by `abyss-pe` as a function of the number of processed sequences. The logarithms of the measured values are also shown.
sequences	10e6	5*10e6	10e7	5*10e7	10e8
RAM (GB)	4.0	7.6	11	29	44
log(sequences)	6	6.7	7	7.7	8
log(RAM)	0.60	0.88	1.03	1.46	1.65

From the values of the table we obtain a fitting of the RAM in GB as a function of the number of sequences (s) to the equation

log(RAM)=0.53*log(s)-2.65

o equivalently

RAM=(s^0.53)/447

Conclusion

The memory usage is smaller than in other assemblers like [intlink id=”6043″ type=”post”]Velvet[/intlink], see as well the report Velvet performance in the machines of the Computing Service of the UPV/EHU and comparing both we have published article. In addition, the parallelization with MPI of ABySS allows to aggregate the RAM memory of several nodes to perform larger calculations.

More information

ABySS web page.
[intlink id=”6043″ type=”post”]Velvet[/intlink] assembler.
Velvet performance in the machines of the Computing Service of the UPV/EHU report.

Velvet and ABySS performance in the machines of the Computing Service of the UPV/EHU, post in the hpc blog.

abyss-pe