General information
4.02 version of the MD program for macromolecules, polymers, ionic systems, solutions and other molecular systems. Developed at the Daresbury Laboratory. In Pendulo the 2.2 version remains. There is already the DL_POLY_CLASSIC version which currently is not been developed.
How to submit to the queue
The program is installed in all the architectures, Arina and Pendulo (DL_POLY 2.2). To execute it include in the scripts:
/software/bin/DL_POLY/DL_POLY.Z
The program will exekute in GPGPUs if it starts in these kind of nodes. Besides, they can be selected by using the gpu label within [intlink id=”244″ type=”post”]the queue system[/intlink].
The GUI is also installed. To execute it use:
/software/bin/DL_POLY/gui
Some utilities has been installed in the /software/bin/DL_POLY/
directory.
Benchmark
We show a small benchmarks performed with dl_ploly_4.02. We stady the parallelization as well as the performance of the GPGPUs.
System | 1 cores | 4 cores | 8 cores | 16 cores | 32 cores | 64 cores |
Itanium 1.6 GHz | 1500 | 419 | 248 | 149 | 92 | 61 |
Opteron | 1230 | 503 | 264 | 166 | 74 | |
Xeon 2.27 GHz | 807 | 227 | 126 | 67 | 37 | 25 |
We show in the firs benchamrk that DL_POLY scales very well and that the xeon nodes are the fastest ones, so we recomend them for large jobs.
System | 1 cores | 2 cores | 4 cores | 8 cores | 16 cores | 32 cores |
Itanium 1.6 GHz | 2137 | 303 | 165 | 93 | 47 | |
Opteron | 1592 | 482 | 177 | 134 | 55 | |
Xeon 2.27 GHz | 848 | 180 | 92 | 48 | 28 | |
1 GPGPU | 125 | 114 | 104 | 102 | ||
2 GPGPU | 77 | 72 | 69 | |||
4 GPGPU | 53 | 50 | ||||
8 GPGPU | 37 |
System | 1 cores | 2 cores | 4 cores | 8 cores | 16 cores | 32 cores | 64 cores |
Xeon 2.27 GHz | 2918 | 774 | 411 | 223 | 122 | 71 | |
1 GPGPU | 362 | 333 | 338 | 337 | |||
2 GPGPU | 240 | 222 | 220 | ||||
4 GPGPU | 145 | 142 | |||||
8 GPGPU | 97 |
We show that the GPGPUs speedup the calculation but each time we double the number of GPGPUs the speed up is multiplied but only 1.5. Because of this for large number of GPGPUs or cores is better to use the paralelization over cores. For example, one node has 8 cores and 2 GPGPUS. The 2 GPGPUs need 220 s while 8 cores need 411 s. Still 4 GPGPUs are faster than 16 cores but 32 cores with 71 s are faster than 8 GPGPUs that need 97 s. Therefore, the GPGPUS can speedup jobs in PCs or single nodes, but for jobs that require higher parallelization the cores parallelization is more effective.
DL_POLY is designed for big systems and the use up to thousand of cores. According to the documentation:
The DL_POLY_4 parallel performance and efficiency are considered very-good-to-excellent as long as (i) all CPU cores are loaded
with no less than 500 particles each and (ii) the major linked cells algorithm has no dimension less than 4.