DL_POLY | IZO-SGI Scientific Computation

General information

4.02 version of the MD program for macromolecules, polymers, ionic systems, solutions and other molecular systems. Developed at the Daresbury Laboratory. In Pendulo the 2.2 version remains. There is already the DL_POLY_CLASSIC version which currently is not been developed.

How to submit to the queue

The program is installed in all the architectures, Arina and Pendulo (DL_POLY 2.2). To execute it include in the scripts:

/software/bin/DL_POLY/DL_POLY.Z

The program will exekute in GPGPUs if it starts in these kind of nodes. Besides, they can be selected by using the gpu label within [intlink id=”244″ type=”post”]the queue system[/intlink].

The GUI is also installed. To execute it use:

/software/bin/DL_POLY/gui

Some utilities has been installed in the /software/bin/DL_POLY/ directory.

Benchmark

We show a small benchmarks performed with dl_ploly_4.02. We stady the parallelization as well as the performance of the GPGPUs.

System	1 cores	4 cores	8 cores	16 cores	32 cores	64 cores
Itanium 1.6 GHz	1500	419	248	149	92	61
Opteron	1230	503	264	166	74
Xeon 2.27 GHz	807	227	126	67	37	25

We show in the firs benchamrk that DL_POLY scales very well and that the xeon nodes are the fastest ones, so we recomend them for large jobs.

System	1 cores	2 cores	4 cores	8 cores	16 cores	32 cores
Itanium 1.6 GHz	2137		303	165	93	47
Opteron	1592		482	177	134	55
Xeon 2.27 GHz	848		180	92	48	28
1 GPGPU	125	114	104	102
2 GPGPU		77	72	69
4 GPGPU			53	50
8 GPGPU				37

System	1 cores	2 cores	4 cores	8 cores	16 cores	32 cores	64 cores
Xeon 2.27 GHz	2918		774	411	223	122	71
1 GPGPU	362	333	338	337
2 GPGPU		240	222	220
4 GPGPU			145	142
8 GPGPU				97

We show that the GPGPUs speedup the calculation but each time we double the number of GPGPUs the speed up is multiplied but only 1.5. Because of this for large number of GPGPUs or cores is better to use the paralelization over cores. For example, one node has 8 cores and 2 GPGPUS. The 2 GPGPUs need 220 s while 8 cores need 411 s. Still 4 GPGPUs are faster than 16 cores but 32 cores with 71 s are faster than 8 GPGPUs that need 97 s. Therefore, the GPGPUS can speedup jobs in PCs or single nodes, but for jobs that require higher parallelization the cores parallelization is more effective.

DL_POLY is designed for big systems and the use up to thousand of cores. According to the documentation:

The DL_POLY_4 parallel performance and efficiency are considered very-good-to-excellent as long as (i) all CPU cores are loaded
with no less than 500 particles each and (ii) the major linked cells algorithm has no dimension less than 4.

More information

DL_POLY web page.

DL_POLY user guide (pdf).

DL_POLY GUI user guide (pdf).