...
Code Block |
---|
esp-ver-bsub <version> myscript.py esp-ver-bsub-native <version> -q suncat-test -o my.log -n 8 pw.x -in pw.inp |
k-point Parallelization
- NOTE: typically one does NOT do k-point parallelization for large systems. Only the gamma-point is necessary.
- k-point parallelization across nodes will not be as cpu-efficient as planewave parallelization within one node, so use it judiciously
- k-point parallelization is not as memory efficient as planewave parallelization, but it is supposed to scale better to more nodes (ask cpo if you want a better explanation). In particular, my understanding is that k-point parallelization will not reduce the memory usage per node.
- vossj and cpo have not yet seen good scaling behavior for the k-point parallelization for small systems (2x2x3 system). lausche has reported good k-point scaling for 3x3x4 systems. there have been some not-understood hangs with npool=3 or 4 (see below).
- to turn on k-point parallelization:
- for ase mode: add parameter "parflags='-npool 2'" to the espresso object. This is a general-purpose string for passing run-time options to espresso executables.
- for native mode: add something like "-npool 2" at the end of the line
- an example for 16 cores (2 nodes) and npool=2: each of the 2 pools of 8 cores would parallelize over planewaves, but the 2 pools would process pairs of k-points in parallel. If one had 9 k-points, they would get processed in pairs, but the last one would only be processed on one node, leaving the other idle, which is not ideal.
- if you have done it correctly, you should see a line about "K-points division" in your espresso log file (the planewave parallelization produces a line like "R & G space division")
- there is a chicken-and-egg problem: to run your job one needs to know the number of reduced k-points (to determine npool) however one has to run the job to learn what this number is. a workaround for this would be to run it first in the test queue to learn the reduced number of k-points.
...