Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
esp-ver-bsub <version> myscript.py
esp-ver-bsub-native <version> -q suncat-test -o my.log -n 8 pw.x -in pw.inp

k-point Parallelization

  • NOTE: typically one does NOT do k-point parallelization for large systems. Only the gamma-point is necessary.
  • k-point parallelization across nodes will not be as cpu-efficient as planewave parallelization within one node, so use it judiciously
  • k-point parallelization is not as memory efficient as planewave parallelization, but it is supposed to scale better to more nodes (ask cpo if you want a better explanation). In particular, my understanding is that k-point parallelization will not reduce the memory usage per node.
  • vossj and cpo have not yet seen good scaling behavior for the k-point parallelization for small systems (2x2x3 system). lausche has reported good k-point scaling for 3x3x4 systems. there have been some not-understood hangs with npool=3 or 4 (see below).
  • to turn on k-point parallelization:
    • for ase mode: add parameter "parflags='-npool 2'" to the espresso object. This is a general-purpose string for passing run-time options to espresso executables.
    • for native mode: add something like "-npool 2" at the end of the line
  • an example for 16 cores (2 nodes) and npool=2: each of the 2 pools of 8 cores would parallelize over planewaves, but the 2 pools would process pairs of k-points in parallel. If one had 9 k-points, they would get processed in pairs, but the last one would only be processed on one node, leaving the other idle, which is not ideal.
  • if you have done it correctly, you should see a line about "K-points division" in your espresso log file (the planewave parallelization produces a line like "R & G space division")
  • there is a chicken-and-egg problem: to run your job one needs to know the number of reduced k-points (to determine npool) however one has to run the job to learn what this number is. a workaround for this would be to run it first in the test queue to learn the reduced number of k-points.

...