...
- k-point parallelization across nodes will not be as cpu-efficient as planewave parallelization within one node, so use it judiciously
- k-point parallelization is not as memory efficient as planewave parallelization, but it is supposed to scale better to more nodes (ask cpo if you want a better explanation)
- vossj and cpo have not yet seen good scaling behavior for the k-point parallelization, at least with small systems, so perhaps we're doing something wrong
- to turn on k-point parallelization:
- for ase mode: add parameter "parflags='-npool 2'" to the espresso object. This is a general-purpose string for passing run-time options to espresso executables.
- for native mode: add something like "-npool 2" at the end of the line
- an example for 16 cores (2 nodes) and npool=2: each of the 2 pools of 8 cores would parallelize over planewaves, but the 2 pools would process pairs of k-points in parallel.
- if you have done it correctly, you should see a line about "K-points division" in your espresso log file (the planewave parallelization produces a line like "R & G space division")
- there is a chicken-and-egg problem: to run your job one needs to know the number of reduced k-points (to determine npool) however one has to run the job to learn what this number is. a workaround for this would be to run it first in the test queue to learn the reduced number of k-points.
...