View Source

GPAW Convergence Behavior

A talk given by Ansgar Schaefer studying convergence behaviour for rutiles is here (pdf).

General suggestions for helping GPAW convergence are here.

A discussion and suggestions for converging some simple systems can be found here.

Other convergence experience:

System	Who	Action
Graphene with vacancy	Felix Studt	Increase Fermi Temp from 0.1 to 0.2, use cg
Graphene with vacancy	Chris O'Grady	change nbands from -10 to -20, MixerDif(beta=0.03, nmaxold=5, weight=50.0)
Nitrogenase FeVCo for CO2 reduction	Lars Grabow	use Davidson solver (faster as well?), although later jvarley said MixerSum
Several surfaces	Andy Peterson	Broyden mixer with Beta=0.5
TiO2	Monica Garcia-Mota	MixerSum(0.05,6, 50.)
MnxOy	Monica Garcia-Mota	Broyden MixerSum
Co3O4	Monica Garcia-Mota	Davidson eigensolver, MixerSum(beta=0.05, nmaxold=5, weight=50) or MixerSum(beta=0.05, nmaxold=6, weight=100)
MnO2 with DFT+U U=+2eV	Monica Garcia-Mota	Marcin suggests we disable the DipoleCorrectionPoissonSolver (not yet tested)
MnO2 with DFT+U U=+2eV	Monica Garcia-Mota	Henrik Kristofferson suggests: convergence is easier with high U (U=4eV) and then one can shift to preferred value
MnO2 with DFT+U U=+2eV	Monica Garcia-Mota	(from Heine) increase U in steps of say 0.1 (or smaller) and reuse the density and/or wave functions from the previous calculation? This tends to reduce the problem of being trapped in meta-stable electronic states, and it also makes convergence easier. Monica later reported that this helped.
Cu	Ask Hjorth Larsen	first mixer parameter should probably be 0.1 for faster convergence, because it has a low DOS at the Fermi level. (Other transition metals may require lower values.)
N on Co/Ni (with BEEF)	Tuhin	rmm-diis and MixerSum(beta=0.1, nmaxold=5, weight=50)

Other Tricks:

To speed up the poisson solver, use something like gpts=h2gpts(h=0.18, atoms.get_cell(), idiv=8) to get a nicely divisible-by-large-power-of-2 grid. This helps the "multi grid" poisson solver.
experiment with Fermi smearing to help improve convergence
experiment with number of empty bands to help improve convergence

be sure to specify nbands, otherwise GPAW will add "plenty" of bands which is very expensive in FD calculations.  nbands=6\*\[number of atoms\] should be more than enough.

GPAW Planewave Mode

Jens Jurgen has a post here that discusses how to select plane wave mode in your script.

It looks like we have to manually turn off the real-space parallelization with the keyword:

parallel={'domain': 1}

In planewave mode I believe we also can only parallelize over reduced k-points, spins, and bands. We have to manually set the right numbers for these to match the numbers of CPUs.

To get parallelization over bands, we can at the moment only use the rmm-diis eigensolver (cg and davidson don't work).
The number of bands must be divisible by the number of CPUs, according to Jens Jurgen.
At the moment there is no dipole correction in planewave mode.
Density mixing is still done in real-space
GPAW Geometry Optimizations

With GPAW one can do geometry optimizations a factor of 10 faster in LCAO mode (with smaller memory requirements). Then it's necessary to "tweak" the optimization with a little bit of running in FD mode.

Plus, LCAO mode has the added feature that convergence is typically easier, according to Heine.

I think it's difficult to automate the above process in one script, since the number of cores required for LCAO is typically lower than FD (because of the lower memory usage).

But if you're limited by CPU time when doing GPAW optimizations it might be worth keeping the above in mind.

GPAW Memory Estimation

The get a guess for the right number of nodes to run on for GPAW, run the
following line interactively:

gpaw-python <yourjob>.py --dry-run=<numberofnodes>
(e.g. gpaw-python graphene.py --dry-run=16)

Number of nodes should be a multiple of 8 for the suncat farm,
multiples of 12 for the suncat2 farm. The above will run quickly
(because it doesn't do the calculation). Then check that the
following number is <3GB for the 8-core suncat farm, <4GB for the 12-core suncat2 farm:

Memory estimate
---------------
Calculator  574.32 MiB

Tips for Running with BEEF

If you use the BEEF functional:

use xc='BEEF-vdW'
the code parallelizes over 20 cores (because of the way the VDW contribution is calculated)
you typically need a lot of memory, even for small calculations. For larger calculations you sometimes have to use more than 20 cores, just to get enough memory (you don't get additional parallelization).
the GPAW memory estimates are incorrect
it is typically best to run on the suncat2 (more memory per core) or, even better, the suncat3 farm (more memory per core, plus a much faster infiniband node-node interconnect)

Building a Private Version of GPAW

Use svn to check out the version of GPAW that you want to use (described here).
copy /afs/slac/g/suncat/share/scripts/privgpaw.csh into whatever directory you like
edit the two variables GPAW_BASE ("base" GPAW release that you want to re-use for numpy, mpi etc.) and GPAW_HOME (directory where you checked out GPAW)

Use the commands:

./privgpaw.csh build (build)
./privgpaw.csh test (run gpaw self-tests)
./privgpaw.csh gpaw-bsub <arguments> (run in batch)
./privgpaw.csh <cmd> (run <cmd> interactively, using privgpaw.csh environment.  e.g. "gpaw-python junk.py")

Some notes:

You can see a list of available installed versions here (to use for GPAW_BASE).
The syntax for the "bsub" option is identical to the gpaw-bsub command described here.
The above assumes that the base release is compatible with your checked out GPAW version. Talk to cpo if you have questions about this.
The above doesn't include support for a private version of ASE. When that becomes important we will add it.

Jacapo Parallel NEB Example

You can find a Jacapo parallel NEB example here. This same script can be used for a restart (the interpolated traj files are only recreated if they don't exist).

Some important notes:

the number of processors selected for a parallel NEB must be an integer multiple of the NumberOfImages-2 (the images at the end points are "fixed").
Parallel NEB settings (e.g. number of cores) can be debugged running in the suncat-test queue with 1 processor per non-fixed image.
Later on in the NEB (discussed here) it is good to set climbing=True and switch to the FIRE method, shown here. As best I can tell, this page doesn't provide good instructions for when to turn those on.

Perturbing Jacapo Calculations

When taking an existing Jacapo calculation and making changes (e.g. adding an external field) it is important to not instantiate a new calculator (to work around some Jacapo bugs) but instead read in the previous atoms/calculator from the .nc file. Johannes Voss has kindly provided an example with some comments here.