next up previous contents
Next: 4.5 Pseudopotentials Up: 4. Implementation Previous: 4.3 Wavefunctions   Contents

Subsections


4.4 Algorithms and practicalities

In this section, the numerical implementation of QMC methods using wavefunctions based on Slater determinants is discussed. Although critical for any QMC simulation, this discussion has been delayed until now because the implementation is relatively straightforward. An implementation of VMC and DMC algorithms for molecular and solid-state systems is similar in complexity to a conventional quantum chemical HF or solid-state DFT code. However, although many of the quantities involved are the same the actual implementation is very different. For example, no integrals are calculated analytically and all evaluations are performed in real-space.

4.4.1 Calculation and update of the Slater determinants

The calculation and update of the Slater determinants in trial wavefunctions gives rise to the $O(N^3)$ scaling of QMC methods. Provided the efficient rank-1 update procedure described below is adopted, for molecular and solid-state systems of between approximately $10^1$ and $10^3$ electrons, the update does not require a substantial proportion of processor time.4.5 For sufficiently large systems the update will dominate. Note that the $O(N^3)$ scaling is that of moving all of the electrons with respect to system size, and not that of obtaining a fixed statistical accuracy with respect system size.

Two quantities are required during a QMC calculation: the ratio of the old to new determinant when a single electron is moved, and the updated determinant when a single electron move is accepted. Although similar, the first of these operations is a $O(N)$ process, the second $O(N^2)$. There are $O(N)$ accepted single electrons moves per ``configuration move'' giving rise to the $O(N^3)$ scaling of QMC methods.

It has been shown [50,26,23] that it is possible to work entirely with the matrix of cofactors of the Slater determinant. This also applies to multi-determinant wavefunctions.

The Slater determinant has elements

\begin{displaymath}
D_{ji}=\phi_{j}({\bf r}_{i}) \;\;.
\end{displaymath} (4.15)

When electron $i$ is moved from position ${\bf r}_{i}$ to ${\bf r}^{\prime}_{i}$ only the elements of the determinant $D_{ji}$, where $j=1,N$ are changed. (As a single electron is moved, only the respective column of the Slater determinant of the same spin changes.)

The relation

\begin{displaymath}
\vert{\bf D}\vert{\bf I} = {\bf D}^T {\bf D}^C \;\;\;,
\end{displaymath} (4.16)

states that the value of the determinant, $D$, can be obtained from the transpose of the matrix ${\bf D}$ and its matrix of cofactors ${\bf D}^C$. By definition,
\begin{displaymath}
({\bf D}^T)^{-1}=\frac{1}{D}{\bf D}^C=\overline{\bf D} \;\;\;,
\end{displaymath} (4.17)

so that
\begin{displaymath}
D^{C}_{ji}=\vert{\bf D}\vert \overline{\bf D}_{ji} \;\;\;.
\end{displaymath} (4.18)

From equation 4.16, we can write
\begin{displaymath}
\vert{\bf D}\vert=\sum_{j=1}^{N} D_{ij} D^{C}_{ji} \;\;\;
\end{displaymath} (4.19)

and
\begin{displaymath}
\vert{\bf D}^\prime\vert=\sum_{j=1}^{N} D^{\prime}_{ij} D^{C}_{ji} \;\;\;,
\end{displaymath} (4.20)

where the primes indicate the matrices after the move. When electron $i$ is moved, only the $i^{\mathrm{th}}$ column of the matrix ${\bf D}$ changes, and the $i^{\mathrm{th}}$ column of the matrix of cofactors ${\bf D}^C$ is unchanged. Hence,
\begin{displaymath}
q_{i}=\left\vert\frac{{\bf D}^\prime}{{\bf D}}\right\vert=
\sum_{j=1}^{N} D^{\prime}_{ji} \overline{D}_{ji}\;\;,
\end{displaymath} (4.21)

and the ratio of the old to new determinants is computed without reference to ${\bf D}^\prime$ in $O(N)$ operations. The efficiency of this method is particularly important in the evaluation of the non-local pseudopotential energy, which requires this ratio to be evaluated at for each point on an integration grid (see section 4.5.1).

If the move of electron $i$ to position ${\bf r}^\prime$ is accepted, the matrix of cofactors is updated:

\begin{displaymath}
\overline{D}^{\prime}_{jk}=
\left\{ \begin{array}{ll}
\frac{...
...rime}_{li}\right) & \mathrm{for} \; k\neq i
\end{array}\right.
\end{displaymath} (4.22)

These rules may be derived by considering the change to the $\overline{D}$ matrices resulting from changing column $i$ in the Slater determinant, [50]
\begin{displaymath}
\overline{D}^{\prime}_{jk}=\overline{D}_{jk}+\Delta_{jk}\;\;\;
\end{displaymath} (4.23)

where
\begin{displaymath}
\Delta_{jk}=\delta_{ik}(\phi_{j}({\bf r}^{\prime})-\phi_{j}({\bf r}) \;\;.
\end{displaymath} (4.24)

4.4.2 Evaluation of the local energy

Evaluation of the local energy during a QMC calculation requires computation of the kinetic and potential energies. Calculation of the kinetic energy requires derivatives of the trial wavefunction. This is performed analytically and their calculation is presented below.

Calculation of the potential energies requires evaluation of the election-ion and electron-electron potentials. For pseudopotential calculations, the electron-ion potential is evaluated using the technique of section 4.5.1. The remaining Coulomb potentials are evaluated by direct summation in finite systems, and by Ewald summation (see section 4.6.1) in periodic systems.

4.4.3 Calculation of the kinetic energy

Calculation of the kinetic energy is performed by evaluating the estimator
\begin{displaymath}
\mathrm{KE}=\sum_{i=1}^{N}-\frac{1}{2}\frac{\nabla^{2}_{i}\Psi}{\Psi} \;\;,
\end{displaymath} (4.25)

where $\Psi$ is the trial wavefunction. When averaged of the probability distribution $\vert\Psi^{2}\vert$, this estimator gives the exact kinetic energy of the trial wavefunction,
\begin{displaymath}
<\mathrm{KE}>=\sum_{i=1}^{N}-\frac{1}{2}\frac{<\Psi\vert\nabla^{2}_{i}\vert\Psi>}{<\Psi\vert\Psi>} \;\;.
\end{displaymath} (4.26)

The kinetic energy contribution from a single electron, $\mathrm{KE}_{i}$, is most conveniently evaluated by defining two additional quantities, $T_{i}$ and $F_{i}$:

\begin{displaymath}
T_{i} = -\frac{1}{4}\nabla_{i}^{2}\ln \Psi
\end{displaymath} (4.27)

and
\begin{displaymath}
F_{i} = \frac{1}{\sqrt{2}}\nabla_{i}\ln \Psi \;\;\; .
\end{displaymath} (4.28)

Then the kinetic energy is given by

\begin{displaymath}
\mathrm{KE}_{i}=2T_{i}-F^{2}_{i}=
-\frac{1}{2}\frac{\nabla_i^2\Psi}{\Psi}\;\;\;.
\end{displaymath} (4.29)

The use of logarithms is particularly convenient for Slater-Jastrow wavefunctions (section 4.3.1). For a single-determinant wavefunction, written as a product of spin up and down determinants,

$\displaystyle \nabla_i\ln \Psi$ $\textstyle =$ $\displaystyle \nabla_i\ln D^{\uparrow} + \nabla_i\ln D^{\downarrow}
+ \nabla_i \ln [\exp({J})]$  
  $\textstyle =$ $\displaystyle \frac{1}{D^{\uparrow}}\nabla_i D^{\uparrow} +
\frac{1}{D^{\downarrow}}\nabla_i D^{\downarrow} + \nabla_i J \;\;\; ,$ (4.30)

and
\begin{displaymath}
\nabla_i^2\ln \Psi = - \left( \frac{1}{D^{\uparrow}}\nabla_i...
...{\downarrow}}\nabla_i^2 D^{\downarrow}
- \nabla_i^2 J \;\;\; .
\end{displaymath} (4.31)

Averaging the quantities $T_{i}$ and $F^{2}_{i}$ independently provides a useful check on the simulation. In VMC, Green's relation shows that

\begin{displaymath}
<\mathrm{KE}_{i}>=<T_{i}>=<F^{2}_{i}> \;\;\;,
\end{displaymath} (4.32)

provided all of configuration space, ${\bf R}$, has been sampled. [26,23] The quantities $T_{i}$ and $F^{2}_{i}$ usually have worse statistics than the kinetic energy and are not used to determine the kinetic energy directly.4.6

4.4.4 Calculation of the potential energy

Calculation of the potential energy terms in QMC simulations is straightforward. The electron-electron and electron-ion potential in periodic systems deserve special mention due to the use of the Ewald summation ([51] and section 4.6.1).

The Ewald sum consists of separate real and reciprocal space sums. For computational efficiency, an optimised Ewald sum is used in which only a single real space term is taken. The sum is therefore biased towards the reciprocal space part which is more rapidly evaluated. An optimised form for the real and reciprocal space potentials [52] minimises errors.

4.4.5 Parallelisation of VMC

Almost all of the calculations in this thesis were performed on parallel machines. Extension of the VMC algorithm of section 3.3.5 to parallel machines requires very few changes, as the algorithm is naturally parallel.

To parallelise the VMC algorithm it is sufficient to note that provided care is taken to ensure each calculation uses a different random number sequence, the algorithm can be parallelised by performing independent VMC calculations on each node of a parallel machine. Communication between the nodes is only necessary to obtain global averages and is therefore negligible: the parallel efficiency is essentially $100\%$, and the calculation can theoretically exploit any number of nodes without loss of efficiency.

4.4.6 Parallelisation of DMC

The DMC algorithm of section 3.4.6 requires more changes than the VMC algorithm to be parallelised, but very high parallel efficiencies are still obtainable.

During a DMC calculation an ensemble of walkers is propagated in imaginary time. It is natural to divide the walkers evenly amongst the nodes, so that communication is only strictly necessary at the end of each block, when the population is renormalised. However, this simple modification would result in a low parallel efficiency because the calculation would become poorly load balanced as the calculation progressed. The branching and deletion of walkers would result in some nodes having more walkers than others; at the end of the block many nodes would have to wait whilst others moved their remaining walkers. A load balanced DMC algorithm requires the even distribution of walkers over all nodes, and this is accomplished by redistributing the walkers after every move.

Therefore, in the parallel DMC algorithm, walkers are initially evenly distributed over all nodes. Each node is then responsible for propagating its walkers. After each ``ensemble move'', the walkers are redistributed as evenly as possible across all of the nodes.

The parallel efficiency is limited by two factors:

  1. The number of walkers on each node.
  2. The accuracy of the guiding wavefunction.

In general, the walkers are not evenly distributed between nodes and the first limitation results from the time lost while nodes complete moving all of their walkers. The efficiency of the algorithm is given by

\begin{displaymath}
\mathrm{efficiency} \%=100\times\frac{N_{w}}{n_{\mathrm{max}}\times
N_{\mathrm{nodes}}} \;\;\;,
\end{displaymath} (4.33)

where there are $N_{w}$ walkers total, a maximum of $n_{\mathrm{max}}$ walkers on any node and $N_{\mathrm{nodes}}$ total. The efficiency is maximised by keeping the number of walkers well balanced and by having a large number of walkers per node.

The accuracy of the guiding wavefunction governs the fluctuations in the walker population. Hence improved guiding wavefunctions have the combined effect of increasing the statistical accuracy of the calculations and increasing the efficiency with which they are carried out. For the calculations in this thesis, parallel efficiencies of $90-95\%$ were typical.


next up previous contents
Next: 4.5 Pseudopotentials Up: 4. Implementation Previous: 4.3 Wavefunctions   Contents
© Paul Kent