Setup PySCF the performant way on Debian 12 Bookworm

Published at Oct 6, 2023

#debian#administration#quantum chemistry#simulations#software

TL;DR

In High-Performance computing speed and reproducibility matter. In this article I want to shed light on which installation is the most performant way for the quantum chemistry package PySCF. The result might be surprising.

Debian Logo

Preliminaries

Install an MKL library — For a scientist OpenBLAS is a good choice because we focus more on reproducibility than performance. We might need cmake, too.


sudo apt install -y cmake libopenblas-dev

Install via pip

In a directory inside of your home directory (I use pyscf-pip)


python3 -m venv venv 
source venv/bin/activate 
pip install --prefer-binary pyscf

Install from source

Compiling from source might speed up the calculations for you. In your home


git clone https://github.com/pyscf/pyscf.git
cd pyscf
python3 -m venv venv 
source venv/bin/activate 
pip install h5py scipy numpy
cd pyscf/lib
mkdir build
cd build
cmake ..
make

That’s it.

Comparing the installations

In a Jupyter notebook for each environment I ran two cells

As a basis cell (I don’t want to measure that)

from pyscf import gto, scf The first cell are two standard Hartree-Fock caclulations with a good sized basis set


%%timeit
mol = gto.M(
    atom = '''
O        0.000000    0.000000    0.117790
H        0.000000    0.755453   -0.471161
H        0.000000   -0.755453   -0.471161''',
    basis = 'ccpvdz',
    charge = 1,
    spin = 1  # = 2S = spin_up - spin_down
)

#
# == ROHF solver
#
mf = scf.RHF(mol)
mf.kernel()

mf = scf.ROHF(mol)
mf.kernel()

mf = scf.UHF(mol)
mf.kernel()


#
# 2. closed-shell system
#
mol = gto.M(
    atom = '''
O 0 0      0
H 0 -2.757 2.587
H 0  2.757 2.587''',
    basis = 'ccpvdz',
)

#
# Using restricted closed shell solver
#
mf = scf.RHF(mol)
mf.kernel()

#
# Using restricted open shell solver
#

mf = scf.ROHF(mol)
mf.kernel()

mf = scf.UHF(mol)

The second cell is a CISD calculation on the same molecule without specifying the spin states


%%timeit

mol = gto.M(
   atom = '''
O        0.000000    0.000000    0.117790
H        0.000000    0.755453   -0.471161
H        0.000000   -0.755453   -0.471161''',
   basis = 'ccpvdz',
)

mf = mol.HF().run()
mycc1 = mf.CISD().run()


mf = mol.UHF().run()
mycc2 = mf.CISD().run()
print('UCISD correlation energy', mycc1.e_corr)
print('RCISD correlation energy', mycc2.e_corr)

Result

Let’s dive into the results.

Binary from pip

PySCF Benchmark 1

Result from the binary installation on HF calculation

PySCF Benchmark 2

Result of the binary installation on the CISD calculations

Compiled from source

PySCF Benchmark 3

Result from the source installation on HF

PySCF Benchmark 34

Result from the source installation on CISD

Conclusion

The binary installation had the same speed on the HF calculation but a much faster speed on the CISD calculation.

Thus, we choose the binary installation and might have learned something for other HPC programs as well.

Join my email list 9k+ and people to learn more about the good lifestyle, technology, and money.

Helpful Ressources

Install PySCF - PySCF