Setup PySCF the performant way on Debian 12 Bookworm
Published at Oct 6, 2023
TL;DR
In High-Performance computing speed and reproducibility matter. In this article I want to shed light on which installation is the most performant way for the quantum chemistry package PySCF. The result might be surprising.
Preliminaries
Install an MKL library — For a scientist OpenBLAS is a good choice because we focus more on reproducibility than performance. We might need cmake, too.
sudo apt install -y cmake libopenblas-dev
Install via pip
In a directory inside of your home directory (I use pyscf-pip)
python3 -m venv venv
source venv/bin/activate
pip install --prefer-binary pyscf
Install from source
Compiling from source might speed up the calculations for you. In your home
git clone https://github.com/pyscf/pyscf.git
cd pyscf
python3 -m venv venv
source venv/bin/activate
pip install h5py scipy numpy
cd pyscf/lib
mkdir build
cd build
cmake ..
make
That’s it.
Comparing the installations
In a Jupyter notebook for each environment I ran two cells
As a basis cell (I don’t want to measure that)
from pyscf import gto, scf The first cell are two standard Hartree-Fock caclulations with a good sized basis set
%%timeit
mol = gto.M(
atom = '''
O 0.000000 0.000000 0.117790
H 0.000000 0.755453 -0.471161
H 0.000000 -0.755453 -0.471161''',
basis = 'ccpvdz',
charge = 1,
spin = 1 # = 2S = spin_up - spin_down
)
#
# == ROHF solver
#
mf = scf.RHF(mol)
mf.kernel()
mf = scf.ROHF(mol)
mf.kernel()
mf = scf.UHF(mol)
mf.kernel()
#
# 2. closed-shell system
#
mol = gto.M(
atom = '''
O 0 0 0
H 0 -2.757 2.587
H 0 2.757 2.587''',
basis = 'ccpvdz',
)
#
# Using restricted closed shell solver
#
mf = scf.RHF(mol)
mf.kernel()
#
# Using restricted open shell solver
#
mf = scf.ROHF(mol)
mf.kernel()
mf = scf.UHF(mol)
The second cell is a CISD calculation on the same molecule without specifying the spin states
%%timeit
mol = gto.M(
atom = '''
O 0.000000 0.000000 0.117790
H 0.000000 0.755453 -0.471161
H 0.000000 -0.755453 -0.471161''',
basis = 'ccpvdz',
)
mf = mol.HF().run()
mycc1 = mf.CISD().run()
mf = mol.UHF().run()
mycc2 = mf.CISD().run()
print('UCISD correlation energy', mycc1.e_corr)
print('RCISD correlation energy', mycc2.e_corr)
Result
Let’s dive into the results.
Binary from pip
Result from the binary installation on HF calculation
Result of the binary installation on the CISD calculations
Compiled from source
Result from the source installation on HF
Result from the source installation on CISD
Conclusion
The binary installation had the same speed on the HF calculation but a much faster speed on the CISD calculation.
Thus, we choose the binary installation and might have learned something for other HPC programs as well.
Join my email list 9k+ and people to learn more about the good lifestyle, technology, and money.