# Install PiNN & download QM9 dataset
!pip install git+https://github.com/Teoroo-CMC/PiNN
!mkdir -p /tmp/dsgdb9nsd && curl -sSL https://ndownloader.figshare.com/files/3195389 | tar xj -C /tmp/dsgdb9nsd
import os, warnings
import tensorflow as tf
from glob import glob
from ase.collections import g2
from pinn.io import load_qm9, sparse_batch
from pinn import get_model, get_calc
# CPU is used for documentation generation, feel free to use your GPU!
os.environ['CUDA_VISIBLE_DEVICES'] = ''
# We heavily use indexed slices to do sparse summations,
# which causes tensorflow to complain,
# we believe it's safe to ignore this warning.
index_warning = 'Converting sparse IndexedSlices'
warnings.filterwarnings('ignore', index_warning)
Getting the dataset¶
PiNN adapts TensorFlow's dataset API to handle different datasets.
For this and the following notebooks the QM9 dataset (https://doi.org/10.6084/m9.figshare.978904) is used.
To follow the notebooks, download the dataset and change the directory accordingly.
The dataset will be automatically split into subsets according to the split_ratio.
Note that to use the dataset with the estimator, the datasets should be a function, instead of a dataset object.
filelist = glob('/tmp/dsgdb9nsd/*.xyz')
dataset = lambda: load_qm9(filelist, splits={'train':8, 'test':2})
train = lambda: dataset()['train'].repeat().shuffle(1000).apply(sparse_batch(100))
test = lambda: dataset()['test'].repeat().apply(sparse_batch(100))
Defining the model¶
In PiNN, models are defined at two levels: models and networks.
- A model (model_fn) defines the target, loss and training detail.
- A network defines the structure of the neural network.
In this example, we will use the potential model, and the PiNet network. The configuration of a model is stored in a nested dictionary as shown below. Available options of the network and model can be found in the documentation.
!rm -rf /tmp/PiNet_QM9
params = {'model_dir': '/tmp/PiNet_QM9',
'network': {
'name': 'PiNet',
'params': {
'depth': 4,
'rc':4.0,
'atom_types':[1,6,7,8,9]
},
},
'model': {
'name': 'potential_model',
'params': {
'learning_rate': 1e-3
}
}
}
model = get_model(params)
Configuring the training process¶
The defined model is indeed a tf.Estimator object, thus, the training can be easily controlled
train_spec = tf.estimator.TrainSpec(input_fn=train, max_steps=1000)
eval_spec = tf.estimator.EvalSpec(input_fn=test, steps=100)
Train and evaluate¶
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
Using the model¶
The trained model can be used as an ASE calculator.
from ase.collections import g2
from pinn import get_calc
params = {'model_dir': '/tmp/PiNet_QM92',
'network': {
'name': 'PiNet',
'params': {
'depth': 4,
'rc':4.0,
'atom_types':[1,6,7,8,9]
},
},
'model': {
'name': 'potential_model',
'params': {
'learning_rate': 1e-3
}
}
}
calc = get_calc(params)
calc.properties = ['energy']
atoms = g2['C2H4']
atoms.set_calculator(calc)
atoms.get_forces(), atoms.get_potential_energy()
Conclusion¶
You have trained your first PiNN model, though the accuracy is not so satisfying (RMSE=21 Hartree!). Also, the training speed is slow as it's limited by the IO and pre-processing of data.
We will show in following notebooks that:
- Proper scaling of the energy will improve the accuracy of the model.
- The training speed can be enhanced by caching and pre-processing the data.