Debugging PiNN layers and networks

!pip install tensorflow==2.9
!pip install git+https://github.com/Teoroo-CMC/PiNN
!wget -nv -nc https://raw.githubusercontent.com/Teoroo-CMC/PiNN_lab/master/resources/qm9_train.{yml,tfr}

Loading data

For the purpose of testing we download a subset of the QM9 dataset used in PiNN_lab.

from pinn.io import load_tfrecord, sparse_batch

dataset = load_tfrecord("qm9_train.yml").apply(sparse_batch(10))
for datum in dataset:
    print({k: v.shape for k, v in datum.items()})
    break

{'A': TensorShape([10]), 'B': TensorShape([10]), 'C': TensorShape([10]), 'Cv': TensorShape([10]), 'G': TensorShape([10]), 'H': TensorShape([10]), 'U': TensorShape([10]), 'U0': TensorShape([10]), 'alpha': TensorShape([10]), 'coord': TensorShape([176, 3]), 'elems': TensorShape([176]), 'gap': TensorShape([10]), 'homo': TensorShape([10]), 'lumo': TensorShape([10]), 'mu': TensorShape([10]), 'r2': TensorShape([10]), 'zpve': TensorShape([10]), 'ind_1': TensorShape([176, 1])}

Using PiNN Layers

PiNN networks and layers are Keras Layers and Models.

To use them, you create an instance of layer, after that, the layer object can be used as a function. Each layer is initialized with different parameters and requires different input tensors, see their individual documentation for the details.

from pinn.layers import CellListNL
from pinn.networks.pinet import PILayer, PiNet

nl = CellListNL(rc=5)

for datum in dataset:
    nl(datum)
    break

The definition of a layer needs three parts:

__init__ defines the layer object;
build creates the necessary variables or sub-layers;
call defines how the input tensors are processed.

The build() method is only called once when the layer is used for the first tiem (e.g. in a loop). See below for an example definition for the PILayer

??PILayer

Init signature: PILayer(*args, **kwargs)
Source:        
class PILayer(tf.keras.layers.Layer):
    """PiNN style interaction layer

    Args:
        n_nodes: number of nodes to use
            Note that the last element of n_nodes specifies the dimention of
            the fully connected network before applying the basis function.
            Dimension of the last node is [pairs*n_nodes[-1]*n_basis], the
            output is then summed with the basis to form the interaction nodes
        **kwargs: keyword arguments will be parsed to the feed forward layers
    """
    def __init__(self, n_nodes=[64], **kwargs):
        super(PILayer, self).__init__()
        self.n_nodes = n_nodes
        self.kwargs = kwargs

    def build(self, shapes):
        self.n_basis = shapes[2][-1]
        n_nodes_iter = self.n_nodes.copy()
        n_nodes_iter[-1] *= self.n_basis
        self.ff_layer = FFLayer(n_nodes_iter, **self.kwargs)

    def call(self, tensors):
        ind_2, prop, basis = tensors
        ind_i = ind_2[:, 0]
        ind_j = ind_2[:, 1]
        prop_i = tf.gather(prop, ind_i)
        prop_j = tf.gather(prop, ind_j)

        inter = tf.concat([prop_i, prop_j], axis=-1)
        inter = self.ff_layer(inter)
        inter = tf.reshape(inter, tf.concat(
            [tf.shape(inter)[:-1], [self.n_nodes[-1], self.n_basis]], 0))
        inter = tf.reduce_sum(inter*basis, axis=-1)
        return inter
File:           ~/code/PiNN/pinn/networks/pinet.py
Type:           type
Subclasses:

Using PiNN Networks

network (Keras Models) are defined similarly, but they can be directly used to perform regression task.

By default, network produces per-atom predictions, this can be changed by the out_pool parameter to get some simple per-structure predictions. In that case, the network object can be used to perform trainig directly.

def label_data(data):
    # defines the label to train on
    x = data
    y = data['lumo']
    return x, y

train = dataset.map(label_data)
pinet = PiNet(out_pool='min')
pinet.compile(optimizer='Adam', loss='MAE')
pinet.fit(train, epochs=3)

Epoch 1/3

/home/yunqi/.miniconda/envs/pinn-tf2/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:444: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("Adam/gradients/concat_1:0", shape=(None,), dtype=int32), values=Tensor("Adam/gradients/concat:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradient_tape/pi_net_1/gc_block_7/pi_layer_7/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/yunqi/.miniconda/envs/pinn-tf2/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:444: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("Adam/gradients/concat_3:0", shape=(None,), dtype=int32), values=Tensor("Adam/gradients/concat_2:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradient_tape/pi_net_1/gc_block_6/pi_layer_6/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(
/home/yunqi/.miniconda/envs/pinn-tf2/lib/python3.9/site-packages/tensorflow/python/framework/indexed_slices.py:444: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("Adam/gradients/concat_5:0", shape=(None,), dtype=int32), values=Tensor("Adam/gradients/concat_4:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradient_tape/pi_net_1/gc_block_5/pi_layer_5/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  warnings.warn(

2000/2000 [==============================] - 37s 17ms/step - loss: 0.0551
Epoch 2/3
2000/2000 [==============================] - 35s 18ms/step - loss: 0.0285
Epoch 3/3
2000/2000 [==============================] - 35s 17ms/step - loss: 0.0240

<keras.callbacks.History at 0x7f17c86591f0>

Further benchmarks

For more advanced usage you are recommended to use the Model API to define the trainig loss, derived predicates. For traininig potential energy surfaces, you are recommended to use pinn.models.potential_model in combination with the command line interface (CLI).

Alternatively, see the Trainig Tips notebook to see how to run the tranining interactively in a notebook.