PiNN workflows

pinnTrain

The pinnTrain process runs a training given a dataset and a PiNN input file. The input file can optionally be a model folder in such case the process contines training from the last checkpoint available. For list of flags available, see the PiNN documentation.

Input Channels

Channel	Type	i/o	Note
`name`	`val`	`in[0]`	an id to identify the process
`dataset`	`file`	`in[1]`	a dataset recognizable by PiNN
`input`	`file`	`in[2]`	a PiNN `.yml` input file
`flag`	`val`	`in[3]`	flags for `pinn train`
`name`	`val`	`out.*[0]`	same as input
`model`	`file`	`out.model[1]`	Trained PiNN model
`log`	`file`	`out.log[2]`	PiNN log (the evaluation errors)

pinnMD

The pinnMD process takes a trained model and runs a MD trajecotry, with some limited options regarding the dynamics. The process suppors PiNN models only for now. A list of models can be supplied to run an ensemble MD. For more complex processes, consider writing a customized MD process and include it in your workflow.

Input Channels

Channel	Type	i/o	Note
`name`	`val`	`in[0]`	an id to identify the process
`model`	`file`	`in[1]`	trained PiNN model, can be a list of models
`init`	`file`	`in[2]`	initial geometry, in any ASE recognizable format
`flags`	`val`	`in[3]`	a string specifying the MD simulation see options below
`name`	`val`	`out.*[0]`	same as input
`traj`	`file`	`out.traj[1]`	output trajectory
`log`	`file`	`out.log[1]`	MD log

Options

The flags for pinnMD is specified in the form of --option1 val1 --option2 val, the available options and default valeus are listed below.

Option	Default	Note
`--ensemble`	`'nvt'`	`'npt'` or `'nvt'`
`--T`	`273`	Temperature in K
`--t`	`100`	Time in ps
`--dt`	`0.5`	Time step in fs
`--taut`	`100`	Damping factor for thermostat in steps
`--taup`	`1000`	Damping factor for barostat in steps
`--log-every`	`5`	Log interval in steps
`--pressure`	`1`	pressure in bar
`--compressibility`	`4.57e-5`	compressibility in bar

Source code

nextflow.enable.dsl=2

params.publish = 'pinn'

process train {
  label 'pinn'
  publishDir "$params.publish/$name"

  input:
    tuple val(name), path(dataset), path(input, stageAs:'input'), val(flags)

  output:
    tuple val(name), path('model', type:'dir'), emit: model
    tuple val(name), path('pinn.log'), emit: log

  script:
    convert_flag = "${(flags =~ /--seed[\s,\=]\d+/)[0]}"
    train_flags = "${flags.replaceAll(/--seed[\s,\=]\d+/, '')}"
    dataset = (dataset instanceof Path) ? dataset : dataset[0].baseName+'.yml'
    """
    #!/bin/bash

    pinn convert $dataset -o 'train:9,eval:1' $convert_flag

    if [ ! -f $input/params.yml ];  then
        mkdir -p model; cp $input model/params.yml
    else
        cp -rL $input model
    fi
    pinn train model/params.yml --model-dir='model'\
        --train-ds='train.yml' --eval-ds='eval.yml'\
        $train_flags
    pinn log model/eval > pinn.log
    """
}

process md {
  label 'pinn'
  publishDir "$params.publish/$name"

  input:
    tuple val(name), path(model,stageAs:'model*'), path(init, stageAs:'init*'), val(flags)

  output:
    tuple val(name), path('asemd.traj'), emit: traj
    tuple val(name), path('asemd.log'), emit: log

  script:
    """
    #!/usr/bin/env python
    import re
    import pinn
    import tensorflow as tf
    from ase import units
    from ase.io import read
    from ase.io.trajectory import Trajectory
    from ase.md import MDLogger
    from ase.md.velocitydistribution import MaxwellBoltzmannDistribution
    from ase.md.nptberendsen import NPTBerendsen
    from ase.md.nvtberendsen import NVTBerendsen
    from tips.bias import EnsembleBiasedCalculator

    # ------------ patch ase properties to write extra cols --------------------
    from ase.calculators.calculator import all_properties
    all_properties+=[f'{prop}_{extra}' for prop in ['energy', 'forces', 'stress'] for extra in ['avg','std','bias']]
    # --------------------------------------------------------------------------

    setup = {
      'ensemble': 'nvt', # ensemble
      'T': 340, # temperature in K
      't': 50, # time in ps
      'dt': 0.5, # timestep is fs
      'taut': 100, # thermostat damping in steps
      'taup': 1000, # barastat dampling in steps
      'log-every': 20, # log interval in steps
      'pressure': 1, # pressure in bar
      'compressibility': 4.57e-4, # compressibility in bar^{-1}
      'bias': None,
      'kb': 0,
      'sigma0': 0,
    }

    flags = {
      k: v for k,v in
        re.findall('--(.*?)[\\s,\\=]([^\\s]*)', "$flags")
    }
    setup.update(flags)
    ensemble=setup['ensemble']
    T=float(setup['T'])
    t=float(setup['t'])*units.fs*1e3
    dt=float(setup['dt'])*units.fs
    taut=int(setup['taut'])
    taup=int(setup['taup'])
    every=int(setup['log-every'])
    pressure=float(setup['pressure'])
    compressibility=float(setup['compressibility'])

    ${(model instanceof Path) ?
    "calc = pinn.get_calc('$model')" :
    """
    models = ["${model.join('", "')}"]
    calcs = [pinn.get_calc(model) for model in models]
    if len(calcs) == 1:
        calc =  calcs[0]
    else:
        calc = EnsembleBiasedCalculator(calcs,
                                        bias=setup['bias'],
                                        kb=float(setup['kb']),
                                        sigma0=float(setup['sigma0']))
    """}

    atoms = read("$init")
    atoms.set_calculator(calc)
    if not atoms.has('momenta'):
        MaxwellBoltzmannDistribution(atoms, T*units.kB)

    if ensemble == 'npt':
        dyn = NPTBerendsen(atoms, timestep=dt, temperature=T, pressure=pressure,
                      taut=dt * taut, taup=dt * taup, compressibility=compressibility)
    if ensemble == 'nvt':
        dyn = NVTBerendsen(atoms, timestep=dt, temperature=T, taut=dt * taut)

    dyn.attach(
        MDLogger(dyn, atoms, 'asemd.log',stress=True, mode="w"),
        interval=int(every))
    dyn.attach(
        Trajectory('asemd.traj', 'w', atoms).write,
        interval=int(every))
    dyn.run(int(t/dt))
    """
}