TIPS module
The tips.nf
module contains several processes supplied by the TIPS library.
convertDS
The convertDS
process converts a one dataset to another. The input/output
formats are controlled by the flags
channel.
Channel specification
Element | Type | i/o | Note |
---|---|---|---|
name |
val |
in[0] |
an id to identify the process |
input |
path |
in[1] |
input dataset |
flag |
val |
in[2] |
flags for tips convert |
name |
val |
out[0] |
same as input |
converted |
path |
out[1] |
converted dataset [converted.* ] |
mergeDS
The mergeDS
process merges a number of single point calculations into one.
Note that the process also expect a idx
element in the input channel, which
should give an index of the corresponding single point calculation, and will be
saved into the merged.idx
file.
Channel specification
Element | Type | i/o | Note |
---|---|---|---|
name |
val |
in[0] |
an id to identify the process |
idx |
val |
in[1] |
indices of single point simulations |
logs |
path |
in[2] |
logs from single point computations |
name |
val |
out[0] |
same as input |
idx |
path |
out[1] |
file that records the indices [merged.idx ] |
merged |
path |
out[2] |
merged dataset [merged.traj ] |
mixDS
The mixDS
process takes two datasets, called newDS
and oldDS
, and two
flags newFlag
and oldFlag
, the datasets are first subsampled with
corresponding flags, and them merged together. This process is mainly used to
update a training set in an activated learning loop.
Channel specification
Element | Type | i/o | Note |
---|---|---|---|
name |
val |
in[0] |
an id to identify the process |
newDS |
path |
in[1] |
new dataset |
oldDS |
path |
in[2] |
old dataset |
newFlag |
path |
in[3] |
subsample flag for newDS |
oldFlag |
path |
in[4] |
subsample flag for oldDS |
name |
val |
out[0] |
same as input |
idx |
path |
out[1] |
merged index (merged.idx ) |
checkConverge
This workflow compares a sampled trajectories to labelled data. The output geometry will be:
- The last frame of the trajectory if the trajectory is deemed converted;
- The first frame of the trajectory otherwise.
The convergence is controlled by the following parameters.
Channel specification
Element | Type | i/o | Note |
---|---|---|---|
name |
val |
in[0] |
an id to identify the process |
idx |
path |
in[1] |
index of labels in the trajectory |
label |
val |
in[2] |
labelled data set |
traj |
val |
in[3] |
sampled trajectory |
name |
val |
out[0] |
same as input |
geo |
path |
out[1] |
geometry |
out |
val |
out[2] |
a string of convergence information |
Parameters
Parameter | Default | Description |
---|---|---|
fmaxtol |
2.0 |
Max error on forces |
emaxtol |
0.02 |
Max error on energy |
frmsetol |
0.15 |
Tolerance for force RMSE |
ermsetol |
0.005 |
Tolerance for energy RMSE |
Source code
nextflow.enable.dsl=2
params.publish = "."
def space_sep(in) {(in instanceof Path) ?in :in.join(' ')}
process convert {
label 'tips'
publishDir "$params.publish/$name"
input:
tuple val(name), path(in, stageAs:'.in*/*'), val(flags)
output:
tuple val(name), path('*')
script:
"""
tips convert ${space_sep(in)} $flags
"""
}
process dsmix {
label 'tips'
publishDir "$params.publish/$name"
input: tuple val(name), path(newDS, stageAs:'*.traj'), path(oldDS, stageAs:'old/*'), val(newFlag), val(oldFlag)
output: tuple val(name), path('mix-ds.{tfr,yml}')
script:
"""
tips convert old/${oldDS[0].baseName}.yml -f pinn -o old-ds -of asetraj $oldFlag
tips convert ${space_sep(newDS)} -f asetraj -o tmp.traj -of asetraj
tips convert tmp.traj -f asetraj -o new-ds -of asetraj $newFlag
tips convert new-ds.traj old-ds.traj -f asetraj -o mix-ds -of pinn --shuffle $params.filters
rm {new-ds,old-ds,tmp}.*
"""
}
process merge {
label 'tips'
publishDir "$params.publish/$name"
input: tuple val(name), val(idx), path(in, stageAs:'.in*/*'), val(flags)
output: tuple val(name), path('merged.idx'), path('merged.traj')
script:
"""
printf "${idx.join('\\n')}" > merged.idx
tips convert ${space_sep(in)} -o merged -of asetraj $flags
"""
}
process check {
label 'tips'
publishDir "$params.publish/$name"
input:
tuple val(name), path(idx), path(logs), path(traj)
output:
tuple val(name), path('*.xyz'), stdout
script:
fmaxtol = params.fmaxtol
emaxtol = params.emaxtol
frmsetol = params.frmsetol
ermsetol = params.ermsetol
sp_points = params.sp_points
"""
#!/usr/bin/env python
import numpy as np
from ase import Atoms
from ase.io import read, write
from tips.io import load_ds
from tips.io.filter import filters2fn
filters = "$params.filters".replace("'", '').split(' ')[1::2]
filter_fn = filters2fn(filters) # ^ a crude extractor
idx = [int(i) for i in np.loadtxt("$idx")]
logs = load_ds("$logs", fmt='asetraj')
traj = load_ds("$traj", fmt='asetraj')
idx, logs = tuple(zip(*(
(i, datum) for (i, datum) in zip(idx, logs) if filter_fn(datum))))
e_label = np.array([datum['energy']/len(datum['elem']) for datum in logs])
f_label = np.array([datum['force'] for datum in logs])
e_pred = np.array([traj[i]['energy']/len(traj[i]['elem']) for i in idx])
f_pred = np.array([traj[i]['force'] for i in idx])
ecnt = np.sum(np.abs(e_pred-e_label)>$emaxtol)
fcnt = np.sum(np.any(np.abs(f_pred-f_label)>$fmaxtol,axis=(1,2)))
emax = np.max(np.abs(e_pred-e_label))
fmax = np.max(np.abs(f_pred-f_label))
ermse = np.sqrt(np.mean((e_pred-e_label)**2))
frmse = np.sqrt(np.mean((f_pred-f_label)**2))
converged = (emax<$emaxtol) and (fmax<$fmaxtol) and (ermse<$ermsetol) and (frmse<$frmsetol) and (len(idx)==$sp_points)
geoname = "$name".split('/')[1]
if converged:
msg = f'Converged; will restart from latest frame.'
new_geo = logs[np.argmax(idx)]
else:
msg = f'energy: {ecnt}/{len(idx)} failed, max={emax:.2f} rmse={ermse:.2f}; '\
f'force: {fcnt}/{len(idx)} failed, max={fmax:.2f} rmse={frmse:.2f}.'
new_geo = logs[np.argmin(idx)]
atoms = Atoms(new_geo['elem'], positions=new_geo['coord'], cell=new_geo['cell'],
pbc=True)
write(f'{geoname}.xyz', atoms)
print(msg)
"""
}