TIPS module
The tips.nf module contains several processes supplied by the TIPS library.
convertDS
The convertDS process converts a one dataset to another. The input/output
formats are controlled by the flags channel.
Channel specification
| Element | Type | i/o | Note |
|---|---|---|---|
name |
val |
in[0] |
an id to identify the process |
input |
path |
in[1] |
input dataset |
flag |
val |
in[2] |
flags for tips convert |
name |
val |
out[0] |
same as input |
converted |
path |
out[1] |
converted dataset [converted.*] |
mergeDS
The mergeDS process merges a number of single point calculations into one.
Note that the process also expect a idx element in the input channel, which
should give an index of the corresponding single point calculation, and will be
saved into the merged.idx file.
Channel specification
| Element | Type | i/o | Note |
|---|---|---|---|
name |
val |
in[0] |
an id to identify the process |
idx |
val |
in[1] |
indices of single point simulations |
logs |
path |
in[2] |
logs from single point computations |
name |
val |
out[0] |
same as input |
idx |
path |
out[1] |
file that records the indices [merged.idx] |
merged |
path |
out[2] |
merged dataset [merged.traj] |
mixDS
The mixDS process takes two datasets, called newDS and oldDS, and two
flags newFlag and oldFlag, the datasets are first subsampled with
corresponding flags, and them merged together. This process is mainly used to
update a training set in an activated learning loop.
Channel specification
| Element | Type | i/o | Note |
|---|---|---|---|
name |
val |
in[0] |
an id to identify the process |
newDS |
path |
in[1] |
new dataset |
oldDS |
path |
in[2] |
old dataset |
newFlag |
path |
in[3] |
subsample flag for newDS |
oldFlag |
path |
in[4] |
subsample flag for oldDS |
name |
val |
out[0] |
same as input |
idx |
path |
out[1] |
merged index (merged.idx) |
checkConverge
This workflow compares a sampled trajectories to labelled data. The output geometry will be:
- The last frame of the trajectory if the trajectory is deemed converted;
- The first frame of the trajectory otherwise.
The convergence is controlled by the following parameters.
Channel specification
| Element | Type | i/o | Note |
|---|---|---|---|
name |
val |
in[0] |
an id to identify the process |
idx |
path |
in[1] |
index of labels in the trajectory |
label |
val |
in[2] |
labelled data set |
traj |
val |
in[3] |
sampled trajectory |
name |
val |
out[0] |
same as input |
geo |
path |
out[1] |
geometry |
out |
val |
out[2] |
a string of convergence information |
Parameters
| Parameter | Default | Description |
|---|---|---|
fmaxtol |
2.0 |
Max error on forces |
emaxtol |
0.02 |
Max error on energy |
frmsetol |
0.15 |
Tolerance for force RMSE |
ermsetol |
0.005 |
Tolerance for energy RMSE |
Source code
nextflow.enable.dsl=2
params.publish = "."
def space_sep(in) {(in instanceof Path) ?in :in.join(' ')}
process convert {
label 'tips'
publishDir "$params.publish/$name"
input:
tuple val(name), path(in, stageAs:'.in*/*'), val(flags)
output:
tuple val(name), path('*')
script:
"""
tips convert ${space_sep(in)} $flags
"""
}
process dsmix {
label 'tips'
publishDir "$params.publish/$name"
input: tuple val(name), path(newDS, stageAs:'*.traj'), path(oldDS, stageAs:'old/*'), val(newFlag), val(oldFlag)
output: tuple val(name), path('mix-ds.{tfr,yml}')
script:
"""
tips convert old/${oldDS[0].baseName}.yml -f pinn -o old-ds -of asetraj $oldFlag
tips convert ${space_sep(newDS)} -f asetraj -o tmp.traj -of asetraj
tips convert tmp.traj -f asetraj -o new-ds -of asetraj $newFlag
tips convert new-ds.traj old-ds.traj -f asetraj -o mix-ds -of pinn --shuffle $params.filters
rm {new-ds,old-ds,tmp}.*
"""
}
process merge {
label 'tips'
publishDir "$params.publish/$name"
input: tuple val(name), val(idx), path(in, stageAs:'.in*/*'), val(flags)
output: tuple val(name), path('merged.idx'), path('merged.traj')
script:
"""
printf "${idx.join('\\n')}" > merged.idx
tips convert ${space_sep(in)} -o merged -of asetraj $flags
"""
}
process check {
label 'tips'
publishDir "$params.publish/$name"
input:
tuple val(name), path(idx), path(logs), path(traj)
output:
tuple val(name), path('*.xyz'), stdout
script:
fmaxtol = params.fmaxtol
emaxtol = params.emaxtol
frmsetol = params.frmsetol
ermsetol = params.ermsetol
sp_points = params.sp_points
"""
#!/usr/bin/env python
import numpy as np
from ase import Atoms
from ase.io import read, write
from tips.io import load_ds
from tips.io.filter import filters2fn
filters = "$params.filters".replace("'", '').split(' ')[1::2]
filter_fn = filters2fn(filters) # ^ a crude extractor
idx = [int(i) for i in np.loadtxt("$idx")]
logs = load_ds("$logs", fmt='asetraj')
traj = load_ds("$traj", fmt='asetraj')
idx, logs = tuple(zip(*(
(i, datum) for (i, datum) in zip(idx, logs) if filter_fn(datum))))
e_label = np.array([datum['energy']/len(datum['elem']) for datum in logs])
f_label = np.array([datum['force'] for datum in logs])
e_pred = np.array([traj[i]['energy']/len(traj[i]['elem']) for i in idx])
f_pred = np.array([traj[i]['force'] for i in idx])
ecnt = np.sum(np.abs(e_pred-e_label)>$emaxtol)
fcnt = np.sum(np.any(np.abs(f_pred-f_label)>$fmaxtol,axis=(1,2)))
emax = np.max(np.abs(e_pred-e_label))
fmax = np.max(np.abs(f_pred-f_label))
ermse = np.sqrt(np.mean((e_pred-e_label)**2))
frmse = np.sqrt(np.mean((f_pred-f_label)**2))
converged = (emax<$emaxtol) and (fmax<$fmaxtol) and (ermse<$ermsetol) and (frmse<$frmsetol) and (len(idx)==$sp_points)
geoname = "$name".split('/')[1]
if converged:
msg = f'Converged; will restart from latest frame.'
new_geo = logs[np.argmax(idx)]
else:
msg = f'energy: {ecnt}/{len(idx)} failed, max={emax:.2f} rmse={ermse:.2f}; '\
f'force: {fcnt}/{len(idx)} failed, max={fmax:.2f} rmse={frmse:.2f}.'
new_geo = logs[np.argmin(idx)]
atoms = Atoms(new_geo['elem'], positions=new_geo['coord'], cell=new_geo['cell'],
pbc=True)
write(f'{geoname}.xyz', atoms)
print(msg)
"""
}