Finite-temperature simulations

Finite temperature properties are typically obtained by performing MCMC simulations in the desired thermodynamic ensemble.

The commande mcmc in the ece is meant for running MC simulations in the desired thermodynamic ensemble. Arguments can directly be given via the command line interface or via a YAML settings file. When both arguments are given via the command line interface and via a settings YAML file, the priority is given to the arguments given in the YAML settings file. The accepted arguments are summarized in the table below.

Argument	Shortcut	Type	Help	Default	Notes
`model`	`m`	`str`	Path to the model to fit.	model.pth
`device`		`str`	Device to be used for PyTorch operations.	cpu
`cpu`		`store_true`	Use cpu device for PyTorch operations.		Shortcut for `--device cpu`
`cuda`		`store_true`	Use cuda device for PyTorch operations.		Shortcut for `--device cuda`
`ensemble`	`e`	`str`	Name of the MCEnsemble object available in ‘pyece.montecarlo.mcmc’.	Canonical	Choose between `Canonical` and `SemiGrandCanonical`.
`increment`	`i`	`str`	List of string dictionaries that specifies the evolution of the fixed variables.		The dictionaries contain 4 keys: ‘mode’, ‘number’, ‘initial’, and ‘final’.
`config`	`c`	`str`	Path to the file with the initial configuration.		Typically the path to a vasp POSCAR file of the supercell.
`supercell`	`C`	`str`	string matrix of the supercell transformation of the primitive cell.		The concentration is can either given in the `--concentration` or within the `--increment` arguments.
`composition`	`x`	`str`	String list with the concentration for each species as sorted in `Prim`, where sites are randomly assigned with a specie satisfying the concentration.		Sites are randomly assigned with a specie satisfying the concentration
`background_specie`	`b`	`str`	Name of the background specie for the chemical potentials.		It corresponds to the specie for which the chemical potential is not fixed.
`chemical_transformation`		`str`	Proportion of data in the test dataset (allows for several test datasets).		String matrix that transforms the true chemical potentials to the modified chemical potentials.
`properties`	`q`	`str`	string dictionary with the names of the properties to compute as the first level keys and settings specific to the chosen properties are written in inner dictionaries.		The propereties available so far can be found in `properties`.
`precision`	`y`	`str`	String dictionary that specifies the precision that one wants to achieve.		The keys must correspond to either fluctiating variables or properties in `--properties`.
`confidence`	`a`	`float`	Level of confidence for the construction of the confidence interval.	0.95
`n_uncorrelated`	`n`	`int`	Minimum number of uncorrelated samples to compute in each MC simulation.	50
`min`		`int`	Minimum number of samples to compute before stopping the simulation.	100
`max`		`int`	Maximum number of samples to compute before stopping the simulation.	5000
`sampling_frequency`	`f`	`int`	Number of MCMC steps to wait before collecting the state as a sample.		If not set, the frequency is set to the number of sites in the configuration.
`n_changes`		`int`	Number of occupational changes to perform at each MCMC step.	1
`settings`	`s`	`str`	Path to a YAML file containing the settings to perform the MCMC simulation.
`path`	`p`	`str`	Path to save the results of the MC run in the JSON format.	results.json
`path_to_configs`		`str`	Path to save the final configurations for fixed state of the MC run in the JSON format.
`raw`	`r`	`str`	Path to save the raw data of the MC run in gz tar- file format.		Be carefull as the size of the files might be large.
`verbose`	`v`	`store_true`	Print information about the model.	False

Tip

The function can be used in the command line interface when using the ece module. Internally, the module calls the run_mcmc() function. This latter also accepts a dictionary with the same arguments as keys and can be called in a Python script.

mcmc Arguments

model

It specifies the name of the model to use for the MC simulations. PyeCE eCE models are saved either as the Pytorch default settings (typically use a .pt or .pth extension) or as a gz-compressed tarfile if the eci model is not built using the from_settings building attribute (use a .tar.gz extension).

device

It indicates the PyTorch device to be used during PyTorch operations.

cpu

This argument indiates that the CPU device must be used during PyTorch operations. This is a shortcut to the --device cpu argument.

cuda

This argument indiates that the CUDA device must be used during PyTorch operations. This is a shortcut to the --device cuda argument.

ensemble

It specifies the name of the MCEnsemble object available in mcmc.

So far, 2 thermodynamic ensembles are implemented:

Canonical: this ensemble corresponds to a situation with fixed temperature and fixed number of atoms for each species. The partition function reads as:

\[Z = \sum_{\sigma} \exp\Big(-\beta E(\sigma) \Big)\]
Semi-Grand Canonical: this ensemble corresponds to a situation with fixed temperature, fixed total number of atoms for each species, and fixed modified chemical potentials (\(\tilde{\mu}\)). The partition function reads as:

\[Z = \sum_{\sigma} \exp\Big(-\beta \Big[ E(\sigma) - \sum_i \tilde{\mu}_i N_i(\sigma) \Big] \Big)\]

increment

This argument specifies the variation of the variables fixing the state (i.e., the temperature and the composition for the canonical ensemble, and the temperature and the tilde chemical potentials for the semi-grand canonical ensemble). The path is indicated by a list of dictionaries. Each dictionary is composed of the 4 keys as follows:

mode: Type of spacing. It can be either linear for a linear or logarithmic for logarithmic spacings between the initial and the final states.
number: Number of staps between the initial and the final states (included).
initial: Dictionary that specifies the initial state. The dictionary contains keys corresponding to the name of the fixed variables characterizing the thermodynamic ensemble.
final: Dictionary that specifies the final state. The dictionary contains keys corresponding to the name of the fixed variables characterizing the thermodynamic ensemble.

When the list is composed of several dictionaries, the simulation follows the paths described in each dictionary one after the other. In case of an unspeciefied variable, it is assumed constant and its value is set to the previous one.

Note

The logarithmic mode is primarily meant for varying the temperature from a high temperature to a lower temperature in cooldown simulations.

Note

If the composition is specified, the occupation is randomly filled accordingly to the composition. In case the composition is kept constant, the composition can be specified either in the --config argument ar in the --composition argument.

Below is an example of a path composed first of a 10-step logarithmic cooldown from 10’000K to 1’000 at fixed composition followed by a 5-step change in composition performed at constant temperature.

increment:
     - mode: "logarithmic"
       number: 5
       initial:
          temperature: 10000
          composition: [1 1 0 0]
       final:
          temperature: 1000
          composition: [1 1 0 0]
     - mode: "linear"
       number: 5
       initial:
          composition: [1 1 0 0]
       final:
          composition: [0 0 1 1]

which yields the following path:

1.  Temperature: 10000.0
    Composition: [0.500 0.500 0.0   0.0  ]
2.  Temperature:  5623.4
    Composition: [0.500 0.500 0.0   0.0  ]
3.  Temperature:  3162.3
    Composition: [0.500 0.500 0.0   0.0  ]
4.  Temperature:  1778.3
    Composition: [0.500 0.500 0.0   0.0  ]
5.  Temperature:  1000.0
    Composition: [0.500 0.500 0.0   0.0  ]
6.  Temperature:  1000.0
    Composition: [0.500 0.500 0.0   0.0  ]
7.  Temperature:  1000.0
    Composition: [0.375 0.375 0.125 0.125]
8.  Temperature:  1000.0
    Composition: [0.250 0.250 0.250 0.250]
9.  Temperature:  1000.0
    Composition: [0.125 0.125 0.375 0.375]
10. Temperature:  1000.0
    Composition: [0.0   0.0   0.500 0.500]

Notice that it is not necessary to normalize the composition to 1.

config

This argument indicates the path to a file containing the input structure geometry. Generally, this file corresponds to a VASP POSCAR. However, any file that can be read by Pymatgen is accepted.

supercell

This argument can be used together with the --composition argument instead of the --config argument. It specifies the 3x3 supercell matrix (in row-wise convention) that transforms the primitive cell in Prim to the desired supercell to be used in the MC simulation.

composition

Use this argument to specify the initial composition of the configuration. The occupation is randomly filled accordingly to the composition. The composition does not need to be normalized to 1.

background_specie

In case of semi-grand canonical ensemble, a chemical transformation, cooresponding to a Nx(N-1) matrix \(T\) with N being the number of species, needs to be specified. In most cases, this transformation corresponds to setting one specie as the background specie so that the modified chemical potentials corresponds to the chemical potential minus that of the background specie. This argument specifies the name of the background specie and automatically generate the associated chemical transformation matrix.

Let’s consider a ternary alloy composed of species A, B, and C, and choosing the specie C as background, the chemical transformation matrix \(T\) reads as:

\[\begin{split}T = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}\end{split}\]

so that the modified chemical potentials \(\tilde{\vec{\mu}}\) yield:

\[\begin{split}\begin{bmatrix} \tilde{\mu}_1 \\ \tilde{\mu}_2 \\ \mu^* \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix}^{-T} \vec{\mu} = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & -1 \\ 0 & 0 & 1 \end{bmatrix} \vec{\mu} = \begin{bmatrix} \mu_A - \mu_C \\ \mu_B - \mu_C \\ \mu_C \end{bmatrix}\end{split}\]

chemical_transformation

In case of semi-grand canonical ensemble, a chemical transformation, cooresponding to a Nx(N-1) matrix \(T\) with N being the number of species, needs to be specified. This argument specifies the Nx(N-1) chemical transformation matrix in row-wise convention.

Let’s consider a ternary alloy composed of species A, B, and C, the following chemical transformation matrix \(T\):

\[\begin{split}T = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}\end{split}\]

corresponds to the transformation that yields the following modified chemical potentials \(\tilde{\vec{\mu}}\):

\[\begin{split}\begin{bmatrix} \tilde{\mu}_1 \\ \tilde{\mu}_2 \\ \mu^* \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix}^{-T} \vec{\mu} = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & -1 \\ 0 & 0 & 1 \end{bmatrix} \vec{\mu} = \begin{bmatrix} \mu_A - \mu_C \\ \mu_B - \mu_C \\ \mu_C \end{bmatrix}\end{split}\]

properties

This argument specifies some properties to compute during the MC simulation. Generally, these properties corresponds to ensemble averages of functions. Directly computing them during the simulation avoids the need to save every configuration. The argument must be of the form of a dictionary (or string dictionary in the CLI) with the names of the properties to compute as the first level keys and settings specific to the chosen properties are written in an inner dictionary. The properties are available in properties. The following properties are implemented so far:

Corr: computes the averaged correlation functions of an eCE model.
Prob: computes the averaged joint probabilities between pairs of atoms.
SRO: computes the averaged short-range order parameters between pairs of atoms.
SublatticeOccupation: computes the averaged sublattice compositions.

Let’s consider a BCC structure, the following command yields the computation of the averaged correlation functions for the model found in “model.tar.gz”, the averaged joint probabilities for pairs of lengths up to 4 Å, the averaged short-range order (SRO) parameters for pairs of lengths up to 4 Å, and the averaged sublattice compositions of the B2-supercell:

--properties '{"Corr": {"path_to_model": "model.tar.gz"}, "Prob": {"cutoff": 4.0}, "SRO": {"cutoff": 4.0}, "SublatticeOccupation": {"supercell_transformation": [[0,1,1],[1,0,1],[1,1,0]]}}'

precision

This argument specifies the precision that one wishes to reach for the some variables/properties to specify. The precision is computed as the size of the confidence interval, with the level of confidence specified in the --confidence argument. The possible variables/properties are the mean and (co)variance estimators of the fluctuating extensive variables of the thermodynamic ensemble (typically the energy and the number of atom of a specie) and the mean estimator of the properties set in the –properties` argument. The argument must be of the form of a dictionary (or string dictionary in the CLI) with the names of the variables/properties to converge as keys and precision as associated value.

Note

For the fluctuating variables, the mean estimator name is written as ‘<A>’ and the (co)variance estimator name as ‘<A B>-<A><B>’. For the properties in the --properties argument, the mean estimator name is the same as in the output dictionaries of the functions in properties (e.g., ‘A-B_(Orbit_2_0)’ for the SRO property corresponding to the first nearest neighbor short-range order parameter between species A and B, and ‘A_(0.000_0.000_0.000)’ for the SublatticeOccupation property corresponding to the composition of specie A at the site [0, 0, 0]).

In order to account for correlation inherent to MCMC processes, the statistical inefficiency \(g\) is computed for all fluctuating variables as:

\[g_\tau = \sum_{i,j = 1}^T C_{|i-j|} = 1 + \sum_n^{T-1} \Big(1-\frac{n}{T}\Big) C_n \quad \text{(if stationary)}\]

where \(C_{|i-j|}\) denotes the (auto-)correlation between samples with lag \(|i-j| = n\) (see compute_inefficiency_ACF()). Uncorrelated samples are then subsampled according to the largest statistical inefficiency to ensure the same number of fluctuating variable samples and no correlation (see get_uncorrelated_indexes()). Estimators corresponding to these fluctuating variables are computed using T-statistics (for the means and the covariances) and \(\chi^2\)-statistics (for the variances) (see compute_estimators_uncorrelated()). Estimators corresponding to properties in the --properties argument are computed from the uncorrelated samples according to T-statistics. For each properties, the statistical inefficiency is recomputed and a correlation factor is added to all estimators as:

\[\hat{\sigma}^2 = \frac{g}{T} \mathbb{Var}[X]\]

where \(g\) is the statistical inefficiency, \(X\) is the property of interest, and \(T\) is the number of samples (see compute_average()). This is to ensure the variance is not underestimated by residual correlation in these properties since the uncorrelated samples are estimated on the fluctuating variables and not directly on samples corresponding to the property of interest.

Let’s consider an Al-Fe-Ti system and ensure a precision of the energy to 0.001eV, the number of Al atoms to 0.1, the covariance between the energy and the number of Al atoms to 0.0001, and the shrot-range order parameter between Al and Ti t0 0.01:

--precision '{"<E>": 1e-3, "<N_Al>": 0.1, "<E N_Al>-<E><N_Al>": 1e-4, "Al-Ti_(Orbit_2_0)": 0.01}'

confidence

This argument is used to set the level of confidence \(0 < \alpha < 1\) when computing confidence intervals. Given a random variable \(X\), it consists in determining an interval \([a, b]\) satisfying:

\[\mathbb{P}\Big(a \leq X \leq b \Big) = \alpha\]

n_uncorrelated

This argument specifies the minimum number of uncorrelated samples one wishes to reach in order to compute statistics before the simulation is considered as converged. It is typically recommended to have at least 30-50 uncorrelated samples for the central limit theorem to apply. This is particularly import for the computation of variances and covariances.

min

This argument specifies the minimum number of samples to produce before stopping the simulation.

max

This argument specifies the maximum number of samples to produce before stopping the simulation.

sampling_frequency

This argument indicates the number of MC step to perform before collecting the sample. If this argument is not specified, the frequency is set to the number of sites in the configuration.

n_changes

This argument specifies the number of occupational changes to perform at each MC step. The value should remain small enough to ensure an acceptance rate greater that 10%. Generally, 1 change at each MC step is a good choice. However, in some cases, performing more changes might result in a significant speed-up.

settings

A path to a yaml file containing all arguments can be given in place of or together with arguments given in the CLI. In case of conflict between arguments given in the settings file and in the CLI, the arguments in the settingsfile have the priority. Missing arguments are set to default values or to the values given in the CLI.

Below is an example of such a settings file. The simulation is performed within the semi-grand canonical ensemble for a 10x10x10 supercell of a quinary alloy with an equiatomic initial composition. In a first step, a 50-step logarithmic cooldown from 20’000K to 1’000K with the modified chemical potentials set to 0, followed by a 20-step linear chemical potential scan at constant temperature of 1’000K.

# Model
model: "model.tar.gz"                            # Use the eCE model stored in 'model.tar.gz'
device: "cuda"                                   # Use the cuda device

# Thermodynamics
ensemble: "SemiGrandCanonical"                   # Use the semi-grand canonical ensemble
increment:
 - mode: "logarithmic"                           # Logarithmic cooldown from 20'000K to 1'000K
   number: 50
   initial:
      temperature: 20000
      mu_tilde: [0, 0, 0, 0]
   final:
      temperature: 1000
      mu_tilde: [0, 0, 0, 0]
 - mode: "linear"                                # linear chemical potential scan at 1'000K
   number: 20
   initial:
      mu_tilde: [0, 0, 0, 0]
   final:
      mu_tilde: [1, 1, 1, 1]
supercell: [[10, 0, 0], [0, 10, 0], [0, 0, 10]]  # Use a 10x10x10 supercell
composition: [1, 1, 1, 1, 1]                     # Set the initial composition to equiatomic
chemical_transformation: [[0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]]
properties:                                      # Compute the short-range order parameter within a radius of 4 Å
   SRO:
      cutoff: 4

# Simulation
precision:
   <E>: 1e-3                                     # Reach convergence of the energy to within 1meV before stopping the simulation
confidence: 0.99                                 # Level of confidence set to 99%
n_correlated: 100                                # Compute at least 100 uncorrelated samples before stoping the simulation
min: 200                                         # Compute at least 200 samples before stoping the simulation
max: 5000                                        # Compute a maximum of 5000 samples before stoping the simulation

# Global
path: results_mcmc.json                          # Save the processed data in 'results_mcmc.json'
path_to_configs: last_configs.json               # Save the last configuration for each state in 'last_configs.json'
raw: raw_data.tar.gz                             # Save the raw data in 'raw_data.tar.gz'
verbose: True

path

This argument indicates the path to save the processed data resulting from the simulation. The file corresponds to a dictionary with the keys being:

The fixed variables
The mean estimators for the fluctuating variables
The (co)variance estimators for the fluctuating variables
The mean estimator for the properties defined in the --properties argument
The metadata related to the simulation and the convergence status

A list containing the values for each state defined in the --increment argument is associated to all these keys. For each estimator, the value corresponds to a dictionary with the mean value and the confidence interval.

path_to_configs

This argument should be set in order to save the last configuration in the form of occupation bitstring for each state defined in the --increment argument.

raw

This argument should be set in order to save the raw data for each state defined in the --increment argument. The raw data correspond to the timeseries of the fluctuating variables as well as the occupation bitstring.

Warning

Saving the raw data can result in the creation of heavy files

verbose

Set it to print information during the MC simulation.