Finite-temperature simulations
==================================

Finite temperature properties are typically obtained by performing MCMC simulations in the desired thermodynamic ensemble.

The commande ``mcmc`` in the :py:mod:`~.pyece.cli.ece` is meant for running MC simulations in the desired thermodynamic ensemble. Arguments can directly be given via the command line interface or via a YAML settings file. When both arguments are given via the command line interface and via a settings YAML file, the priority is given to the arguments given in the YAML settings file. The accepted arguments are summarized in the table below.

===========================  ==========  ==============  ==================================================  ============  ==================================================
Argument                     Shortcut    Type            Help                                                Default       Notes
===========================  ==========  ==============  ==================================================  ============  ==================================================
``model``                    ``m``       ``str``         Path to the model to fit.                           model.pth
``device``                               ``str``         Device to be used for PyTorch operations.           cpu
``cpu``                                  ``store_true``  Use cpu device for PyTorch operations.                            Shortcut for ``--device cpu``
``cuda``                                 ``store_true``  Use cuda device for PyTorch operations.                           Shortcut for ``--device cuda``
``ensemble``                 ``e``       ``str``         Name of the MCEnsemble object available in          Canonical     Choose between ``Canonical`` and
                                                         'pyece.montecarlo.mcmc'.                                          ``SemiGrandCanonical``.
``increment``                ``i``       ``str``         List of string dictionaries that specifies the                    The dictionaries contain 4 keys: 'mode', 'number',
                                                         evolution of the fixed variables.                                 'initial', and 'final'.
``config``                   ``c``       ``str``         Path to the file with the initial configuration.                  Typically the path to a vasp POSCAR file of the
                                                                                                                           supercell.
``supercell``                ``C``       ``str``         string matrix of the supercell transformation of                  The concentration is can either given in the
                                                         the primitive cell.                                               ``--concentration`` or within the ``--increment``
                                                                                                                           arguments.
``composition``              ``x``       ``str``         String list with the concentration for each                       Sites are randomly assigned with a specie
                                                         species as sorted in                                              satisfying the concentration
                                                         :py:obj:`~.pyece.core.prim.Prim`, where sites
                                                         are randomly assigned with a specie satisfying the
                                                         concentration.
``background_specie``        ``b``       ``str``         Name of the background specie for the chemical                    It corresponds to the specie for which the
                                                         potentials.                                                       chemical potential is not fixed.
``chemical_transformation``              ``str``         Proportion of data in the test dataset (allows for                String matrix that transforms the true chemical
                                                         several test datasets).                                           potentials to the modified chemical potentials.
``properties``               ``q``       ``str``         string dictionary with the names of the properties                The propereties available so far can be found in
                                                         to compute as the first level keys and settings                   :py:mod:`~.pyece.montecarlo.properties`.
                                                         specific to the chosen properties are written in
                                                         inner dictionaries.
``precision``                ``y``       ``str``         String dictionary that specifies the precision                    The keys must correspond to either fluctiating
                                                         that one wants to achieve.                                        variables or properties in ``--properties``.
``confidence``               ``a``       ``float``       Level of confidence for the construction of the     0.95
                                                         confidence interval.
``n_uncorrelated``           ``n``       ``int``         Minimum number of uncorrelated samples to compute   50
                                                         in each MC simulation.
``min``                                  ``int``         Minimum number of samples to compute before         100
                                                         stopping the simulation.
``max``                                  ``int``         Maximum number of samples to compute before         5000
                                                         stopping the simulation.
``sampling_frequency``       ``f``       ``int``         Number of MCMC steps to wait before collecting the                If not set, the frequency is set to the number of
                                                         state as a sample.                                                sites in the configuration.
``n_changes``                            ``int``         Number of occupational changes to perform at each   1
                                                         MCMC step.
``settings``                 ``s``       ``str``         Path to a YAML file containing the settings to
                                                         perform the MCMC simulation.
``path``                     ``p``       ``str``         Path to save the results of the MC run in the JSON  results.json
                                                         format.
``path_to_configs``                      ``str``         Path to save the final configurations for fixed     
                                                         state of the MC run in the JSON format.
``raw``                      ``r``       ``str``         Path to save the raw data of the MC run in gz tar-                Be carefull as the size of the files might be
                                                         file format.                                                      large.
``verbose``                  ``v``       ``store_true``  Print information about the model.                  `False`
===========================  ==========  ==============  ==================================================  ============  ==================================================

.. tip::
	The function can be used in the command line interface when using the :py:mod:`~.pyece.cli.ece` module. Internally, the module calls the :py:func:`~.pyece.cli.ece_mcmc.run_mcmc` function. This latter also accepts a dictionary with the same arguments as keys and can be called in a `Python script`.


*mcmc* Arguments
****************

**model**
---------
It specifies the name of the model to use for the MC simulations. `PyeCE` eCE models are saved either as the Pytorch default settings (typically use a *.pt* or *.pth* extension) or as a *gz-compressed tarfile* if the eci model is not built using the :py:attr:`~.pyece.core.clex.eCE.from_settings` building attribute (use a *.tar.gz* extension). 

**device**
----------
It indicates the `PyTorch device <https://pytorch.org/docs/stable/tensor_attributes.html#torch.device>`_ to be used during `PyTorch` operations.

**cpu**
-------
This argument indiates that the CPU device must be used during `PyTorch` operations. This is a shortcut to the ``--device cpu`` argument.

**cuda**
--------
This argument indiates that the CUDA device must be used during `PyTorch` operations. This is a shortcut to the ``--device cuda`` argument.

**ensemble**
------------
It specifies the name of the :py:obj:`~.pyece.montecarlo.mcmc.MCEnsemble` object available in :py:mod:`~.pyece.montecarlo.mcmc`.

So far, 2 thermodynamic ensembles are implemented:

* Canonical: this ensemble corresponds to a situation with fixed temperature and fixed number of atoms for each species. The partition function reads as:

	.. math:: Z = \sum_{\sigma} \exp\Big(-\beta E(\sigma) \Big)

* Semi-Grand Canonical: this ensemble corresponds to a situation with fixed temperature, fixed total number of atoms for each species, and fixed modified chemical potentials (:math:`\tilde{\mu}`). The partition function reads as:

	.. math:: Z = \sum_{\sigma} \exp\Big(-\beta \Big[ E(\sigma) - \sum_i \tilde{\mu}_i N_i(\sigma) \Big] \Big)

**increment**
-------------
This argument specifies the variation of the variables fixing the state (i.e., the temperature and the composition for the canonical ensemble, and the temperature and the tilde chemical potentials for the semi-grand canonical ensemble). The path is indicated by a list of dictionaries. Each dictionary is composed of the 4 keys as follows:

* ``mode``: Type of spacing. It can be either ``linear`` for a linear or ``logarithmic`` for logarithmic spacings between the initial and the final states.
* ``number``: Number of staps between the initial and the final states (included).
* ``initial``: Dictionary that specifies the initial state. The dictionary contains keys corresponding to the name of the fixed variables characterizing the thermodynamic ensemble.
* ``final``: Dictionary that specifies the final state. The dictionary contains keys corresponding to the name of the fixed variables characterizing the thermodynamic ensemble.

When the list is composed of several dictionaries, the simulation follows the paths described in each dictionary one after the other. In case of an unspeciefied variable, it is assumed constant and its value is set to the previous one.

.. note:: The logarithmic mode is primarily meant for varying the temperature from a high temperature to a lower temperature in cooldown simulations.

.. note:: If the composition is specified, the occupation is randomly filled accordingly to the composition. In case the composition is kept constant, the composition can be specified either in the ``--config`` argument ar in the ``--composition`` argument.

Below is an example of a path composed first of a 10-step logarithmic cooldown from 10'000K to 1'000 at fixed composition followed by a 5-step change in composition performed at constant temperature.

.. code-block:: YAML

            increment:
		 - mode: "logarithmic"
		   number: 5
		   initial: 
		      temperature: 10000
		      composition: [1 1 0 0]
		   final:
		      temperature: 1000
		      composition: [1 1 0 0]
		 - mode: "linear"
		   number: 5
		   initial:
		      composition: [1 1 0 0]
		   final:
		      composition: [0 0 1 1]
		      
which yields the following path:

.. code-block::

	1.  Temperature: 10000.0
	    Composition: [0.500 0.500 0.0   0.0  ]
	2.  Temperature:  5623.4
	    Composition: [0.500 0.500 0.0   0.0  ]
	3.  Temperature:  3162.3
	    Composition: [0.500 0.500 0.0   0.0  ]
	4.  Temperature:  1778.3
	    Composition: [0.500 0.500 0.0   0.0  ]
	5.  Temperature:  1000.0
	    Composition: [0.500 0.500 0.0   0.0  ]
	6.  Temperature:  1000.0
	    Composition: [0.500 0.500 0.0   0.0  ]
	7.  Temperature:  1000.0
	    Composition: [0.375 0.375 0.125 0.125]
	8.  Temperature:  1000.0
	    Composition: [0.250 0.250 0.250 0.250]
	9.  Temperature:  1000.0
	    Composition: [0.125 0.125 0.375 0.375]
	10. Temperature:  1000.0
	    Composition: [0.0   0.0   0.500 0.500]
   
Notice that it is not necessary to normalize the composition to 1.

**config**
----------
This argument indicates the path to a file containing the input structure geometry. Generally, this file corresponds to a `VASP POSCAR <https://www.vasp.at/wiki/index.php/POSCAR>`_. However, any file that can be read by `Pymatgen <https://pymatgen.org/pymatgen.core.html#pymatgen.core.structure.IStructure.from_file>`_ is accepted.

**supercell**
-------------
This argument can be used together with the ``--composition`` argument instead of the ``--config`` argument. It specifies the 3x3 supercell matrix (in row-wise convention) that transforms the primitive cell in :py:obj:`~.pyece.core.prim.Prim` to the desired supercell to be used in the MC simulation.

**composition**
---------------
Use this argument to specify the initial composition of the configuration. The occupation is randomly filled accordingly to the composition. The composition does not need to be normalized to 1.

**background_specie**
---------------------
In case of semi-grand canonical ensemble, a chemical transformation, cooresponding to a Nx(N-1) matrix :math:`T` with N being the number of species, needs to be specified. In most cases, this transformation corresponds to setting one specie as the background specie so that the modified chemical potentials corresponds to the chemical potential minus that of the background specie. This argument specifies the name of the background specie and automatically generate the associated chemical transformation matrix.

Let's consider a ternary alloy composed of species A, B, and C, and choosing the specie C as background, the chemical transformation matrix :math:`T` reads as:
        
.. math:: 

    T = \begin{bmatrix} 
        1 & 0 & 0 \\
        0 & 1 & 0  
        \end{bmatrix}

so that the modified chemical potentials :math:`\tilde{\vec{\mu}}` yield:

.. math::

    \begin{bmatrix} \tilde{\mu}_1 \\ \tilde{\mu}_2 \\ \mu^* \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix}^{-T} \vec{\mu} = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & -1 \\ 0 & 0 & 1 \end{bmatrix} \vec{\mu} = \begin{bmatrix} \mu_A - \mu_C \\ \mu_B - \mu_C \\ \mu_C  \end{bmatrix}

**chemical_transformation**
---------------------------
In case of semi-grand canonical ensemble, a chemical transformation, cooresponding to a Nx(N-1) matrix :math:`T` with N being the number of species, needs to be specified. This argument specifies the Nx(N-1) chemical transformation matrix in row-wise convention.

Let's consider a ternary alloy composed of species A, B, and C, the following chemical transformation matrix :math:`T`:
        
.. math:: 

    T = \begin{bmatrix} 
        1 & 0 & 0 \\
        0 & 1 & 0  
        \end{bmatrix}

corresponds to the transformation that yields the following modified chemical potentials :math:`\tilde{\vec{\mu}}`:

.. math::

    \begin{bmatrix} \tilde{\mu}_1 \\ \tilde{\mu}_2 \\ \mu^* \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix}^{-T} \vec{\mu} = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & -1 \\ 0 & 0 & 1 \end{bmatrix} \vec{\mu} = \begin{bmatrix} \mu_A - \mu_C \\ \mu_B - \mu_C \\ \mu_C  \end{bmatrix}

**properties**
--------------
This argument specifies some properties to compute during the MC simulation. Generally, these properties corresponds to ensemble averages of functions. Directly computing them during the simulation avoids the need to save every configuration. The argument must be of the form of a dictionary (or string dictionary in the CLI) with the names of the properties to compute as the first level keys and settings specific to the chosen properties are written in an inner dictionary. The properties are available in :py:mod:`~.pyece.montecarlo.properties`. The following properties are implemented so far:

* :py:obj:`~.pyece.montecarlo.properties.Corr`: computes the averaged correlation functions of an :py:obj:`~.pyece.core.clex.eCE` model.
* :py:obj:`~.pyece.montecarlo.properties.Prob`: computes the averaged joint probabilities between pairs of atoms.
* :py:obj:`~.pyece.montecarlo.properties.SRO`: computes the averaged short-range order parameters between pairs of atoms.
* :py:obj:`~.pyece.montecarlo.properties.SublatticeOccupation`: computes the averaged sublattice compositions.

Let's consider a BCC structure, the following command yields the computation of the averaged correlation functions for the model found in "model.tar.gz", the averaged joint probabilities for pairs of lengths up to 4 Å, the averaged short-range order (SRO) parameters for pairs of lengths up to 4 Å, and the averaged sublattice compositions of the B2-supercell:

.. code-block:: bash

	--properties '{"Corr": {"path_to_model": "model.tar.gz"}, "Prob": {"cutoff": 4.0}, "SRO": {"cutoff": 4.0}, "SublatticeOccupation": {"supercell_transformation": [[0,1,1],[1,0,1],[1,1,0]]}}'


**precision**
-------------
This argument specifies the precision that one wishes to reach for the some variables/properties to specify. The precision is computed as the size of the confidence interval, with the level of confidence specified in the ``--confidence`` argument. The possible variables/properties are the mean and (co)variance estimators of the fluctuating extensive variables of the thermodynamic ensemble (typically the energy and the number of atom of a specie) and the mean estimator of the properties set in the `--properties`` argument. The argument must be of the form of a dictionary (or string dictionary in the CLI) with the names of the variables/properties to converge as keys and precision as associated value.

.. note::

	For the fluctuating variables, the mean estimator name is written as '<A>' and the (co)variance estimator name as '<A B>-<A><B>'. For the properties in the ``--properties`` argument, the mean estimator name is the same as in the output dictionaries of the functions in :py:mod:`~.pyece.montecarlo.properties` (e.g., 'A-B_(Orbit_2_0)' for the ``SRO`` property corresponding to the first nearest neighbor short-range order parameter between species A and B, and 'A_(0.000_0.000_0.000)' for the ``SublatticeOccupation`` property corresponding to the composition of specie A at the site [0, 0, 0]).

	In order to account for correlation inherent to MCMC processes, the statistical inefficiency :math:`g` is computed for all fluctuating variables as:

	.. math:: g_\tau = \sum_{i,j = 1}^T C_{|i-j|} = 1 + \sum_n^{T-1} \Big(1-\frac{n}{T}\Big) C_n \quad \text{(if stationary)}

	where :math:`C_{|i-j|}` denotes the (auto-)correlation between samples with lag :math:`|i-j| = n` (see :py:func:`~.pyece.montecarlo.timeseries.compute_inefficiency_ACF`). Uncorrelated samples are then subsampled according to the largest statistical inefficiency to ensure the same number of fluctuating variable samples and no correlation (see :py:func:`~.pyece.montecarlo.timeseries.get_uncorrelated_indexes`). Estimators corresponding to these fluctuating variables are computed using T-statistics (for the means and the covariances) and :math:`\chi^2`-statistics (for the variances) (see :py:func:`~.pyece.montecarlo.timeseries.compute_estimators_uncorrelated`). Estimators corresponding to properties in the ``--properties`` argument are computed from the uncorrelated samples according to T-statistics. For each properties, the statistical inefficiency is recomputed and a correlation factor is added to all estimators as:

	.. math:: \hat{\sigma}^2 = \frac{g}{T} \mathbb{Var}[X]

	where :math:`g` is the statistical inefficiency, :math:`X` is the property of interest, and :math:`T` is the number of samples (see :py:func:`~.pyece.montecarlo.timeseries.compute_average`). This is to ensure the variance is not underestimated by residual correlation in these properties since the uncorrelated samples are estimated on the fluctuating variables and not directly on samples corresponding to the property of interest.

Let's consider an Al-Fe-Ti system and ensure a precision of the energy to 0.001eV, the number of Al atoms to 0.1, the covariance between the energy and the number of Al atoms to 0.0001, and the shrot-range order parameter between Al and Ti t0 0.01:

.. code-block:: bash

	--precision '{"<E>": 1e-3, "<N_Al>": 0.1, "<E N_Al>-<E><N_Al>": 1e-4, "Al-Ti_(Orbit_2_0)": 0.01}'

**confidence**
--------------
This argument is used to set the level of confidence :math:`0 < \alpha < 1` when computing confidence intervals. Given a random variable :math:`X`, it consists in determining an interval :math:`[a, b]` satisfying:

.. math:: \mathbb{P}\Big(a \leq X \leq b \Big) = \alpha

**n_uncorrelated**
------------------
This argument specifies the minimum number of uncorrelated samples one wishes to reach in order to compute statistics before the simulation is considered as converged. It is typically recommended to have at least 30-50 uncorrelated samples for the central limit theorem to apply. This is particularly import for the computation of variances and covariances.

**min**
-------
This argument specifies the minimum number of samples to produce before stopping the simulation.

**max**
-------
This argument specifies the maximum number of samples to produce before stopping the simulation.

**sampling_frequency**
----------------------
This argument indicates the number of MC step to perform before collecting the sample. If this argument is not specified, the frequency is set to the number of sites in the configuration.

**n_changes**
-------------
This argument specifies the number of occupational changes to perform at each MC step. The value should remain small enough to ensure an acceptance rate greater that 10%. Generally, 1 change at each MC step is a good choice. However, in some cases, performing more changes might result in a significant speed-up.

**settings**
------------
A path to a yaml file containing all arguments can be given in place of or together with arguments given in the CLI. In case of conflict between arguments given in the settings file and in the CLI, the arguments in the settingsfile have the priority. Missing arguments are set to default values or to the values given in the CLI. 

Below is an example of such a settings file. The simulation is performed within the semi-grand canonical ensemble for a 10x10x10 supercell of a quinary alloy with an equiatomic initial composition. In a first step, a 50-step logarithmic cooldown from 20'000K to 1'000K with the modified chemical potentials set to 0, followed by a 20-step linear chemical potential scan at constant temperature of 1'000K.

.. code-block:: YAML

	# Model
	model: "model.tar.gz"                            # Use the eCE model stored in 'model.tar.gz'
	device: "cuda"                                   # Use the cuda device

	# Thermodynamics
	ensemble: "SemiGrandCanonical"                   # Use the semi-grand canonical ensemble
	increment: 
	 - mode: "logarithmic"                           # Logarithmic cooldown from 20'000K to 1'000K
	   number: 50
	   initial: 
	      temperature: 20000
	      mu_tilde: [0, 0, 0, 0]
	   final:
	      temperature: 1000
	      mu_tilde: [0, 0, 0, 0]
	 - mode: "linear"                                # linear chemical potential scan at 1'000K
	   number: 20
	   initial: 
	      mu_tilde: [0, 0, 0, 0]
	   final:
	      mu_tilde: [1, 1, 1, 1]
	supercell: [[10, 0, 0], [0, 10, 0], [0, 0, 10]]  # Use a 10x10x10 supercell
	composition: [1, 1, 1, 1, 1]                     # Set the initial composition to equiatomic
	chemical_transformation: [[0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]]
	properties:                                      # Compute the short-range order parameter within a radius of 4 Å
	   SRO:
	      cutoff: 4

	# Simulation
	precision:
	   <E>: 1e-3                                     # Reach convergence of the energy to within 1meV before stopping the simulation
	confidence: 0.99                                 # Level of confidence set to 99%
	n_correlated: 100                                # Compute at least 100 uncorrelated samples before stoping the simulation
	min: 200                                         # Compute at least 200 samples before stoping the simulation
	max: 5000                                        # Compute a maximum of 5000 samples before stoping the simulation

	# Global
	path: results_mcmc.json                          # Save the processed data in 'results_mcmc.json'
	path_to_configs: last_configs.json               # Save the last configuration for each state in 'last_configs.json'
	raw: raw_data.tar.gz                             # Save the raw data in 'raw_data.tar.gz'
	verbose: True

**path**
--------
This argument indicates the path to save the processed data resulting from the simulation. The file corresponds to a dictionary with the keys being:

* The fixed variables
* The mean estimators for the fluctuating variables
* The (co)variance estimators for the fluctuating variables
* The mean estimator for the properties defined in the ``--properties`` argument
* The metadata related to the simulation and the convergence status

A list containing the values for each state defined in the ``--increment`` argument is associated to all these keys. For each estimator, the value corresponds to a dictionary with the mean value and the confidence interval.

**path_to_configs**
-------------------
This argument should be set in order to save the last configuration in the form of occupation bitstring for each state defined in the ``--increment`` argument.

**raw**
-------
This argument should be set in order to save the raw data for each state defined in the ``--increment`` argument. The raw data correspond to the timeseries of the fluctuating variables as well as the occupation bitstring.

.. warning:: Saving the raw data can result in the creation of heavy files

**verbose**
-----------
Set it to print information during the MC simulation.