.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/lode-linear/lode-linear.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_lode-linear_lode-linear.py: LODE Tutorial ============= :Authors: Philip Loche `@PicoCentauri `_, Kevin Huguenin-Dumittan `@kvhuguenin `_ This tutorial explains how Long range equivariant descriptors can be constructed using rascaline and the resulting descriptors be used to construct a linear model with equisolve First, import all the necessary packages .. GENERATED FROM PYTHON SOURCE LINES 16-27 .. code-block:: Python import ase.io import matplotlib.pyplot as plt import metatensor import numpy as np from equisolve.numpy.models.linear_model import Ridge from equisolve.utils.convert import ase_to_tensormap from rascaline import AtomicComposition, LodeSphericalExpansion, SphericalExpansion from rascaline.utils import PowerSpectrum .. GENERATED FROM PYTHON SOURCE LINES 28-43 Step 0: Prepare Data Set ------------------------ Get structures ~~~~~~~~~~~~~~ We take a small subset of the dimer dataset from `A. Grisafi et al., 2021 `_ for which we additionally calculated the forces. Each structure in the dataset contains two small organic molecules which are extended along a certain direction in the subsequent structures. For speeding up the calculations we already selected the first 130 :download:`structures ` of the charge-charge molecule pairs. .. GENERATED FROM PYTHON SOURCE LINES 44-48 .. code-block:: Python frames = ase.io.read("charge-charge.xyz", ":") .. GENERATED FROM PYTHON SOURCE LINES 49-56 Convert target properties to metatensor format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If we want to train models using the `equisolve `_ package, we need to convert the target properties (in this case, the energies and forces) into the appropriate format #justequistorethings .. GENERATED FROM PYTHON SOURCE LINES 57-61 .. code-block:: Python y = ase_to_tensormap(frames, energy="energy", forces="forces") .. GENERATED FROM PYTHON SOURCE LINES 62-73 Step 1: Compute short-range and LODE features --------------------------------------------- Define hypers and get the expansion coefficients :math:`\langle anlm | \rho_i \rangle` and :math:`\langle anlm | V_i \rangle` The short-range and long-range descriptors have very similar hyperparameters. We highlight the differences below. We first define the hyperparameters for the short-range (SR) part. These will be used to create SOAP features. .. GENERATED FROM PYTHON SOURCE LINES 74-86 .. code-block:: Python SR_HYPERS = { "cutoff": 3.0, "max_radial": 6, "max_angular": 2, "atomic_gaussian_width": 0.3, "center_atom_weight": 1.0, "radial_basis": {"Gto": {}}, "cutoff_function": {"ShiftedCosine": {"width": 0.5}}, } .. GENERATED FROM PYTHON SOURCE LINES 87-88 And next the hyperparaters for the LODE / long-range (LR) part .. GENERATED FROM PYTHON SOURCE LINES 89-107 .. code-block:: Python LR_HYPERS = { # Cutoff on which to project potential density "cutoff": 3.0, # keep max_radial slightly smaller than for SR part "max_radial": 3, # max_angular should be <= 4, more precisely, max_angular + potential_exponent < 10 "max_angular": 2, # keep at >=1, WARNING: CUBIC SCALING, do not use values <0.5 "atomic_gaussian_width": 3.0, "center_atom_weight": 1.0, "radial_basis": {"Gto": {}}, # the exponent p that determines the 1/r^p potential "potential_exponent": 1, } .. GENERATED FROM PYTHON SOURCE LINES 108-110 We then use the above defined hyperparaters to define the per atom short range (sr) and long range (sr) descriptors. .. GENERATED FROM PYTHON SOURCE LINES 110-115 .. code-block:: Python calculator_sr = SphericalExpansion(**SR_HYPERS) calculator_lr = LodeSphericalExpansion(**LR_HYPERS) .. GENERATED FROM PYTHON SOURCE LINES 116-136 Note that LODE requires periodic systems. Therefore, if the data set does not come with periodic boundary conditions by default you can not use the data set and you will face an error if you try to compute the features. As you notices the calculation of the long range features takes significant more time compared to the sr features. Taking a look at the output we find that the resulting :py:class:`metatensor.TensorMap` are quite similar in their structure. The short range :py:class:`metatensor.TensorMap` contains more blocks due to the higher ``max_angular`` paramater we choosed above. Generate the rotational invariants (power spectra) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Rotationally invariant features can be obtained by taking two of the calculators that were defines above. For the short-range part, we use the SOAP vector which is obtained by computing the invariant combinations of the form :math:`\rho \otimes \rho`. .. GENERATED FROM PYTHON SOURCE LINES 137-142 .. code-block:: Python ps_calculator_sr = PowerSpectrum(calculator_sr, calculator_sr) ps_sr = ps_calculator_sr.compute(frames, gradients=["positions"]) .. GENERATED FROM PYTHON SOURCE LINES 143-149 We calculate gradients with respect to pistions by providing the ``gradients=["positions"]`` option to the :py:meth:`rascaline.calculators.CalculatorBase.compute()` method. For the long-range part, we combine the long-range descriptor :math:`V` with one a short-range density :math:`\rho` to get :math:`\rho \otimes V` features. .. GENERATED FROM PYTHON SOURCE LINES 150-155 .. code-block:: Python ps_calculator_lr = PowerSpectrum(calculator_sr, calculator_lr) ps_lr = ps_calculator_lr.compute(systems=frames, gradients=["positions"]) .. GENERATED FROM PYTHON SOURCE LINES 156-165 Step 2: Building a Simple Linear SR + LR Model with energy baselining --------------------------------------------------------------------- Preprocessing (model dependent) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For our current model, we do not wish to treat the individual center and neighbor species separately. Thus, we move the ``"species_center"`` key into the ``sample`` direction, over which we will later sum over. .. GENERATED FROM PYTHON SOURCE LINES 166-171 .. code-block:: Python ps_sr = ps_sr.keys_to_samples("species_center") ps_lr = ps_lr.keys_to_samples("species_center") .. GENERATED FROM PYTHON SOURCE LINES 172-174 For linear models only: Sum features up over atoms (``samples``) in the same structure. .. GENERATED FROM PYTHON SOURCE LINES 175-182 .. code-block:: Python sample_names_to_sum = ["center", "species_center"] ps_sr = metatensor.sum_over_samples(ps_sr, sample_names=sample_names_to_sum) ps_lr = metatensor.sum_over_samples(ps_lr, sample_names=sample_names_to_sum) .. GENERATED FROM PYTHON SOURCE LINES 183-189 Initialize tensormaps for energy baselining ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We add a simple extra descriptor :py:class:`rascaline.AtomicComposition` that stores how many atoms of each chemical species are contained in the structures. This is used for energy baselining. .. GENERATED FROM PYTHON SOURCE LINES 190-197 .. code-block:: Python calculator_co = AtomicComposition(per_structure=False) descriptor_co = calculator_co.compute(frames, gradients=["positions"]) co = descriptor_co.keys_to_properties(["species_center"]) co = metatensor.sum_over_samples(co, sample_names=["center"]) .. GENERATED FROM PYTHON SOURCE LINES 198-219 The :py:class:`rascaline.AtomicComposition` calculator also allows to directly perform the the sum over center atoms by using the following lines. .. code:: python descriptor_co = AtomicComposition(per_structure=True).compute(**compute_args) co = descriptor_co.keys_to_properties(["species_center"]) Stack all the features together for linear model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A linear model on SR + LR features can be thought of as a linear model built on a feature vector that is simply the concatenation of the SR and LR features. Furthermore, energy baselining can be performed by concatenating the information about chemical species as well. There is an metatensor function called :py:func:`metatensor.join()` for this purpose. Formally, we can write for the SR model. X_sr: :math:`1 \oplus \left(\rho \otimes \rho\right)` .. GENERATED FROM PYTHON SOURCE LINES 220-224 .. code-block:: Python X_sr = metatensor.join([co, ps_sr], axis="properties") .. GENERATED FROM PYTHON SOURCE LINES 225-232 We used the ``axis="properties"`` parameter since we want to concatenate along the features/properties dimensions. For the long range model we can formerly write X_lr: :math:`1 \oplus \left(\rho \otimes \rho\right) \oplus \left(\rho \otimes V\right)` .. GENERATED FROM PYTHON SOURCE LINES 233-237 .. code-block:: Python X_lr = metatensor.join([co, ps_sr, ps_lr], axis="properties") .. GENERATED FROM PYTHON SOURCE LINES 238-245 The features are now ready! Let us now perform some actual learning. Below we initialize two instances of the :py:class:`equisolve.numpy.models.linear_model.Ridge` class. :py:class:`equisolve.numpy.models.linear_model.Ridge` will perform a regression with respect to ``"values"`` (energies) and ``"positions"`` gradients (forces). If you only want a fit with respect to energies you can remove the gradients with ``metatensor.remove_gradients()`` .. GENERATED FROM PYTHON SOURCE LINES 246-251 .. code-block:: Python clf_sr = Ridge() clf_lr = Ridge() .. GENERATED FROM PYTHON SOURCE LINES 252-258 Split training and target data into train and test dat ------------------------------------------------------ Split the training and the test data by the distance :math:`r_{\rm train}=6\,\mathrm{Å}` between the center of mass of the two molecules. A structure with a :math:`r_{\rm train}<6 {\rm Å}` is used for training. .. GENERATED FROM PYTHON SOURCE LINES 259-263 .. code-block:: Python r_cut = 6.0 .. GENERATED FROM PYTHON SOURCE LINES 264-266 We calculate the indices from the dataset by list comprehension. The center of mass distance is stored in the ``"distance""`` attribute. .. GENERATED FROM PYTHON SOURCE LINES 267-272 .. code-block:: Python idx_train = [i for i, f in enumerate(frames) if f.info["distance"] < r_cut] idx_test = [i for i, f in enumerate(frames) if f.info["distance"] >= r_cut] .. GENERATED FROM PYTHON SOURCE LINES 273-275 For doing the split we define two ``Labels`` instances and combine them in a :py:class:`List`. .. GENERATED FROM PYTHON SOURCE LINES 276-282 .. code-block:: Python samples_train = metatensor.Labels(["structure"], np.reshape(idx_train, (-1, 1))) samples_test = metatensor.Labels(["structure"], np.reshape(idx_test, (-1, 1))) grouped_labels = [samples_train, samples_test] .. GENERATED FROM PYTHON SOURCE LINES 283-284 That we use as input to the :py:func:`metatensor.split()` function .. GENERATED FROM PYTHON SOURCE LINES 285-297 .. code-block:: Python X_sr_train, X_sr_test = metatensor.split( X_sr, axis="samples", grouped_labels=grouped_labels ) X_lr_train, X_lr_test = metatensor.split( X_lr, axis="samples", grouped_labels=grouped_labels ) y_train, y_test = metatensor.split(y, axis="samples", grouped_labels=grouped_labels) .. GENERATED FROM PYTHON SOURCE LINES 298-305 Fit the model ------------- For this model, we use a very simple regularization scheme where all features are regularized in the same way (the amount being controlled by the parameter ``alpha``). For more advanced regularization schemes (regularizing energies and forces differently and/or the SR and LR parts differently), see further down. .. GENERATED FROM PYTHON SOURCE LINES 306-311 .. code-block:: Python clf_sr.fit(X_sr_train, y_train, alpha=1e-6) clf_lr.fit(X_lr_train, y_train, alpha=1e-6) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 312-317 Evaluation ---------- For evaluating the model we calculate the RMSEs using the ``score()`` method. With the ``parameter_key`` parameter we select which RMSE should be calculated. .. GENERATED FROM PYTHON SOURCE LINES 318-338 .. code-block:: Python print( "SR: RMSE energies = " f"{clf_sr.score(X_sr_test, y_test, parameter_key='values')[0]:.3f} eV" ) print( "SR: RMSE forces = " f"{clf_sr.score(X_sr_test, y_test, parameter_key='positions')[0]:.3f} eV/Å" ) print( "LR: RMSE energies = " f"{clf_lr.score(X_lr_test, y_test, parameter_key='values')[0]:.3f} eV" ) print( "LR: RMSE forces = " f"{clf_lr.score(X_lr_test, y_test, parameter_key='positions')[0]:.3f} eV/Å" ) .. rst-class:: sphx-glr-script-out .. code-block:: none SR: RMSE energies = 0.557 eV SR: RMSE forces = 0.188 eV/Å LR: RMSE energies = 0.158 eV LR: RMSE forces = 0.178 eV/Å .. GENERATED FROM PYTHON SOURCE LINES 339-345 We find that the RMSE of the energy and the force of the LR model is smaller compared to the SR model. From this we conclude that the LR model performs better for the selection of the dataset. We additionally, can plot of the binding energy as a function of the distance. For the plot we select some properties from the dataset .. GENERATED FROM PYTHON SOURCE LINES 346-352 .. code-block:: Python dist = np.array([f.info["distance"] for f in frames]) energies = np.array([f.info["energy"] for f in frames]) monomer_energies = np.array([f.info["energyA"] + f.info["energyB"] for f in frames]) .. GENERATED FROM PYTHON SOURCE LINES 353-354 and select only the indices corresponding to our test set. .. GENERATED FROM PYTHON SOURCE LINES 358-359 Next we calculate the predicted SR and LR ``TensorMaps``. .. GENERATED FROM PYTHON SOURCE LINES 360-365 .. code-block:: Python y_sr_pred = clf_sr.predict(X_sr) y_lr_pred = clf_lr.predict(X_lr) .. GENERATED FROM PYTHON SOURCE LINES 366-367 And, finally perform the plot. .. GENERATED FROM PYTHON SOURCE LINES 368-396 .. code-block:: Python plt.scatter( dist, y.block().values[:, 0] - monomer_energies, label="target data", color="black" ) plt.scatter( dist, y_sr_pred.block().values[:, 0] - monomer_energies, label="short range model", marker="x", ) plt.scatter( dist, y_lr_pred.block().values[:, 0] - monomer_energies, label="long range model", marker="s", facecolor="None", edgecolor="orange", ) plt.xlabel("center of mass distance in Å") plt.ylabel(r"$E - E_\mathrm{monomer}$ in eV") plt.axvline(r_cut, c="red", label=r"$r_\mathrm{train}$") plt.legend() plt.tight_layout() plt.show() .. image-sg:: /examples/lode-linear/images/sphx_glr_lode-linear_001.png :alt: lode linear :srcset: /examples/lode-linear/images/sphx_glr_lode-linear_001.png :class: sphx-glr-single-img .. _sphx_glr_download_examples_lode-linear_lode-linear.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download :download:`Download Conda environment file: environment.yml ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: lode-linear.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: lode-linear.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: lode-linear.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_