diff --git a/main/search/search_index.json b/main/search/search_index.json index 8b05bf4..492d533 100644 --- a/main/search/search_index.json +++ b/main/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to ElementEmbeddings","text":"
This site contains the project documentation for the ElementEmbeddings
package which provides tools and examples of analysing and visualising elemental representation data.
The documentation consists of the following six parts:
Analyse elemental representation data.
Modules exported by this package:
elementembeddings.core
: Provides the Embedding
class.elementembeddings.composition
: Tools to featurise compositions.elementembeddings.plotter
: Tools to plot embeddings.Utility functions for elementembeddings.
Modules exported in elementembeddings.utils
:
elementembeddings.utils.config
: Provides configuration settings and constants.elementembeddings.utils.io
: Tools to read and write data.elementembeddings.utils.math
: Tools for mathematical operations.elementembeddings.utils.species
: Tools to handle ionic species.:docstring: :members:
"},{"location":"about/","title":"About the ElementEmbeddings package","text":"The Element Embeddings package provides high-level tools for analysing elemental embeddings data. This primarily involves visualising the correlation between embedding schemes using different statistical measures.
Machine learning approaches for materials informatics have become increasingly widespread. Some of these involve the use of deep learning techniques where the representation of the elements is learned rather than specified by the user of the model. While an important goal of machine learning training is to minimise the chosen error function to make more accurate predictions, it is also important for us material scientists to be able to interpret these models. As such, we aim to evaluate and compare different atomic embedding schemes in a consistent framework.
"},{"location":"about/#developer","title":"Developer","text":"H. Park et al, \"Mapping inorganic crystal chemical space\" Faraday Discuss. (2024)
A. Onwuli et al, \"Element similarity in high-dimensional materials representations\" Digital Discovery 2, 1558 (2023)
"},{"location":"contribution/","title":"Contributing","text":"This is a quick guide on how to follow best practice and contribute smoothly to ElementEmbeddings
.
We are always looking for ways to make ElementEmbeddings
better and a more useful to a wider community. For making contributions, use the \"Fork and Pull\" workflow to make contributions and stick as closely as possible to the following:
The steps required to add a new representation scheme are:
DEFAULT_ELEMENT_EMBEDDINGS
and CITATIONS
.We follow the [GitHub flow] (https://guides.github.com/introduction/flow/index.html), using branches for new work and pull requests for verifying the work.
The steps for a new piece of work can be summarised as follows:
For a general overview of using pull requests on GitHub look in the GitHub docs.
When creating a pull request you should:
Recommended reading: How to Write the Perfect Pull Request
"},{"location":"contribution/#dev-requirements","title":"Dev requirements","text":"When developing locally, it is recommended to install the python packages in requirements-dev.txt
.
pip install -r requirements-dev.txt\n
This will allow you to run the tests locally with pytest as described in the main README, as well as run pre-commit hooks to automatically format python files with isort and black. To install the pre-commit hooks (only needs to be done once):
pre-commit install\npre-commit run --all-files # optionally run hooks on all files\n
Pre-commit hooks will check all files when you commit changes, automatically fixing any files which are not formatted correctly. Those files will need to be staged again before re-attempting the commit.
"},{"location":"contribution/#bug-reports-feature-requests-and-questions","title":"Bug reports, feature requests and questions","text":"Please use the Issue Tracker to report bugs or request features in the first instance. Contributions are always welcome.
"},{"location":"installation/","title":"Getting Started","text":"The latest stable release can be installed via pip using:
pip install ElementEmbeddings\n
"},{"location":"installation/#developers-installation-optional","title":"Developer's installation (optional)","text":"For development work, ElementEmbeddings
can eb installed from a copy of the source repository; this is preferred if using experimental code branches.
To clone the project from Github and make a local installation:
git clone https://github.com/WMD-group/ElementEmbeddings.git\ncd ElementEmbeddings\npip install -e .\n
With -e
, pip will create links to the source folder so that the changes to the code will be reflected on the PATH.
Here we will demonstrate how to use some of ElementEmbeddings
's features. For full worked examples of using the package, please refer to the Jupyter notebooks in the examples section of the Github repo.
The Embedding
class lies at the heart of the package. It handles elemental representation data and enables analysis and visualisation.
For simple usage, you can instantiate an Embedding object using one of the embeddings in the data directory. For this example, let's use the magpie elemental representation.
# Import the class\nfrom elementembeddings.core import Embedding\n\n# Load the magpie data\nmagpie = Embedding.load_data(\"magpie\")\n
We can access some of the properties of the Embedding
class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.
# Print out some of the properties of the ElementEmbeddings class\nprint(f\"The magpie representation has embeddings of dimension {magpie.dim}\")\nprint(\n f\"The magpie representation contains these elements: \\n {magpie.element_list}\"\n) # prints out all the elements considered for this representation\nprint(\n f\"The magpie representation contains these features: \\n {magpie.feature_labels}\"\n) # Prints out the feature labels of the chosen representation\n\n# The magpie representation has embeddings of dimension 22\n# The magpie representation contains these elements:\n[\n \"H\",\n \"He\",\n \"Li\",\n \"Be\",\n \"B\",\n \"C\",\n \"N\",\n \"O\",\n \"F\",\n \"Ne\",\n \"Na\",\n \"Mg\",\n \"Al\",\n \"Si\",\n \"P\",\n \"S\",\n \"Cl\",\n \"Ar\",\n \"K\",\n \"Ca\",\n \"Sc\",\n \"Ti\",\n \"V\",\n \"Cr\",\n \"Mn\",\n \"Fe\",\n \"Co\",\n \"Ni\",\n \"Cu\",\n \"Zn\",\n \"Ga\",\n \"Ge\",\n \"As\",\n \"Se\",\n \"Br\",\n \"Kr\",\n \"Rb\",\n \"Sr\",\n \"Y\",\n \"Zr\",\n \"Nb\",\n \"Mo\",\n \"Tc\",\n \"Ru\",\n \"Rh\",\n \"Pd\",\n \"Ag\",\n \"Cd\",\n \"In\",\n \"Sn\",\n \"Sb\",\n \"Te\",\n \"I\",\n \"Xe\",\n \"Cs\",\n \"Ba\",\n \"La\",\n \"Ce\",\n \"Pr\",\n \"Nd\",\n \"Pm\",\n \"Sm\",\n \"Eu\",\n \"Gd\",\n \"Tb\",\n \"Dy\",\n \"Ho\",\n \"Er\",\n \"Tm\",\n \"Yb\",\n \"Lu\",\n \"Hf\",\n \"Ta\",\n \"W\",\n \"Re\",\n \"Os\",\n \"Ir\",\n \"Pt\",\n \"Au\",\n \"Hg\",\n \"Tl\",\n \"Pb\",\n \"Bi\",\n \"Po\",\n \"At\",\n \"Rn\",\n \"Fr\",\n \"Ra\",\n \"Ac\",\n \"Th\",\n \"Pa\",\n \"U\",\n \"Np\",\n \"Pu\",\n \"Am\",\n \"Cm\",\n \"Bk\",\n]\n# The magpie representation contains these features:\n[\n \"Number\",\n \"MendeleevNumber\",\n \"AtomicWeight\",\n \"MeltingT\",\n \"Column\",\n \"Row\",\n \"CovalentRadius\",\n \"Electronegativity\",\n \"NsValence\",\n \"NpValence\",\n \"NdValence\",\n \"NfValence\",\n \"NValence\",\n \"NsUnfilled\",\n \"NpUnfilled\",\n \"NdUnfilled\",\n \"NfUnfilled\",\n \"NUnfilled\",\n \"GSvolume_pa\",\n \"GSbandgap\",\n \"GSmagmom\",\n \"SpaceGroupNumber\",\n]\n
"},{"location":"tutorials/#plotting","title":"Plotting","text":"We can quickly generate heatmaps of distance/similarity measures between the element vectors using heatmap_plotter
and plot the representations in two dimensions using the dimension_plotter
from the plotter module. Before we do that, we will standardise the embedding using the standardise
method available to the Embedding class
from elementembeddings.plotter import heatmap_plotter, dimension_plotter\nimport matplotlib.pyplot as plt\n\nmagpie.standardise(inplace=True) # Standardises the representation\n\nfig, ax = plt.subplots(1, 1, figsize=(6, 6))\nheatmap_params = {\"vmin\": -1, \"vmax\": 1}\nheatmap_plotter(\n embedding=magpie,\n metric=\"cosine_similarity\",\n show_axislabels=False,\n cmap=\"Blues_r\",\n ax=ax,\n **heatmap_params\n)\nax.set_title(\"Magpie cosine similarities\")\nfig.tight_layout()\nfig.show()\n
fig, ax = plt.subplots(1, 1, figsize=(6, 6))\n\nreducer_params = {\"n_neighbors\": 30, \"random_state\": 42}\nscatter_params = {\"s\": 100}\n\ndimension_plotter(\n embedding=magpie,\n reducer=\"umap\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n reducer_params=reducer_params,\n scatter_params=scatter_params,\n)\nax.set_title(\"Magpie UMAP (n_neighbours=30)\")\nax.legend().remove()\nhandles, labels = ax1.get_legend_handles_labels()\nfig.legend(handles, labels, bbox_to_anchor=(1.25, 0.5), loc=\"center right\", ncol=1)\n\nfig.tight_layout()\nfig.show()\n
"},{"location":"tutorials/#compositions","title":"Compositions","text":"The package can also be used to featurise compositions. Your data could be a list of formula strings or a pandas dataframe of the following format:
formula CsPbI3 Fe2O3 NaCl ZnSThe composition_featuriser
function can be used to featurise the data. The compositions can be featurised using different representation schemes and different types of pooling through the embedding
and stats
arguments respectively.
from elementembeddings.composition import composition_featuriser\n\ndf_featurised = composition_featuriser(df, embedding=\"magpie\", stats=[\"mean\", \"sum\"])\n\ndf_featurised\n
formula mean_Number mean_MendeleevNumber mean_AtomicWeight mean_MeltingT mean_Column mean_Row mean_CovalentRadius mean_Electronegativity mean_NsValence mean_NpValence mean_NdValence mean_NfValence mean_NValence mean_NsUnfilled mean_NpUnfilled mean_NdUnfilled mean_NfUnfilled mean_NUnfilled mean_GSvolume_pa mean_GSbandgap mean_GSmagmom mean_SpaceGroupNumber sum_Number sum_MendeleevNumber sum_AtomicWeight sum_MeltingT sum_Column sum_Row sum_CovalentRadius sum_Electronegativity sum_NsValence sum_NpValence sum_NdValence sum_NfValence sum_NValence sum_NsUnfilled sum_NpUnfilled sum_NdUnfilled sum_NfUnfilled sum_NUnfilled sum_GSvolume_pa sum_GSbandgap sum_GSmagmom sum_SpaceGroupNumber CsPbI3 59.2 74.8 144.16377238 412.55 13.2 5.4 161.39999999999998 2.22 1.8 3.4 8.0 2.8000000000000003 16.0 0.2 1.4 0.0 0.0 1.6 54.584 0.6372 0.0 129.20000000000002 296.0 374.0 720.8188619 2062.75 66.0 27.0 807.0 11.100000000000001 9.0 17.0 40.0 14.0 80.0 1.0 7.0 0.0 0.0 8.0 272.92 3.186 0.0 646.0 Fe2O3 15.2 74.19999999999999 31.937640000000002 757.2800000000001 12.8 2.8 92.4 2.7960000000000003 2.0 2.4 2.4000000000000004 0.0 6.8 0.0 1.2 1.6 0.0 2.8 9.755 0.0 0.8442651200000001 98.80000000000001 76.0 371.0 159.6882 3786.4 64.0 14.0 462.0 13.98 10.0 12.0 12.0 0.0 34.0 0.0 6.0 8.0 0.0 14.0 48.775000000000006 0.0 4.2213256 494.0 NaCl 14.0 48.0 29.221384640000004 271.235 9.0 3.0 134.0 2.045 1.5 2.5 0.0 0.0 4.0 0.5 0.5 0.0 0.0 1.0 26.87041666665 1.2465 0.0 146.5 28.0 96.0 58.44276928000001 542.47 18.0 6.0 268.0 4.09 3.0 5.0 0.0 0.0 8.0 1.0 1.0 0.0 0.0 2.0 53.7408333333 2.493 0.0 293.0 ZnS 23.0 78.5 48.7225 540.52 14.0 3.5 113.5 2.115 2.0 2.0 5.0 0.0 9.0 0.0 1.0 0.0 0.0 1.0 19.8734375 1.101 0.0 132.0 46.0 157.0 97.445 1081.04 28.0 7.0 227.0 4.23 4.0 4.0 10.0 0.0 18.0 0.0 2.0 0.0 0.0 2.0 39.746875 2.202 0.0 264.0 The returned dataframe contains the mean- and sum-pooled features of the magpie representation for the four formulas.
"},{"location":"embeddings/element/","title":"Elemental Embeddings","text":"The data contained in this repository are a collection of various elemental representation/embedding schemes. We provide the literature source for these representations as well as the data source for which the files were obtained. Some representations have been obtained from the following repositories:
For the linear/scalar representations, the Embedding
class will load these representations as one-hot vectors where the vector components are ordered following the scale (i.e. the atomic
representation is ordered by atomic numbers).
The following paper describes the details of the modified Pettifor chemical scale: The optimal one-dimensional periodic table: a modified Pettifor chemical scale from data mining
Data source
"},{"location":"embeddings/element/#atomic-numbers","title":"Atomic numbers","text":"We included atomic
as a linear representation to generate one-hot vectors corresponding to the atomic numbers
The following representations are all vector representations (some are local, some are distributed) and the Embedding
class will load these representations as they are.
The following paper describes the implementation of the composition graph neural fingerprint (cgnf) from the node embedding vectors of a pre-trained crystal graph convolution neural network: Synthesizability of materials stoichiometry using semi-supervised learning
Data source
"},{"location":"embeddings/element/#crystallm","title":"crystallm","text":"The following paper describes the details behind the generative crystal structure model based on a large language model: Crystal Structure Generation with Autoregressive Large Language Modeling
"},{"location":"embeddings/element/#magpie","title":"magpie","text":"The following paper describes the details of the Materials Agnostic Platform for Informatics and Exploration (Magpie) framework: A general-purpose machine learning framework for predicting properties of inorganic materials
The source code for Magpie can be found here
Data source
The 22 dimensional embedding vector includes the following elemental properties:
Click to see the 22 properties - Number; - Mendeleev number; - Atomic weight; - Melting temperature; - Group number; - Period; - Covalent Radius; - Electronegativity; - no. of s, p, d, f valence electrons (4 features); - no. of valence electrons; - no. of unfilled: s, p, d, f orbitals (4 features), - no. of unfilled orbtials - GSvolume_pa (DFT volume per atom of T=0K ground state from the OQMD) - GSbandgap(DFT bandgap energy of T=0K ground state from the OQMD) - GSmagmom (DFT magnetic moment of T=0K ground state from the OQMD) - Space Group Numbermagpie_sc
is a scaled version of the magpie embeddings. Data sourceThe following paper describes the implementation of mat2vec: Unsupervised word embeddings capture latent knowledge from materials science literature
Data source
"},{"location":"embeddings/element/#matscholar","title":"matscholar","text":"The following paper describes the natural language processing implementation of Materials Scholar (matscholar): Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature
Data source
"},{"location":"embeddings/element/#megnet","title":"megnet","text":"The following paper describes the details of the construction of the MatErials Graph Network (MEGNet): Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. The 16 dimensional vectors are drawn from the atomic weights of a model trained to predict the formation energies of crystalline materials.
Data source
"},{"location":"embeddings/element/#oliynyk","title":"oliynyk","text":"The following paper describes the details: High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds
Data source
The 44 features of the embedding vector are formed of the following properties:
Click to see the 44 features! - Number - Atomic_Weight - Period - Group - Families - Metal - Nonmetal - Metalliod - Mendeleev_Number - l_quantum_number - Atomic_Radius - Miracle*Radius*[pm] - Covalent_Radius - Zunger_radii_sum - Ionic_radius - crystal_radius - Pauling_Electronegativity - MB_electonegativity - Gordy_electonegativity - Mulliken_EN - Allred-Rockow_electronegativity - Metallic_valence - Number_of_valence_electrons - Gilmor_number_of_valence_electron - valence_s - valence_p - valence_d - valence_f - Number_of_unfilled_s_valence_electrons - Number_of_unfilled_p_valence_electrons - Number_of_unfilled_d_valence_electrons - Number_of_unfilled_f_valence_electrons - Outer_shell_electrons - 1st*ionization_potential*(kJ/mol) - Polarizability(A^3) - Melting*point*(K) - Boiling*Point*(K) - Density\\_(g/mL) - Specific*heat*(J/g*K)* - Heat*of_fusion*(kJ/mol)\\_ - Heat*of_vaporization*(kJ/mol)\\_ - Thermal*conductivity*(W/(m*K))* - Heat_atomization(kJ/mol) - Cohesive_energyoliynyk_sc
is a scaled version of the oliynyk embeddings: Data sourceThis is a set of 200-dimensional vectors in which the components are randomly generated
The 118 200-dimensional vectors in random_200_new
were generated using the following code:
import numpy as np\n\nmu, sigma = 0, 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)\ns = np.random.default_rng(seed=42).normal(mu, sigma, (118, 200))\n
"},{"location":"embeddings/element/#skipatom","title":"skipatom","text":"The following paper describes the details: Distributed representations of atoms and materials for machine learning
Data source
"},{"location":"embeddings/element/#xenonpy","title":"xenonpy","text":"The XenonPy embedding uses the 58 features which are commonly used in publications that use the XenonPy package. See the following publications:
The ElementEmebddings
package is distributed with a number of element and ionic species embedding schemes. These schemes are used to represent elements and ionic species in a high-dimensional space. The schemes are stored in the ElementEmbeddings
package and can be accessed using the ElementEmbeddings
API.
Element Embeddings
Species Embeddings
"},{"location":"embeddings/species/","title":"Species Embeddings","text":"The ElementEmbeddings
package has been expanded to incorporate representation of ionic species. We provide the literature source for these representations as well as the data source for which the files were obtained.
The following paper describes the details of how the SkipSpecies embeddings were developed.
Ionic species representations for materials informatics
Data Source
"},{"location":"python_api/composition/","title":"Composition module","text":"This module provides a class for handling compositional embeddings.
Typical usage exampleFe2O3_magpie = CompositionalEmbedding(\"Fe2O3\", \"magpie\")
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding","title":"CompositionalEmbedding
","text":"Class to handle compositional embeddings.
formula (str): A string formula e.g. CsPbI3, Li7La3Zr2O12\nembedding (Union[str, Embedding]): Either a string name of the embedding\nor an Embedding instance\nx (int, optional): The non-stoichiometric amount.\n
Source code in src/elementembeddings/composition.py
class CompositionalEmbedding:\n \"\"\"Class to handle compositional embeddings.\n\n Args:\n ----\n formula (str): A string formula e.g. CsPbI3, Li7La3Zr2O12\n embedding (Union[str, Embedding]): Either a string name of the embedding\n or an Embedding instance\n x (int, optional): The non-stoichiometric amount.\n \"\"\"\n\n def __init__(self, formula: str, embedding: str | Embedding, x=1) -> None:\n \"\"\"Initialise a CompositionalEmbedding instance.\"\"\"\n self.embedding = embedding\n\n # If a string has been passed for embedding, create an Embedding instance\n if isinstance(embedding, str):\n self.embedding = Embedding.load_data(embedding)\n\n self.embedding_name: str = self.embedding.embedding_name\n # Set an attribute for the formula\n self.formula = formula\n\n # Set an attribute for the comp dict\n comp_dict = formula_parser(self.formula)\n self._natoms = 0\n for v in comp_dict.values():\n if v < 0:\n msg = \"Formula cannot contain negative amounts of elements\"\n raise ValueError(msg)\n self._natoms += abs(v)\n\n self.composition = comp_dict\n\n # Set an attribute for the element list\n self.element_list = list(self.composition.keys())\n # Set an attribute for the element matrix\n self.el_matrix = np.zeros(\n shape=(len(self.composition), len(self.embedding.embeddings[\"H\"])),\n )\n for i, k in enumerate(self.composition.keys()):\n self.el_matrix[i] = self.embedding.embeddings[k]\n self.el_matrix = np.nan_to_num(self.el_matrix)\n\n # Set an attribute for the stoichiometric vector\n self.stoich_vector = np.array(list(self.composition.values()))\n\n # Set an attribute for the normalised stoichiometric vector\n self.norm_stoich_vector = self.stoich_vector / self._natoms\n\n @property\n def fractional_composition(self):\n \"\"\"Fractional composition of the Composition.\"\"\"\n return _get_fractional_composition(self.formula)\n\n @property\n def num_atoms(self) -> float:\n \"\"\"Total number of atoms in Composition.\"\"\"\n return self._natoms\n\n def as_dict(self) -> dict:\n # TO-DO: Need to create a dict representation for the embedding class\n \"\"\"Return the CompositionalEmbedding class as a dict.\"\"\"\n return {\n \"formula\": self.formula,\n \"composition\": self.composition,\n \"fractional_composition\": self.fractional_composition,\n }\n\n def _mean_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a weighted mean feature vector based of the embedding.\n\n The dimension of the feature vector is the same as the embedding.\n\n \"\"\"\n return np.dot(self.norm_stoich_vector, self.el_matrix)\n\n def _variance_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a weighted variance feature vector.\"\"\"\n diff_matrix = self.el_matrix - self._mean_feature_vector()\n\n diff_matrix = diff_matrix**2\n return np.dot(self.norm_stoich_vector, diff_matrix)\n\n def _minpool_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a min pooled feature vector.\"\"\"\n return np.min(self.el_matrix, axis=0)\n\n def _maxpool_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a max pooled feature vector.\"\"\"\n return np.max(self.el_matrix, axis=0)\n\n def _range_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a range feature vector.\"\"\"\n return np.ptp(self.el_matrix, axis=0)\n\n def _sum_feature_vector(self) -> np.ndarray:\n \"\"\"Compute the weighted sum feature vector.\"\"\"\n return np.dot(self.stoich_vector, self.el_matrix)\n\n def _geometric_mean_feature_vector(self) -> np.ndarray:\n \"\"\"Compute the geometric mean feature vector.\"\"\"\n return np.exp(np.dot(self.norm_stoich_vector, np.log(self.el_matrix)))\n\n def _harmonic_mean_feature_vector(self) -> np.ndarray:\n \"\"\"Compute the harmonic mean feature vector.\"\"\"\n return np.reciprocal(\n np.dot(self.norm_stoich_vector, np.reciprocal(self.el_matrix)),\n )\n\n _stats_functions_dict: ClassVar = {\n \"mean\": \"_mean_feature_vector\",\n \"variance\": \"_variance_feature_vector\",\n \"minpool\": \"_minpool_feature_vector\",\n \"maxpool\": \"_maxpool_feature_vector\",\n \"range\": \"_range_feature_vector\",\n \"sum\": \"_sum_feature_vector\",\n \"geometric_mean\": \"_geometric_mean_feature_vector\",\n \"harmonic_mean\": \"_harmonic_mean_feature_vector\",\n }\n\n def feature_vector(self, stats: str | list = \"mean\"):\n \"\"\"Compute a feature vector.\n\n The feature vector is a concatenation of\n the statistics specified in the stats argument.\n\n Args:\n ----\n stats (list): A list of strings specifying the statistics to be computed.\n The default is ['mean'].\n\n Returns:\n -------\n np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n \"\"\"\n implemented_stats = [\n \"mean\",\n \"variance\",\n \"minpool\",\n \"maxpool\",\n \"range\",\n \"sum\",\n \"geometric_mean\",\n \"harmonic_mean\",\n ]\n if isinstance(stats, str):\n stats = [stats]\n if not all(s in implemented_stats for s in stats):\n msg = f\" {[stat for stat in stats if stat not in implemented_stats]} \" f\"are not valid statistics.\"\n raise ValueError(\n msg,\n )\n feature_vector = []\n for s in stats:\n feature_vector.append(getattr(self, self._stats_functions_dict[s])())\n return np.concatenate(feature_vector)\n\n def distance(\n self,\n comp_other,\n distance_metric: str = \"euclidean\",\n stats: str | list[str] = \"mean\",\n ):\n \"\"\"Compute the distance between two compositions.\n\n Args:\n ----\n comp_other (Union[str, CompositionalEmbedding]): The other composition.\n distance_metric (str): The metric to be used. The default is 'euclidean'.\n stats (Union[str, list], optional): A list of statistics to be computed.\n\n Returns:\n -------\n float: The distance between the two CompositionalEmbedding objects.\n \"\"\"\n if isinstance(comp_other, str):\n comp_other = CompositionalEmbedding(comp_other, self.embedding)\n if not isinstance(comp_other, CompositionalEmbedding):\n msg = \"comp_other must be a string or a CompositionalEmbedding object.\"\n raise TypeError(\n msg,\n )\n if self.embedding_name != comp_other.embedding_name:\n msg = \"The two CompositionalEmbedding objects must have the same embedding.\"\n raise TypeError(\n msg,\n )\n return _composition_distance(\n self,\n comp_other,\n self.embedding,\n distance_metric,\n stats,\n )\n\n def __repr__(self) -> str:\n return f\"CompositionalEmbedding(formula={self.formula}, \" f\"embedding={self.embedding_name})\"\n\n def __str__(self) -> str:\n return f\"CompositionalEmbedding(formula={self.formula}, \" f\"embedding={self.embedding_name})\"\n\n def __eq__(self, other):\n if isinstance(other, self.__class__):\n return self.formula == other.formula and self.embedding_name == other.embedding_name\n else:\n return False\n\n def __ne__(self, other):\n return not self.__eq__(other)\n\n def __hash__(self):\n return hash((self.formula, self.embedding))\n
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.fractional_composition","title":"fractional_composition
property
","text":"Fractional composition of the Composition.
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.num_atoms","title":"num_atoms: float
property
","text":"Total number of atoms in Composition.
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.__init__","title":"__init__(formula, embedding, x=1)
","text":"Initialise a CompositionalEmbedding instance.
Source code insrc/elementembeddings/composition.py
def __init__(self, formula: str, embedding: str | Embedding, x=1) -> None:\n \"\"\"Initialise a CompositionalEmbedding instance.\"\"\"\n self.embedding = embedding\n\n # If a string has been passed for embedding, create an Embedding instance\n if isinstance(embedding, str):\n self.embedding = Embedding.load_data(embedding)\n\n self.embedding_name: str = self.embedding.embedding_name\n # Set an attribute for the formula\n self.formula = formula\n\n # Set an attribute for the comp dict\n comp_dict = formula_parser(self.formula)\n self._natoms = 0\n for v in comp_dict.values():\n if v < 0:\n msg = \"Formula cannot contain negative amounts of elements\"\n raise ValueError(msg)\n self._natoms += abs(v)\n\n self.composition = comp_dict\n\n # Set an attribute for the element list\n self.element_list = list(self.composition.keys())\n # Set an attribute for the element matrix\n self.el_matrix = np.zeros(\n shape=(len(self.composition), len(self.embedding.embeddings[\"H\"])),\n )\n for i, k in enumerate(self.composition.keys()):\n self.el_matrix[i] = self.embedding.embeddings[k]\n self.el_matrix = np.nan_to_num(self.el_matrix)\n\n # Set an attribute for the stoichiometric vector\n self.stoich_vector = np.array(list(self.composition.values()))\n\n # Set an attribute for the normalised stoichiometric vector\n self.norm_stoich_vector = self.stoich_vector / self._natoms\n
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.as_dict","title":"as_dict()
","text":"Return the CompositionalEmbedding class as a dict.
Source code insrc/elementembeddings/composition.py
def as_dict(self) -> dict:\n # TO-DO: Need to create a dict representation for the embedding class\n \"\"\"Return the CompositionalEmbedding class as a dict.\"\"\"\n return {\n \"formula\": self.formula,\n \"composition\": self.composition,\n \"fractional_composition\": self.fractional_composition,\n }\n
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.distance","title":"distance(comp_other, distance_metric='euclidean', stats='mean')
","text":"Compute the distance between two compositions.
comp_other (Union[str, CompositionalEmbedding]): The other composition.\ndistance_metric (str): The metric to be used. The default is 'euclidean'.\nstats (Union[str, list], optional): A list of statistics to be computed.\n
float: The distance between the two CompositionalEmbedding objects.\n
Source code in src/elementembeddings/composition.py
def distance(\n self,\n comp_other,\n distance_metric: str = \"euclidean\",\n stats: str | list[str] = \"mean\",\n):\n \"\"\"Compute the distance between two compositions.\n\n Args:\n ----\n comp_other (Union[str, CompositionalEmbedding]): The other composition.\n distance_metric (str): The metric to be used. The default is 'euclidean'.\n stats (Union[str, list], optional): A list of statistics to be computed.\n\n Returns:\n -------\n float: The distance between the two CompositionalEmbedding objects.\n \"\"\"\n if isinstance(comp_other, str):\n comp_other = CompositionalEmbedding(comp_other, self.embedding)\n if not isinstance(comp_other, CompositionalEmbedding):\n msg = \"comp_other must be a string or a CompositionalEmbedding object.\"\n raise TypeError(\n msg,\n )\n if self.embedding_name != comp_other.embedding_name:\n msg = \"The two CompositionalEmbedding objects must have the same embedding.\"\n raise TypeError(\n msg,\n )\n return _composition_distance(\n self,\n comp_other,\n self.embedding,\n distance_metric,\n stats,\n )\n
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.feature_vector","title":"feature_vector(stats='mean')
","text":"Compute a feature vector.
The feature vector is a concatenation of the statistics specified in the stats argument.
stats (list): A list of strings specifying the statistics to be computed.\nThe default is ['mean'].\n
np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n
Source code in src/elementembeddings/composition.py
def feature_vector(self, stats: str | list = \"mean\"):\n \"\"\"Compute a feature vector.\n\n The feature vector is a concatenation of\n the statistics specified in the stats argument.\n\n Args:\n ----\n stats (list): A list of strings specifying the statistics to be computed.\n The default is ['mean'].\n\n Returns:\n -------\n np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n \"\"\"\n implemented_stats = [\n \"mean\",\n \"variance\",\n \"minpool\",\n \"maxpool\",\n \"range\",\n \"sum\",\n \"geometric_mean\",\n \"harmonic_mean\",\n ]\n if isinstance(stats, str):\n stats = [stats]\n if not all(s in implemented_stats for s in stats):\n msg = f\" {[stat for stat in stats if stat not in implemented_stats]} \" f\"are not valid statistics.\"\n raise ValueError(\n msg,\n )\n feature_vector = []\n for s in stats:\n feature_vector.append(getattr(self, self._stats_functions_dict[s])())\n return np.concatenate(feature_vector)\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding","title":"SpeciesCompositionalEmbedding
","text":"Class to handle species compositional embeddings.
formula_dict (dict): A dictionary of the form {species: amount}\nembedding (Union[str, SpeciesEmbedding]): Either a string name of the embedding\nor an SpeciesEmbedding instance\nx (int, optional): The non-stoichiometric amount.\n
Source code in src/elementembeddings/composition.py
class SpeciesCompositionalEmbedding:\n \"\"\"Class to handle species compositional embeddings.\n\n Args:\n ----\n formula_dict (dict): A dictionary of the form {species: amount}\n embedding (Union[str, SpeciesEmbedding]): Either a string name of the embedding\n or an SpeciesEmbedding instance\n x (int, optional): The non-stoichiometric amount.\n \"\"\"\n\n def __init__(self, formula_dict: dict, embedding: str | SpeciesEmbedding, x=1) -> None:\n \"\"\"Initialise a SpeciesCompositionalEmbedding instance.\"\"\"\n self.embedding = embedding\n\n # If a string has been passed for embedding, create an Embedding instance\n if isinstance(embedding, str):\n self.embedding = SpeciesEmbedding.load_data(embedding)\n\n self.embedding_name: str = self.embedding.embedding_name\n\n # Set an attribute for the comp dict\n self.composition = formula_dict\n\n # Set an attribute for the number of atoms\n self._natoms = 0\n for v in self.composition.values():\n if v < 0:\n msg = \"Formula cannot contain negative amounts of elements\"\n raise ValueError(msg)\n self._natoms += abs(v)\n\n # Set an attribute for the species list\n self.species_list = list(self.composition.keys())\n\n # Set an attribute for the element list\n self.element_list = list({parse_species(sp)[0] for sp in self.species_list})\n # Set an attribute for the species matrix\n self.species_matrix = np.zeros(\n shape=(len(self.composition), len(self.embedding.embeddings[\"Zn2+\"])),\n )\n for i, k in enumerate(self.composition.keys()):\n self.species_matrix[i] = self.embedding.embeddings[k]\n self.species_matrix = np.nan_to_num(self.species_matrix)\n\n # Set an attribute for the stoichiometric vector\n self.stoich_vector = np.array(list(self.composition.values()))\n\n # Set an attribute for the normalised stoichiometric vector\n self.norm_stoich_vector = self.stoich_vector / np.sum(self.stoich_vector)\n\n @property\n def num_atoms(self) -> float:\n \"\"\"Total number of atoms in Composition.\"\"\"\n return self._natoms\n\n def get_el_amt_dict(self) -> dict:\n \"\"\"\n Return the composition as dictionary of element symbol : stoichiometry.\n\n e.g. {\"Fe2+\":1, \"Fe3+\":2, \"O2-\": 4} -> {\"Fe\":3, \"O\":4}.\n \"\"\"\n dct: dict[str, float] = collections.defaultdict(float)\n for sp, stoich in self.composition.items():\n el = parse_species(sp)[0]\n dct[el] += stoich\n return dct\n\n @property\n def formula_pretty(self) -> str:\n \"\"\"Return the pretty formula of the composition.\"\"\"\n els_amt_dict = self.get_el_amt_dict()\n els = sorted(els_amt_dict, key=lambda el: X[el])\n formula = [f\"{el}{self._stoich_formatter(els_amt_dict[el])}\" for el in els]\n return \"\".join(formula)\n\n def _stoich_formatter(self, stoich: float, tol: float = 1e-8) -> str:\n \"\"\"Return the stoichiometry as a string.\"\"\"\n if stoich == 1:\n return \"\"\n if abs(stoich - int(stoich)) < tol:\n return str(int(stoich))\n return str(round(stoich, 8))\n\n def as_dict(self) -> dict:\n # TO-DO: Need to create a dict representation for the embedding class\n \"\"\"Return the SpeciesCompositionalEmbedding class as a dict.\"\"\"\n return {\n \"composition\": self.composition,\n }\n\n @property\n def fractional_composition(self):\n \"\"\"Fractional composition of the Composition.\"\"\"\n return {k: v / self._natoms for k, v in self.composition.items()}\n\n def _mean_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a weighted mean feature vector based of the embedding.\n\n The dimension of the feature vector is the same as the embedding.\n\n \"\"\"\n return np.dot(self.norm_stoich_vector, self.species_matrix)\n\n def _variance_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a weighted variance feature vector.\"\"\"\n diff_matrix = self.species_matrix - self._mean_feature_vector()\n\n diff_matrix = diff_matrix**2\n return np.dot(self.norm_stoich_vector, diff_matrix)\n\n def _minpool_feature_vector(self) -> np.ndarray:\n return np.min(self.species_matrix, axis=0)\n\n def _maxpool_feature_vector(self) -> np.ndarray:\n return np.max(self.species_matrix, axis=0)\n\n def _range_feature_vector(self) -> np.ndarray:\n return np.ptp(self.species_matrix, axis=0)\n\n def _sum_feature_vector(self) -> np.ndarray:\n return np.dot(self.stoich_vector, self.species_matrix)\n\n def _geometric_mean_feature_vector(self) -> np.ndarray:\n return np.exp(np.dot(self.norm_stoich_vector, np.log(self.species_matrix)))\n\n def _harmonic_mean_feature_vector(self) -> np.ndarray:\n return np.reciprocal(\n np.dot(self.norm_stoich_vector, np.reciprocal(self.species_matrix)),\n )\n\n _stats_functions_dict: ClassVar = {\n \"mean\": \"_mean_feature_vector\",\n \"variance\": \"_variance_feature_vector\",\n \"minpool\": \"_minpool_feature_vector\",\n \"maxpool\": \"_maxpool_feature_vector\",\n \"range\": \"_range_feature_vector\",\n \"sum\": \"_sum_feature_vector\",\n \"geometric_mean\": \"_geometric_mean_feature_vector\",\n \"harmonic_mean\": \"_harmonic_mean_feature_vector\",\n }\n\n def feature_vector(self, stats: str | list = \"mean\"):\n \"\"\"Compute a feature vector.\n\n The feature vector is a concatenation of\n the statistics specified in the stats argument.\n\n Args:\n ----\n stats (list): A list of strings specifying the statistics to be computed.\n The default is ['mean'].\n\n Returns:\n -------\n np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n \"\"\"\n implemented_stats = [\n \"mean\",\n \"variance\",\n \"minpool\",\n \"maxpool\",\n \"range\",\n \"sum\",\n \"geometric_mean\",\n \"harmonic_mean\",\n ]\n if isinstance(stats, str):\n stats = [stats]\n if not all(s in implemented_stats for s in stats):\n msg = f\" {[stat for stat in stats if stat not in implemented_stats]} \" f\"are not valid statistics.\"\n raise ValueError(\n msg,\n )\n feature_vector = []\n for s in stats:\n feature_vector.append(getattr(self, self._stats_functions_dict[s])())\n return np.concatenate(feature_vector)\n\n def distance(\n self,\n comp_other,\n distance_metric: str = \"euclidean\",\n stats: str | list[str] = \"mean\",\n ):\n \"\"\"Compute the distance between two compositions.\n\n Args:\n ----\n comp_other (Union[dict, SpeciesCompositionalEmbedding]):\n The other composition.\n distance_metric (str): The metric to be used. The default is 'euclidean'.\n stats (Union[str, list], optional): A list of statistics to be computed.\n\n Returns:\n -------\n float: The distance between the two SpeciesCompositionalEmbedding objects.\n \"\"\"\n if isinstance(comp_other, dict):\n comp_other = SpeciesCompositionalEmbedding(comp_other, self.embedding)\n if not isinstance(comp_other, SpeciesCompositionalEmbedding):\n msg = \"comp_other must be a dict or a SpeciesCompositionalEmbedding object.\"\n raise TypeError(\n msg,\n )\n if self.embedding_name != comp_other.embedding_name:\n msg = \"\"\"The two SpeciesCompositionalEmbedding\n objects must have the same embedding.\"\"\"\n raise ValueError(\n msg,\n )\n return _species_composition_distance(\n self,\n comp_other,\n self.embedding,\n distance_metric,\n stats,\n )\n\n def __repr__(self) -> str:\n return f\"SpeciesCompositionalEmbedding(formula={self.formula_pretty}, \" f\"embedding={self.embedding_name})\"\n\n def __str__(self) -> str:\n return f\"SpeciesCompositionalEmbedding(formula={self.formula_pretty}, \" f\"embedding={self.embedding_name})\"\n\n def __eq__(self, other):\n if isinstance(other, self.__class__):\n return (\n self.formula_pretty == other.formula_pretty\n and self.embedding_name == other.embedding_name\n and self.composition == other.composition\n )\n else:\n return False\n\n def __ne__(self, other):\n return not self.__eq__(other)\n\n def __hash__(self):\n return hash((self.formula_pretty, self.embedding))\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.formula_pretty","title":"formula_pretty: str
property
","text":"Return the pretty formula of the composition.
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.fractional_composition","title":"fractional_composition
property
","text":"Fractional composition of the Composition.
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.num_atoms","title":"num_atoms: float
property
","text":"Total number of atoms in Composition.
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.__init__","title":"__init__(formula_dict, embedding, x=1)
","text":"Initialise a SpeciesCompositionalEmbedding instance.
Source code insrc/elementembeddings/composition.py
def __init__(self, formula_dict: dict, embedding: str | SpeciesEmbedding, x=1) -> None:\n \"\"\"Initialise a SpeciesCompositionalEmbedding instance.\"\"\"\n self.embedding = embedding\n\n # If a string has been passed for embedding, create an Embedding instance\n if isinstance(embedding, str):\n self.embedding = SpeciesEmbedding.load_data(embedding)\n\n self.embedding_name: str = self.embedding.embedding_name\n\n # Set an attribute for the comp dict\n self.composition = formula_dict\n\n # Set an attribute for the number of atoms\n self._natoms = 0\n for v in self.composition.values():\n if v < 0:\n msg = \"Formula cannot contain negative amounts of elements\"\n raise ValueError(msg)\n self._natoms += abs(v)\n\n # Set an attribute for the species list\n self.species_list = list(self.composition.keys())\n\n # Set an attribute for the element list\n self.element_list = list({parse_species(sp)[0] for sp in self.species_list})\n # Set an attribute for the species matrix\n self.species_matrix = np.zeros(\n shape=(len(self.composition), len(self.embedding.embeddings[\"Zn2+\"])),\n )\n for i, k in enumerate(self.composition.keys()):\n self.species_matrix[i] = self.embedding.embeddings[k]\n self.species_matrix = np.nan_to_num(self.species_matrix)\n\n # Set an attribute for the stoichiometric vector\n self.stoich_vector = np.array(list(self.composition.values()))\n\n # Set an attribute for the normalised stoichiometric vector\n self.norm_stoich_vector = self.stoich_vector / np.sum(self.stoich_vector)\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.as_dict","title":"as_dict()
","text":"Return the SpeciesCompositionalEmbedding class as a dict.
Source code insrc/elementembeddings/composition.py
def as_dict(self) -> dict:\n # TO-DO: Need to create a dict representation for the embedding class\n \"\"\"Return the SpeciesCompositionalEmbedding class as a dict.\"\"\"\n return {\n \"composition\": self.composition,\n }\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.distance","title":"distance(comp_other, distance_metric='euclidean', stats='mean')
","text":"Compute the distance between two compositions.
comp_other (Union[dict, SpeciesCompositionalEmbedding]):\n The other composition.\ndistance_metric (str): The metric to be used. The default is 'euclidean'.\nstats (Union[str, list], optional): A list of statistics to be computed.\n
float: The distance between the two SpeciesCompositionalEmbedding objects.\n
Source code in src/elementembeddings/composition.py
def distance(\n self,\n comp_other,\n distance_metric: str = \"euclidean\",\n stats: str | list[str] = \"mean\",\n):\n \"\"\"Compute the distance between two compositions.\n\n Args:\n ----\n comp_other (Union[dict, SpeciesCompositionalEmbedding]):\n The other composition.\n distance_metric (str): The metric to be used. The default is 'euclidean'.\n stats (Union[str, list], optional): A list of statistics to be computed.\n\n Returns:\n -------\n float: The distance between the two SpeciesCompositionalEmbedding objects.\n \"\"\"\n if isinstance(comp_other, dict):\n comp_other = SpeciesCompositionalEmbedding(comp_other, self.embedding)\n if not isinstance(comp_other, SpeciesCompositionalEmbedding):\n msg = \"comp_other must be a dict or a SpeciesCompositionalEmbedding object.\"\n raise TypeError(\n msg,\n )\n if self.embedding_name != comp_other.embedding_name:\n msg = \"\"\"The two SpeciesCompositionalEmbedding\n objects must have the same embedding.\"\"\"\n raise ValueError(\n msg,\n )\n return _species_composition_distance(\n self,\n comp_other,\n self.embedding,\n distance_metric,\n stats,\n )\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.feature_vector","title":"feature_vector(stats='mean')
","text":"Compute a feature vector.
The feature vector is a concatenation of the statistics specified in the stats argument.
stats (list): A list of strings specifying the statistics to be computed.\nThe default is ['mean'].\n
np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n
Source code in src/elementembeddings/composition.py
def feature_vector(self, stats: str | list = \"mean\"):\n \"\"\"Compute a feature vector.\n\n The feature vector is a concatenation of\n the statistics specified in the stats argument.\n\n Args:\n ----\n stats (list): A list of strings specifying the statistics to be computed.\n The default is ['mean'].\n\n Returns:\n -------\n np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n \"\"\"\n implemented_stats = [\n \"mean\",\n \"variance\",\n \"minpool\",\n \"maxpool\",\n \"range\",\n \"sum\",\n \"geometric_mean\",\n \"harmonic_mean\",\n ]\n if isinstance(stats, str):\n stats = [stats]\n if not all(s in implemented_stats for s in stats):\n msg = f\" {[stat for stat in stats if stat not in implemented_stats]} \" f\"are not valid statistics.\"\n raise ValueError(\n msg,\n )\n feature_vector = []\n for s in stats:\n feature_vector.append(getattr(self, self._stats_functions_dict[s])())\n return np.concatenate(feature_vector)\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.get_el_amt_dict","title":"get_el_amt_dict()
","text":"Return the composition as dictionary of element symbol : stoichiometry.
e.g. {\"Fe2+\":1, \"Fe3+\":2, \"O2-\": 4} -> {\"Fe\":3, \"O\":4}.
Source code insrc/elementembeddings/composition.py
def get_el_amt_dict(self) -> dict:\n \"\"\"\n Return the composition as dictionary of element symbol : stoichiometry.\n\n e.g. {\"Fe2+\":1, \"Fe3+\":2, \"O2-\": 4} -> {\"Fe\":3, \"O\":4}.\n \"\"\"\n dct: dict[str, float] = collections.defaultdict(float)\n for sp, stoich in self.composition.items():\n el = parse_species(sp)[0]\n dct[el] += stoich\n return dct\n
"},{"location":"python_api/composition/#elementembeddings.composition.composition_featuriser","title":"composition_featuriser(data, formula_column='formula', embedding='magpie', stats='mean', inplace=False)
","text":"Compute a feature vector for a composition.
The feature vector is based on the statistics specified in the stats argument.
data (Union[pd.DataFrame, pd.Series, list, CompositionalEmbedding]):\n A pandas DataFrame or Series containing a column named 'formula',\n a list of formula, or a CompositionalEmbedding class\nformula_column (str, optional): The column name containing the formula.\nembedding (Union[Embedding, str], optional): A Embedding class or a string\nstats (Union[str, list], optional): A list of statistics to be computed.\n The default is ['mean'].\ninplace (bool, optional): Whether to perform the operation in place on the data.\n The default is False.\n
Union[pd.DataFrame,list]: A pandas DataFrame containing the feature vector,\nor a list of feature vectors is returned\n
Source code in src/elementembeddings/composition.py
def composition_featuriser(\n data: pd.DataFrame | pd.Series | CompositionalEmbedding | list,\n formula_column: str = \"formula\",\n embedding: Embedding | str = \"magpie\",\n stats: str | list = \"mean\",\n inplace: bool = False,\n) -> pd.DataFrame:\n \"\"\"Compute a feature vector for a composition.\n\n The feature vector is based on the statistics specified\n in the stats argument.\n\n Args:\n ----\n data (Union[pd.DataFrame, pd.Series, list, CompositionalEmbedding]):\n A pandas DataFrame or Series containing a column named 'formula',\n a list of formula, or a CompositionalEmbedding class\n formula_column (str, optional): The column name containing the formula.\n embedding (Union[Embedding, str], optional): A Embedding class or a string\n stats (Union[str, list], optional): A list of statistics to be computed.\n The default is ['mean'].\n inplace (bool, optional): Whether to perform the operation in place on the data.\n The default is False.\n\n Returns:\n -------\n Union[pd.DataFrame,list]: A pandas DataFrame containing the feature vector,\n or a list of feature vectors is returned\n \"\"\"\n if isinstance(stats, str):\n stats = [stats]\n if isinstance(data, pd.Series):\n data = data.to_frame(name=\"formula\")\n if isinstance(data, pd.DataFrame):\n if not inplace:\n data = data.copy()\n if formula_column not in data.columns:\n msg = f\"The data must contain a column named {formula_column} to featurise.\"\n raise ValueError(\n msg,\n )\n print(\"Featurising compositions...\")\n comps = [CompositionalEmbedding(x, embedding) for x in tqdm(data[formula_column].tolist())]\n print(\"Computing feature vectors...\")\n fvs = [x.feature_vector(stats) for x in tqdm(comps)]\n feature_names = comps[0].embedding.feature_labels\n feature_names = [f\"{stat}_{feature}\" for stat in stats for feature in feature_names]\n return pd.concat([data, pd.DataFrame(fvs, columns=feature_names)], axis=1)\n elif isinstance(data, list):\n comps = [CompositionalEmbedding(x, embedding) for x in data]\n return [x.feature_vector(stats) for x in tqdm(comps)]\n\n elif isinstance(data, CompositionalEmbedding):\n return data.feature_vector(stats)\n else:\n msg = \"The data must be a pandas DataFrame, Series,\" \" list or CompositionalEmbedding class.\"\n raise TypeError(\n msg,\n )\n
"},{"location":"python_api/composition/#elementembeddings.composition.formula_parser","title":"formula_parser(formula)
","text":"Parse a string formula.
Returns a dictionary of the composition with key:value pairs of element symbol:amount.
formula (str): A string formula e.g. CsPbI3, Li7La3Zr2O12\n
(dict): A dictionary of the composition\n
Source code in src/elementembeddings/composition.py
def formula_parser(formula: str) -> dict[str, float]:\n # TO-DO: Add validation to check composition contains real elements.\n \"\"\"Parse a string formula.\n\n Returns a dictionary of the composition with key:value pairs\n of element symbol:amount.\n\n Args:\n ----\n formula (str): A string formula e.g. CsPbI3, Li7La3Zr2O12\n\n Returns:\n -------\n (dict): A dictionary of the composition\n\n \"\"\"\n # For Metallofullerene\n formula = formula.replace(\"@\", \"\")\n\n regex = r\"\\(([^\\(\\)]+)\\)\\s*([\\.e\\d]*)\"\n r = re.compile(regex)\n m = re.search(r, formula)\n if m:\n factor = 1.0\n if m.group(2) != \"\":\n factor = float(m.group(2))\n unit_sym_dict = _get_sym_dict(m.group(1), factor)\n expanded_sym = \"\".join([f\"{el}{amt}\" for el, amt in unit_sym_dict.items()])\n expanded_formula = formula.replace(m.group(), expanded_sym)\n return formula_parser(expanded_formula)\n return _get_sym_dict(formula, 1)\n
"},{"location":"python_api/composition/#elementembeddings.composition.species_composition_featuriser","title":"species_composition_featuriser(data, embedding='skipspecies', stats='mean', to_dataframe=False)
","text":"Compute a feature vector for a composition.
The feature vector is based on the statistics specified in the stats argument.
data (Union[list, SpeciesCompositionalEmbedding]):\n a list of composition dictionaries, or a SpeciesCompositionalEmbedding class\nembedding (Union[SpeciesEmbedding, str], optional): A SpeciesEmbedding class\n or a string\nstats (Union[str, list], optional): A list of statistics to be computed.\n The default is ['mean'].\nto_dataframe (bool, optional): Whether to return the feature vectors\n as a DataFrame. The default is False.\n
Union[pd.DataFrame,list]: A pandas DataFrame containing the feature vector,\nor a list of feature vectors is returned\n
Source code in src/elementembeddings/composition.py
def species_composition_featuriser(\n data: SpeciesCompositionalEmbedding | list,\n embedding: Embedding | str = \"skipspecies\",\n stats: str | list = \"mean\",\n to_dataframe: bool = False,\n) -> list | pd.DataFrame:\n \"\"\"Compute a feature vector for a composition.\n\n The feature vector is based on the statistics specified\n in the stats argument.\n\n Args:\n ----\n data (Union[list, SpeciesCompositionalEmbedding]):\n a list of composition dictionaries, or a SpeciesCompositionalEmbedding class\n embedding (Union[SpeciesEmbedding, str], optional): A SpeciesEmbedding class\n or a string\n stats (Union[str, list], optional): A list of statistics to be computed.\n The default is ['mean'].\n to_dataframe (bool, optional): Whether to return the feature vectors\n as a DataFrame. The default is False.\n\n Returns:\n -------\n Union[pd.DataFrame,list]: A pandas DataFrame containing the feature vector,\n or a list of feature vectors is returned\n \"\"\"\n if isinstance(stats, str):\n stats = [stats]\n if isinstance(data, list):\n comps = [SpeciesCompositionalEmbedding(x, embedding) for x in data]\n comp_vectors = [x.feature_vector(stats) for x in tqdm(comps, desc=\"Computing feature vectors\")]\n elif isinstance(data, SpeciesCompositionalEmbedding):\n comps = [data]\n comp_vectors = data.feature_vector(stats)\n else:\n msg = \"The data must be a list or SpeciesCompositionalEmbedding class.\"\n raise TypeError(\n msg,\n )\n if to_dataframe:\n feature_names = comps[0].embedding.feature_labels\n feature_names = [f\"{stat}_{feature}\" for stat in stats for feature in feature_names]\n formulae = [x.formula_pretty for x in comps]\n # Create a DataFrame with formula, composition and feature vectors\n df = pd.DataFrame(comp_vectors, columns=feature_names)\n df[\"formula\"] = formulae\n df[\"composition\"] = data\n # Reorder the columns\n return df[[\"formula\", \"composition\", *feature_names]]\n\n return comp_vectors\n
"},{"location":"python_api/core/","title":"Core module","text":"Provides the Embedding
class.
This module enables the user load in elemental representation data and analyse it using statistical functions.
Typical usage examplemegnet16 = Embedding.load_data('megnet16')
"},{"location":"python_api/core/#elementembeddings.core.Embedding","title":"Embedding
","text":" Bases: EmbeddingBase
Represent an elemental representation.
To load an embedding distributed from the package use the load_data() method.
Works like a standard python dictionary. The keys are {element: vector} pairs.
Adds a few convenience methods related to elemental representations.
Source code insrc/elementembeddings/core.py
class Embedding(EmbeddingBase):\n \"\"\"Represent an elemental representation.\n\n To load an embedding distributed from the package use the load_data() method.\n\n Works like a standard python dictionary. The keys are {element: vector} pairs.\n\n Adds a few convenience methods related to elemental representations.\n \"\"\"\n\n @staticmethod\n def load_data(embedding_name: str | None = None):\n \"\"\"Create an instance of the `Embedding` class from a default embedding file.\n\n The default embeddings are in the table below:\n\n | **Name** | **str_name** |\n |-------------------------|--------------|\n | Magpie | magpie |\n | Magpie (scaled) | magpie_sc |\n | Mat2Vec | mat2vec |\n | Matscholar | matscholar |\n | Megnet (16 dimensions) | megnet16 |\n | Modified Pettifor scale | mod_petti |\n | Oliynyk | oliynyk |\n | Oliynyk (scaled) | oliynyk_sc |\n | Random (200 dimensions) | random_200 |\n | SkipAtom | skipatom |\n | Atomic Number | atomic |\n | CrystaLLM | crystallm |\n | XenonPy | xenonpy |\n | Cgnf | cgnf |\n\n\n Args:\n ----\n embedding_name (str): The str_name of an embedding file.\n\n Returns:\n -------\n Embedding :class:`Embedding` instance.\n \"\"\"\n if DEFAULT_ELEMENT_EMBEDDINGS[embedding_name].endswith(\".csv\"):\n return Embedding.from_csv(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n elif \"megnet\" in DEFAULT_ELEMENT_EMBEDDINGS[embedding_name]:\n return Embedding.from_json(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n ).remove_elements([\"Null\"])\n elif DEFAULT_ELEMENT_EMBEDDINGS[embedding_name].endswith(\".json\"):\n return Embedding.from_json(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n else:\n return None\n\n @staticmethod\n def from_json(embedding_json, embedding_name: str | None = None):\n \"\"\"Create an instance of the Embedding class from a json file.\n\n Args:\n ----\n embedding_json (str): Filepath of the json file\n embedding_name (str): The name of the elemental representation\n \"\"\"\n # Need to add validation handling for JSONs in different formats\n with open(embedding_json) as f:\n embedding_data = json.load(f)\n return Embedding(embedding_data, embedding_name)\n\n @staticmethod\n def from_csv(embedding_csv, embedding_name: str | None = None):\n \"\"\"Create an instance of the Embedding class from a csv file.\n\n The first column of the csv file must contain the elements and be named element.\n\n Args:\n ----\n embedding_csv (str): Filepath of the csv file\n embedding_name (str): The name of the elemental representation\n\n \"\"\"\n # Need to add validation handling for csv files\n df = pd.read_csv(embedding_csv)\n elements = list(df[\"element\"])\n df = df.drop([\"element\"], axis=1)\n feature_labels = list(df.columns)\n embeds_array = df.to_numpy()\n embedding_data = {elements[i]: embeds_array[i] for i in range(len(embeds_array))}\n return Embedding(embedding_data, embedding_name, feature_labels)\n\n def as_dataframe(self, columns: str = \"components\") -> pd.DataFrame:\n \"\"\"Return the embedding as a pandas Dataframe.\n\n The first column is the elements and each other\n column represents a component of the embedding.\n\n Args:\n ----\n columns (str): A string to specify if the columns are the vector components\n and the index is the elements (`columns='components'`)\n or the columns are the elements (`columns='elements'`).\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n\n\n \"\"\"\n embedding = self.embeddings\n df = pd.DataFrame(embedding, index=self.feature_labels)\n if columns == \"components\":\n return df.T\n elif columns == \"elements\":\n return df\n else:\n msg = f\"{columns} is not a valid keyword argument. \" f\"Choose either 'components' or 'elements\"\n raise (\n ValueError(\n msg,\n )\n )\n\n def to(self, fmt: str = \"\", filename: str | None = \"\"):\n \"\"\"Output the embedding to a file.\n\n Args:\n ----\n fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\n filename (str): The name of the file to be outputted\n\n Returns:\n -------\n (str) if filename not specified, otherwise None.\n \"\"\"\n fmt = fmt.lower()\n\n if fmt == \"json\" or fnmatch.fnmatch(filename, \"*.json\"):\n j = json.dumps(self.embeddings, cls=NumpyEncoder)\n if filename:\n if not filename.endswith(\".json\"):\n filename = filename + \".json\"\n with open(filename, \"w\") as file:\n file.write(j)\n return None\n else:\n return j\n elif fmt == \"csv\" or fnmatch.fnmatch(filename, \"*.csv\"):\n if filename:\n if not filename.endswith(\".csv\"):\n filename = filename + \".csv\"\n self.as_dataframe().to_csv(filename, index_label=\"element\")\n return None\n else:\n return self.as_dataframe().to_csv(index_label=\"element\")\n\n else:\n msg = f\"{fmt!s} is an invalid file format\"\n raise ValueError(msg)\n\n @property\n def element_list(self) -> list:\n \"\"\"Return the elements of the embedding.\"\"\"\n return self._embeddings_keys_list()\n\n def remove_elements(self, elements: str | list[str], inplace: bool = False):\n # TO-DO allow removal by atomic numbers\n \"\"\"Remove elements from the Embedding instance.\n\n Args:\n ----\n elements (str,list(str)): An element symbol or a list of element symbols\n inplace (bool): If True, elements are removed from the Embedding instance.\n If false, the original embedding instance is unchanged\n and a new embedding instance with the elements removed is created.\n\n \"\"\"\n if inplace:\n if isinstance(elements, str):\n del self.embeddings[elements]\n elif isinstance(elements, list):\n for el in elements:\n del self.embeddings[el]\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n if isinstance(elements, str):\n del embeddings_copy[elements]\n elif isinstance(elements, list):\n for el in elements:\n del embeddings_copy[el]\n return Embedding(embeddings_copy, self.embedding_name)\n\n def standardise(self, inplace: bool = False):\n \"\"\"Standardise the embeddings.\n\n Mean is 0 and standard deviation is 1.\n\n \"\"\"\n if self._is_standardised():\n warnings.warn(\n \"Embedding is already standardised. \" \"Returning None and not changing the embedding.\",\n )\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n embeddings_array = np.array(list(embeddings_copy.values()))\n embeddings_array = StandardScaler().fit_transform(embeddings_array)\n for el, emb in zip(embeddings_copy.keys(), embeddings_array):\n embeddings_copy[el] = emb\n\n if inplace:\n self.embeddings = embeddings_copy\n self.is_standardised = True\n return None\n else:\n return Embedding(embeddings_copy, self.embedding_name)\n\n @property\n def element_groups_dict(self) -> dict[str, str]:\n \"\"\"Return a dictionary of {element: element type} pairs.\n\n e.g. {'He':'Noble gas'}\n\n \"\"\"\n with open(path.join(data_directory, \"element_data/element_group.json\")) as f:\n _dict = json.load(f)\n return {i: _dict[i] for i in self.element_list}\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.element_groups_dict","title":"element_groups_dict: dict[str, str]
property
","text":"Return a dictionary of {element: element type} pairs.
e.g. {'He':'Noble gas'}
"},{"location":"python_api/core/#elementembeddings.core.Embedding.element_list","title":"element_list: list
property
","text":"Return the elements of the embedding.
"},{"location":"python_api/core/#elementembeddings.core.Embedding.as_dataframe","title":"as_dataframe(columns='components')
","text":"Return the embedding as a pandas Dataframe.
The first column is the elements and each other column represents a component of the embedding.
columns (str): A string to specify if the columns are the vector components\nand the index is the elements (`columns='components'`)\nor the columns are the elements (`columns='elements'`).\n
df (pandas.DataFrame): A pandas dataframe object\n
Source code in src/elementembeddings/core.py
def as_dataframe(self, columns: str = \"components\") -> pd.DataFrame:\n \"\"\"Return the embedding as a pandas Dataframe.\n\n The first column is the elements and each other\n column represents a component of the embedding.\n\n Args:\n ----\n columns (str): A string to specify if the columns are the vector components\n and the index is the elements (`columns='components'`)\n or the columns are the elements (`columns='elements'`).\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n\n\n \"\"\"\n embedding = self.embeddings\n df = pd.DataFrame(embedding, index=self.feature_labels)\n if columns == \"components\":\n return df.T\n elif columns == \"elements\":\n return df\n else:\n msg = f\"{columns} is not a valid keyword argument. \" f\"Choose either 'components' or 'elements\"\n raise (\n ValueError(\n msg,\n )\n )\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.from_csv","title":"from_csv(embedding_csv, embedding_name=None)
staticmethod
","text":"Create an instance of the Embedding class from a csv file.
The first column of the csv file must contain the elements and be named element.
embedding_csv (str): Filepath of the csv file\nembedding_name (str): The name of the elemental representation\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef from_csv(embedding_csv, embedding_name: str | None = None):\n \"\"\"Create an instance of the Embedding class from a csv file.\n\n The first column of the csv file must contain the elements and be named element.\n\n Args:\n ----\n embedding_csv (str): Filepath of the csv file\n embedding_name (str): The name of the elemental representation\n\n \"\"\"\n # Need to add validation handling for csv files\n df = pd.read_csv(embedding_csv)\n elements = list(df[\"element\"])\n df = df.drop([\"element\"], axis=1)\n feature_labels = list(df.columns)\n embeds_array = df.to_numpy()\n embedding_data = {elements[i]: embeds_array[i] for i in range(len(embeds_array))}\n return Embedding(embedding_data, embedding_name, feature_labels)\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.from_json","title":"from_json(embedding_json, embedding_name=None)
staticmethod
","text":"Create an instance of the Embedding class from a json file.
embedding_json (str): Filepath of the json file\nembedding_name (str): The name of the elemental representation\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef from_json(embedding_json, embedding_name: str | None = None):\n \"\"\"Create an instance of the Embedding class from a json file.\n\n Args:\n ----\n embedding_json (str): Filepath of the json file\n embedding_name (str): The name of the elemental representation\n \"\"\"\n # Need to add validation handling for JSONs in different formats\n with open(embedding_json) as f:\n embedding_data = json.load(f)\n return Embedding(embedding_data, embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.load_data","title":"load_data(embedding_name=None)
staticmethod
","text":"Create an instance of the Embedding
class from a default embedding file.
The default embeddings are in the table below:
Name str_name Magpie magpie Magpie (scaled) magpie_sc Mat2Vec mat2vec Matscholar matscholar Megnet (16 dimensions) megnet16 Modified Pettifor scale mod_petti Oliynyk oliynyk Oliynyk (scaled) oliynyk_sc Random (200 dimensions) random_200 SkipAtom skipatom Atomic Number atomic CrystaLLM crystallm XenonPy xenonpy Cgnf cgnfembedding_name (str): The str_name of an embedding file.\n
Embedding :class:`Embedding` instance.\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef load_data(embedding_name: str | None = None):\n \"\"\"Create an instance of the `Embedding` class from a default embedding file.\n\n The default embeddings are in the table below:\n\n | **Name** | **str_name** |\n |-------------------------|--------------|\n | Magpie | magpie |\n | Magpie (scaled) | magpie_sc |\n | Mat2Vec | mat2vec |\n | Matscholar | matscholar |\n | Megnet (16 dimensions) | megnet16 |\n | Modified Pettifor scale | mod_petti |\n | Oliynyk | oliynyk |\n | Oliynyk (scaled) | oliynyk_sc |\n | Random (200 dimensions) | random_200 |\n | SkipAtom | skipatom |\n | Atomic Number | atomic |\n | CrystaLLM | crystallm |\n | XenonPy | xenonpy |\n | Cgnf | cgnf |\n\n\n Args:\n ----\n embedding_name (str): The str_name of an embedding file.\n\n Returns:\n -------\n Embedding :class:`Embedding` instance.\n \"\"\"\n if DEFAULT_ELEMENT_EMBEDDINGS[embedding_name].endswith(\".csv\"):\n return Embedding.from_csv(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n elif \"megnet\" in DEFAULT_ELEMENT_EMBEDDINGS[embedding_name]:\n return Embedding.from_json(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n ).remove_elements([\"Null\"])\n elif DEFAULT_ELEMENT_EMBEDDINGS[embedding_name].endswith(\".json\"):\n return Embedding.from_json(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n else:\n return None\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.remove_elements","title":"remove_elements(elements, inplace=False)
","text":"Remove elements from the Embedding instance.
elements (str,list(str)): An element symbol or a list of element symbols\ninplace (bool): If True, elements are removed from the Embedding instance.\nIf false, the original embedding instance is unchanged\nand a new embedding instance with the elements removed is created.\n
Source code in src/elementembeddings/core.py
def remove_elements(self, elements: str | list[str], inplace: bool = False):\n # TO-DO allow removal by atomic numbers\n \"\"\"Remove elements from the Embedding instance.\n\n Args:\n ----\n elements (str,list(str)): An element symbol or a list of element symbols\n inplace (bool): If True, elements are removed from the Embedding instance.\n If false, the original embedding instance is unchanged\n and a new embedding instance with the elements removed is created.\n\n \"\"\"\n if inplace:\n if isinstance(elements, str):\n del self.embeddings[elements]\n elif isinstance(elements, list):\n for el in elements:\n del self.embeddings[el]\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n if isinstance(elements, str):\n del embeddings_copy[elements]\n elif isinstance(elements, list):\n for el in elements:\n del embeddings_copy[el]\n return Embedding(embeddings_copy, self.embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.standardise","title":"standardise(inplace=False)
","text":"Standardise the embeddings.
Mean is 0 and standard deviation is 1.
Source code insrc/elementembeddings/core.py
def standardise(self, inplace: bool = False):\n \"\"\"Standardise the embeddings.\n\n Mean is 0 and standard deviation is 1.\n\n \"\"\"\n if self._is_standardised():\n warnings.warn(\n \"Embedding is already standardised. \" \"Returning None and not changing the embedding.\",\n )\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n embeddings_array = np.array(list(embeddings_copy.values()))\n embeddings_array = StandardScaler().fit_transform(embeddings_array)\n for el, emb in zip(embeddings_copy.keys(), embeddings_array):\n embeddings_copy[el] = emb\n\n if inplace:\n self.embeddings = embeddings_copy\n self.is_standardised = True\n return None\n else:\n return Embedding(embeddings_copy, self.embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.to","title":"to(fmt='', filename='')
","text":"Output the embedding to a file.
fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\nfilename (str): The name of the file to be outputted\n
(str) if filename not specified, otherwise None.\n
Source code in src/elementembeddings/core.py
def to(self, fmt: str = \"\", filename: str | None = \"\"):\n \"\"\"Output the embedding to a file.\n\n Args:\n ----\n fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\n filename (str): The name of the file to be outputted\n\n Returns:\n -------\n (str) if filename not specified, otherwise None.\n \"\"\"\n fmt = fmt.lower()\n\n if fmt == \"json\" or fnmatch.fnmatch(filename, \"*.json\"):\n j = json.dumps(self.embeddings, cls=NumpyEncoder)\n if filename:\n if not filename.endswith(\".json\"):\n filename = filename + \".json\"\n with open(filename, \"w\") as file:\n file.write(j)\n return None\n else:\n return j\n elif fmt == \"csv\" or fnmatch.fnmatch(filename, \"*.csv\"):\n if filename:\n if not filename.endswith(\".csv\"):\n filename = filename + \".csv\"\n self.as_dataframe().to_csv(filename, index_label=\"element\")\n return None\n else:\n return self.as_dataframe().to_csv(index_label=\"element\")\n\n else:\n msg = f\"{fmt!s} is an invalid file format\"\n raise ValueError(msg)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding","title":"SpeciesEmbedding
","text":" Bases: EmbeddingBase
Represent an ion representation.
To load an embedding distributed from the package use the load_data() method.
Works like a standard python dictionary. The keys are {species: vector} pairs.
Source code insrc/elementembeddings/core.py
class SpeciesEmbedding(EmbeddingBase):\n \"\"\"Represent an ion representation.\n\n To load an embedding distributed from the package use the load_data() method.\n\n Works like a standard python dictionary. The keys are {species: vector} pairs.\n \"\"\"\n\n @staticmethod\n def load_data(embedding_name: str, include_neutral: bool = False):\n \"\"\"Create a `SpeciesEmbedding` from a preset embedding file.\n\n The default embeddings are in the table below:\n\n | **Name** | **str_name** |\n |-------------------------|--------------|\n | SkipSpecies (200 dim, MPv2022) | skipspecies |\n | SkipSpecies (induced, 200 dim, MPv2022) | skipspecies_induced |\n\n Args:\n ----\n embedding_name (str): The str_name of the species representation\n include_neutral (bool): If True, neutral species are\n included in the embedding\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n \"\"\"\n if DEFAULT_SPECIES_EMBEDDINGS[embedding_name].endswith(\".csv\"):\n embedding = SpeciesEmbedding.from_csv(\n path.join(\n data_directory,\n \"species_representations\",\n DEFAULT_SPECIES_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n if not include_neutral:\n embedding.remove_neutral_species(inplace=True)\n return embedding\n elif DEFAULT_SPECIES_EMBEDDINGS[embedding_name].endswith(\".json\"):\n embedding = SpeciesEmbedding.from_json(\n path.join(\n data_directory,\n \"species_representations\",\n DEFAULT_SPECIES_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n if not include_neutral:\n embedding.remove_neutral_species(inplace=True)\n return embedding\n else:\n return None\n\n @staticmethod\n def from_csv(csv_path, embedding_name: str | None = None):\n \"\"\"Create an instance of the SpeciesEmbedding class from a csv file.\n\n The first column of the csv file must contain the species and be named species.\n\n Args:\n ----\n csv_path (str): Filepath of the csv file\n embedding_name (str): The name of the species representation\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n\n \"\"\"\n # Need to add validation handling for csv files\n df = pd.read_csv(csv_path)\n species = list(df[\"species\"])\n df = df.drop([\"species\"], axis=1)\n feature_labels = list(df.columns)\n embeds_array = df.to_numpy()\n embedding_data = {species[i]: embeds_array[i] for i in range(len(embeds_array))}\n return SpeciesEmbedding(embedding_data, embedding_name, feature_labels)\n\n @staticmethod\n def from_json(json_path, embedding_name: str | None = None):\n \"\"\"Create an instance of the SpeciesEmbedding class from a json file.\n\n Args:\n ----\n json_path (str): Filepath of the json file\n embedding_name (str): The name of the species representation\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n\n \"\"\"\n # Need to add validation handling for json files\n with open(json_path) as f:\n embedding_data = json.load(f)\n return SpeciesEmbedding(embedding_data, embedding_name)\n\n @property\n def species_list(self) -> list:\n \"\"\"Return the species of the embedding.\"\"\"\n return list(self.embeddings.keys())\n\n @property\n def element_list(self) -> list:\n \"\"\"Return the elements of the embedding.\"\"\"\n return list({parse_species(species)[0] for species in self.species_list})\n\n def remove_neutral_species(self, inplace: bool = False):\n \"\"\"Remove neutral species from the SpeciesEmbedding instance.\n\n Args:\n ----\n inplace (bool): If True, neutral species are removed\n from the SpeciesEmbedding instance.\n If false, the original SpeciesEmbedding instance is unchanged\n and a new SpeciesEmbedding instance with the\n neutral species removed is created.\n\n \"\"\"\n neutral_species = [s for s in self.species_list if parse_species(s)[1] == 0]\n return self.remove_species(neutral_species, inplace)\n\n def get_element_oxi_states(self, el: str) -> list:\n \"\"\"Return the oxidation states for a given element.\n\n Args:\n ----\n el (str): An element symbol\n\n Returns:\n -------\n oxidation_states (list[int]): A list of oxidation states\n \"\"\"\n assert el in self.element_list, f\"There are no species of the element {el} in this SpeciesEmbedding\"\n parsed_species = [parse_species(species) for species in self.species_list]\n\n el_species_list = [species for species in parsed_species if species[0] == el]\n oxidation_states = [species[1] for species in el_species_list]\n return sorted(oxidation_states)\n\n def remove_species(self, species: str | list[str], inplace: bool = False):\n \"\"\"Remove species from the SpeciesEmbedding instance.\n\n Args:\n ----\n species (str,list(str)): A species or a list of species\n inplace (bool): If True, species are removed\n from the SpeciesEmbedding instance.\n If false, the original SpeciesEmbedding instance is unchanged\n and a new SpeciesEmbedding instance with the species removed is created.\n\n \"\"\"\n if inplace:\n if isinstance(species, str):\n try:\n del self.embeddings[species]\n except KeyError:\n warnings.warn(\n f\"{species} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n elif isinstance(species, list):\n for sp in species:\n try:\n del self.embeddings[sp]\n except KeyError:\n warnings.warn(\n f\"{sp} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n if isinstance(species, str):\n try:\n del embeddings_copy[species]\n except KeyError:\n warnings.warn(\n f\"{species} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n elif isinstance(species, list):\n for sp in species:\n try:\n del embeddings_copy[sp]\n except KeyError:\n warnings.warn(\n f\"{sp} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n return SpeciesEmbedding(embeddings_copy, self.embedding_name)\n\n @property\n def ion_type_dict(self) -> dict[str, str]:\n \"\"\"Return a dictionary of {species: ion type} pairs.\n\n e.g. {'Fe2+':'cation'}\n\n \"\"\"\n ion_dict = {}\n for species in self.species_list:\n el, charge = parse_species(species)\n if charge > 0:\n ion_dict[species] = \"Cation\"\n elif charge < 0:\n ion_dict[species] = \"Anion\"\n else:\n ion_dict[species] = \"Neutral\"\n\n return ion_dict\n\n @property\n def species_groups_dict(self) -> dict[str, str]:\n \"\"\"Return a dictionary of {species: element type} pairs.\n\n e.g. {'Fe2+':'transition metal'}\n\n \"\"\"\n with open(path.join(data_directory, \"element_data/element_group.json\")) as f:\n _dict = json.load(f)\n return {i: _dict[parse_species(i)[0]] for i in self.species_list}\n\n def distance_df(self, metric=\"euclidean\") -> pd.DataFrame:\n \"\"\"Return a dataframe of the distance between species.\n\n Args:\n ----\n metric (str): The metric to use to calculate the distance.\n Options are 'euclidean', 'cosine', 'manhattan' and 'chebyshev'.\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n \"\"\"\n return super().distance_df(metric).rename(mapper={\"ele_1\": \"species_1\", \"ele_2\": \"species_2\"}, axis=1)\n\n def correlation_df(self, metric: str = \"pearson\") -> pd.DataFrame:\n \"\"\"Return a dataframe of the correlation between species.\n\n Args:\n ----\n metric (str): The metric to use to calculate the correlation.\n Options are 'pearson' and 'spearman'.\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n\n \"\"\"\n return super().correlation_df(metric).rename(mapper={\"ele_1\": \"species_1\", \"ele_2\": \"species_2\"}, axis=1)\n\n def to(self, fmt: str = \"\", filename: str | None = \"\"):\n \"\"\"Output the embedding to a file.\n\n Args:\n ----\n fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\n filename (str): The name of the file to be outputted\n\n Returns:\n -------\n (str) if filename not specified, otherwise None.\n \"\"\"\n fmt = fmt.lower()\n\n if fmt == \"json\" or fnmatch.fnmatch(filename, \"*.json\"):\n j = json.dumps(self.embeddings, cls=NumpyEncoder)\n if filename:\n if not filename.endswith(\".json\"):\n filename = filename + \".json\"\n with open(filename, \"w\") as file:\n file.write(j)\n return None\n else:\n return j\n return None\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.element_list","title":"element_list: list
property
","text":"Return the elements of the embedding.
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.ion_type_dict","title":"ion_type_dict: dict[str, str]
property
","text":"Return a dictionary of {species: ion type} pairs.
e.g. {'Fe2+':'cation'}
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.species_groups_dict","title":"species_groups_dict: dict[str, str]
property
","text":"Return a dictionary of {species: element type} pairs.
e.g. {'Fe2+':'transition metal'}
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.species_list","title":"species_list: list
property
","text":"Return the species of the embedding.
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.correlation_df","title":"correlation_df(metric='pearson')
","text":"Return a dataframe of the correlation between species.
metric (str): The metric to use to calculate the correlation.\nOptions are 'pearson' and 'spearman'.\n
df (pandas.DataFrame): A pandas dataframe object\n
Source code in src/elementembeddings/core.py
def correlation_df(self, metric: str = \"pearson\") -> pd.DataFrame:\n \"\"\"Return a dataframe of the correlation between species.\n\n Args:\n ----\n metric (str): The metric to use to calculate the correlation.\n Options are 'pearson' and 'spearman'.\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n\n \"\"\"\n return super().correlation_df(metric).rename(mapper={\"ele_1\": \"species_1\", \"ele_2\": \"species_2\"}, axis=1)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.distance_df","title":"distance_df(metric='euclidean')
","text":"Return a dataframe of the distance between species.
metric (str): The metric to use to calculate the distance.\nOptions are 'euclidean', 'cosine', 'manhattan' and 'chebyshev'.\n
df (pandas.DataFrame): A pandas dataframe object\n
Source code in src/elementembeddings/core.py
def distance_df(self, metric=\"euclidean\") -> pd.DataFrame:\n \"\"\"Return a dataframe of the distance between species.\n\n Args:\n ----\n metric (str): The metric to use to calculate the distance.\n Options are 'euclidean', 'cosine', 'manhattan' and 'chebyshev'.\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n \"\"\"\n return super().distance_df(metric).rename(mapper={\"ele_1\": \"species_1\", \"ele_2\": \"species_2\"}, axis=1)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.from_csv","title":"from_csv(csv_path, embedding_name=None)
staticmethod
","text":"Create an instance of the SpeciesEmbedding class from a csv file.
The first column of the csv file must contain the species and be named species.
csv_path (str): Filepath of the csv file\nembedding_name (str): The name of the species representation\n
SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef from_csv(csv_path, embedding_name: str | None = None):\n \"\"\"Create an instance of the SpeciesEmbedding class from a csv file.\n\n The first column of the csv file must contain the species and be named species.\n\n Args:\n ----\n csv_path (str): Filepath of the csv file\n embedding_name (str): The name of the species representation\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n\n \"\"\"\n # Need to add validation handling for csv files\n df = pd.read_csv(csv_path)\n species = list(df[\"species\"])\n df = df.drop([\"species\"], axis=1)\n feature_labels = list(df.columns)\n embeds_array = df.to_numpy()\n embedding_data = {species[i]: embeds_array[i] for i in range(len(embeds_array))}\n return SpeciesEmbedding(embedding_data, embedding_name, feature_labels)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.from_json","title":"from_json(json_path, embedding_name=None)
staticmethod
","text":"Create an instance of the SpeciesEmbedding class from a json file.
json_path (str): Filepath of the json file\nembedding_name (str): The name of the species representation\n
SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef from_json(json_path, embedding_name: str | None = None):\n \"\"\"Create an instance of the SpeciesEmbedding class from a json file.\n\n Args:\n ----\n json_path (str): Filepath of the json file\n embedding_name (str): The name of the species representation\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n\n \"\"\"\n # Need to add validation handling for json files\n with open(json_path) as f:\n embedding_data = json.load(f)\n return SpeciesEmbedding(embedding_data, embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.get_element_oxi_states","title":"get_element_oxi_states(el)
","text":"Return the oxidation states for a given element.
el (str): An element symbol\n
oxidation_states (list[int]): A list of oxidation states\n
Source code in src/elementembeddings/core.py
def get_element_oxi_states(self, el: str) -> list:\n \"\"\"Return the oxidation states for a given element.\n\n Args:\n ----\n el (str): An element symbol\n\n Returns:\n -------\n oxidation_states (list[int]): A list of oxidation states\n \"\"\"\n assert el in self.element_list, f\"There are no species of the element {el} in this SpeciesEmbedding\"\n parsed_species = [parse_species(species) for species in self.species_list]\n\n el_species_list = [species for species in parsed_species if species[0] == el]\n oxidation_states = [species[1] for species in el_species_list]\n return sorted(oxidation_states)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.load_data","title":"load_data(embedding_name, include_neutral=False)
staticmethod
","text":"Create a SpeciesEmbedding
from a preset embedding file.
The default embeddings are in the table below:
Name str_name SkipSpecies (200 dim, MPv2022) skipspecies SkipSpecies (induced, 200 dim, MPv2022) skipspecies_inducedembedding_name (str): The str_name of the species representation\ninclude_neutral (bool): If True, neutral species are\n included in the embedding\n
SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef load_data(embedding_name: str, include_neutral: bool = False):\n \"\"\"Create a `SpeciesEmbedding` from a preset embedding file.\n\n The default embeddings are in the table below:\n\n | **Name** | **str_name** |\n |-------------------------|--------------|\n | SkipSpecies (200 dim, MPv2022) | skipspecies |\n | SkipSpecies (induced, 200 dim, MPv2022) | skipspecies_induced |\n\n Args:\n ----\n embedding_name (str): The str_name of the species representation\n include_neutral (bool): If True, neutral species are\n included in the embedding\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n \"\"\"\n if DEFAULT_SPECIES_EMBEDDINGS[embedding_name].endswith(\".csv\"):\n embedding = SpeciesEmbedding.from_csv(\n path.join(\n data_directory,\n \"species_representations\",\n DEFAULT_SPECIES_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n if not include_neutral:\n embedding.remove_neutral_species(inplace=True)\n return embedding\n elif DEFAULT_SPECIES_EMBEDDINGS[embedding_name].endswith(\".json\"):\n embedding = SpeciesEmbedding.from_json(\n path.join(\n data_directory,\n \"species_representations\",\n DEFAULT_SPECIES_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n if not include_neutral:\n embedding.remove_neutral_species(inplace=True)\n return embedding\n else:\n return None\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.remove_neutral_species","title":"remove_neutral_species(inplace=False)
","text":"Remove neutral species from the SpeciesEmbedding instance.
inplace (bool): If True, neutral species are removed\n from the SpeciesEmbedding instance.\nIf false, the original SpeciesEmbedding instance is unchanged\nand a new SpeciesEmbedding instance with the\n neutral species removed is created.\n
Source code in src/elementembeddings/core.py
def remove_neutral_species(self, inplace: bool = False):\n \"\"\"Remove neutral species from the SpeciesEmbedding instance.\n\n Args:\n ----\n inplace (bool): If True, neutral species are removed\n from the SpeciesEmbedding instance.\n If false, the original SpeciesEmbedding instance is unchanged\n and a new SpeciesEmbedding instance with the\n neutral species removed is created.\n\n \"\"\"\n neutral_species = [s for s in self.species_list if parse_species(s)[1] == 0]\n return self.remove_species(neutral_species, inplace)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.remove_species","title":"remove_species(species, inplace=False)
","text":"Remove species from the SpeciesEmbedding instance.
species (str,list(str)): A species or a list of species\ninplace (bool): If True, species are removed\nfrom the SpeciesEmbedding instance.\nIf false, the original SpeciesEmbedding instance is unchanged\nand a new SpeciesEmbedding instance with the species removed is created.\n
Source code in src/elementembeddings/core.py
def remove_species(self, species: str | list[str], inplace: bool = False):\n \"\"\"Remove species from the SpeciesEmbedding instance.\n\n Args:\n ----\n species (str,list(str)): A species or a list of species\n inplace (bool): If True, species are removed\n from the SpeciesEmbedding instance.\n If false, the original SpeciesEmbedding instance is unchanged\n and a new SpeciesEmbedding instance with the species removed is created.\n\n \"\"\"\n if inplace:\n if isinstance(species, str):\n try:\n del self.embeddings[species]\n except KeyError:\n warnings.warn(\n f\"{species} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n elif isinstance(species, list):\n for sp in species:\n try:\n del self.embeddings[sp]\n except KeyError:\n warnings.warn(\n f\"{sp} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n if isinstance(species, str):\n try:\n del embeddings_copy[species]\n except KeyError:\n warnings.warn(\n f\"{species} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n elif isinstance(species, list):\n for sp in species:\n try:\n del embeddings_copy[sp]\n except KeyError:\n warnings.warn(\n f\"{sp} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n return SpeciesEmbedding(embeddings_copy, self.embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.to","title":"to(fmt='', filename='')
","text":"Output the embedding to a file.
fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\nfilename (str): The name of the file to be outputted\n
(str) if filename not specified, otherwise None.\n
Source code in src/elementembeddings/core.py
def to(self, fmt: str = \"\", filename: str | None = \"\"):\n \"\"\"Output the embedding to a file.\n\n Args:\n ----\n fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\n filename (str): The name of the file to be outputted\n\n Returns:\n -------\n (str) if filename not specified, otherwise None.\n \"\"\"\n fmt = fmt.lower()\n\n if fmt == \"json\" or fnmatch.fnmatch(filename, \"*.json\"):\n j = json.dumps(self.embeddings, cls=NumpyEncoder)\n if filename:\n if not filename.endswith(\".json\"):\n filename = filename + \".json\"\n with open(filename, \"w\") as file:\n file.write(j)\n return None\n else:\n return j\n return None\n
"},{"location":"python_api/plotter/","title":"Plotter module","text":"Provides the plotting functions for visualising Embeddings.
"},{"location":"python_api/plotter/#elementembeddings.plotter.dimension_plotter","title":"dimension_plotter(embedding, ax=None, n_components=2, reducer='umap', adjusttext=True, reducer_params=None, scatter_params=None, include_species=None)
","text":"Plot the reduced dimensions of the embeddings.
embedding (Embedding): The embedding to be plotted.\nax (plt.axes, optional): The axes to plot on, by default None\nn_components (int): The number of components to reduce to, by default 2\nreducer (str): The dimensionality reduction algorithm to use, by default \"umap\"\nadjusttext (bool): Whether to avoid overlap of the text labels, by default True\nreducer_params (dict, optional): Additional keyword arguments to pass to\nthe reducer, by default None\nscatter_params (dict, optional): Additional keyword arguments to pass to\nthe scatterplot, by default None\ninclude_species (list, optional): The elements/species to include in the plot,\n
Source code in src/elementembeddings/plotter.py
def dimension_plotter(\n embedding: Embedding | SpeciesEmbedding,\n ax: plt.axes | None = None,\n n_components: int = 2,\n reducer: str = \"umap\",\n adjusttext: bool = True,\n reducer_params: dict | None = None,\n scatter_params: dict | None = None,\n include_species: list | None = None,\n):\n \"\"\"Plot the reduced dimensions of the embeddings.\n\n Args:\n ----\n embedding (Embedding): The embedding to be plotted.\n ax (plt.axes, optional): The axes to plot on, by default None\n n_components (int): The number of components to reduce to, by default 2\n reducer (str): The dimensionality reduction algorithm to use, by default \"umap\"\n adjusttext (bool): Whether to avoid overlap of the text labels, by default True\n reducer_params (dict, optional): Additional keyword arguments to pass to\n the reducer, by default None\n scatter_params (dict, optional): Additional keyword arguments to pass to\n the scatterplot, by default None\n include_species (list, optional): The elements/species to include in the plot,\n\n \"\"\"\n if reducer_params is None:\n reducer_params = {}\n if reducer == \"umap\":\n reduced = embedding.calculate_umap(n_components=n_components, **reducer_params)\n elif reducer == \"tsne\":\n reduced = embedding.calculate_tsne(n_components=n_components, **reducer_params)\n elif reducer == \"pca\":\n reduced = embedding.calculate_pca(n_components=n_components, **reducer_params)\n else:\n msg = \"Unrecognised reducer.\"\n raise ValueError(msg)\n\n if isinstance(embedding, Embedding):\n group_dict = embedding.element_groups_dict\n el_sp_array = np.array(embedding.element_list)\n\n data = {\n \"x\": reduced[:, 0],\n \"y\": reduced[:, 1],\n \"element\": el_sp_array,\n \"Group\": list(group_dict.values()),\n }\n elif isinstance(embedding, SpeciesEmbedding):\n group_dict = embedding.species_groups_dict\n el_sp_array = np.array(embedding.species_list)\n ion_type = embedding.ion_type_dict\n data = {\n \"x\": reduced[:, 0],\n \"y\": reduced[:, 1],\n \"element\": el_sp_array,\n \"Group\": list(group_dict.values()),\n \"ion_type\": list(ion_type.values()),\n }\n if reduced.shape[1] == 2:\n df = pd.DataFrame(data)\n if include_species:\n df = df[df[\"element\"].isin(include_species)].reset_index(drop=True)\n if not ax:\n fig, ax = plt.subplots()\n if scatter_params is None:\n scatter_params = {}\n if isinstance(embedding, SpeciesEmbedding):\n sns.scatterplot(\n data=df,\n x=\"x\",\n y=\"y\",\n hue=\"Group\",\n ax=ax,\n palette=ELEMENT_GROUPS_PALETTES,\n style=\"ion_type\",\n **scatter_params,\n )\n # Convert the species to (element, charge) format\n parsed_species = [parse_species(spec) for spec in df[\"element\"].tolist()]\n signs = [get_sign(charge) for _, charge in parsed_species]\n\n species_labels = [\n rf\"$\\mathregular{{{element}^{{{abs(charge)}{sign}}}}}$\"\n for (element, charge), sign in zip(parsed_species, signs)\n ]\n\n texts = [ax.text(df[\"x\"][i], df[\"y\"][i], species_labels[i], fontsize=12) for i in range(len(df))]\n elif isinstance(embedding, Embedding):\n sns.scatterplot(\n data=df,\n x=\"x\",\n y=\"y\",\n hue=\"Group\",\n ax=ax,\n palette=ELEMENT_GROUPS_PALETTES,\n **scatter_params,\n )\n texts = [ax.text(df[\"x\"][i], df[\"y\"][i], df[\"element\"][i], fontsize=12) for i in range(len(df))]\n ax.set_xlabel(\"Dimension 1\")\n ax.set_ylabel(\"Dimension 2\")\n if adjusttext:\n adjust_text(\n texts,\n arrowprops={\"arrowstyle\": \"-\", \"color\": \"gray\", \"lw\": 0.5},\n ax=ax,\n )\n\n elif reduced.shape[1] == 3:\n df = pd.DataFrame(\n {\n \"x\": reduced[:, 0],\n \"y\": reduced[:, 1],\n \"z\": reduced[:, 2],\n \"element\": el_sp_array,\n \"group\": list(group_dict.values()),\n },\n )\n if include_species:\n df = df[df[\"element\"].isin(include_species)].reset_index(drop=True)\n if not ax:\n fig = plt.figure() # noqa: F841\n ax = plt.axes(projection=\"3d\")\n ax.scatter3D(\n df[\"x\"],\n df[\"y\"],\n df[\"z\"],\n )\n ax.set_xlabel(\"Dimension 1\")\n ax.set_ylabel(\"Dimension 2\")\n ax.set_zlabel(\"Dimension 3\")\n for i in range(len(df)):\n ax.text(df[\"x\"][i], df[\"y\"][i], df[\"z\"][i], df[\"element\"][i], fontsize=12)\n else:\n msg = \"Unrecognised number of dimensions.\"\n raise ValueError(msg)\n ax.set_title(embedding.embedding_name, fontdict={\"fontweight\": \"bold\"})\n return ax\n
"},{"location":"python_api/plotter/#elementembeddings.plotter.heatmap_plotter","title":"heatmap_plotter(embedding, metric, cmap='Blues', sortaxisby='mendeleev', ax=None, show_axislabels=True, **kwargs)
","text":"Plot multiple heatmaps of the embeddings.
embedding (Embedding): The embeddings to be plotted.\nmetric (str): The distance metric / similarity measure to be plotted.\ncmap (str): The colourmap for the heatmap.\nsortaxisby (str, optional): The attribute to sort the axis by,\nby default \"mendeleev_number\".\nOptions are \"mendeleev_number\", \"atomic_number\"\nax (plt.axes, optional): The axes to plot on, by default None\nshow_axislabels (bool, optional): Whether to show the axis, by default True\n**kwargs: Additional keyword arguments to pass to seaborn.heatmap\n
Source code in src/elementembeddings/plotter.py
def heatmap_plotter(\n embedding: Embedding | SpeciesEmbedding,\n metric: str,\n cmap: str = \"Blues\",\n sortaxisby: str = \"mendeleev\",\n ax: plt.axes | None = None,\n show_axislabels: bool = True,\n **kwargs,\n):\n \"\"\"Plot multiple heatmaps of the embeddings.\n\n Args:\n ----\n embedding (Embedding): The embeddings to be plotted.\n metric (str): The distance metric / similarity measure to be plotted.\n cmap (str): The colourmap for the heatmap.\n sortaxisby (str, optional): The attribute to sort the axis by,\n by default \"mendeleev_number\".\n Options are \"mendeleev_number\", \"atomic_number\"\n ax (plt.axes, optional): The axes to plot on, by default None\n show_axislabels (bool, optional): Whether to show the axis, by default True\n **kwargs: Additional keyword arguments to pass to seaborn.heatmap\n\n \"\"\"\n if not ax:\n fig, ax = plt.subplots()\n\n correlation_metrics = [\"spearman\", \"pearson\", \"cosine_similarity\"]\n distance_metrics = [\n \"euclidean\",\n \"manhattan\",\n \"cosine_distance\",\n \"chebyshev\",\n \"wasserstein\",\n \"energy\",\n ]\n if metric in correlation_metrics:\n p = embedding.correlation_pivot_table(metric=metric, sortby=sortaxisby)\n\n elif metric in distance_metrics:\n p = embedding.distance_pivot_table(metric=metric, sortby=sortaxisby)\n else:\n raise ValueError(\"Unrecognised metric.\")\n xlabels = [i[1] for i in p.index]\n ylabels = [i[1] for i in p.columns]\n sns.heatmap(\n p,\n cmap=cmap,\n square=\"True\",\n linecolor=\"k\",\n ax=ax,\n cbar_kws={\n \"shrink\": 0.5,\n },\n xticklabels=True,\n yticklabels=True,\n **kwargs,\n )\n ax.set_title(\n embedding.embedding_name,\n fontdict={\n \"fontweight\": \"bold\",\n },\n )\n if not show_axislabels:\n ax.set_xticklabels([])\n ax.set_yticklabels([])\n ax.set_xticks([])\n ax.set_yticks([])\n else:\n ax.set_xticklabels(\n xlabels,\n )\n ax.set_yticklabels(ylabels)\n ax.set_xlabel(\"\")\n ax.set_ylabel(\"\")\n return ax\n
"},{"location":"python_api/python_api/","title":"ElementEmbeddings Python package","text":"The core module of the ElementEmbeddings
contains the Embedding
class which is used to store and manipulate elemental representation data. This part of the project documentation provides the python API for the ElementEmbeddings
package.
Core module
Composition module
Plotter module
"},{"location":"python_api/utils/io/","title":"io","text":"IO utils for AtomicEmbeddings.
"},{"location":"python_api/utils/io/#elementembeddings.utils.io.NumpyEncoder","title":"NumpyEncoder
","text":" Bases: JSONEncoder
Special json encoder for numpy types.
Source code insrc/elementembeddings/utils/io.py
class NumpyEncoder(json.JSONEncoder):\n \"\"\"Special json encoder for numpy types.\"\"\"\n\n def default(self, obj):\n \"\"\"Encode numpy types.\"\"\"\n if isinstance(obj, np.ndarray):\n return obj.tolist()\n return json.JSONEncoder.default(self, obj)\n
"},{"location":"python_api/utils/io/#elementembeddings.utils.io.NumpyEncoder.default","title":"default(obj)
","text":"Encode numpy types.
Source code insrc/elementembeddings/utils/io.py
def default(self, obj):\n \"\"\"Encode numpy types.\"\"\"\n if isinstance(obj, np.ndarray):\n return obj.tolist()\n return json.JSONEncoder.default(self, obj)\n
"},{"location":"python_api/utils/math/","title":"Math","text":"Math functions for the AtomicEmbeddings package.
"},{"location":"python_api/utils/math/#elementembeddings.utils.math.cosine_distance","title":"cosine_distance(a, b)
","text":"Cosine distance of two vectors.
Source code insrc/elementembeddings/utils/math.py
def cosine_distance(\n a: list[int | float],\n b: list[int | float],\n) -> int | float:\n \"\"\"Cosine distance of two vectors.\"\"\"\n return 1 - cosine_similarity(a, b)\n
"},{"location":"python_api/utils/math/#elementembeddings.utils.math.cosine_similarity","title":"cosine_similarity(a, b)
","text":"Cosine similarity of two vectors.
Source code insrc/elementembeddings/utils/math.py
def cosine_similarity(\n a: list[int | float],\n b: list[int | float],\n) -> int | float:\n \"\"\"Cosine similarity of two vectors.\"\"\"\n return dot(a, b) / ((dot(a, a) ** 0.5) * (dot(b, b) ** 0.5))\n
"},{"location":"python_api/utils/math/#elementembeddings.utils.math.dot","title":"dot(a, b)
","text":"Dot product of two vectors.
Source code insrc/elementembeddings/utils/math.py
def dot(a: list[int | float], b: list[int | float]) -> int | float:\n \"\"\"Dot product of two vectors.\"\"\"\n return sum(map(operator.mul, a, b))\n
"},{"location":"python_api/utils/species/","title":"Species","text":"Utilities for species.
"},{"location":"python_api/utils/species/#elementembeddings.utils.species.get_sign","title":"get_sign(charge)
","text":"Get string representation of a number's sign.
Parameters:
Name Type Description Defaultcharge
int
The number whose sign to derive.
requiredReturns:
Name Type Descriptionsign
str
either '+', '-', or '' for neutral.
Source code insrc/elementembeddings/utils/species.py
def get_sign(charge: int) -> str:\n \"\"\"Get string representation of a number's sign.\n\n Args:\n charge (int): The number whose sign to derive.\n\n Returns:\n sign (str): either '+', '-', or '' for neutral.\n\n \"\"\"\n if charge > 0:\n return \"+\"\n elif charge < 0:\n return \"-\"\n else:\n return \"\"\n
"},{"location":"python_api/utils/species/#elementembeddings.utils.species.parse_species","title":"parse_species(species)
","text":"Parse a species string into its atomic symbol and oxidation state.
:param species: the species string :return: a tuple of the atomic symbol and oxidation state
Source code insrc/elementembeddings/utils/species.py
def parse_species(species: str) -> tuple[str, int]:\n \"\"\"\n Parse a species string into its atomic symbol and oxidation state.\n\n :param species: the species string\n :return: a tuple of the atomic symbol and oxidation state\n\n \"\"\"\n try:\n ele, oxi_state = re.match(r\"([A-Za-z]+)([0-9]*[\\+\\-])\", species).groups()\n if oxi_state[-1] in [\"+\", \"-\"]:\n charge = (int(oxi_state[:-1] or 1)) * (-1 if \"-\" in oxi_state else 1)\n return ele, charge\n else:\n return ele, 0\n except AttributeError:\n return _parse_species_old(species)\n
"},{"location":"tutorial/composition/","title":"Using the composition module","text":"In\u00a0[1]: Copied! import pandas as pd\nfrom elementembeddings.composition import composition_featuriser\nfrom elementembeddings.composition import CompositionalEmbedding\nimport numpy as np\n\nnp.set_printoptions(suppress=True)\nimport pandas as pd from elementembeddings.composition import composition_featuriser from elementembeddings.composition import CompositionalEmbedding import numpy as np np.set_printoptions(suppress=True)
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n from .autonotebook import tqdm as notebook_tqdm\n
The core class of the elementembeddings.composition
module is the CompositionalEmbedding
class. We can use this class the create objects which represent a composition and an elemental representation. We can create an instance of this class as follows:
CsPbI3_magpie = CompositionalEmbedding(formula='CsPbI3', embedding='magpie')\nIn\u00a0[2]: Copied!
CsPbI3_magpie = CompositionalEmbedding(formula=\"CsPbI3\", embedding=\"magpie\")\nCsPbI3_magpie = CompositionalEmbedding(formula=\"CsPbI3\", embedding=\"magpie\")
We can access the elemental embeddings of the individual elements in the composition from the el_matrix
attribute.
>>> CsPbI3_magpie.el_matrix\nIn\u00a0[3]: Copied!
# Print the individual element feature vectors\nprint(CsPbI3_magpie.el_matrix)\n# Print the individual element feature vectors print(CsPbI3_magpie.el_matrix)
[[ 55. 5. 132.9054519 301.59 1. 6.\n 244. 0.79 1. 0. 0. 0.\n 1. 1. 0. 0. 0. 1.\n 115.765 0. 0. 229. ]\n [ 82. 81. 207.2 600.61 14. 6.\n 146. 2.33 2. 2. 10. 14.\n 28. 0. 4. 0. 0. 4.\n 28.11 0. 0. 225. ]\n [ 53. 96. 126.90447 386.85 17. 5.\n 139. 2.66 2. 5. 10. 0.\n 17. 0. 1. 0. 0. 1.\n 43.015 1.062 0. 64. ]]\n
Some properties which are accessible are the composition
and fractional composition
which are dictionaries of element:amount key:value pairs.
# Print the composition and the fractional composition\nprint(CsPbI3_magpie.composition)\nprint(CsPbI3_magpie.fractional_composition)\n# Print the composition and the fractional composition print(CsPbI3_magpie.composition) print(CsPbI3_magpie.fractional_composition)
defaultdict(<class 'float'>, {'Cs': 1.0, 'Pb': 1.0, 'I': 3.0})\n{'Cs': 0.2, 'Pb': 0.2, 'I': 0.6}\n
Other properties and attributes that can be accessed are the (normalised) stoichiometry represented as a vector.
In\u00a0[5]: Copied!# Print the list of elements\nprint(CsPbI3_magpie.element_list)\n# Print the stoichiometric vector\nprint(CsPbI3_magpie.stoich_vector)\n\n# Print the normalized stoichiometric vector\nprint(CsPbI3_magpie.norm_stoich_vector)\n\n# Print the number of atoms\nprint(CsPbI3_magpie.num_atoms)\n# Print the list of elements print(CsPbI3_magpie.element_list) # Print the stoichiometric vector print(CsPbI3_magpie.stoich_vector) # Print the normalized stoichiometric vector print(CsPbI3_magpie.norm_stoich_vector) # Print the number of atoms print(CsPbI3_magpie.num_atoms)
['Cs', 'Pb', 'I']\n[1. 1. 3.]\n[0.2 0.2 0.6]\n5.0\n
We can create create compositional-based feature vectors using the feature_vector
method.
>>> CsPbI3_magpie.feature_vector()\n
By default, this will return the weighted average of the elemental embeddings of the composition. This would have the same dimension as the individual elemental embeddings. We can also specify the type of feature vector we want to create by passing the stats
argument.
>>> CsPbI3_magpie.feature_vector(stats=['mean', 'variance'])\n
This would return a feature vector which is the concatenation of the mean and variance of the elemental embeddings of the composition. This would have twice the dimension of the individual elemental embeddings. In general, the dimension of the feature vector is the product of the dimension of the elemental embeddings and the number of statistics requested.
The available statistics are:
mean
variance
minpool
maxpool
sum
range
harmonic_mean
geometric_mean
# Print the mean feature vector\nprint(CsPbI3_magpie.feature_vector(stats=\"mean\"))\n# Print the mean feature vector print(CsPbI3_magpie.feature_vector(stats=\"mean\"))
[ 59.2 74.8 144.16377238 412.55 13.2\n 5.4 161.4 2.22 1.8 3.4\n 8. 2.8 16. 0.2 1.4\n 0. 0. 1.6 54.584 0.6372\n 0. 129.2 ]\nIn\u00a0[7]: Copied!
print(CompositionalEmbedding(formula=\"NaCl\", embedding=\"magpie\").feature_vector())\nprint(CompositionalEmbedding(formula=\"NaCl\", embedding=\"magpie\").feature_vector())
[ 14. 48. 29.22138464 271.235 9.\n 3. 134. 2.045 1.5 2.5\n 0. 0. 4. 0.5 0.5\n 0. 0. 1. 26.87041667 1.2465\n 0. 146.5 ]\nIn\u00a0[8]: Copied!
# Print the feature vector for the mean, variance, minpool, maxpool, and sum\nCsPbI3_magpie_cbfv = CsPbI3_magpie.feature_vector(\n stats=[\"mean\", \"variance\", \"minpool\", \"maxpool\", \"sum\"]\n)\nprint(f\"The dimension of the feature vector is {CsPbI3_magpie_cbfv.shape[0]}\")\n\nprint(CsPbI3_magpie_cbfv)\n# Print the feature vector for the mean, variance, minpool, maxpool, and sum CsPbI3_magpie_cbfv = CsPbI3_magpie.feature_vector( stats=[\"mean\", \"variance\", \"minpool\", \"maxpool\", \"sum\"] ) print(f\"The dimension of the feature vector is {CsPbI3_magpie_cbfv.shape[0]}\") print(CsPbI3_magpie_cbfv)
The dimension of the feature vector is 110\n[ 59.2 74.8 144.16377238 412.55 13.2\n 5.4 161.4 2.22 1.8 3.4\n 8. 2.8 16. 0.2 1.4\n 0. 0. 1.6 54.584 0.6372\n 0. 129.2 130.56 1251.76 998.7932657\n 9932.03104 38.56 0.24 1713.04 0.52756\n 0.16 4.24 16. 31.36 74.4\n 0.16 1.84 0. 0. 1.44\n 969.102544 0.27068256 0. 6378.16 53.\n 5. 126.90447 301.59 1. 5.\n 139. 0.79 1. 0. 0.\n 0. 1. 0. 0. 0.\n 0. 1. 28.11 0. 0.\n 64. 82. 96. 207.2 600.61\n 17. 6. 244. 2.66 2.\n 5. 10. 14. 28. 1.\n 4. 0. 0. 4. 115.765\n 1.062 0. 229. 296. 374.\n 720.8188619 2062.75 66. 27. 807.\n 11.1 9. 17. 40. 14.\n 80. 1. 7. 0. 0.\n 8. 272.92 3.186 0. 646. ]\n
We can also featurise multiple formulas at once using the composition_featuriser
function.
>>> composition_featuriser([\"CsPbI3\", \"Fe2O3\", \"NaCl\"], embedding='magpie')\n
This will return a numpy
array of the feature vectors of the compositions. The order of the feature vectors will be the same as the order of the formulas in the input list.
formulas = [\"CsPbI3\", \"Fe2O3\", \"NaCl\"]\n\ncomposition_featuriser(formulas, embedding=\"magpie\", stats=\"mean\")\nformulas = [\"CsPbI3\", \"Fe2O3\", \"NaCl\"] composition_featuriser(formulas, embedding=\"magpie\", stats=\"mean\")
\r 0%| | 0/3 [00:00<?, ?it/s]
\r100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 24966.10it/s]
\nOut[9]:
[array([ 59.2 , 74.8 , 144.16377238, 412.55 ,\n 13.2 , 5.4 , 161.4 , 2.22 ,\n 1.8 , 3.4 , 8. , 2.8 ,\n 16. , 0.2 , 1.4 , 0. ,\n 0. , 1.6 , 54.584 , 0.6372 ,\n 0. , 129.2 ]),\n array([ 15.2 , 74.2 , 31.93764 , 757.28 ,\n 12.8 , 2.8 , 92.4 , 2.796 ,\n 2. , 2.4 , 2.4 , 0. ,\n 6.8 , 0. , 1.2 , 1.6 ,\n 0. , 2.8 , 9.755 , 0. ,\n 0.84426512, 98.8 ]),\n array([ 14. , 48. , 29.22138464, 271.235 ,\n 9. , 3. , 134. , 2.045 ,\n 1.5 , 2.5 , 0. , 0. ,\n 4. , 0.5 , 0.5 , 0. ,\n 0. , 1. , 26.87041667, 1.2465 ,\n 0. , 146.5 ])]In\u00a0[10]: Copied!
df = pd.DataFrame({\"formula\": formulas})\ncomposition_featuriser(df, embedding=\"magpie\", stats=[\"mean\", \"sum\"])\ndf = pd.DataFrame({\"formula\": formulas}) composition_featuriser(df, embedding=\"magpie\", stats=[\"mean\", \"sum\"])
Featurising compositions...\n
\r 0%| | 0/3 [00:00<?, ?it/s]
\r100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 561.94it/s]
\n
Computing feature vectors...\n
\r 0%| | 0/3 [00:00<?, ?it/s]
\r100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 25523.15it/s]
\nOut[10]: formula mean_Number mean_MendeleevNumber mean_AtomicWeight mean_MeltingT mean_Column mean_Row mean_CovalentRadius mean_Electronegativity mean_NsValence ... sum_NValence sum_NsUnfilled sum_NpUnfilled sum_NdUnfilled sum_NfUnfilled sum_NUnfilled sum_GSvolume_pa sum_GSbandgap sum_GSmagmom sum_SpaceGroupNumber 0 CsPbI3 59.2 74.8 144.163772 412.550 13.2 5.4 161.4 2.220 1.8 ... 80.0 1.0 7.0 0.0 0.0 8.0 272.920000 3.186 0.000000 646.0 1 Fe2O3 15.2 74.2 31.937640 757.280 12.8 2.8 92.4 2.796 2.0 ... 34.0 0.0 6.0 8.0 0.0 14.0 48.775000 0.000 4.221326 494.0 2 NaCl 14.0 48.0 29.221385 271.235 9.0 3.0 134.0 2.045 1.5 ... 8.0 1.0 1.0 0.0 0.0 2.0 53.740833 2.493 0.000000 293.0
3 rows \u00d7 45 columns
We can also calculate the \"distance\" between two compositions using their feature vectors. This can be used to determine which compositions are more similar to each other.
In\u00a0[11]: Copied!print(\n f\"The euclidean distance between CsPbI3 and Fe2O3 is {CsPbI3_magpie.distance('Fe2O3', distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint(\n f\"The euclidean distance between CsPbI3 and NaCl is {CsPbI3_magpie.distance('NaCl',distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint(\n f\"The euclidean distance between CsPbI3 and CsPbCl3 is {CsPbI3_magpie.distance('CsPbCl3',distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint( f\"The euclidean distance between CsPbI3 and Fe2O3 is {CsPbI3_magpie.distance('Fe2O3', distance_metric='euclidean', stats='mean'):.2f}\" ) print( f\"The euclidean distance between CsPbI3 and NaCl is {CsPbI3_magpie.distance('NaCl',distance_metric='euclidean', stats='mean'):.2f}\" ) print( f\"The euclidean distance between CsPbI3 and CsPbCl3 is {CsPbI3_magpie.distance('CsPbCl3',distance_metric='euclidean', stats='mean'):.2f}\" )
The euclidean distance between CsPbI3 and Fe2O3 is 375.77\nThe euclidean distance between CsPbI3 and NaCl is 194.94\nThe euclidean distance between CsPbI3 and CsPbCl3 is 144.39\n
Based on the mean-pooled feature vectors, we can see that CsPbI3 and CsPbBr3 are more similar to each other than CsPbI3 and Fe2O3.
"},{"location":"tutorial/composition/#using-the-composition-module","title":"Using the composition module\u00b6","text":""},{"location":"tutorial/species/","title":"Interacting with ionic species representations using ElementEmbeddings","text":"In\u00a0[1]: Copied!from elementembeddings.core import SpeciesEmbedding\nfrom elementembeddings.composition import (\n SpeciesCompositionalEmbedding,\n species_composition_featuriser,\n)\nfrom elementembeddings.core import SpeciesEmbedding from elementembeddings.composition import ( SpeciesCompositionalEmbedding, species_composition_featuriser, )
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n from .autonotebook import tqdm as notebook_tqdm\n
Elements are the building blocks of chemistry, but species (elements in a given charge state) dictate the structure and properties of inorganic compounds.
For example, the local spin and atomic environment in Fe(s), FeO, Fe2O3, and Fe3O4 solids are different due to variations in the charge state and coordination of iron.
For composition only machine learning, there many representation schemes that enable us to represent compounds as vectors, built on embeddings of elements. However, this may present a limitation when we want to represent ionic species, as the charge state of the element is not taken into account. As such, we need to represent ionic species as vectors.
The ElementEmbeddings package contains a set of pre-trained embeddings for elements and ionic species, which can be used to represent ionic species in a vector space.
At the time of writing, the 200-dimension SkipSpecies vector embeddings are available for ionic species representations. These embeddings are trained using the Skip-gram model on a large dataset of inorganic compounds.
In\u00a0[2]: Copied!# Load the SkipSpecies vectors as a SpeciesEmbedding object\n\nskipspecies = SpeciesEmbedding.load_data(embedding_name=\"skipspecies\")\n\n\nprint(\"Below is the representation of Fe3+ using the SkipSpecies vectors.\")\n\nprint(skipspecies.embeddings[\"Fe3+\"])\n# Load the SkipSpecies vectors as a SpeciesEmbedding object skipspecies = SpeciesEmbedding.load_data(embedding_name=\"skipspecies\") print(\"Below is the representation of Fe3+ using the SkipSpecies vectors.\") print(skipspecies.embeddings[\"Fe3+\"])
Below is the representation of Fe3+ using the SkipSpecies vectors.\n[-3.46536078e-02 -3.23320180e-02 -6.41056001e-02 -6.64595328e-03\n -3.81412022e-02 -9.60185826e-02 -1.92383174e-02 -2.02107765e-02\n 8.79131556e-02 9.14798677e-02 -3.54749635e-02 -1.33267939e-01\n -1.77447721e-01 -9.33702961e-02 -7.14094117e-02 -6.68478478e-03\n -1.49846703e-01 3.65290008e-02 -1.11083306e-01 2.04584867e-01\n -7.30767250e-02 7.07381591e-02 1.29051596e-01 8.26864019e-02\n -3.41298096e-02 1.55206323e-01 5.24081439e-02 7.91398287e-02\n 1.86461732e-02 1.88235074e-01 1.51956931e-01 1.14296928e-01\n -1.12691864e-01 6.95107281e-02 -1.16133653e-01 -1.42861262e-01\n -3.24610062e-02 -6.37443736e-02 9.47019458e-02 -7.04379454e-02\n 1.51012568e-02 -6.04141466e-02 -7.57871270e-02 6.90726042e-02\n -3.73109318e-02 -1.04284994e-01 -7.36037940e-02 -3.05999294e-02\n -4.32690326e-03 -6.09171018e-02 1.28173083e-02 4.53064829e-01\n 4.73245084e-02 -1.39801240e+00 -1.01322591e-01 -1.62838653e-01\n -4.33158763e-02 -1.32046595e-01 1.88525077e-02 -9.60192643e-03\n -5.94866455e-01 1.12727061e-01 1.86967605e-03 8.49850774e-02\n 1.26277655e-01 -5.00426851e-02 -4.56427746e-02 -3.25046569e-01\n 1.37247995e-01 -9.46224555e-02 7.27631105e-03 -5.33877499e-02\n -3.18312906e-02 -8.66127461e-02 -1.40548006e-01 6.63848501e-03\n 6.23855107e-02 1.06035680e-01 -1.68600217e-01 -1.79605886e-01\n -9.72149730e-01 1.33717686e-01 -5.84784038e-02 -1.49619198e+00\n 1.86823923e-02 7.76157603e-02 -5.89469783e-02 -9.49078351e-02\n -1.11909047e-01 3.17605101e-02 5.79413511e-02 1.40282623e-02\n 7.69326091e-02 -1.12443836e-02 -8.67934301e-02 -6.59158587e-01\n 9.15968940e-02 -3.47942114e-01 -9.98707302e-03 -4.93343398e-02\n 7.81614780e-02 1.12851635e-01 2.69402359e-02 1.41710088e-01\n 5.72816245e-02 1.60002038e-01 -2.57115781e-01 -1.09435096e-01\n -4.88008857e-02 5.72116769e-05 -1.07527770e-01 5.56552038e-02\n 7.56548047e-02 8.72470587e-02 -1.57128468e-01 -1.33189365e-01\n -1.06330979e+00 -5.80653787e-01 -7.17684031e-02 -3.73947710e-01\n 1.13771893e-02 -1.42221987e-01 -1.48932025e-01 -2.07824185e-02\n 3.69309634e-02 1.27229178e-02 4.40038621e-01 -1.32923722e-01\n -1.88622907e-01 2.58340001e-01 2.99438331e-02 1.02058776e-01\n 1.04237549e-01 -9.04425755e-02 2.39991665e-01 8.11270997e-02\n -2.99125281e-03 2.83314623e-02 -2.62917858e-02 7.42266746e-03\n -5.04185539e-03 -4.37292382e-02 1.17831230e-01 -4.98771993e-03\n 1.18534625e-01 1.53611377e-01 5.65077439e-02 -1.91291913e-01\n -9.52507034e-02 -8.89603943e-02 2.01912194e-01 1.17760837e-01\n -2.85485648e-02 -9.52739790e-02 1.49672581e-02 -7.14538768e-02\n 4.95206676e-02 3.00312508e-02 8.33884105e-02 9.99914482e-02\n -9.40189809e-02 -4.94113080e-02 5.30362427e-02 -3.15267175e-01\n -3.44095714e-02 1.56485736e-02 2.91987918e-02 -7.36336783e-02\n -1.27800524e-01 5.92167228e-02 1.07430264e-01 5.31437919e-02\n -1.76421866e-01 2.23079890e-01 7.48595372e-02 -5.39487004e-01\n 5.16922653e-01 1.29015148e-01 4.36748080e-02 -5.45317074e-03\n 1.46122992e-01 -7.71054178e-02 3.18054631e-02 -4.02254723e-02\n -7.62721375e-02 5.14244894e-03 -6.23153821e-02 -6.00104272e-01\n 6.64846972e-02 6.28835186e-02 -1.06045604e-01 -1.76288888e-01\n -4.96284366e-02 -7.97898546e-02 7.50872344e-02 -5.45614585e-03\n -6.50706142e-02 -2.17388973e-01 -3.25618118e-01 4.77024205e-02]\n
We can check the ionic species which have a feature vector for a particular embedding
In\u00a0[3]: Copied!print(\"SkipSpecies has feature vectors for the following ionic species:\\n\")\nprint(skipspecies.species_list)\nprint(\"SkipSpecies has feature vectors for the following ionic species:\\n\") print(skipspecies.species_list)
SkipSpecies has feature vectors for the following ionic species:\n\n['H+', 'H-', 'Li+', 'Be2+', 'B+', 'B2+', 'B2-', 'B3-', 'B3+', 'B-', 'C4-', 'C-', 'C4+', 'C+', 'C2+', 'C3+', 'C2-', 'C3-', 'N3-', 'N2+', 'N3+', 'N-', 'N+', 'N2-', 'N5+', 'N4+', 'O2-', 'O-', 'F-', 'Na+', 'Mg2+', 'Al3+', 'Al2+', 'Si2+', 'Si4+', 'Si-', 'Si2-', 'Si4-', 'Si3+', 'Si3-', 'P5+', 'P2-', 'P3-', 'P4+', 'P+', 'P-', 'P3+', 'P2+', 'S2-', 'S6+', 'S-', 'S2+', 'S3+', 'S+', 'S4+', 'S5+', 'Cl-', 'Cl7+', 'Cl5+', 'Cl3+', 'K+', 'Ca2+', 'Sc3+', 'Sc+', 'Sc2+', 'Ti3+', 'Ti4+', 'Ti2+', 'V4+', 'V3+', 'V2+', 'V5+', 'Cr3+', 'Cr2+', 'Cr6+', 'Cr4+', 'Cr5+', 'Mn2+', 'Mn3+', 'Mn4+', 'Mn+', 'Mn7+', 'Mn6+', 'Mn5+', 'Fe2+', 'Fe3+', 'Fe+', 'Fe4+', 'Fe6+', 'Fe5+', 'Co2+', 'Co4+', 'Co3+', 'Co+', 'Ni2+', 'Ni4+', 'Ni3+', 'Ni+', 'Cu2+', 'Cu3+', 'Cu+', 'Zn2+', 'Ga+', 'Ga3+', 'Ga4+', 'Ga2+', 'Ge4-', 'Ge4+', 'Ge2-', 'Ge2+', 'Ge3+', 'As-', 'As2-', 'As3+', 'As5+', 'As3-', 'As+', 'As2+', 'As4+', 'Se2-', 'Se-', 'Se4+', 'Se6+', 'Se5+', 'Se2+', 'Se+', 'Se3+', 'Br-', 'Br+', 'Br2+', 'Br5+', 'Br3+', 'Rb+', 'Sr2+', 'Y3+', 'Y2+', 'Y+', 'Zr2+', 'Zr4+', 'Zr3+', 'Zr+', 'Nb5+', 'Nb3+', 'Nb4+', 'Nb2+', 'Nb+', 'Nb7+', 'Mo3+', 'Mo4+', 'Mo6+', 'Mo5+', 'Mo2+', 'Tc-', 'Tc4+', 'Tc3-', 'Tc3+', 'Tc+', 'Tc7+', 'Tc5+', 'Tc6+', 'Tc2-', 'Tc2+', 'Ru2+', 'Ru6+', 'Ru4+', 'Ru5+', 'Ru3+', 'Rh+', 'Rh4+', 'Rh3+', 'Pd2+', 'Pd4+', 'Pd3+', 'Ag3+', 'Ag+', 'Ag2+', 'Cd2+', 'In3+', 'In+', 'In2+', 'Sn4+', 'Sn3+', 'Sn2+', 'Sb5+', 'Sb2-', 'Sb3-', 'Sb3+', 'Sb4+', 'Sb-', 'Sb+', 'Te-', 'Te2-', 'Te4+', 'Te6+', 'Te2+', 'Te5+', 'Te+', 'I-', 'I3+', 'I7+', 'I5+', 'I+', 'I2+', 'Cs+', 'Ba2+', 'La3+', 'La2+', 'La+', 'Ce3+', 'Ce2+', 'Ce4+', 'Pr3+', 'Pr4+', 'Pr2+', 'Nd3+', 'Nd2+', 'Pm3+', 'Sm3+', 'Sm2+', 'Eu2+', 'Eu3+', 'Gd2+', 'Gd3+', 'Tb3+', 'Tb+', 'Tb2+', 'Tb4+', 'Dy3+', 'Dy2+', 'Ho3+', 'Ho2+', 'Er3+', 'Tm3+', 'Tm2+', 'Yb3+', 'Yb2+', 'Lu3+', 'Hf3+', 'Hf2+', 'Hf4+', 'Ta5+', 'Ta3+', 'Ta4+', 'Ta+', 'Ta2+', 'W6+', 'W4+', 'W2+', 'W3+', 'W5+', 'Re5+', 'Re3+', 'Re6+', 'Re2+', 'Re4+', 'Re7+', 'Os7+', 'Os6+', 'Os5+', 'Os2-', 'Os3+', 'Os-', 'Os4+', 'Os8+', 'Os2+', 'Os+', 'Ir3+', 'Ir4+', 'Ir5+', 'Ir6+', 'Pt2+', 'Pt2-', 'Pt4+', 'Pt3+', 'Pt5+', 'Pt-', 'Pt6+', 'Pt+', 'Au-', 'Au2+', 'Au+', 'Au3+', 'Au5+', 'Au4+', 'Hg2+', 'Hg+', 'Tl+', 'Tl3+', 'Tl2+', 'Pb2+', 'Pb3+', 'Pb4+', 'Bi3+', 'Bi5+', 'Bi2+', 'Bi3-', 'Bi4+', 'Bi+', 'Ac3+', 'Th4+', 'Th3+', 'Pa4+', 'Pa5+', 'Pa3+', 'U6+', 'U4+', 'U3+', 'U2+', 'U5+', 'Np6+', 'Np4+', 'Np3+', 'Np7+', 'Np5+', 'Pu7+', 'Pu6+', 'Pu3+', 'Pu4+', 'Pu5+']\n
We can also check which elements have an ionic species representation in the embedding
In\u00a0[4]: Copied!print(\"The folliowing elements have SkipSpecies ionic species representations:\\n\")\nprint(skipspecies.element_list)\nprint(\"The folliowing elements have SkipSpecies ionic species representations:\\n\") print(skipspecies.element_list)
The folliowing elements have SkipSpecies ionic species representations:\n\n['Sr', 'I', 'Ni', 'Y', 'Na', 'Sb', 'Rb', 'Mn', 'C', 'Pd', 'Ir', 'K', 'Br', 'Pt', 'Ca', 'Li', 'Ru', 'Pr', 'Cl', 'U', 'Au', 'Er', 'Al', 'S', 'Te', 'Os', 'Hg', 'Ge', 'Nd', 'B', 'Co', 'Pb', 'Tm', 'Pu', 'Mg', 'Cd', 'Eu', 'Sn', 'Ac', 'La', 'Tb', 'W', 'F', 'In', 'Rh', 'Lu', 'Ce', 'Pa', 'Sc', 'Zr', 'Ag', 'Np', 'Re', 'Yb', 'Ta', 'Sm', 'Be', 'Cu', 'Gd', 'Si', 'O', 'As', 'Ti', 'Cr', 'Mo', 'P', 'Se', 'H', 'Ba', 'Pm', 'Hf', 'Bi', 'Fe', 'V', 'Zn', 'Ga', 'Tc', 'Dy', 'Ho', 'Tl', 'Nb', 'Th', 'N', 'Cs']\n
Like the element representations, BibTex citation information is available for the ionic species embeddings.
In\u00a0[5]: Copied!print(skipspecies.citation())\nprint(skipspecies.citation())
['@article{Onwuli_Butler_Walsh_2024, title={Ionic species representations for materials informatics}, DOI={10.26434/chemrxiv-2024-8621l}, journal={ChemRxiv}, author={Onwuli, Anthony and Butler, Keith T. and Walsh, Aron}, year={2024}} This content is a preprint and has not been peer-reviewed.', '@article{antunes2022distributed,title={Distributed representations of atoms and materials for machine learning},author={Antunes, Luis M and Grau-Crespo, Ricardo and Butler, Keith T},journal={npj Computational Materials},volume={8},number={1},pages={1--9},year={2022},publisher={Nature Publishing Group} }']\nIn\u00a0[6]: Copied!
composition = {\"Fe2+\": 1, \"Fe3+\": 2, \"O2-\": 4}\n\nFe3O4_skipspecies = SpeciesCompositionalEmbedding(\n formula_dict=composition, embedding=skipspecies\n)\ncomposition = {\"Fe2+\": 1, \"Fe3+\": 2, \"O2-\": 4} Fe3O4_skipspecies = SpeciesCompositionalEmbedding( formula_dict=composition, embedding=skipspecies )
A few properties are accessible from the SpeciesCompositionalEmbedding
class
# Print the pretty formula\n\nprint(Fe3O4_skipspecies.formula_pretty)\n\n# Print the list of elements in the composition\nprint(Fe3O4_skipspecies.element_list)\n# Print the list of ionic species in the composition\nprint(Fe3O4_skipspecies.species_list)\n\n\n# Print the stoichiometric vector of the composition\nprint(Fe3O4_skipspecies.stoich_vector)\n\n# Print the normalised stoichiometric vector of the composition\nprint(Fe3O4_skipspecies.norm_stoich_vector)\n\n# Print the number of atoms\nprint(Fe3O4_skipspecies.num_atoms)\n# Print the pretty formula print(Fe3O4_skipspecies.formula_pretty) # Print the list of elements in the composition print(Fe3O4_skipspecies.element_list) # Print the list of ionic species in the composition print(Fe3O4_skipspecies.species_list) # Print the stoichiometric vector of the composition print(Fe3O4_skipspecies.stoich_vector) # Print the normalised stoichiometric vector of the composition print(Fe3O4_skipspecies.norm_stoich_vector) # Print the number of atoms print(Fe3O4_skipspecies.num_atoms)
Fe3O4\n['O', 'Fe']\n['Fe2+', 'Fe3+', 'O2-']\n[1 2 4]\n[0.14285714 0.28571429 0.57142857]\n7\nIn\u00a0[8]: Copied!
compositions = [\n {\"Fe2+\": 1, \"Fe3+\": 2, \"O2-\": 4},\n {\"Fe3+\": 2, \"O2-\": 3},\n {\"Li+\": 7, \"La3+\": 3, \"Zr4+\": 1, \"O2-\": 12},\n {\"Cs+\": 1, \"Pb2+\": 1, \"I-\": 3},\n {\"Pb2+\": 1, \"Pb4+\": 1, \"O2-\": 3},\n]\n\nfeaturised_comps_df = species_composition_featuriser(\n data=compositions, embedding=\"skipspecies\", stats=\"mean\", to_dataframe=True\n)\n\nfeaturised_comps_df\ncompositions = [ {\"Fe2+\": 1, \"Fe3+\": 2, \"O2-\": 4}, {\"Fe3+\": 2, \"O2-\": 3}, {\"Li+\": 7, \"La3+\": 3, \"Zr4+\": 1, \"O2-\": 12}, {\"Cs+\": 1, \"Pb2+\": 1, \"I-\": 3}, {\"Pb2+\": 1, \"Pb4+\": 1, \"O2-\": 3}, ] featurised_comps_df = species_composition_featuriser( data=compositions, embedding=\"skipspecies\", stats=\"mean\", to_dataframe=True ) featurised_comps_df
\rComputing feature vectors: 0%| | 0/5 [00:00<?, ?it/s]
\rComputing feature vectors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5 [00:00<00:00, 23250.02it/s]
\nOut[8]: formula composition mean_0 mean_1 mean_2 mean_3 mean_4 mean_5 mean_6 mean_7 ... mean_190 mean_191 mean_192 mean_193 mean_194 mean_195 mean_196 mean_197 mean_198 mean_199 0 Fe3O4 {'Fe2+': 1, 'Fe3+': 2, 'O2-': 4} -0.018255 0.001659 -0.009839 0.005230 -0.010928 -0.057023 -0.002567 -0.005813 ... -0.037202 -0.008057 -0.027421 -0.008534 -0.009001 0.002369 0.017834 -0.055822 -0.219390 0.020507 1 Fe2O3 {'Fe3+': 2, 'O2-': 3} -0.036597 -0.009373 -0.013700 -0.015516 -0.020896 -0.071463 0.002221 -0.014784 ... -0.045530 -0.024589 -0.037825 -0.025545 0.010654 -0.002034 -0.001094 -0.096479 -0.211483 0.035755 2 Li7La3ZrO12 {'Li+': 7, 'La3+': 3, 'Zr4+': 1, 'O2-': 12} -0.031236 -0.015952 -0.018968 -0.029273 -0.005297 -0.035049 0.045972 -0.032007 ... -0.042820 0.045177 -0.056733 0.006726 0.017449 -0.023732 0.021772 -0.034134 -0.102773 0.061038 3 CsPbI3 {'Cs+': 1, 'Pb2+': 1, 'I-': 3} -0.002381 0.023988 -0.026468 -0.020235 -0.002876 -0.033317 0.076300 -0.069057 ... 0.055368 0.058231 -0.079549 -0.032172 -0.076099 -0.024554 0.108428 -0.058528 -0.055804 -0.031679 4 Pb2O3 {'Pb2+': 1, 'Pb4+': 1, 'O2-': 3} -0.077403 -0.015334 0.023065 -0.060073 -0.043160 -0.140865 0.067917 -0.044093 ... 0.038975 0.102474 -0.051598 0.001011 -0.131225 -0.026707 0.145250 -0.057493 -0.188810 0.055239
5 rows \u00d7 202 columns
In\u00a0[9]: Copied!print(\n f\"The euclidean distance between Fe3O4 and Fe2O3 is {Fe3O4_skipspecies.distance({'Fe3+': 2, 'O2-': 3}, distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint(\n f\"The euclidean distance between Fe3O4 and Pb2O3 is {Fe3O4_skipspecies.distance({'Pb2+': 1, 'Pb4+': 1, 'O2-': 3}, distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint(\n f\"The euclidean distance between Fe3O4 and CsPbI3 is {Fe3O4_skipspecies.distance({'Cs+': 1, 'Pb2+': 1, 'I-': 3},distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint( f\"The euclidean distance between Fe3O4 and Fe2O3 is {Fe3O4_skipspecies.distance({'Fe3+': 2, 'O2-': 3}, distance_metric='euclidean', stats='mean'):.2f}\" ) print( f\"The euclidean distance between Fe3O4 and Pb2O3 is {Fe3O4_skipspecies.distance({'Pb2+': 1, 'Pb4+': 1, 'O2-': 3}, distance_metric='euclidean', stats='mean'):.2f}\" ) print( f\"The euclidean distance between Fe3O4 and CsPbI3 is {Fe3O4_skipspecies.distance({'Cs+': 1, 'Pb2+': 1, 'I-': 3},distance_metric='euclidean', stats='mean'):.2f}\" )
The euclidean distance between Fe3O4 and Fe2O3 is 0.38\nThe euclidean distance between Fe3O4 and Pb2O3 is 1.60\nThe euclidean distance between Fe3O4 and CsPbI3 is 2.11\n
Based on the mean-pooled feature vectors, we can see that Fe3O4 is closer to Fe2O3 than either Pb2O3 and CsPbI3.
"},{"location":"tutorial/species/#interacting-with-ionic-species-representations-using-elementembeddings","title":"Interacting with ionic species representations using ElementEmbeddings\u00b6","text":"This notebook will serve as a tutorial for using the ElementEmbeddings package to interact with ionic species representations.
"},{"location":"tutorial/species/#representing-ionic-compositions-using-elementembeddings","title":"Representing ionic compositions using ElementEmbeddings\u00b6","text":"In addition to representing individual ionic species, we can also represent ionic compositions using the ElementEmbeddings package. This is useful for representing inorganic compounds as vectors. Let's take the example of Fe3O4.
Fe3O4 is a mixed-valence iron oxide, with a formula unit of Fe3O4. We pass the composition as a dicitionary in the following format:
composition = {\n 'Fe2+': 1,\n 'Fe3+': 2,\n 'O2-': 4\n }\n"},{"location":"tutorial/species/#featurising-compositions","title":"Featurising compositions\u00b6","text":"
We can featurise the composition using the .feature_vector
method. This method returns the feature vector for the composition. This is identical in operation to the CompositionEmbedding
class for featurising compositions.
The species_composition_featuriser
can be used to featurise a list of compositions. This is useful for featurising a large number of compositions. It can also export the feature vectors to a pandas DataFrame by setting the to_dataframe
argument to True
.
We can also calculate the \"distance\" between two compositions using their feature vectors. This can be used to determine which compositions are more similar to each other.
"},{"location":"tutorial/usage/","title":"Using the ElementEmbeddings package","text":"In\u00a0[1]: Copied!# Imports\nimport numpy as np\nimport pandas as pd\nimport seaborn as sns\n\nfrom elementembeddings.core import Embedding\nfrom elementembeddings.plotter import heatmap_plotter, dimension_plotter\nimport matplotlib.pyplot as plt\n\nsns.set(font_scale=1.5)\n# Imports import numpy as np import pandas as pd import seaborn as sns from elementembeddings.core import Embedding from elementembeddings.plotter import heatmap_plotter, dimension_plotter import matplotlib.pyplot as plt sns.set(font_scale=1.5)
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n from .autonotebook import tqdm as notebook_tqdm\nIn\u00a0[2]: Copied!
# Create a list of the available CBFVs included in the package\n\ncbfvs = [\n \"magpie\",\n \"mat2vec\",\n \"matscholar\",\n \"megnet16\",\n \"oliynyk\",\n \"random_200\",\n \"skipatom\",\n \"mod_petti\",\n \"magpie_sc\",\n \"oliynyk_sc\",\n]\n\n# Create a dictionary of {cbfv name : Embedding objects} key, value pairs\nAtomEmbeds = {cbfv: Embedding.load_data(cbfv) for cbfv in cbfvs}\n# Create a list of the available CBFVs included in the package cbfvs = [ \"magpie\", \"mat2vec\", \"matscholar\", \"megnet16\", \"oliynyk\", \"random_200\", \"skipatom\", \"mod_petti\", \"magpie_sc\", \"oliynyk_sc\", ] # Create a dictionary of {cbfv name : Embedding objects} key, value pairs AtomEmbeds = {cbfv: Embedding.load_data(cbfv) for cbfv in cbfvs}
Taking the magpie representation as our example, we will demonstrate some features of the the Embedding
class.
# Let's use magpie as our example\n\n# Let's look at the CBFV of hydrogen for the magpie representation\nprint(\n \"Below is the CBFV/representation of the hydrogen atom from the magpie data we have \\n\"\n)\nprint(AtomEmbeds[\"magpie\"].embeddings[\"H\"])\n# Let's use magpie as our example # Let's look at the CBFV of hydrogen for the magpie representation print( \"Below is the CBFV/representation of the hydrogen atom from the magpie data we have \\n\" ) print(AtomEmbeds[\"magpie\"].embeddings[\"H\"])
Below is the CBFV/representation of the hydrogen atom from the magpie data we have \n\n[ 1. 92. 1.00794 14.01 1. 1. 31.\n 2.2 1. 0. 0. 0. 1. 1.\n 0. 0. 0. 1. 6.615 7.853 0.\n 194. ]\n
We can check the elements which have a feature vector for a particular embedding
In\u00a0[4]: Copied!# We can also check to see what elements have a CBFV for our chosen representation\nprint(\"Magpie has composition-based feature vectors for the following elements: \\n\")\nprint(AtomEmbeds[\"magpie\"].element_list)\n# We can also check to see what elements have a CBFV for our chosen representation print(\"Magpie has composition-based feature vectors for the following elements: \\n\") print(AtomEmbeds[\"magpie\"].element_list)
Magpie has composition-based feature vectors for the following elements: \n\n['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk']\n
For the elemental representations distributed with the package, we also included BibTex citations of the original papers were these representations are derived from. This is accessible through the .citation()
method.
# Print the bibtex citation for the magpie embedding\nprint(AtomEmbeds[\"magpie\"].citation())\n# Print the bibtex citation for the magpie embedding print(AtomEmbeds[\"magpie\"].citation())
['@article{ward2016general,title={A general-purpose machine learning framework for predicting properties of inorganic materials},author={Ward, Logan and Agrawal, Ankit and Choudhary, Alok and Wolverton, Christopher},journal={npj Computational Materials},volume={2},number={1},pages={1--7},year={2016},publisher={Nature Publishing Group}}']\n
We can also check the dimensionality of the elemental representation.
In\u00a0[6]: Copied!# We can quickly check the dimensionality of this CBFV\nmagpie_dim = AtomEmbeds[\"magpie\"].dim\nprint(f\"The magpie CBFV has a dimensionality of {magpie_dim}\")\n# We can quickly check the dimensionality of this CBFV magpie_dim = AtomEmbeds[\"magpie\"].dim print(f\"The magpie CBFV has a dimensionality of {magpie_dim}\")
The magpie CBFV has a dimensionality of 22\nIn\u00a0[7]: Copied!
# Let's find the dimensionality of all of the CBFVs that we have loaded\n\n\nAtomEmbeds_dim = {\n cbfv: {\"dim\": AtomEmbeds[cbfv].dim, \"type\": AtomEmbeds[cbfv].embedding_type}\n for cbfv in cbfvs\n}\n\ndim_df = pd.DataFrame.from_dict(AtomEmbeds_dim)\ndim_df.T\n# Let's find the dimensionality of all of the CBFVs that we have loaded AtomEmbeds_dim = { cbfv: {\"dim\": AtomEmbeds[cbfv].dim, \"type\": AtomEmbeds[cbfv].embedding_type} for cbfv in cbfvs } dim_df = pd.DataFrame.from_dict(AtomEmbeds_dim) dim_df.T Out[7]: dim type magpie 22 vector mat2vec 200 vector matscholar 200 vector megnet16 16 vector oliynyk 44 vector random_200 200 vector skipatom 200 vector mod_petti 103 one-hot magpie_sc 22 vector oliynyk_sc 44 vector
We can see a wide range of dimensions of the composition-based feature vectors.
Let's know explore more of the core features of the package. The numerical representation of the elements enables us to quantify the differences between atoms. With these embedding features, we can explore how similar to atoms are by using a 'distance' metric. Atoms with distances close to zero are 'similar', whereas elements which have a large distance between them should in theory be dissimilar.
Using the class method compute_distance_metric
, we can compute these distances.
# Let's continue using our magpie cbfv\n# The package contains some default distance metrics: euclidean, manhattan, chebyshev\n\nmetrics = [\"euclidean\", \"manhattan\", \"chebyshev\", \"wasserstein\", \"energy\"]\n\ndistances = [\n AtomEmbeds[\"magpie\"].compute_distance_metric(\"Li\", \"K\", metric=metric)\n for metric in metrics\n]\nprint(\"For the magpie representation:\")\nfor i, distance in enumerate(distances):\n print(\n f\"Using the metric {metrics[i]}, the distance between Li and K is {distance:.2f}\"\n )\n# Let's continue using our magpie cbfv # The package contains some default distance metrics: euclidean, manhattan, chebyshev metrics = [\"euclidean\", \"manhattan\", \"chebyshev\", \"wasserstein\", \"energy\"] distances = [ AtomEmbeds[\"magpie\"].compute_distance_metric(\"Li\", \"K\", metric=metric) for metric in metrics ] print(\"For the magpie representation:\") for i, distance in enumerate(distances): print( f\"Using the metric {metrics[i]}, the distance between Li and K is {distance:.2f}\" )
For the magpie representation:\nUsing the metric euclidean, the distance between Li and K is 154.41\nUsing the metric manhattan, the distance between Li and K is 300.99\nUsing the metric chebyshev, the distance between Li and K is 117.16\nUsing the metric wasserstein, the distance between Li and K is 13.68\nUsing the metric energy, the distance between Li and K is 1.25\nIn\u00a0[9]: Copied!
# Let's continue using our magpie cbfv\n# The package contains some default distance metrics: euclidean, manhattan, chebyshev\n\nmetrics = [\"euclidean\", \"manhattan\", \"chebyshev\", \"wasserstein\", \"energy\"]\n\ndistances = [\n AtomEmbeds[\"magpie_sc\"].compute_distance_metric(\"Li\", \"K\", metric=metric)\n for metric in metrics\n]\nprint(\"For the scaled magpie representation:\")\nfor i, distance in enumerate(distances):\n print(\n f\"Using the metric {metrics[i]}, the distance between Li and K is {distance:.2f}\"\n )\n# Let's continue using our magpie cbfv # The package contains some default distance metrics: euclidean, manhattan, chebyshev metrics = [\"euclidean\", \"manhattan\", \"chebyshev\", \"wasserstein\", \"energy\"] distances = [ AtomEmbeds[\"magpie_sc\"].compute_distance_metric(\"Li\", \"K\", metric=metric) for metric in metrics ] print(\"For the scaled magpie representation:\") for i, distance in enumerate(distances): print( f\"Using the metric {metrics[i]}, the distance between Li and K is {distance:.2f}\" )
For the scaled magpie representation:\nUsing the metric euclidean, the distance between Li and K is 4.09\nUsing the metric manhattan, the distance between Li and K is 7.87\nUsing the metric chebyshev, the distance between Li and K is 3.39\nUsing the metric wasserstein, the distance between Li and K is 0.32\nUsing the metric energy, the distance between Li and K is 0.23\nIn\u00a0[10]: Copied!
fig, ax = plt.subplots(figsize=(24, 24))\nheatmap_plotter(\n embedding=AtomEmbeds[\"magpie\"],\n metric=\"pearson\",\n sortaxisby=\"atomic_number\",\n # show_axislabels=False,\n ax=ax,\n)\n\nfig.show()\nfig, ax = plt.subplots(figsize=(24, 24)) heatmap_plotter( embedding=AtomEmbeds[\"magpie\"], metric=\"pearson\", sortaxisby=\"atomic_number\", # show_axislabels=False, ax=ax, ) fig.show() In\u00a0[11]: Copied!
fig, ax = plt.subplots(figsize=(24, 24))\nheatmap_plotter(\n embedding=AtomEmbeds[\"magpie_sc\"],\n metric=\"pearson\",\n sortaxisby=\"atomic_number\",\n # show_axislabels=False,\n ax=ax,\n)\n\nfig.show()\nfig, ax = plt.subplots(figsize=(24, 24)) heatmap_plotter( embedding=AtomEmbeds[\"magpie_sc\"], metric=\"pearson\", sortaxisby=\"atomic_number\", # show_axislabels=False, ax=ax, ) fig.show()
As we can see from the above pearson correlation heatmaps, the visualisation of the correlations across the atomic embeddings is sensitive to the components of the embedding vectors. The unscaled magpie representation produces a plot which makes qualitative assessment of chemical trends difficult, whereas with the scaled representation it is possible to perform some qualitative analysis on the (dis)similarity of elements based on their feature vector.
In\u00a0[12]: Copied!fig, ax = plt.subplots(figsize=(24, 24))\nheatmap_plotter(\n embedding=AtomEmbeds[\"megnet16\"],\n metric=\"pearson\",\n sortaxisby=\"atomic_number\",\n # show_axislabels=False,\n ax=ax,\n)\n\nfig.show()\nfig, ax = plt.subplots(figsize=(24, 24)) heatmap_plotter( embedding=AtomEmbeds[\"megnet16\"], metric=\"pearson\", sortaxisby=\"atomic_number\", # show_axislabels=False, ax=ax, ) fig.show() In\u00a0[13]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"magpie\"],\n reducer=\"pca\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"magpie\"], reducer=\"pca\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[14]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"magpie_sc\"],\n reducer=\"pca\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"magpie_sc\"], reducer=\"pca\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[15]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"megnet16\"],\n reducer=\"pca\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"megnet16\"], reducer=\"pca\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[16]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"magpie\"],\n reducer=\"tsne\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"magpie\"], reducer=\"tsne\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[17]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"magpie_sc\"],\n reducer=\"tsne\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"magpie_sc\"], reducer=\"tsne\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[18]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"megnet16\"],\n reducer=\"tsne\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"megnet16\"], reducer=\"tsne\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show()"},{"location":"tutorial/usage/#using-the-elementembeddings-package","title":"Using the ElementEmbeddings package\u00b6","text":"
This notebook will serve as a tutorial for using the ElementEmbeddings package and going over the core features.
"},{"location":"tutorial/usage/#elemental-representations","title":"Elemental representations\u00b6","text":"A key problem in supervised machine learning problems is determining the featurisation/representation scheme for a material in order to pass it through a mathematical algorithm. For composition only machine learning, we want to be able create a numerical representation of a chemical formula AwBxCyDz. We can achieve this by creating a composition based feature vector derived from the elemental properties of the constituent atoms or a representation can be learned during the supervised training process.
A few of these CBFV have been included in the package and we can load them using the load_data
class method.
We can also explore the correlation between embedding vectors. In the example below, we will plot a heatmap of the pearson correlation of our magpie CBFV, a scaled magpie CBFV and the 16-dim megnet embeddings
"},{"location":"tutorial/usage/#pearson-correlation-plots","title":"Pearson Correlation plots\u00b6","text":""},{"location":"tutorial/usage/#unscaled-and-scaled-magpie","title":"Unscaled and scaled Magpie\u00b6","text":""},{"location":"tutorial/usage/#pca-plots","title":"PCA plots\u00b6","text":""},{"location":"tutorial/usage/#t-sne-plots","title":"t-SNE plots\u00b6","text":""}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to ElementEmbeddings","text":"This site contains the project documentation for the ElementEmbeddings
package which provides tools and examples of analysing and visualising elemental representation data.
The documentation consists of the following six parts:
Analyse elemental representation data.
Modules exported by this package:
elementembeddings.core
: Provides the Embedding
class.elementembeddings.composition
: Tools to featurise compositions.elementembeddings.plotter
: Tools to plot embeddings.Utility functions for elementembeddings.
Modules exported in elementembeddings.utils
:
elementembeddings.utils.config
: Provides configuration settings and constants.elementembeddings.utils.io
: Tools to read and write data.elementembeddings.utils.math
: Tools for mathematical operations.elementembeddings.utils.species
: Tools to handle ionic species.:docstring: :members:
"},{"location":"about/","title":"About the ElementEmbeddings package","text":"The Element Embeddings package provides high-level tools for analysing elemental embeddings data. This primarily involves visualising the correlation between embedding schemes using different statistical measures.
Machine learning approaches for materials informatics have become increasingly widespread. Some of these involve the use of deep learning techniques where the representation of the elements is learned rather than specified by the user of the model. While an important goal of machine learning training is to minimise the chosen error function to make more accurate predictions, it is also important for us material scientists to be able to interpret these models. As such, we aim to evaluate and compare different atomic embedding schemes in a consistent framework.
"},{"location":"about/#developer","title":"Developer","text":"H. Park et al, \"Mapping inorganic crystal chemical space\" Faraday Discuss. (2024)
A. Onwuli et al, \"Element similarity in high-dimensional materials representations\" Digital Discovery 2, 1558 (2023)
"},{"location":"contribution/","title":"Contributing","text":"This is a quick guide on how to follow best practice and contribute smoothly to ElementEmbeddings
.
We are always looking for ways to make ElementEmbeddings
better and a more useful to a wider community. For making contributions, use the \"Fork and Pull\" workflow to make contributions and stick as closely as possible to the following:
The steps required to add a new representation scheme are:
DEFAULT_ELEMENT_EMBEDDINGS
and CITATIONS
.We follow the [GitHub flow] (https://guides.github.com/introduction/flow/index.html), using branches for new work and pull requests for verifying the work.
The steps for a new piece of work can be summarised as follows:
For a general overview of using pull requests on GitHub look in the GitHub docs.
When creating a pull request you should:
Recommended reading: How to Write the Perfect Pull Request
"},{"location":"contribution/#dev-requirements","title":"Dev requirements","text":"When developing locally, it is recommended to install the python packages in requirements-dev.txt
.
pip install -r requirements-dev.txt\n
This will allow you to run the tests locally with pytest as described in the main README, as well as run pre-commit hooks to automatically format python files with isort and black. To install the pre-commit hooks (only needs to be done once):
pre-commit install\npre-commit run --all-files # optionally run hooks on all files\n
Pre-commit hooks will check all files when you commit changes, automatically fixing any files which are not formatted correctly. Those files will need to be staged again before re-attempting the commit.
"},{"location":"contribution/#bug-reports-feature-requests-and-questions","title":"Bug reports, feature requests and questions","text":"Please use the Issue Tracker to report bugs or request features in the first instance. Contributions are always welcome.
"},{"location":"installation/","title":"Getting Started","text":"The latest stable release can be installed via pip using:
pip install ElementEmbeddings\n
"},{"location":"installation/#developers-installation-optional","title":"Developer's installation (optional)","text":"For development work, ElementEmbeddings
can eb installed from a copy of the source repository; this is preferred if using experimental code branches.
To clone the project from Github and make a local installation:
git clone https://github.com/WMD-group/ElementEmbeddings.git\ncd ElementEmbeddings\npip install -e .\n
With -e
, pip will create links to the source folder so that the changes to the code will be reflected on the PATH.
Here we will demonstrate how to use some of ElementEmbeddings
's features. For full worked examples of using the package, please refer to the Jupyter notebooks in the examples section of the Github repo.
The Embedding
class lies at the heart of the package. It handles elemental representation data and enables analysis and visualisation.
For simple usage, you can instantiate an Embedding object using one of the embeddings in the data directory. For this example, let's use the magpie elemental representation.
# Import the class\nfrom elementembeddings.core import Embedding\n\n# Load the magpie data\nmagpie = Embedding.load_data(\"magpie\")\n
We can access some of the properties of the Embedding
class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.
# Print out some of the properties of the ElementEmbeddings class\nprint(f\"The magpie representation has embeddings of dimension {magpie.dim}\")\nprint(\n f\"The magpie representation contains these elements: \\n {magpie.element_list}\"\n) # prints out all the elements considered for this representation\nprint(\n f\"The magpie representation contains these features: \\n {magpie.feature_labels}\"\n) # Prints out the feature labels of the chosen representation\n\n# The magpie representation has embeddings of dimension 22\n# The magpie representation contains these elements:\n[\n \"H\",\n \"He\",\n \"Li\",\n \"Be\",\n \"B\",\n \"C\",\n \"N\",\n \"O\",\n \"F\",\n \"Ne\",\n \"Na\",\n \"Mg\",\n \"Al\",\n \"Si\",\n \"P\",\n \"S\",\n \"Cl\",\n \"Ar\",\n \"K\",\n \"Ca\",\n \"Sc\",\n \"Ti\",\n \"V\",\n \"Cr\",\n \"Mn\",\n \"Fe\",\n \"Co\",\n \"Ni\",\n \"Cu\",\n \"Zn\",\n \"Ga\",\n \"Ge\",\n \"As\",\n \"Se\",\n \"Br\",\n \"Kr\",\n \"Rb\",\n \"Sr\",\n \"Y\",\n \"Zr\",\n \"Nb\",\n \"Mo\",\n \"Tc\",\n \"Ru\",\n \"Rh\",\n \"Pd\",\n \"Ag\",\n \"Cd\",\n \"In\",\n \"Sn\",\n \"Sb\",\n \"Te\",\n \"I\",\n \"Xe\",\n \"Cs\",\n \"Ba\",\n \"La\",\n \"Ce\",\n \"Pr\",\n \"Nd\",\n \"Pm\",\n \"Sm\",\n \"Eu\",\n \"Gd\",\n \"Tb\",\n \"Dy\",\n \"Ho\",\n \"Er\",\n \"Tm\",\n \"Yb\",\n \"Lu\",\n \"Hf\",\n \"Ta\",\n \"W\",\n \"Re\",\n \"Os\",\n \"Ir\",\n \"Pt\",\n \"Au\",\n \"Hg\",\n \"Tl\",\n \"Pb\",\n \"Bi\",\n \"Po\",\n \"At\",\n \"Rn\",\n \"Fr\",\n \"Ra\",\n \"Ac\",\n \"Th\",\n \"Pa\",\n \"U\",\n \"Np\",\n \"Pu\",\n \"Am\",\n \"Cm\",\n \"Bk\",\n]\n# The magpie representation contains these features:\n[\n \"Number\",\n \"MendeleevNumber\",\n \"AtomicWeight\",\n \"MeltingT\",\n \"Column\",\n \"Row\",\n \"CovalentRadius\",\n \"Electronegativity\",\n \"NsValence\",\n \"NpValence\",\n \"NdValence\",\n \"NfValence\",\n \"NValence\",\n \"NsUnfilled\",\n \"NpUnfilled\",\n \"NdUnfilled\",\n \"NfUnfilled\",\n \"NUnfilled\",\n \"GSvolume_pa\",\n \"GSbandgap\",\n \"GSmagmom\",\n \"SpaceGroupNumber\",\n]\n
"},{"location":"tutorials/#plotting","title":"Plotting","text":"We can quickly generate heatmaps of distance/similarity measures between the element vectors using heatmap_plotter
and plot the representations in two dimensions using the dimension_plotter
from the plotter module. Before we do that, we will standardise the embedding using the standardise
method available to the Embedding class
from elementembeddings.plotter import heatmap_plotter, dimension_plotter\nimport matplotlib.pyplot as plt\n\nmagpie.standardise(inplace=True) # Standardises the representation\n\nfig, ax = plt.subplots(1, 1, figsize=(6, 6))\nheatmap_params = {\"vmin\": -1, \"vmax\": 1}\nheatmap_plotter(\n embedding=magpie,\n metric=\"cosine_similarity\",\n show_axislabels=False,\n cmap=\"Blues_r\",\n ax=ax,\n **heatmap_params\n)\nax.set_title(\"Magpie cosine similarities\")\nfig.tight_layout()\nfig.show()\n
fig, ax = plt.subplots(1, 1, figsize=(6, 6))\n\nreducer_params = {\"n_neighbors\": 30, \"random_state\": 42}\nscatter_params = {\"s\": 100}\n\ndimension_plotter(\n embedding=magpie,\n reducer=\"umap\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n reducer_params=reducer_params,\n scatter_params=scatter_params,\n)\nax.set_title(\"Magpie UMAP (n_neighbours=30)\")\nax.legend().remove()\nhandles, labels = ax1.get_legend_handles_labels()\nfig.legend(handles, labels, bbox_to_anchor=(1.25, 0.5), loc=\"center right\", ncol=1)\n\nfig.tight_layout()\nfig.show()\n
"},{"location":"tutorials/#compositions","title":"Compositions","text":"The package can also be used to featurise compositions. Your data could be a list of formula strings or a pandas dataframe of the following format:
formula CsPbI3 Fe2O3 NaCl ZnSThe composition_featuriser
function can be used to featurise the data. The compositions can be featurised using different representation schemes and different types of pooling through the embedding
and stats
arguments respectively.
from elementembeddings.composition import composition_featuriser\n\ndf_featurised = composition_featuriser(df, embedding=\"magpie\", stats=[\"mean\", \"sum\"])\n\ndf_featurised\n
formula mean_Number mean_MendeleevNumber mean_AtomicWeight mean_MeltingT mean_Column mean_Row mean_CovalentRadius mean_Electronegativity mean_NsValence mean_NpValence mean_NdValence mean_NfValence mean_NValence mean_NsUnfilled mean_NpUnfilled mean_NdUnfilled mean_NfUnfilled mean_NUnfilled mean_GSvolume_pa mean_GSbandgap mean_GSmagmom mean_SpaceGroupNumber sum_Number sum_MendeleevNumber sum_AtomicWeight sum_MeltingT sum_Column sum_Row sum_CovalentRadius sum_Electronegativity sum_NsValence sum_NpValence sum_NdValence sum_NfValence sum_NValence sum_NsUnfilled sum_NpUnfilled sum_NdUnfilled sum_NfUnfilled sum_NUnfilled sum_GSvolume_pa sum_GSbandgap sum_GSmagmom sum_SpaceGroupNumber CsPbI3 59.2 74.8 144.16377238 412.55 13.2 5.4 161.39999999999998 2.22 1.8 3.4 8.0 2.8000000000000003 16.0 0.2 1.4 0.0 0.0 1.6 54.584 0.6372 0.0 129.20000000000002 296.0 374.0 720.8188619 2062.75 66.0 27.0 807.0 11.100000000000001 9.0 17.0 40.0 14.0 80.0 1.0 7.0 0.0 0.0 8.0 272.92 3.186 0.0 646.0 Fe2O3 15.2 74.19999999999999 31.937640000000002 757.2800000000001 12.8 2.8 92.4 2.7960000000000003 2.0 2.4 2.4000000000000004 0.0 6.8 0.0 1.2 1.6 0.0 2.8 9.755 0.0 0.8442651200000001 98.80000000000001 76.0 371.0 159.6882 3786.4 64.0 14.0 462.0 13.98 10.0 12.0 12.0 0.0 34.0 0.0 6.0 8.0 0.0 14.0 48.775000000000006 0.0 4.2213256 494.0 NaCl 14.0 48.0 29.221384640000004 271.235 9.0 3.0 134.0 2.045 1.5 2.5 0.0 0.0 4.0 0.5 0.5 0.0 0.0 1.0 26.87041666665 1.2465 0.0 146.5 28.0 96.0 58.44276928000001 542.47 18.0 6.0 268.0 4.09 3.0 5.0 0.0 0.0 8.0 1.0 1.0 0.0 0.0 2.0 53.7408333333 2.493 0.0 293.0 ZnS 23.0 78.5 48.7225 540.52 14.0 3.5 113.5 2.115 2.0 2.0 5.0 0.0 9.0 0.0 1.0 0.0 0.0 1.0 19.8734375 1.101 0.0 132.0 46.0 157.0 97.445 1081.04 28.0 7.0 227.0 4.23 4.0 4.0 10.0 0.0 18.0 0.0 2.0 0.0 0.0 2.0 39.746875 2.202 0.0 264.0 The returned dataframe contains the mean- and sum-pooled features of the magpie representation for the four formulas.
"},{"location":"embeddings/element/","title":"Elemental Embeddings","text":"The data contained in this repository are a collection of various elemental representation/embedding schemes. We provide the literature source for these representations as well as the data source for which the files were obtained. Some representations have been obtained from the following repositories:
For the linear/scalar representations, the Embedding
class will load these representations as one-hot vectors where the vector components are ordered following the scale (i.e. the atomic
representation is ordered by atomic numbers).
The following paper describes the details of the modified Pettifor chemical scale: The optimal one-dimensional periodic table: a modified Pettifor chemical scale from data mining
Data source
"},{"location":"embeddings/element/#atomic-numbers","title":"Atomic numbers","text":"We included atomic
as a linear representation to generate one-hot vectors corresponding to the atomic numbers
The following representations are all vector representations (some are local, some are distributed) and the Embedding
class will load these representations as they are.
The following paper describes the implementation of the composition graph neural fingerprint (cgnf) from the node embedding vectors of a pre-trained crystal graph convolution neural network: Synthesizability of materials stoichiometry using semi-supervised learning
Data source
"},{"location":"embeddings/element/#crystallm","title":"crystallm","text":"The following paper describes the details behind the generative crystal structure model based on a large language model: Crystal Structure Generation with Autoregressive Large Language Modeling
"},{"location":"embeddings/element/#magpie","title":"magpie","text":"The following paper describes the details of the Materials Agnostic Platform for Informatics and Exploration (Magpie) framework: A general-purpose machine learning framework for predicting properties of inorganic materials
The source code for Magpie can be found here
Data source
The 22 dimensional embedding vector includes the following elemental properties:
Click to see the 22 properties - Number; - Mendeleev number; - Atomic weight; - Melting temperature; - Group number; - Period; - Covalent Radius; - Electronegativity; - no. of s, p, d, f valence electrons (4 features); - no. of valence electrons; - no. of unfilled: s, p, d, f orbitals (4 features), - no. of unfilled orbtials - GSvolume_pa (DFT volume per atom of T=0K ground state from the OQMD) - GSbandgap(DFT bandgap energy of T=0K ground state from the OQMD) - GSmagmom (DFT magnetic moment of T=0K ground state from the OQMD) - Space Group Numbermagpie_sc
is a scaled version of the magpie embeddings. Data sourceThe following paper describes the implementation of mat2vec: Unsupervised word embeddings capture latent knowledge from materials science literature
Data source
"},{"location":"embeddings/element/#matscholar","title":"matscholar","text":"The following paper describes the natural language processing implementation of Materials Scholar (matscholar): Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature
Data source
"},{"location":"embeddings/element/#megnet","title":"megnet","text":"The following paper describes the details of the construction of the MatErials Graph Network (MEGNet): Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. The 16 dimensional vectors are drawn from the atomic weights of a model trained to predict the formation energies of crystalline materials.
Data source
"},{"location":"embeddings/element/#oliynyk","title":"oliynyk","text":"The following paper describes the details: High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds
Data source
The 44 features of the embedding vector are formed of the following properties:
Click to see the 44 features! - Number - Atomic_Weight - Period - Group - Families - Metal - Nonmetal - Metalliod - Mendeleev_Number - l_quantum_number - Atomic_Radius - Miracle*Radius*[pm] - Covalent_Radius - Zunger_radii_sum - Ionic_radius - crystal_radius - Pauling_Electronegativity - MB_electonegativity - Gordy_electonegativity - Mulliken_EN - Allred-Rockow_electronegativity - Metallic_valence - Number_of_valence_electrons - Gilmor_number_of_valence_electron - valence_s - valence_p - valence_d - valence_f - Number_of_unfilled_s_valence_electrons - Number_of_unfilled_p_valence_electrons - Number_of_unfilled_d_valence_electrons - Number_of_unfilled_f_valence_electrons - Outer_shell_electrons - 1st*ionization_potential*(kJ/mol) - Polarizability(A^3) - Melting*point*(K) - Boiling*Point*(K) - Density\\_(g/mL) - Specific*heat*(J/g*K)* - Heat*of_fusion*(kJ/mol)\\_ - Heat*of_vaporization*(kJ/mol)\\_ - Thermal*conductivity*(W/(m*K))* - Heat_atomization(kJ/mol) - Cohesive_energyoliynyk_sc
is a scaled version of the oliynyk embeddings: Data sourceThis is a set of 200-dimensional vectors in which the components are randomly generated
The 118 200-dimensional vectors in random_200_new
were generated using the following code:
import numpy as np\n\nmu, sigma = 0, 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)\ns = np.random.default_rng(seed=42).normal(mu, sigma, (118, 200))\n
"},{"location":"embeddings/element/#skipatom","title":"skipatom","text":"The following paper describes the details: Distributed representations of atoms and materials for machine learning
Data source
"},{"location":"embeddings/element/#xenonpy","title":"xenonpy","text":"The XenonPy embedding uses the 58 features which are commonly used in publications that use the XenonPy package. See the following publications:
The ElementEmebddings
package is distributed with a number of element and ionic species embedding schemes. These schemes are used to represent elements and ionic species in a high-dimensional space. The schemes are stored in the ElementEmbeddings
package and can be accessed using the ElementEmbeddings
API.
Element Embeddings
Species Embeddings
"},{"location":"embeddings/species/","title":"Species Embeddings","text":"The ElementEmbeddings
package has been expanded to incorporate representation of ionic species. We provide the literature source for these representations as well as the data source for which the files were obtained.
The following paper describes the details of how the SkipSpecies embeddings were developed.
Ionic species representations for materials informatics
Data Source
"},{"location":"python_api/composition/","title":"Composition module","text":"This module provides a class for handling compositional embeddings.
Typical usage exampleFe2O3_magpie = CompositionalEmbedding(\"Fe2O3\", \"magpie\")
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding","title":"CompositionalEmbedding
","text":"Class to handle compositional embeddings.
formula (str): A string formula e.g. CsPbI3, Li7La3Zr2O12\nembedding (Union[str, Embedding]): Either a string name of the embedding\nor an Embedding instance\nx (int, optional): The non-stoichiometric amount.\n
Source code in src/elementembeddings/composition.py
class CompositionalEmbedding:\n \"\"\"Class to handle compositional embeddings.\n\n Args:\n ----\n formula (str): A string formula e.g. CsPbI3, Li7La3Zr2O12\n embedding (Union[str, Embedding]): Either a string name of the embedding\n or an Embedding instance\n x (int, optional): The non-stoichiometric amount.\n \"\"\"\n\n def __init__(self, formula: str, embedding: str | Embedding, x=1) -> None:\n \"\"\"Initialise a CompositionalEmbedding instance.\"\"\"\n self.embedding = embedding\n\n # If a string has been passed for embedding, create an Embedding instance\n if isinstance(embedding, str):\n self.embedding = Embedding.load_data(embedding)\n\n self.embedding_name: str = self.embedding.embedding_name\n # Set an attribute for the formula\n self.formula = formula\n\n # Set an attribute for the comp dict\n comp_dict = formula_parser(self.formula)\n self._natoms = 0\n for v in comp_dict.values():\n if v < 0:\n msg = \"Formula cannot contain negative amounts of elements\"\n raise ValueError(msg)\n self._natoms += abs(v)\n\n self.composition = comp_dict\n\n # Set an attribute for the element list\n self.element_list = list(self.composition.keys())\n # Set an attribute for the element matrix\n self.el_matrix = np.zeros(\n shape=(len(self.composition), len(self.embedding.embeddings[\"H\"])),\n )\n for i, k in enumerate(self.composition.keys()):\n self.el_matrix[i] = self.embedding.embeddings[k]\n self.el_matrix = np.nan_to_num(self.el_matrix)\n\n # Set an attribute for the stoichiometric vector\n self.stoich_vector = np.array(list(self.composition.values()))\n\n # Set an attribute for the normalised stoichiometric vector\n self.norm_stoich_vector = self.stoich_vector / self._natoms\n\n @property\n def fractional_composition(self):\n \"\"\"Fractional composition of the Composition.\"\"\"\n return _get_fractional_composition(self.formula)\n\n @property\n def num_atoms(self) -> float:\n \"\"\"Total number of atoms in Composition.\"\"\"\n return self._natoms\n\n def as_dict(self) -> dict:\n # TO-DO: Need to create a dict representation for the embedding class\n \"\"\"Return the CompositionalEmbedding class as a dict.\"\"\"\n return {\n \"formula\": self.formula,\n \"composition\": self.composition,\n \"fractional_composition\": self.fractional_composition,\n }\n\n def _mean_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a weighted mean feature vector based of the embedding.\n\n The dimension of the feature vector is the same as the embedding.\n\n \"\"\"\n return np.dot(self.norm_stoich_vector, self.el_matrix)\n\n def _variance_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a weighted variance feature vector.\"\"\"\n diff_matrix = self.el_matrix - self._mean_feature_vector()\n\n diff_matrix = diff_matrix**2\n return np.dot(self.norm_stoich_vector, diff_matrix)\n\n def _minpool_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a min pooled feature vector.\"\"\"\n return np.min(self.el_matrix, axis=0)\n\n def _maxpool_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a max pooled feature vector.\"\"\"\n return np.max(self.el_matrix, axis=0)\n\n def _range_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a range feature vector.\"\"\"\n return np.ptp(self.el_matrix, axis=0)\n\n def _sum_feature_vector(self) -> np.ndarray:\n \"\"\"Compute the weighted sum feature vector.\"\"\"\n return np.dot(self.stoich_vector, self.el_matrix)\n\n def _geometric_mean_feature_vector(self) -> np.ndarray:\n \"\"\"Compute the geometric mean feature vector.\"\"\"\n return np.exp(np.dot(self.norm_stoich_vector, np.log(self.el_matrix)))\n\n def _harmonic_mean_feature_vector(self) -> np.ndarray:\n \"\"\"Compute the harmonic mean feature vector.\"\"\"\n return np.reciprocal(\n np.dot(self.norm_stoich_vector, np.reciprocal(self.el_matrix)),\n )\n\n _stats_functions_dict: ClassVar = {\n \"mean\": \"_mean_feature_vector\",\n \"variance\": \"_variance_feature_vector\",\n \"minpool\": \"_minpool_feature_vector\",\n \"maxpool\": \"_maxpool_feature_vector\",\n \"range\": \"_range_feature_vector\",\n \"sum\": \"_sum_feature_vector\",\n \"geometric_mean\": \"_geometric_mean_feature_vector\",\n \"harmonic_mean\": \"_harmonic_mean_feature_vector\",\n }\n\n def feature_vector(self, stats: str | list = \"mean\"):\n \"\"\"Compute a feature vector.\n\n The feature vector is a concatenation of\n the statistics specified in the stats argument.\n\n Args:\n ----\n stats (list): A list of strings specifying the statistics to be computed.\n The default is ['mean'].\n\n Returns:\n -------\n np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n \"\"\"\n implemented_stats = [\n \"mean\",\n \"variance\",\n \"minpool\",\n \"maxpool\",\n \"range\",\n \"sum\",\n \"geometric_mean\",\n \"harmonic_mean\",\n ]\n if isinstance(stats, str):\n stats = [stats]\n if not all(s in implemented_stats for s in stats):\n msg = f\" {[stat for stat in stats if stat not in implemented_stats]} \" f\"are not valid statistics.\"\n raise ValueError(\n msg,\n )\n feature_vector = []\n for s in stats:\n feature_vector.append(getattr(self, self._stats_functions_dict[s])())\n return np.concatenate(feature_vector)\n\n def distance(\n self,\n comp_other,\n distance_metric: str = \"euclidean\",\n stats: str | list[str] = \"mean\",\n ):\n \"\"\"Compute the distance between two compositions.\n\n Args:\n ----\n comp_other (Union[str, CompositionalEmbedding]): The other composition.\n distance_metric (str): The metric to be used. The default is 'euclidean'.\n stats (Union[str, list], optional): A list of statistics to be computed.\n\n Returns:\n -------\n float: The distance between the two CompositionalEmbedding objects.\n \"\"\"\n if isinstance(comp_other, str):\n comp_other = CompositionalEmbedding(comp_other, self.embedding)\n if not isinstance(comp_other, CompositionalEmbedding):\n msg = \"comp_other must be a string or a CompositionalEmbedding object.\"\n raise TypeError(\n msg,\n )\n if self.embedding_name != comp_other.embedding_name:\n msg = \"The two CompositionalEmbedding objects must have the same embedding.\"\n raise TypeError(\n msg,\n )\n return _composition_distance(\n self,\n comp_other,\n self.embedding,\n distance_metric,\n stats,\n )\n\n def __repr__(self) -> str:\n return f\"CompositionalEmbedding(formula={self.formula}, \" f\"embedding={self.embedding_name})\"\n\n def __str__(self) -> str:\n return f\"CompositionalEmbedding(formula={self.formula}, \" f\"embedding={self.embedding_name})\"\n\n def __eq__(self, other):\n if isinstance(other, self.__class__):\n return self.formula == other.formula and self.embedding_name == other.embedding_name\n else:\n return False\n\n def __ne__(self, other):\n return not self.__eq__(other)\n\n def __hash__(self):\n return hash((self.formula, self.embedding))\n
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.fractional_composition","title":"fractional_composition
property
","text":"Fractional composition of the Composition.
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.num_atoms","title":"num_atoms: float
property
","text":"Total number of atoms in Composition.
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.__init__","title":"__init__(formula, embedding, x=1)
","text":"Initialise a CompositionalEmbedding instance.
Source code insrc/elementembeddings/composition.py
def __init__(self, formula: str, embedding: str | Embedding, x=1) -> None:\n \"\"\"Initialise a CompositionalEmbedding instance.\"\"\"\n self.embedding = embedding\n\n # If a string has been passed for embedding, create an Embedding instance\n if isinstance(embedding, str):\n self.embedding = Embedding.load_data(embedding)\n\n self.embedding_name: str = self.embedding.embedding_name\n # Set an attribute for the formula\n self.formula = formula\n\n # Set an attribute for the comp dict\n comp_dict = formula_parser(self.formula)\n self._natoms = 0\n for v in comp_dict.values():\n if v < 0:\n msg = \"Formula cannot contain negative amounts of elements\"\n raise ValueError(msg)\n self._natoms += abs(v)\n\n self.composition = comp_dict\n\n # Set an attribute for the element list\n self.element_list = list(self.composition.keys())\n # Set an attribute for the element matrix\n self.el_matrix = np.zeros(\n shape=(len(self.composition), len(self.embedding.embeddings[\"H\"])),\n )\n for i, k in enumerate(self.composition.keys()):\n self.el_matrix[i] = self.embedding.embeddings[k]\n self.el_matrix = np.nan_to_num(self.el_matrix)\n\n # Set an attribute for the stoichiometric vector\n self.stoich_vector = np.array(list(self.composition.values()))\n\n # Set an attribute for the normalised stoichiometric vector\n self.norm_stoich_vector = self.stoich_vector / self._natoms\n
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.as_dict","title":"as_dict()
","text":"Return the CompositionalEmbedding class as a dict.
Source code insrc/elementembeddings/composition.py
def as_dict(self) -> dict:\n # TO-DO: Need to create a dict representation for the embedding class\n \"\"\"Return the CompositionalEmbedding class as a dict.\"\"\"\n return {\n \"formula\": self.formula,\n \"composition\": self.composition,\n \"fractional_composition\": self.fractional_composition,\n }\n
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.distance","title":"distance(comp_other, distance_metric='euclidean', stats='mean')
","text":"Compute the distance between two compositions.
comp_other (Union[str, CompositionalEmbedding]): The other composition.\ndistance_metric (str): The metric to be used. The default is 'euclidean'.\nstats (Union[str, list], optional): A list of statistics to be computed.\n
float: The distance between the two CompositionalEmbedding objects.\n
Source code in src/elementembeddings/composition.py
def distance(\n self,\n comp_other,\n distance_metric: str = \"euclidean\",\n stats: str | list[str] = \"mean\",\n):\n \"\"\"Compute the distance between two compositions.\n\n Args:\n ----\n comp_other (Union[str, CompositionalEmbedding]): The other composition.\n distance_metric (str): The metric to be used. The default is 'euclidean'.\n stats (Union[str, list], optional): A list of statistics to be computed.\n\n Returns:\n -------\n float: The distance between the two CompositionalEmbedding objects.\n \"\"\"\n if isinstance(comp_other, str):\n comp_other = CompositionalEmbedding(comp_other, self.embedding)\n if not isinstance(comp_other, CompositionalEmbedding):\n msg = \"comp_other must be a string or a CompositionalEmbedding object.\"\n raise TypeError(\n msg,\n )\n if self.embedding_name != comp_other.embedding_name:\n msg = \"The two CompositionalEmbedding objects must have the same embedding.\"\n raise TypeError(\n msg,\n )\n return _composition_distance(\n self,\n comp_other,\n self.embedding,\n distance_metric,\n stats,\n )\n
"},{"location":"python_api/composition/#elementembeddings.composition.CompositionalEmbedding.feature_vector","title":"feature_vector(stats='mean')
","text":"Compute a feature vector.
The feature vector is a concatenation of the statistics specified in the stats argument.
stats (list): A list of strings specifying the statistics to be computed.\nThe default is ['mean'].\n
np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n
Source code in src/elementembeddings/composition.py
def feature_vector(self, stats: str | list = \"mean\"):\n \"\"\"Compute a feature vector.\n\n The feature vector is a concatenation of\n the statistics specified in the stats argument.\n\n Args:\n ----\n stats (list): A list of strings specifying the statistics to be computed.\n The default is ['mean'].\n\n Returns:\n -------\n np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n \"\"\"\n implemented_stats = [\n \"mean\",\n \"variance\",\n \"minpool\",\n \"maxpool\",\n \"range\",\n \"sum\",\n \"geometric_mean\",\n \"harmonic_mean\",\n ]\n if isinstance(stats, str):\n stats = [stats]\n if not all(s in implemented_stats for s in stats):\n msg = f\" {[stat for stat in stats if stat not in implemented_stats]} \" f\"are not valid statistics.\"\n raise ValueError(\n msg,\n )\n feature_vector = []\n for s in stats:\n feature_vector.append(getattr(self, self._stats_functions_dict[s])())\n return np.concatenate(feature_vector)\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding","title":"SpeciesCompositionalEmbedding
","text":"Class to handle species compositional embeddings.
formula_dict (dict): A dictionary of the form {species: amount}\nembedding (Union[str, SpeciesEmbedding]): Either a string name of the embedding\nor an SpeciesEmbedding instance\nx (int, optional): The non-stoichiometric amount.\n
Source code in src/elementembeddings/composition.py
class SpeciesCompositionalEmbedding:\n \"\"\"Class to handle species compositional embeddings.\n\n Args:\n ----\n formula_dict (dict): A dictionary of the form {species: amount}\n embedding (Union[str, SpeciesEmbedding]): Either a string name of the embedding\n or an SpeciesEmbedding instance\n x (int, optional): The non-stoichiometric amount.\n \"\"\"\n\n def __init__(self, formula_dict: dict, embedding: str | SpeciesEmbedding, x=1) -> None:\n \"\"\"Initialise a SpeciesCompositionalEmbedding instance.\"\"\"\n self.embedding = embedding\n\n # If a string has been passed for embedding, create an Embedding instance\n if isinstance(embedding, str):\n self.embedding = SpeciesEmbedding.load_data(embedding)\n\n self.embedding_name: str = self.embedding.embedding_name\n\n # Set an attribute for the comp dict\n self.composition = formula_dict\n\n # Set an attribute for the number of atoms\n self._natoms = 0\n for v in self.composition.values():\n if v < 0:\n msg = \"Formula cannot contain negative amounts of elements\"\n raise ValueError(msg)\n self._natoms += abs(v)\n\n # Set an attribute for the species list\n self.species_list = list(self.composition.keys())\n\n # Set an attribute for the element list\n self.element_list = list({parse_species(sp)[0] for sp in self.species_list})\n # Set an attribute for the species matrix\n self.species_matrix = np.zeros(\n shape=(len(self.composition), len(self.embedding.embeddings[\"Zn2+\"])),\n )\n for i, k in enumerate(self.composition.keys()):\n self.species_matrix[i] = self.embedding.embeddings[k]\n self.species_matrix = np.nan_to_num(self.species_matrix)\n\n # Set an attribute for the stoichiometric vector\n self.stoich_vector = np.array(list(self.composition.values()))\n\n # Set an attribute for the normalised stoichiometric vector\n self.norm_stoich_vector = self.stoich_vector / np.sum(self.stoich_vector)\n\n @property\n def num_atoms(self) -> float:\n \"\"\"Total number of atoms in Composition.\"\"\"\n return self._natoms\n\n def get_el_amt_dict(self) -> dict:\n \"\"\"\n Return the composition as dictionary of element symbol : stoichiometry.\n\n e.g. {\"Fe2+\":1, \"Fe3+\":2, \"O2-\": 4} -> {\"Fe\":3, \"O\":4}.\n \"\"\"\n dct: dict[str, float] = collections.defaultdict(float)\n for sp, stoich in self.composition.items():\n el = parse_species(sp)[0]\n dct[el] += stoich\n return dct\n\n @property\n def formula_pretty(self) -> str:\n \"\"\"Return the pretty formula of the composition.\"\"\"\n els_amt_dict = self.get_el_amt_dict()\n els = sorted(els_amt_dict, key=lambda el: X[el])\n formula = [f\"{el}{self._stoich_formatter(els_amt_dict[el])}\" for el in els]\n return \"\".join(formula)\n\n def _stoich_formatter(self, stoich: float, tol: float = 1e-8) -> str:\n \"\"\"Return the stoichiometry as a string.\"\"\"\n if stoich == 1:\n return \"\"\n if abs(stoich - int(stoich)) < tol:\n return str(int(stoich))\n return str(round(stoich, 8))\n\n def as_dict(self) -> dict:\n # TO-DO: Need to create a dict representation for the embedding class\n \"\"\"Return the SpeciesCompositionalEmbedding class as a dict.\"\"\"\n return {\n \"composition\": self.composition,\n }\n\n @property\n def fractional_composition(self):\n \"\"\"Fractional composition of the Composition.\"\"\"\n return {k: v / self._natoms for k, v in self.composition.items()}\n\n def _mean_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a weighted mean feature vector based of the embedding.\n\n The dimension of the feature vector is the same as the embedding.\n\n \"\"\"\n return np.dot(self.norm_stoich_vector, self.species_matrix)\n\n def _variance_feature_vector(self) -> np.ndarray:\n \"\"\"Compute a weighted variance feature vector.\"\"\"\n diff_matrix = self.species_matrix - self._mean_feature_vector()\n\n diff_matrix = diff_matrix**2\n return np.dot(self.norm_stoich_vector, diff_matrix)\n\n def _minpool_feature_vector(self) -> np.ndarray:\n return np.min(self.species_matrix, axis=0)\n\n def _maxpool_feature_vector(self) -> np.ndarray:\n return np.max(self.species_matrix, axis=0)\n\n def _range_feature_vector(self) -> np.ndarray:\n return np.ptp(self.species_matrix, axis=0)\n\n def _sum_feature_vector(self) -> np.ndarray:\n return np.dot(self.stoich_vector, self.species_matrix)\n\n def _geometric_mean_feature_vector(self) -> np.ndarray:\n return np.exp(np.dot(self.norm_stoich_vector, np.log(self.species_matrix)))\n\n def _harmonic_mean_feature_vector(self) -> np.ndarray:\n return np.reciprocal(\n np.dot(self.norm_stoich_vector, np.reciprocal(self.species_matrix)),\n )\n\n _stats_functions_dict: ClassVar = {\n \"mean\": \"_mean_feature_vector\",\n \"variance\": \"_variance_feature_vector\",\n \"minpool\": \"_minpool_feature_vector\",\n \"maxpool\": \"_maxpool_feature_vector\",\n \"range\": \"_range_feature_vector\",\n \"sum\": \"_sum_feature_vector\",\n \"geometric_mean\": \"_geometric_mean_feature_vector\",\n \"harmonic_mean\": \"_harmonic_mean_feature_vector\",\n }\n\n def feature_vector(self, stats: str | list = \"mean\"):\n \"\"\"Compute a feature vector.\n\n The feature vector is a concatenation of\n the statistics specified in the stats argument.\n\n Args:\n ----\n stats (list): A list of strings specifying the statistics to be computed.\n The default is ['mean'].\n\n Returns:\n -------\n np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n \"\"\"\n implemented_stats = [\n \"mean\",\n \"variance\",\n \"minpool\",\n \"maxpool\",\n \"range\",\n \"sum\",\n \"geometric_mean\",\n \"harmonic_mean\",\n ]\n if isinstance(stats, str):\n stats = [stats]\n if not all(s in implemented_stats for s in stats):\n msg = f\" {[stat for stat in stats if stat not in implemented_stats]} \" f\"are not valid statistics.\"\n raise ValueError(\n msg,\n )\n feature_vector = []\n for s in stats:\n feature_vector.append(getattr(self, self._stats_functions_dict[s])())\n return np.concatenate(feature_vector)\n\n def distance(\n self,\n comp_other,\n distance_metric: str = \"euclidean\",\n stats: str | list[str] = \"mean\",\n ):\n \"\"\"Compute the distance between two compositions.\n\n Args:\n ----\n comp_other (Union[dict, SpeciesCompositionalEmbedding]):\n The other composition.\n distance_metric (str): The metric to be used. The default is 'euclidean'.\n stats (Union[str, list], optional): A list of statistics to be computed.\n\n Returns:\n -------\n float: The distance between the two SpeciesCompositionalEmbedding objects.\n \"\"\"\n if isinstance(comp_other, dict):\n comp_other = SpeciesCompositionalEmbedding(comp_other, self.embedding)\n if not isinstance(comp_other, SpeciesCompositionalEmbedding):\n msg = \"comp_other must be a dict or a SpeciesCompositionalEmbedding object.\"\n raise TypeError(\n msg,\n )\n if self.embedding_name != comp_other.embedding_name:\n msg = \"\"\"The two SpeciesCompositionalEmbedding\n objects must have the same embedding.\"\"\"\n raise ValueError(\n msg,\n )\n return _species_composition_distance(\n self,\n comp_other,\n self.embedding,\n distance_metric,\n stats,\n )\n\n def __repr__(self) -> str:\n return f\"SpeciesCompositionalEmbedding(formula={self.formula_pretty}, \" f\"embedding={self.embedding_name})\"\n\n def __str__(self) -> str:\n return f\"SpeciesCompositionalEmbedding(formula={self.formula_pretty}, \" f\"embedding={self.embedding_name})\"\n\n def __eq__(self, other):\n if isinstance(other, self.__class__):\n return (\n self.formula_pretty == other.formula_pretty\n and self.embedding_name == other.embedding_name\n and self.composition == other.composition\n )\n else:\n return False\n\n def __ne__(self, other):\n return not self.__eq__(other)\n\n def __hash__(self):\n return hash((self.formula_pretty, self.embedding))\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.formula_pretty","title":"formula_pretty: str
property
","text":"Return the pretty formula of the composition.
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.fractional_composition","title":"fractional_composition
property
","text":"Fractional composition of the Composition.
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.num_atoms","title":"num_atoms: float
property
","text":"Total number of atoms in Composition.
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.__init__","title":"__init__(formula_dict, embedding, x=1)
","text":"Initialise a SpeciesCompositionalEmbedding instance.
Source code insrc/elementembeddings/composition.py
def __init__(self, formula_dict: dict, embedding: str | SpeciesEmbedding, x=1) -> None:\n \"\"\"Initialise a SpeciesCompositionalEmbedding instance.\"\"\"\n self.embedding = embedding\n\n # If a string has been passed for embedding, create an Embedding instance\n if isinstance(embedding, str):\n self.embedding = SpeciesEmbedding.load_data(embedding)\n\n self.embedding_name: str = self.embedding.embedding_name\n\n # Set an attribute for the comp dict\n self.composition = formula_dict\n\n # Set an attribute for the number of atoms\n self._natoms = 0\n for v in self.composition.values():\n if v < 0:\n msg = \"Formula cannot contain negative amounts of elements\"\n raise ValueError(msg)\n self._natoms += abs(v)\n\n # Set an attribute for the species list\n self.species_list = list(self.composition.keys())\n\n # Set an attribute for the element list\n self.element_list = list({parse_species(sp)[0] for sp in self.species_list})\n # Set an attribute for the species matrix\n self.species_matrix = np.zeros(\n shape=(len(self.composition), len(self.embedding.embeddings[\"Zn2+\"])),\n )\n for i, k in enumerate(self.composition.keys()):\n self.species_matrix[i] = self.embedding.embeddings[k]\n self.species_matrix = np.nan_to_num(self.species_matrix)\n\n # Set an attribute for the stoichiometric vector\n self.stoich_vector = np.array(list(self.composition.values()))\n\n # Set an attribute for the normalised stoichiometric vector\n self.norm_stoich_vector = self.stoich_vector / np.sum(self.stoich_vector)\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.as_dict","title":"as_dict()
","text":"Return the SpeciesCompositionalEmbedding class as a dict.
Source code insrc/elementembeddings/composition.py
def as_dict(self) -> dict:\n # TO-DO: Need to create a dict representation for the embedding class\n \"\"\"Return the SpeciesCompositionalEmbedding class as a dict.\"\"\"\n return {\n \"composition\": self.composition,\n }\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.distance","title":"distance(comp_other, distance_metric='euclidean', stats='mean')
","text":"Compute the distance between two compositions.
comp_other (Union[dict, SpeciesCompositionalEmbedding]):\n The other composition.\ndistance_metric (str): The metric to be used. The default is 'euclidean'.\nstats (Union[str, list], optional): A list of statistics to be computed.\n
float: The distance between the two SpeciesCompositionalEmbedding objects.\n
Source code in src/elementembeddings/composition.py
def distance(\n self,\n comp_other,\n distance_metric: str = \"euclidean\",\n stats: str | list[str] = \"mean\",\n):\n \"\"\"Compute the distance between two compositions.\n\n Args:\n ----\n comp_other (Union[dict, SpeciesCompositionalEmbedding]):\n The other composition.\n distance_metric (str): The metric to be used. The default is 'euclidean'.\n stats (Union[str, list], optional): A list of statistics to be computed.\n\n Returns:\n -------\n float: The distance between the two SpeciesCompositionalEmbedding objects.\n \"\"\"\n if isinstance(comp_other, dict):\n comp_other = SpeciesCompositionalEmbedding(comp_other, self.embedding)\n if not isinstance(comp_other, SpeciesCompositionalEmbedding):\n msg = \"comp_other must be a dict or a SpeciesCompositionalEmbedding object.\"\n raise TypeError(\n msg,\n )\n if self.embedding_name != comp_other.embedding_name:\n msg = \"\"\"The two SpeciesCompositionalEmbedding\n objects must have the same embedding.\"\"\"\n raise ValueError(\n msg,\n )\n return _species_composition_distance(\n self,\n comp_other,\n self.embedding,\n distance_metric,\n stats,\n )\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.feature_vector","title":"feature_vector(stats='mean')
","text":"Compute a feature vector.
The feature vector is a concatenation of the statistics specified in the stats argument.
stats (list): A list of strings specifying the statistics to be computed.\nThe default is ['mean'].\n
np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n
Source code in src/elementembeddings/composition.py
def feature_vector(self, stats: str | list = \"mean\"):\n \"\"\"Compute a feature vector.\n\n The feature vector is a concatenation of\n the statistics specified in the stats argument.\n\n Args:\n ----\n stats (list): A list of strings specifying the statistics to be computed.\n The default is ['mean'].\n\n Returns:\n -------\n np.ndarray: A feature vector of dimension (len(stats) * embedding_dim).\n \"\"\"\n implemented_stats = [\n \"mean\",\n \"variance\",\n \"minpool\",\n \"maxpool\",\n \"range\",\n \"sum\",\n \"geometric_mean\",\n \"harmonic_mean\",\n ]\n if isinstance(stats, str):\n stats = [stats]\n if not all(s in implemented_stats for s in stats):\n msg = f\" {[stat for stat in stats if stat not in implemented_stats]} \" f\"are not valid statistics.\"\n raise ValueError(\n msg,\n )\n feature_vector = []\n for s in stats:\n feature_vector.append(getattr(self, self._stats_functions_dict[s])())\n return np.concatenate(feature_vector)\n
"},{"location":"python_api/composition/#elementembeddings.composition.SpeciesCompositionalEmbedding.get_el_amt_dict","title":"get_el_amt_dict()
","text":"Return the composition as dictionary of element symbol : stoichiometry.
e.g. {\"Fe2+\":1, \"Fe3+\":2, \"O2-\": 4} -> {\"Fe\":3, \"O\":4}.
Source code insrc/elementembeddings/composition.py
def get_el_amt_dict(self) -> dict:\n \"\"\"\n Return the composition as dictionary of element symbol : stoichiometry.\n\n e.g. {\"Fe2+\":1, \"Fe3+\":2, \"O2-\": 4} -> {\"Fe\":3, \"O\":4}.\n \"\"\"\n dct: dict[str, float] = collections.defaultdict(float)\n for sp, stoich in self.composition.items():\n el = parse_species(sp)[0]\n dct[el] += stoich\n return dct\n
"},{"location":"python_api/composition/#elementembeddings.composition.composition_featuriser","title":"composition_featuriser(data, formula_column='formula', embedding='magpie', stats='mean', inplace=False)
","text":"Compute a feature vector for a composition.
The feature vector is based on the statistics specified in the stats argument.
data (Union[pd.DataFrame, pd.Series, list, CompositionalEmbedding]):\n A pandas DataFrame or Series containing a column named 'formula',\n a list of formula, or a CompositionalEmbedding class\nformula_column (str, optional): The column name containing the formula.\nembedding (Union[Embedding, str], optional): A Embedding class or a string\nstats (Union[str, list], optional): A list of statistics to be computed.\n The default is ['mean'].\ninplace (bool, optional): Whether to perform the operation in place on the data.\n The default is False.\n
Union[pd.DataFrame,list]: A pandas DataFrame containing the feature vector,\nor a list of feature vectors is returned\n
Source code in src/elementembeddings/composition.py
def composition_featuriser(\n data: pd.DataFrame | pd.Series | CompositionalEmbedding | list,\n formula_column: str = \"formula\",\n embedding: Embedding | str = \"magpie\",\n stats: str | list = \"mean\",\n inplace: bool = False,\n) -> pd.DataFrame:\n \"\"\"Compute a feature vector for a composition.\n\n The feature vector is based on the statistics specified\n in the stats argument.\n\n Args:\n ----\n data (Union[pd.DataFrame, pd.Series, list, CompositionalEmbedding]):\n A pandas DataFrame or Series containing a column named 'formula',\n a list of formula, or a CompositionalEmbedding class\n formula_column (str, optional): The column name containing the formula.\n embedding (Union[Embedding, str], optional): A Embedding class or a string\n stats (Union[str, list], optional): A list of statistics to be computed.\n The default is ['mean'].\n inplace (bool, optional): Whether to perform the operation in place on the data.\n The default is False.\n\n Returns:\n -------\n Union[pd.DataFrame,list]: A pandas DataFrame containing the feature vector,\n or a list of feature vectors is returned\n \"\"\"\n if isinstance(stats, str):\n stats = [stats]\n if isinstance(data, pd.Series):\n data = data.to_frame(name=\"formula\")\n if isinstance(data, pd.DataFrame):\n if not inplace:\n data = data.copy()\n if formula_column not in data.columns:\n msg = f\"The data must contain a column named {formula_column} to featurise.\"\n raise ValueError(\n msg,\n )\n print(\"Featurising compositions...\")\n comps = [CompositionalEmbedding(x, embedding) for x in tqdm(data[formula_column].tolist())]\n print(\"Computing feature vectors...\")\n fvs = [x.feature_vector(stats) for x in tqdm(comps)]\n feature_names = comps[0].embedding.feature_labels\n feature_names = [f\"{stat}_{feature}\" for stat in stats for feature in feature_names]\n return pd.concat([data, pd.DataFrame(fvs, columns=feature_names)], axis=1)\n elif isinstance(data, list):\n comps = [CompositionalEmbedding(x, embedding) for x in data]\n return [x.feature_vector(stats) for x in tqdm(comps)]\n\n elif isinstance(data, CompositionalEmbedding):\n return data.feature_vector(stats)\n else:\n msg = \"The data must be a pandas DataFrame, Series,\" \" list or CompositionalEmbedding class.\"\n raise TypeError(\n msg,\n )\n
"},{"location":"python_api/composition/#elementembeddings.composition.formula_parser","title":"formula_parser(formula)
","text":"Parse a string formula.
Returns a dictionary of the composition with key:value pairs of element symbol:amount.
formula (str): A string formula e.g. CsPbI3, Li7La3Zr2O12\n
(dict): A dictionary of the composition\n
Source code in src/elementembeddings/composition.py
def formula_parser(formula: str) -> dict[str, float]:\n # TO-DO: Add validation to check composition contains real elements.\n \"\"\"Parse a string formula.\n\n Returns a dictionary of the composition with key:value pairs\n of element symbol:amount.\n\n Args:\n ----\n formula (str): A string formula e.g. CsPbI3, Li7La3Zr2O12\n\n Returns:\n -------\n (dict): A dictionary of the composition\n\n \"\"\"\n # For Metallofullerene\n formula = formula.replace(\"@\", \"\")\n\n regex = r\"\\(([^\\(\\)]+)\\)\\s*([\\.e\\d]*)\"\n r = re.compile(regex)\n m = re.search(r, formula)\n if m:\n factor = 1.0\n if m.group(2) != \"\":\n factor = float(m.group(2))\n unit_sym_dict = _get_sym_dict(m.group(1), factor)\n expanded_sym = \"\".join([f\"{el}{amt}\" for el, amt in unit_sym_dict.items()])\n expanded_formula = formula.replace(m.group(), expanded_sym)\n return formula_parser(expanded_formula)\n return _get_sym_dict(formula, 1)\n
"},{"location":"python_api/composition/#elementembeddings.composition.species_composition_featuriser","title":"species_composition_featuriser(data, embedding='skipspecies', stats='mean', to_dataframe=False)
","text":"Compute a feature vector for a composition.
The feature vector is based on the statistics specified in the stats argument.
data (Union[list, SpeciesCompositionalEmbedding]):\n a list of composition dictionaries, or a SpeciesCompositionalEmbedding class\nembedding (Union[SpeciesEmbedding, str], optional): A SpeciesEmbedding class\n or a string\nstats (Union[str, list], optional): A list of statistics to be computed.\n The default is ['mean'].\nto_dataframe (bool, optional): Whether to return the feature vectors\n as a DataFrame. The default is False.\n
Union[pd.DataFrame,list]: A pandas DataFrame containing the feature vector,\nor a list of feature vectors is returned\n
Source code in src/elementembeddings/composition.py
def species_composition_featuriser(\n data: SpeciesCompositionalEmbedding | list,\n embedding: Embedding | str = \"skipspecies\",\n stats: str | list = \"mean\",\n to_dataframe: bool = False,\n) -> list | pd.DataFrame:\n \"\"\"Compute a feature vector for a composition.\n\n The feature vector is based on the statistics specified\n in the stats argument.\n\n Args:\n ----\n data (Union[list, SpeciesCompositionalEmbedding]):\n a list of composition dictionaries, or a SpeciesCompositionalEmbedding class\n embedding (Union[SpeciesEmbedding, str], optional): A SpeciesEmbedding class\n or a string\n stats (Union[str, list], optional): A list of statistics to be computed.\n The default is ['mean'].\n to_dataframe (bool, optional): Whether to return the feature vectors\n as a DataFrame. The default is False.\n\n Returns:\n -------\n Union[pd.DataFrame,list]: A pandas DataFrame containing the feature vector,\n or a list of feature vectors is returned\n \"\"\"\n if isinstance(stats, str):\n stats = [stats]\n if isinstance(data, list):\n comps = [SpeciesCompositionalEmbedding(x, embedding) for x in data]\n comp_vectors = [x.feature_vector(stats) for x in tqdm(comps, desc=\"Computing feature vectors\")]\n elif isinstance(data, SpeciesCompositionalEmbedding):\n comps = [data]\n comp_vectors = data.feature_vector(stats)\n else:\n msg = \"The data must be a list or SpeciesCompositionalEmbedding class.\"\n raise TypeError(\n msg,\n )\n if to_dataframe:\n feature_names = comps[0].embedding.feature_labels\n feature_names = [f\"{stat}_{feature}\" for stat in stats for feature in feature_names]\n formulae = [x.formula_pretty for x in comps]\n # Create a DataFrame with formula, composition and feature vectors\n df = pd.DataFrame(comp_vectors, columns=feature_names)\n df[\"formula\"] = formulae\n df[\"composition\"] = data\n # Reorder the columns\n return df[[\"formula\", \"composition\", *feature_names]]\n\n return comp_vectors\n
"},{"location":"python_api/core/","title":"Core module","text":"Provides the Embedding
class.
This module enables the user load in elemental representation data and analyse it using statistical functions.
Typical usage examplemegnet16 = Embedding.load_data('megnet16')
"},{"location":"python_api/core/#elementembeddings.core.Embedding","title":"Embedding
","text":" Bases: EmbeddingBase
Represent an elemental representation.
To load an embedding distributed from the package use the load_data() method.
Works like a standard python dictionary. The keys are {element: vector} pairs.
Adds a few convenience methods related to elemental representations.
Source code insrc/elementembeddings/core.py
class Embedding(EmbeddingBase):\n \"\"\"Represent an elemental representation.\n\n To load an embedding distributed from the package use the load_data() method.\n\n Works like a standard python dictionary. The keys are {element: vector} pairs.\n\n Adds a few convenience methods related to elemental representations.\n \"\"\"\n\n @staticmethod\n def load_data(embedding_name: str | None = None):\n \"\"\"Create an instance of the `Embedding` class from a default embedding file.\n\n The default embeddings are in the table below:\n\n | **Name** | **str_name** |\n |-------------------------|--------------|\n | Magpie | magpie |\n | Magpie (scaled) | magpie_sc |\n | Mat2Vec | mat2vec |\n | Matscholar | matscholar |\n | Megnet (16 dimensions) | megnet16 |\n | Modified Pettifor scale | mod_petti |\n | Oliynyk | oliynyk |\n | Oliynyk (scaled) | oliynyk_sc |\n | Random (200 dimensions) | random_200 |\n | SkipAtom | skipatom |\n | Atomic Number | atomic |\n | CrystaLLM | crystallm |\n | XenonPy | xenonpy |\n | Cgnf | cgnf |\n\n\n Args:\n ----\n embedding_name (str): The str_name of an embedding file.\n\n Returns:\n -------\n Embedding :class:`Embedding` instance.\n \"\"\"\n if DEFAULT_ELEMENT_EMBEDDINGS[embedding_name].endswith(\".csv\"):\n return Embedding.from_csv(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n elif \"megnet\" in DEFAULT_ELEMENT_EMBEDDINGS[embedding_name]:\n return Embedding.from_json(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n ).remove_elements([\"Null\"])\n elif DEFAULT_ELEMENT_EMBEDDINGS[embedding_name].endswith(\".json\"):\n return Embedding.from_json(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n else:\n return None\n\n @staticmethod\n def from_json(embedding_json, embedding_name: str | None = None):\n \"\"\"Create an instance of the Embedding class from a json file.\n\n Args:\n ----\n embedding_json (str): Filepath of the json file\n embedding_name (str): The name of the elemental representation\n \"\"\"\n # Need to add validation handling for JSONs in different formats\n with open(embedding_json) as f:\n embedding_data = json.load(f)\n return Embedding(embedding_data, embedding_name)\n\n @staticmethod\n def from_csv(embedding_csv, embedding_name: str | None = None):\n \"\"\"Create an instance of the Embedding class from a csv file.\n\n The first column of the csv file must contain the elements and be named element.\n\n Args:\n ----\n embedding_csv (str): Filepath of the csv file\n embedding_name (str): The name of the elemental representation\n\n \"\"\"\n # Need to add validation handling for csv files\n df = pd.read_csv(embedding_csv)\n elements = list(df[\"element\"])\n df = df.drop([\"element\"], axis=1)\n feature_labels = list(df.columns)\n embeds_array = df.to_numpy()\n embedding_data = {elements[i]: embeds_array[i] for i in range(len(embeds_array))}\n return Embedding(embedding_data, embedding_name, feature_labels)\n\n def as_dataframe(self, columns: str = \"components\") -> pd.DataFrame:\n \"\"\"Return the embedding as a pandas Dataframe.\n\n The first column is the elements and each other\n column represents a component of the embedding.\n\n Args:\n ----\n columns (str): A string to specify if the columns are the vector components\n and the index is the elements (`columns='components'`)\n or the columns are the elements (`columns='elements'`).\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n\n\n \"\"\"\n embedding = self.embeddings\n df = pd.DataFrame(embedding, index=self.feature_labels)\n if columns == \"components\":\n return df.T\n elif columns == \"elements\":\n return df\n else:\n msg = f\"{columns} is not a valid keyword argument. \" f\"Choose either 'components' or 'elements\"\n raise (\n ValueError(\n msg,\n )\n )\n\n def to(self, fmt: str = \"\", filename: str | None = \"\"):\n \"\"\"Output the embedding to a file.\n\n Args:\n ----\n fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\n filename (str): The name of the file to be outputted\n\n Returns:\n -------\n (str) if filename not specified, otherwise None.\n \"\"\"\n fmt = fmt.lower()\n\n if fmt == \"json\" or fnmatch.fnmatch(filename, \"*.json\"):\n j = json.dumps(self.embeddings, cls=NumpyEncoder)\n if filename:\n if not filename.endswith(\".json\"):\n filename = filename + \".json\"\n with open(filename, \"w\") as file:\n file.write(j)\n return None\n else:\n return j\n elif fmt == \"csv\" or fnmatch.fnmatch(filename, \"*.csv\"):\n if filename:\n if not filename.endswith(\".csv\"):\n filename = filename + \".csv\"\n self.as_dataframe().to_csv(filename, index_label=\"element\")\n return None\n else:\n return self.as_dataframe().to_csv(index_label=\"element\")\n\n else:\n msg = f\"{fmt!s} is an invalid file format\"\n raise ValueError(msg)\n\n @property\n def element_list(self) -> list:\n \"\"\"Return the elements of the embedding.\"\"\"\n return self._embeddings_keys_list()\n\n def remove_elements(self, elements: str | list[str], inplace: bool = False):\n # TO-DO allow removal by atomic numbers\n \"\"\"Remove elements from the Embedding instance.\n\n Args:\n ----\n elements (str,list(str)): An element symbol or a list of element symbols\n inplace (bool): If True, elements are removed from the Embedding instance.\n If false, the original embedding instance is unchanged\n and a new embedding instance with the elements removed is created.\n\n \"\"\"\n if inplace:\n if isinstance(elements, str):\n del self.embeddings[elements]\n elif isinstance(elements, list):\n for el in elements:\n del self.embeddings[el]\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n if isinstance(elements, str):\n del embeddings_copy[elements]\n elif isinstance(elements, list):\n for el in elements:\n del embeddings_copy[el]\n return Embedding(embeddings_copy, self.embedding_name)\n\n def standardise(self, inplace: bool = False):\n \"\"\"Standardise the embeddings.\n\n Mean is 0 and standard deviation is 1.\n\n \"\"\"\n if self._is_standardised():\n warnings.warn(\n \"Embedding is already standardised. \" \"Returning None and not changing the embedding.\",\n )\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n embeddings_array = np.array(list(embeddings_copy.values()))\n embeddings_array = StandardScaler().fit_transform(embeddings_array)\n for el, emb in zip(embeddings_copy.keys(), embeddings_array):\n embeddings_copy[el] = emb\n\n if inplace:\n self.embeddings = embeddings_copy\n self.is_standardised = True\n return None\n else:\n return Embedding(embeddings_copy, self.embedding_name)\n\n @property\n def element_groups_dict(self) -> dict[str, str]:\n \"\"\"Return a dictionary of {element: element type} pairs.\n\n e.g. {'He':'Noble gas'}\n\n \"\"\"\n with open(path.join(data_directory, \"element_data/element_group.json\")) as f:\n _dict = json.load(f)\n return {i: _dict[i] for i in self.element_list}\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.element_groups_dict","title":"element_groups_dict: dict[str, str]
property
","text":"Return a dictionary of {element: element type} pairs.
e.g. {'He':'Noble gas'}
"},{"location":"python_api/core/#elementembeddings.core.Embedding.element_list","title":"element_list: list
property
","text":"Return the elements of the embedding.
"},{"location":"python_api/core/#elementembeddings.core.Embedding.as_dataframe","title":"as_dataframe(columns='components')
","text":"Return the embedding as a pandas Dataframe.
The first column is the elements and each other column represents a component of the embedding.
columns (str): A string to specify if the columns are the vector components\nand the index is the elements (`columns='components'`)\nor the columns are the elements (`columns='elements'`).\n
df (pandas.DataFrame): A pandas dataframe object\n
Source code in src/elementembeddings/core.py
def as_dataframe(self, columns: str = \"components\") -> pd.DataFrame:\n \"\"\"Return the embedding as a pandas Dataframe.\n\n The first column is the elements and each other\n column represents a component of the embedding.\n\n Args:\n ----\n columns (str): A string to specify if the columns are the vector components\n and the index is the elements (`columns='components'`)\n or the columns are the elements (`columns='elements'`).\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n\n\n \"\"\"\n embedding = self.embeddings\n df = pd.DataFrame(embedding, index=self.feature_labels)\n if columns == \"components\":\n return df.T\n elif columns == \"elements\":\n return df\n else:\n msg = f\"{columns} is not a valid keyword argument. \" f\"Choose either 'components' or 'elements\"\n raise (\n ValueError(\n msg,\n )\n )\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.from_csv","title":"from_csv(embedding_csv, embedding_name=None)
staticmethod
","text":"Create an instance of the Embedding class from a csv file.
The first column of the csv file must contain the elements and be named element.
embedding_csv (str): Filepath of the csv file\nembedding_name (str): The name of the elemental representation\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef from_csv(embedding_csv, embedding_name: str | None = None):\n \"\"\"Create an instance of the Embedding class from a csv file.\n\n The first column of the csv file must contain the elements and be named element.\n\n Args:\n ----\n embedding_csv (str): Filepath of the csv file\n embedding_name (str): The name of the elemental representation\n\n \"\"\"\n # Need to add validation handling for csv files\n df = pd.read_csv(embedding_csv)\n elements = list(df[\"element\"])\n df = df.drop([\"element\"], axis=1)\n feature_labels = list(df.columns)\n embeds_array = df.to_numpy()\n embedding_data = {elements[i]: embeds_array[i] for i in range(len(embeds_array))}\n return Embedding(embedding_data, embedding_name, feature_labels)\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.from_json","title":"from_json(embedding_json, embedding_name=None)
staticmethod
","text":"Create an instance of the Embedding class from a json file.
embedding_json (str): Filepath of the json file\nembedding_name (str): The name of the elemental representation\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef from_json(embedding_json, embedding_name: str | None = None):\n \"\"\"Create an instance of the Embedding class from a json file.\n\n Args:\n ----\n embedding_json (str): Filepath of the json file\n embedding_name (str): The name of the elemental representation\n \"\"\"\n # Need to add validation handling for JSONs in different formats\n with open(embedding_json) as f:\n embedding_data = json.load(f)\n return Embedding(embedding_data, embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.load_data","title":"load_data(embedding_name=None)
staticmethod
","text":"Create an instance of the Embedding
class from a default embedding file.
The default embeddings are in the table below:
Name str_name Magpie magpie Magpie (scaled) magpie_sc Mat2Vec mat2vec Matscholar matscholar Megnet (16 dimensions) megnet16 Modified Pettifor scale mod_petti Oliynyk oliynyk Oliynyk (scaled) oliynyk_sc Random (200 dimensions) random_200 SkipAtom skipatom Atomic Number atomic CrystaLLM crystallm XenonPy xenonpy Cgnf cgnfembedding_name (str): The str_name of an embedding file.\n
Embedding :class:`Embedding` instance.\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef load_data(embedding_name: str | None = None):\n \"\"\"Create an instance of the `Embedding` class from a default embedding file.\n\n The default embeddings are in the table below:\n\n | **Name** | **str_name** |\n |-------------------------|--------------|\n | Magpie | magpie |\n | Magpie (scaled) | magpie_sc |\n | Mat2Vec | mat2vec |\n | Matscholar | matscholar |\n | Megnet (16 dimensions) | megnet16 |\n | Modified Pettifor scale | mod_petti |\n | Oliynyk | oliynyk |\n | Oliynyk (scaled) | oliynyk_sc |\n | Random (200 dimensions) | random_200 |\n | SkipAtom | skipatom |\n | Atomic Number | atomic |\n | CrystaLLM | crystallm |\n | XenonPy | xenonpy |\n | Cgnf | cgnf |\n\n\n Args:\n ----\n embedding_name (str): The str_name of an embedding file.\n\n Returns:\n -------\n Embedding :class:`Embedding` instance.\n \"\"\"\n if DEFAULT_ELEMENT_EMBEDDINGS[embedding_name].endswith(\".csv\"):\n return Embedding.from_csv(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n elif \"megnet\" in DEFAULT_ELEMENT_EMBEDDINGS[embedding_name]:\n return Embedding.from_json(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n ).remove_elements([\"Null\"])\n elif DEFAULT_ELEMENT_EMBEDDINGS[embedding_name].endswith(\".json\"):\n return Embedding.from_json(\n path.join(\n data_directory,\n \"element_representations\",\n DEFAULT_ELEMENT_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n else:\n return None\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.remove_elements","title":"remove_elements(elements, inplace=False)
","text":"Remove elements from the Embedding instance.
elements (str,list(str)): An element symbol or a list of element symbols\ninplace (bool): If True, elements are removed from the Embedding instance.\nIf false, the original embedding instance is unchanged\nand a new embedding instance with the elements removed is created.\n
Source code in src/elementembeddings/core.py
def remove_elements(self, elements: str | list[str], inplace: bool = False):\n # TO-DO allow removal by atomic numbers\n \"\"\"Remove elements from the Embedding instance.\n\n Args:\n ----\n elements (str,list(str)): An element symbol or a list of element symbols\n inplace (bool): If True, elements are removed from the Embedding instance.\n If false, the original embedding instance is unchanged\n and a new embedding instance with the elements removed is created.\n\n \"\"\"\n if inplace:\n if isinstance(elements, str):\n del self.embeddings[elements]\n elif isinstance(elements, list):\n for el in elements:\n del self.embeddings[el]\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n if isinstance(elements, str):\n del embeddings_copy[elements]\n elif isinstance(elements, list):\n for el in elements:\n del embeddings_copy[el]\n return Embedding(embeddings_copy, self.embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.standardise","title":"standardise(inplace=False)
","text":"Standardise the embeddings.
Mean is 0 and standard deviation is 1.
Source code insrc/elementembeddings/core.py
def standardise(self, inplace: bool = False):\n \"\"\"Standardise the embeddings.\n\n Mean is 0 and standard deviation is 1.\n\n \"\"\"\n if self._is_standardised():\n warnings.warn(\n \"Embedding is already standardised. \" \"Returning None and not changing the embedding.\",\n )\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n embeddings_array = np.array(list(embeddings_copy.values()))\n embeddings_array = StandardScaler().fit_transform(embeddings_array)\n for el, emb in zip(embeddings_copy.keys(), embeddings_array):\n embeddings_copy[el] = emb\n\n if inplace:\n self.embeddings = embeddings_copy\n self.is_standardised = True\n return None\n else:\n return Embedding(embeddings_copy, self.embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.Embedding.to","title":"to(fmt='', filename='')
","text":"Output the embedding to a file.
fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\nfilename (str): The name of the file to be outputted\n
(str) if filename not specified, otherwise None.\n
Source code in src/elementembeddings/core.py
def to(self, fmt: str = \"\", filename: str | None = \"\"):\n \"\"\"Output the embedding to a file.\n\n Args:\n ----\n fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\n filename (str): The name of the file to be outputted\n\n Returns:\n -------\n (str) if filename not specified, otherwise None.\n \"\"\"\n fmt = fmt.lower()\n\n if fmt == \"json\" or fnmatch.fnmatch(filename, \"*.json\"):\n j = json.dumps(self.embeddings, cls=NumpyEncoder)\n if filename:\n if not filename.endswith(\".json\"):\n filename = filename + \".json\"\n with open(filename, \"w\") as file:\n file.write(j)\n return None\n else:\n return j\n elif fmt == \"csv\" or fnmatch.fnmatch(filename, \"*.csv\"):\n if filename:\n if not filename.endswith(\".csv\"):\n filename = filename + \".csv\"\n self.as_dataframe().to_csv(filename, index_label=\"element\")\n return None\n else:\n return self.as_dataframe().to_csv(index_label=\"element\")\n\n else:\n msg = f\"{fmt!s} is an invalid file format\"\n raise ValueError(msg)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding","title":"SpeciesEmbedding
","text":" Bases: EmbeddingBase
Represent an ion representation.
To load an embedding distributed from the package use the load_data() method.
Works like a standard python dictionary. The keys are {species: vector} pairs.
Source code insrc/elementembeddings/core.py
class SpeciesEmbedding(EmbeddingBase):\n \"\"\"Represent an ion representation.\n\n To load an embedding distributed from the package use the load_data() method.\n\n Works like a standard python dictionary. The keys are {species: vector} pairs.\n \"\"\"\n\n @staticmethod\n def load_data(embedding_name: str, include_neutral: bool = False):\n \"\"\"Create a `SpeciesEmbedding` from a preset embedding file.\n\n The default embeddings are in the table below:\n\n | **Name** | **str_name** |\n |-------------------------|--------------|\n | SkipSpecies (200 dim, MPv2022) | skipspecies |\n | SkipSpecies (induced, 200 dim, MPv2022) | skipspecies_induced |\n\n Args:\n ----\n embedding_name (str): The str_name of the species representation\n include_neutral (bool): If True, neutral species are\n included in the embedding\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n \"\"\"\n if DEFAULT_SPECIES_EMBEDDINGS[embedding_name].endswith(\".csv\"):\n embedding = SpeciesEmbedding.from_csv(\n path.join(\n data_directory,\n \"species_representations\",\n DEFAULT_SPECIES_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n if not include_neutral:\n embedding.remove_neutral_species(inplace=True)\n return embedding\n elif DEFAULT_SPECIES_EMBEDDINGS[embedding_name].endswith(\".json\"):\n embedding = SpeciesEmbedding.from_json(\n path.join(\n data_directory,\n \"species_representations\",\n DEFAULT_SPECIES_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n if not include_neutral:\n embedding.remove_neutral_species(inplace=True)\n return embedding\n else:\n return None\n\n @staticmethod\n def from_csv(csv_path, embedding_name: str | None = None):\n \"\"\"Create an instance of the SpeciesEmbedding class from a csv file.\n\n The first column of the csv file must contain the species and be named species.\n\n Args:\n ----\n csv_path (str): Filepath of the csv file\n embedding_name (str): The name of the species representation\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n\n \"\"\"\n # Need to add validation handling for csv files\n df = pd.read_csv(csv_path)\n species = list(df[\"species\"])\n df = df.drop([\"species\"], axis=1)\n feature_labels = list(df.columns)\n embeds_array = df.to_numpy()\n embedding_data = {species[i]: embeds_array[i] for i in range(len(embeds_array))}\n return SpeciesEmbedding(embedding_data, embedding_name, feature_labels)\n\n @staticmethod\n def from_json(json_path, embedding_name: str | None = None):\n \"\"\"Create an instance of the SpeciesEmbedding class from a json file.\n\n Args:\n ----\n json_path (str): Filepath of the json file\n embedding_name (str): The name of the species representation\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n\n \"\"\"\n # Need to add validation handling for json files\n with open(json_path) as f:\n embedding_data = json.load(f)\n return SpeciesEmbedding(embedding_data, embedding_name)\n\n @property\n def species_list(self) -> list:\n \"\"\"Return the species of the embedding.\"\"\"\n return list(self.embeddings.keys())\n\n @property\n def element_list(self) -> list:\n \"\"\"Return the elements of the embedding.\"\"\"\n return list({parse_species(species)[0] for species in self.species_list})\n\n def remove_neutral_species(self, inplace: bool = False):\n \"\"\"Remove neutral species from the SpeciesEmbedding instance.\n\n Args:\n ----\n inplace (bool): If True, neutral species are removed\n from the SpeciesEmbedding instance.\n If false, the original SpeciesEmbedding instance is unchanged\n and a new SpeciesEmbedding instance with the\n neutral species removed is created.\n\n \"\"\"\n neutral_species = [s for s in self.species_list if parse_species(s)[1] == 0]\n return self.remove_species(neutral_species, inplace)\n\n def get_element_oxi_states(self, el: str) -> list:\n \"\"\"Return the oxidation states for a given element.\n\n Args:\n ----\n el (str): An element symbol\n\n Returns:\n -------\n oxidation_states (list[int]): A list of oxidation states\n \"\"\"\n assert el in self.element_list, f\"There are no species of the element {el} in this SpeciesEmbedding\"\n parsed_species = [parse_species(species) for species in self.species_list]\n\n el_species_list = [species for species in parsed_species if species[0] == el]\n oxidation_states = [species[1] for species in el_species_list]\n return sorted(oxidation_states)\n\n def remove_species(self, species: str | list[str], inplace: bool = False):\n \"\"\"Remove species from the SpeciesEmbedding instance.\n\n Args:\n ----\n species (str,list(str)): A species or a list of species\n inplace (bool): If True, species are removed\n from the SpeciesEmbedding instance.\n If false, the original SpeciesEmbedding instance is unchanged\n and a new SpeciesEmbedding instance with the species removed is created.\n\n \"\"\"\n if inplace:\n if isinstance(species, str):\n try:\n del self.embeddings[species]\n except KeyError:\n warnings.warn(\n f\"{species} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n elif isinstance(species, list):\n for sp in species:\n try:\n del self.embeddings[sp]\n except KeyError:\n warnings.warn(\n f\"{sp} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n if isinstance(species, str):\n try:\n del embeddings_copy[species]\n except KeyError:\n warnings.warn(\n f\"{species} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n elif isinstance(species, list):\n for sp in species:\n try:\n del embeddings_copy[sp]\n except KeyError:\n warnings.warn(\n f\"{sp} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n return SpeciesEmbedding(embeddings_copy, self.embedding_name)\n\n @property\n def ion_type_dict(self) -> dict[str, str]:\n \"\"\"Return a dictionary of {species: ion type} pairs.\n\n e.g. {'Fe2+':'cation'}\n\n \"\"\"\n ion_dict = {}\n for species in self.species_list:\n el, charge = parse_species(species)\n if charge > 0:\n ion_dict[species] = \"Cation\"\n elif charge < 0:\n ion_dict[species] = \"Anion\"\n else:\n ion_dict[species] = \"Neutral\"\n\n return ion_dict\n\n @property\n def species_groups_dict(self) -> dict[str, str]:\n \"\"\"Return a dictionary of {species: element type} pairs.\n\n e.g. {'Fe2+':'transition metal'}\n\n \"\"\"\n with open(path.join(data_directory, \"element_data/element_group.json\")) as f:\n _dict = json.load(f)\n return {i: _dict[parse_species(i)[0]] for i in self.species_list}\n\n def distance_df(self, metric=\"euclidean\") -> pd.DataFrame:\n \"\"\"Return a dataframe of the distance between species.\n\n Args:\n ----\n metric (str): The metric to use to calculate the distance.\n Options are 'euclidean', 'cosine', 'manhattan' and 'chebyshev'.\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n \"\"\"\n return super().distance_df(metric).rename(mapper={\"ele_1\": \"species_1\", \"ele_2\": \"species_2\"}, axis=1)\n\n def correlation_df(self, metric: str = \"pearson\") -> pd.DataFrame:\n \"\"\"Return a dataframe of the correlation between species.\n\n Args:\n ----\n metric (str): The metric to use to calculate the correlation.\n Options are 'pearson' and 'spearman'.\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n\n \"\"\"\n return super().correlation_df(metric).rename(mapper={\"ele_1\": \"species_1\", \"ele_2\": \"species_2\"}, axis=1)\n\n def to(self, fmt: str = \"\", filename: str | None = \"\"):\n \"\"\"Output the embedding to a file.\n\n Args:\n ----\n fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\n filename (str): The name of the file to be outputted\n\n Returns:\n -------\n (str) if filename not specified, otherwise None.\n \"\"\"\n fmt = fmt.lower()\n\n if fmt == \"json\" or fnmatch.fnmatch(filename, \"*.json\"):\n j = json.dumps(self.embeddings, cls=NumpyEncoder)\n if filename:\n if not filename.endswith(\".json\"):\n filename = filename + \".json\"\n with open(filename, \"w\") as file:\n file.write(j)\n return None\n else:\n return j\n return None\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.element_list","title":"element_list: list
property
","text":"Return the elements of the embedding.
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.ion_type_dict","title":"ion_type_dict: dict[str, str]
property
","text":"Return a dictionary of {species: ion type} pairs.
e.g. {'Fe2+':'cation'}
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.species_groups_dict","title":"species_groups_dict: dict[str, str]
property
","text":"Return a dictionary of {species: element type} pairs.
e.g. {'Fe2+':'transition metal'}
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.species_list","title":"species_list: list
property
","text":"Return the species of the embedding.
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.correlation_df","title":"correlation_df(metric='pearson')
","text":"Return a dataframe of the correlation between species.
metric (str): The metric to use to calculate the correlation.\nOptions are 'pearson' and 'spearman'.\n
df (pandas.DataFrame): A pandas dataframe object\n
Source code in src/elementembeddings/core.py
def correlation_df(self, metric: str = \"pearson\") -> pd.DataFrame:\n \"\"\"Return a dataframe of the correlation between species.\n\n Args:\n ----\n metric (str): The metric to use to calculate the correlation.\n Options are 'pearson' and 'spearman'.\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n\n \"\"\"\n return super().correlation_df(metric).rename(mapper={\"ele_1\": \"species_1\", \"ele_2\": \"species_2\"}, axis=1)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.distance_df","title":"distance_df(metric='euclidean')
","text":"Return a dataframe of the distance between species.
metric (str): The metric to use to calculate the distance.\nOptions are 'euclidean', 'cosine', 'manhattan' and 'chebyshev'.\n
df (pandas.DataFrame): A pandas dataframe object\n
Source code in src/elementembeddings/core.py
def distance_df(self, metric=\"euclidean\") -> pd.DataFrame:\n \"\"\"Return a dataframe of the distance between species.\n\n Args:\n ----\n metric (str): The metric to use to calculate the distance.\n Options are 'euclidean', 'cosine', 'manhattan' and 'chebyshev'.\n\n Returns:\n -------\n df (pandas.DataFrame): A pandas dataframe object\n \"\"\"\n return super().distance_df(metric).rename(mapper={\"ele_1\": \"species_1\", \"ele_2\": \"species_2\"}, axis=1)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.from_csv","title":"from_csv(csv_path, embedding_name=None)
staticmethod
","text":"Create an instance of the SpeciesEmbedding class from a csv file.
The first column of the csv file must contain the species and be named species.
csv_path (str): Filepath of the csv file\nembedding_name (str): The name of the species representation\n
SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef from_csv(csv_path, embedding_name: str | None = None):\n \"\"\"Create an instance of the SpeciesEmbedding class from a csv file.\n\n The first column of the csv file must contain the species and be named species.\n\n Args:\n ----\n csv_path (str): Filepath of the csv file\n embedding_name (str): The name of the species representation\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n\n \"\"\"\n # Need to add validation handling for csv files\n df = pd.read_csv(csv_path)\n species = list(df[\"species\"])\n df = df.drop([\"species\"], axis=1)\n feature_labels = list(df.columns)\n embeds_array = df.to_numpy()\n embedding_data = {species[i]: embeds_array[i] for i in range(len(embeds_array))}\n return SpeciesEmbedding(embedding_data, embedding_name, feature_labels)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.from_json","title":"from_json(json_path, embedding_name=None)
staticmethod
","text":"Create an instance of the SpeciesEmbedding class from a json file.
json_path (str): Filepath of the json file\nembedding_name (str): The name of the species representation\n
SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef from_json(json_path, embedding_name: str | None = None):\n \"\"\"Create an instance of the SpeciesEmbedding class from a json file.\n\n Args:\n ----\n json_path (str): Filepath of the json file\n embedding_name (str): The name of the species representation\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n\n \"\"\"\n # Need to add validation handling for json files\n with open(json_path) as f:\n embedding_data = json.load(f)\n return SpeciesEmbedding(embedding_data, embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.get_element_oxi_states","title":"get_element_oxi_states(el)
","text":"Return the oxidation states for a given element.
el (str): An element symbol\n
oxidation_states (list[int]): A list of oxidation states\n
Source code in src/elementembeddings/core.py
def get_element_oxi_states(self, el: str) -> list:\n \"\"\"Return the oxidation states for a given element.\n\n Args:\n ----\n el (str): An element symbol\n\n Returns:\n -------\n oxidation_states (list[int]): A list of oxidation states\n \"\"\"\n assert el in self.element_list, f\"There are no species of the element {el} in this SpeciesEmbedding\"\n parsed_species = [parse_species(species) for species in self.species_list]\n\n el_species_list = [species for species in parsed_species if species[0] == el]\n oxidation_states = [species[1] for species in el_species_list]\n return sorted(oxidation_states)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.load_data","title":"load_data(embedding_name, include_neutral=False)
staticmethod
","text":"Create a SpeciesEmbedding
from a preset embedding file.
The default embeddings are in the table below:
Name str_name SkipSpecies (200 dim, MPv2022) skipspecies SkipSpecies (induced, 200 dim, MPv2022) skipspecies_inducedembedding_name (str): The str_name of the species representation\ninclude_neutral (bool): If True, neutral species are\n included in the embedding\n
SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n
Source code in src/elementembeddings/core.py
@staticmethod\ndef load_data(embedding_name: str, include_neutral: bool = False):\n \"\"\"Create a `SpeciesEmbedding` from a preset embedding file.\n\n The default embeddings are in the table below:\n\n | **Name** | **str_name** |\n |-------------------------|--------------|\n | SkipSpecies (200 dim, MPv2022) | skipspecies |\n | SkipSpecies (induced, 200 dim, MPv2022) | skipspecies_induced |\n\n Args:\n ----\n embedding_name (str): The str_name of the species representation\n include_neutral (bool): If True, neutral species are\n included in the embedding\n\n Returns:\n -------\n SpeciesEmbedding :class:`SpeciesEmbedding` instance.\n \"\"\"\n if DEFAULT_SPECIES_EMBEDDINGS[embedding_name].endswith(\".csv\"):\n embedding = SpeciesEmbedding.from_csv(\n path.join(\n data_directory,\n \"species_representations\",\n DEFAULT_SPECIES_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n if not include_neutral:\n embedding.remove_neutral_species(inplace=True)\n return embedding\n elif DEFAULT_SPECIES_EMBEDDINGS[embedding_name].endswith(\".json\"):\n embedding = SpeciesEmbedding.from_json(\n path.join(\n data_directory,\n \"species_representations\",\n DEFAULT_SPECIES_EMBEDDINGS[embedding_name],\n ),\n embedding_name,\n )\n if not include_neutral:\n embedding.remove_neutral_species(inplace=True)\n return embedding\n else:\n return None\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.remove_neutral_species","title":"remove_neutral_species(inplace=False)
","text":"Remove neutral species from the SpeciesEmbedding instance.
inplace (bool): If True, neutral species are removed\n from the SpeciesEmbedding instance.\nIf false, the original SpeciesEmbedding instance is unchanged\nand a new SpeciesEmbedding instance with the\n neutral species removed is created.\n
Source code in src/elementembeddings/core.py
def remove_neutral_species(self, inplace: bool = False):\n \"\"\"Remove neutral species from the SpeciesEmbedding instance.\n\n Args:\n ----\n inplace (bool): If True, neutral species are removed\n from the SpeciesEmbedding instance.\n If false, the original SpeciesEmbedding instance is unchanged\n and a new SpeciesEmbedding instance with the\n neutral species removed is created.\n\n \"\"\"\n neutral_species = [s for s in self.species_list if parse_species(s)[1] == 0]\n return self.remove_species(neutral_species, inplace)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.remove_species","title":"remove_species(species, inplace=False)
","text":"Remove species from the SpeciesEmbedding instance.
species (str,list(str)): A species or a list of species\ninplace (bool): If True, species are removed\nfrom the SpeciesEmbedding instance.\nIf false, the original SpeciesEmbedding instance is unchanged\nand a new SpeciesEmbedding instance with the species removed is created.\n
Source code in src/elementembeddings/core.py
def remove_species(self, species: str | list[str], inplace: bool = False):\n \"\"\"Remove species from the SpeciesEmbedding instance.\n\n Args:\n ----\n species (str,list(str)): A species or a list of species\n inplace (bool): If True, species are removed\n from the SpeciesEmbedding instance.\n If false, the original SpeciesEmbedding instance is unchanged\n and a new SpeciesEmbedding instance with the species removed is created.\n\n \"\"\"\n if inplace:\n if isinstance(species, str):\n try:\n del self.embeddings[species]\n except KeyError:\n warnings.warn(\n f\"{species} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n elif isinstance(species, list):\n for sp in species:\n try:\n del self.embeddings[sp]\n except KeyError:\n warnings.warn(\n f\"{sp} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n return None\n else:\n embeddings_copy = self.embeddings.copy()\n if isinstance(species, str):\n try:\n del embeddings_copy[species]\n except KeyError:\n warnings.warn(\n f\"{species} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n elif isinstance(species, list):\n for sp in species:\n try:\n del embeddings_copy[sp]\n except KeyError:\n warnings.warn(\n f\"{sp} is not in the SpeciesEmbedding. \" \"Skipping this species.\",\n )\n return SpeciesEmbedding(embeddings_copy, self.embedding_name)\n
"},{"location":"python_api/core/#elementembeddings.core.SpeciesEmbedding.to","title":"to(fmt='', filename='')
","text":"Output the embedding to a file.
fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\nfilename (str): The name of the file to be outputted\n
(str) if filename not specified, otherwise None.\n
Source code in src/elementembeddings/core.py
def to(self, fmt: str = \"\", filename: str | None = \"\"):\n \"\"\"Output the embedding to a file.\n\n Args:\n ----\n fmt (str): The file format to output the embedding to.\n Options include \"json\" and \"csv\".\n filename (str): The name of the file to be outputted\n\n Returns:\n -------\n (str) if filename not specified, otherwise None.\n \"\"\"\n fmt = fmt.lower()\n\n if fmt == \"json\" or fnmatch.fnmatch(filename, \"*.json\"):\n j = json.dumps(self.embeddings, cls=NumpyEncoder)\n if filename:\n if not filename.endswith(\".json\"):\n filename = filename + \".json\"\n with open(filename, \"w\") as file:\n file.write(j)\n return None\n else:\n return j\n return None\n
"},{"location":"python_api/plotter/","title":"Plotter module","text":"Provides the plotting functions for visualising Embeddings.
"},{"location":"python_api/plotter/#elementembeddings.plotter.dimension_plotter","title":"dimension_plotter(embedding, ax=None, n_components=2, reducer='umap', adjusttext=True, reducer_params=None, scatter_params=None, include_species=None)
","text":"Plot the reduced dimensions of the embeddings.
embedding (Embedding): The embedding to be plotted.\nax (plt.axes, optional): The axes to plot on, by default None\nn_components (int): The number of components to reduce to, by default 2\nreducer (str): The dimensionality reduction algorithm to use, by default \"umap\"\nadjusttext (bool): Whether to avoid overlap of the text labels, by default True\nreducer_params (dict, optional): Additional keyword arguments to pass to\nthe reducer, by default None\nscatter_params (dict, optional): Additional keyword arguments to pass to\nthe scatterplot, by default None\ninclude_species (list, optional): The elements/species to include in the plot,\n
Source code in src/elementembeddings/plotter.py
def dimension_plotter(\n embedding: Embedding | SpeciesEmbedding,\n ax: plt.axes | None = None,\n n_components: int = 2,\n reducer: str = \"umap\",\n adjusttext: bool = True,\n reducer_params: dict | None = None,\n scatter_params: dict | None = None,\n include_species: list | None = None,\n):\n \"\"\"Plot the reduced dimensions of the embeddings.\n\n Args:\n ----\n embedding (Embedding): The embedding to be plotted.\n ax (plt.axes, optional): The axes to plot on, by default None\n n_components (int): The number of components to reduce to, by default 2\n reducer (str): The dimensionality reduction algorithm to use, by default \"umap\"\n adjusttext (bool): Whether to avoid overlap of the text labels, by default True\n reducer_params (dict, optional): Additional keyword arguments to pass to\n the reducer, by default None\n scatter_params (dict, optional): Additional keyword arguments to pass to\n the scatterplot, by default None\n include_species (list, optional): The elements/species to include in the plot,\n\n \"\"\"\n if reducer_params is None:\n reducer_params = {}\n if reducer == \"umap\":\n reduced = embedding.calculate_umap(n_components=n_components, **reducer_params)\n elif reducer == \"tsne\":\n reduced = embedding.calculate_tsne(n_components=n_components, **reducer_params)\n elif reducer == \"pca\":\n reduced = embedding.calculate_pca(n_components=n_components, **reducer_params)\n else:\n msg = \"Unrecognised reducer.\"\n raise ValueError(msg)\n\n if isinstance(embedding, Embedding):\n group_dict = embedding.element_groups_dict\n el_sp_array = np.array(embedding.element_list)\n\n data = {\n \"x\": reduced[:, 0],\n \"y\": reduced[:, 1],\n \"element\": el_sp_array,\n \"Group\": list(group_dict.values()),\n }\n elif isinstance(embedding, SpeciesEmbedding):\n group_dict = embedding.species_groups_dict\n el_sp_array = np.array(embedding.species_list)\n ion_type = embedding.ion_type_dict\n data = {\n \"x\": reduced[:, 0],\n \"y\": reduced[:, 1],\n \"element\": el_sp_array,\n \"Group\": list(group_dict.values()),\n \"ion_type\": list(ion_type.values()),\n }\n if reduced.shape[1] == 2:\n df = pd.DataFrame(data)\n if include_species:\n df = df[df[\"element\"].isin(include_species)].reset_index(drop=True)\n if not ax:\n fig, ax = plt.subplots()\n if scatter_params is None:\n scatter_params = {}\n if isinstance(embedding, SpeciesEmbedding):\n sns.scatterplot(\n data=df,\n x=\"x\",\n y=\"y\",\n hue=\"Group\",\n ax=ax,\n palette=ELEMENT_GROUPS_PALETTES,\n style=\"ion_type\",\n **scatter_params,\n )\n # Convert the species to (element, charge) format\n parsed_species = [parse_species(spec) for spec in df[\"element\"].tolist()]\n signs = [get_sign(charge) for _, charge in parsed_species]\n\n species_labels = [\n rf\"$\\mathregular{{{element}^{{{abs(charge)}{sign}}}}}$\"\n for (element, charge), sign in zip(parsed_species, signs)\n ]\n\n texts = [ax.text(df[\"x\"][i], df[\"y\"][i], species_labels[i], fontsize=12) for i in range(len(df))]\n elif isinstance(embedding, Embedding):\n sns.scatterplot(\n data=df,\n x=\"x\",\n y=\"y\",\n hue=\"Group\",\n ax=ax,\n palette=ELEMENT_GROUPS_PALETTES,\n **scatter_params,\n )\n texts = [ax.text(df[\"x\"][i], df[\"y\"][i], df[\"element\"][i], fontsize=12) for i in range(len(df))]\n ax.set_xlabel(\"Dimension 1\")\n ax.set_ylabel(\"Dimension 2\")\n if adjusttext:\n adjust_text(\n texts,\n arrowprops={\"arrowstyle\": \"-\", \"color\": \"gray\", \"lw\": 0.5},\n ax=ax,\n )\n\n elif reduced.shape[1] == 3:\n df = pd.DataFrame(\n {\n \"x\": reduced[:, 0],\n \"y\": reduced[:, 1],\n \"z\": reduced[:, 2],\n \"element\": el_sp_array,\n \"group\": list(group_dict.values()),\n },\n )\n if include_species:\n df = df[df[\"element\"].isin(include_species)].reset_index(drop=True)\n if not ax:\n fig = plt.figure() # noqa: F841\n ax = plt.axes(projection=\"3d\")\n ax.scatter3D(\n df[\"x\"],\n df[\"y\"],\n df[\"z\"],\n )\n ax.set_xlabel(\"Dimension 1\")\n ax.set_ylabel(\"Dimension 2\")\n ax.set_zlabel(\"Dimension 3\")\n for i in range(len(df)):\n ax.text(df[\"x\"][i], df[\"y\"][i], df[\"z\"][i], df[\"element\"][i], fontsize=12)\n else:\n msg = \"Unrecognised number of dimensions.\"\n raise ValueError(msg)\n ax.set_title(embedding.embedding_name, fontdict={\"fontweight\": \"bold\"})\n return ax\n
"},{"location":"python_api/plotter/#elementembeddings.plotter.heatmap_plotter","title":"heatmap_plotter(embedding, metric, cmap='Blues', sortaxisby='mendeleev', ax=None, show_axislabels=True, **kwargs)
","text":"Plot multiple heatmaps of the embeddings.
embedding (Embedding): The embeddings to be plotted.\nmetric (str): The distance metric / similarity measure to be plotted.\ncmap (str): The colourmap for the heatmap.\nsortaxisby (str, optional): The attribute to sort the axis by,\nby default \"mendeleev_number\".\nOptions are \"mendeleev_number\", \"atomic_number\"\nax (plt.axes, optional): The axes to plot on, by default None\nshow_axislabels (bool, optional): Whether to show the axis, by default True\n**kwargs: Additional keyword arguments to pass to seaborn.heatmap\n
Source code in src/elementembeddings/plotter.py
def heatmap_plotter(\n embedding: Embedding | SpeciesEmbedding,\n metric: str,\n cmap: str = \"Blues\",\n sortaxisby: str = \"mendeleev\",\n ax: plt.axes | None = None,\n show_axislabels: bool = True,\n **kwargs,\n):\n \"\"\"Plot multiple heatmaps of the embeddings.\n\n Args:\n ----\n embedding (Embedding): The embeddings to be plotted.\n metric (str): The distance metric / similarity measure to be plotted.\n cmap (str): The colourmap for the heatmap.\n sortaxisby (str, optional): The attribute to sort the axis by,\n by default \"mendeleev_number\".\n Options are \"mendeleev_number\", \"atomic_number\"\n ax (plt.axes, optional): The axes to plot on, by default None\n show_axislabels (bool, optional): Whether to show the axis, by default True\n **kwargs: Additional keyword arguments to pass to seaborn.heatmap\n\n \"\"\"\n if not ax:\n fig, ax = plt.subplots()\n\n correlation_metrics = [\"spearman\", \"pearson\", \"cosine_similarity\"]\n distance_metrics = [\n \"euclidean\",\n \"manhattan\",\n \"cosine_distance\",\n \"chebyshev\",\n \"wasserstein\",\n \"energy\",\n ]\n if metric in correlation_metrics:\n p = embedding.correlation_pivot_table(metric=metric, sortby=sortaxisby)\n\n elif metric in distance_metrics:\n p = embedding.distance_pivot_table(metric=metric, sortby=sortaxisby)\n else:\n raise ValueError(\"Unrecognised metric.\")\n xlabels = [i[1] for i in p.index]\n ylabels = [i[1] for i in p.columns]\n sns.heatmap(\n p,\n cmap=cmap,\n square=\"True\",\n linecolor=\"k\",\n ax=ax,\n cbar_kws={\n \"shrink\": 0.5,\n },\n xticklabels=True,\n yticklabels=True,\n **kwargs,\n )\n ax.set_title(\n embedding.embedding_name,\n fontdict={\n \"fontweight\": \"bold\",\n },\n )\n if not show_axislabels:\n ax.set_xticklabels([])\n ax.set_yticklabels([])\n ax.set_xticks([])\n ax.set_yticks([])\n else:\n ax.set_xticklabels(\n xlabels,\n )\n ax.set_yticklabels(ylabels)\n ax.set_xlabel(\"\")\n ax.set_ylabel(\"\")\n return ax\n
"},{"location":"python_api/python_api/","title":"ElementEmbeddings Python package","text":"The core module of the ElementEmbeddings
contains the Embedding
class which is used to store and manipulate elemental representation data. This part of the project documentation provides the python API for the ElementEmbeddings
package.
Core module
Composition module
Plotter module
"},{"location":"python_api/utils/io/","title":"io","text":"IO utils for AtomicEmbeddings.
"},{"location":"python_api/utils/io/#elementembeddings.utils.io.NumpyEncoder","title":"NumpyEncoder
","text":" Bases: JSONEncoder
Special json encoder for numpy types.
Source code insrc/elementembeddings/utils/io.py
class NumpyEncoder(json.JSONEncoder):\n \"\"\"Special json encoder for numpy types.\"\"\"\n\n def default(self, obj):\n \"\"\"Encode numpy types.\"\"\"\n if isinstance(obj, np.ndarray):\n return obj.tolist()\n return json.JSONEncoder.default(self, obj)\n
"},{"location":"python_api/utils/io/#elementembeddings.utils.io.NumpyEncoder.default","title":"default(obj)
","text":"Encode numpy types.
Source code insrc/elementembeddings/utils/io.py
def default(self, obj):\n \"\"\"Encode numpy types.\"\"\"\n if isinstance(obj, np.ndarray):\n return obj.tolist()\n return json.JSONEncoder.default(self, obj)\n
"},{"location":"python_api/utils/math/","title":"Math","text":"Math functions for the AtomicEmbeddings package.
"},{"location":"python_api/utils/math/#elementembeddings.utils.math.cosine_distance","title":"cosine_distance(a, b)
","text":"Cosine distance of two vectors.
Source code insrc/elementembeddings/utils/math.py
def cosine_distance(\n a: list[int | float],\n b: list[int | float],\n) -> int | float:\n \"\"\"Cosine distance of two vectors.\"\"\"\n return 1 - cosine_similarity(a, b)\n
"},{"location":"python_api/utils/math/#elementembeddings.utils.math.cosine_similarity","title":"cosine_similarity(a, b)
","text":"Cosine similarity of two vectors.
Source code insrc/elementembeddings/utils/math.py
def cosine_similarity(\n a: list[int | float],\n b: list[int | float],\n) -> int | float:\n \"\"\"Cosine similarity of two vectors.\"\"\"\n return dot(a, b) / ((dot(a, a) ** 0.5) * (dot(b, b) ** 0.5))\n
"},{"location":"python_api/utils/math/#elementembeddings.utils.math.dot","title":"dot(a, b)
","text":"Dot product of two vectors.
Source code insrc/elementembeddings/utils/math.py
def dot(a: list[int | float], b: list[int | float]) -> int | float:\n \"\"\"Dot product of two vectors.\"\"\"\n return sum(map(operator.mul, a, b))\n
"},{"location":"python_api/utils/species/","title":"Species","text":"Utilities for species.
"},{"location":"python_api/utils/species/#elementembeddings.utils.species.get_sign","title":"get_sign(charge)
","text":"Get string representation of a number's sign.
Parameters:
Name Type Description Defaultcharge
int
The number whose sign to derive.
requiredReturns:
Name Type Descriptionsign
str
either '+', '-', or '' for neutral.
Source code insrc/elementembeddings/utils/species.py
def get_sign(charge: int) -> str:\n \"\"\"Get string representation of a number's sign.\n\n Args:\n charge (int): The number whose sign to derive.\n\n Returns:\n sign (str): either '+', '-', or '' for neutral.\n\n \"\"\"\n if charge > 0:\n return \"+\"\n elif charge < 0:\n return \"-\"\n else:\n return \"\"\n
"},{"location":"python_api/utils/species/#elementembeddings.utils.species.parse_species","title":"parse_species(species)
","text":"Parse a species string into its atomic symbol and oxidation state.
:param species: the species string :return: a tuple of the atomic symbol and oxidation state
Source code insrc/elementembeddings/utils/species.py
def parse_species(species: str) -> tuple[str, int]:\n \"\"\"\n Parse a species string into its atomic symbol and oxidation state.\n\n :param species: the species string\n :return: a tuple of the atomic symbol and oxidation state\n\n \"\"\"\n try:\n ele, oxi_state = re.match(r\"([A-Za-z]+)([0-9]*[\\+\\-])\", species).groups()\n if oxi_state[-1] in [\"+\", \"-\"]:\n charge = (int(oxi_state[:-1] or 1)) * (-1 if \"-\" in oxi_state else 1)\n return ele, charge\n else:\n return ele, 0\n except AttributeError:\n return _parse_species_old(species)\n
"},{"location":"tutorial/composition/","title":"Using the composition module","text":"In\u00a0[1]: Copied! import pandas as pd\nfrom elementembeddings.composition import composition_featuriser\nfrom elementembeddings.composition import CompositionalEmbedding\nimport numpy as np\n\nnp.set_printoptions(suppress=True)\nimport pandas as pd from elementembeddings.composition import composition_featuriser from elementembeddings.composition import CompositionalEmbedding import numpy as np np.set_printoptions(suppress=True)
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n from .autonotebook import tqdm as notebook_tqdm\n
The core class of the elementembeddings.composition
module is the CompositionalEmbedding
class. We can use this class the create objects which represent a composition and an elemental representation. We can create an instance of this class as follows:
CsPbI3_magpie = CompositionalEmbedding(formula='CsPbI3', embedding='magpie')\nIn\u00a0[2]: Copied!
CsPbI3_magpie = CompositionalEmbedding(formula=\"CsPbI3\", embedding=\"magpie\")\nCsPbI3_magpie = CompositionalEmbedding(formula=\"CsPbI3\", embedding=\"magpie\")
We can access the elemental embeddings of the individual elements in the composition from the el_matrix
attribute.
>>> CsPbI3_magpie.el_matrix\nIn\u00a0[3]: Copied!
# Print the individual element feature vectors\nprint(CsPbI3_magpie.el_matrix)\n# Print the individual element feature vectors print(CsPbI3_magpie.el_matrix)
[[ 55. 5. 132.9054519 301.59 1. 6.\n 244. 0.79 1. 0. 0. 0.\n 1. 1. 0. 0. 0. 1.\n 115.765 0. 0. 229. ]\n [ 82. 81. 207.2 600.61 14. 6.\n 146. 2.33 2. 2. 10. 14.\n 28. 0. 4. 0. 0. 4.\n 28.11 0. 0. 225. ]\n [ 53. 96. 126.90447 386.85 17. 5.\n 139. 2.66 2. 5. 10. 0.\n 17. 0. 1. 0. 0. 1.\n 43.015 1.062 0. 64. ]]\n
Some properties which are accessible are the composition
and fractional composition
which are dictionaries of element:amount key:value pairs.
# Print the composition and the fractional composition\nprint(CsPbI3_magpie.composition)\nprint(CsPbI3_magpie.fractional_composition)\n# Print the composition and the fractional composition print(CsPbI3_magpie.composition) print(CsPbI3_magpie.fractional_composition)
defaultdict(<class 'float'>, {'Cs': 1.0, 'Pb': 1.0, 'I': 3.0})\n{'Cs': 0.2, 'Pb': 0.2, 'I': 0.6}\n
Other properties and attributes that can be accessed are the (normalised) stoichiometry represented as a vector.
In\u00a0[5]: Copied!# Print the list of elements\nprint(CsPbI3_magpie.element_list)\n# Print the stoichiometric vector\nprint(CsPbI3_magpie.stoich_vector)\n\n# Print the normalized stoichiometric vector\nprint(CsPbI3_magpie.norm_stoich_vector)\n\n# Print the number of atoms\nprint(CsPbI3_magpie.num_atoms)\n# Print the list of elements print(CsPbI3_magpie.element_list) # Print the stoichiometric vector print(CsPbI3_magpie.stoich_vector) # Print the normalized stoichiometric vector print(CsPbI3_magpie.norm_stoich_vector) # Print the number of atoms print(CsPbI3_magpie.num_atoms)
['Cs', 'Pb', 'I']\n[1. 1. 3.]\n[0.2 0.2 0.6]\n5.0\n
We can create create compositional-based feature vectors using the feature_vector
method.
>>> CsPbI3_magpie.feature_vector()\n
By default, this will return the weighted average of the elemental embeddings of the composition. This would have the same dimension as the individual elemental embeddings. We can also specify the type of feature vector we want to create by passing the stats
argument.
>>> CsPbI3_magpie.feature_vector(stats=['mean', 'variance'])\n
This would return a feature vector which is the concatenation of the mean and variance of the elemental embeddings of the composition. This would have twice the dimension of the individual elemental embeddings. In general, the dimension of the feature vector is the product of the dimension of the elemental embeddings and the number of statistics requested.
The available statistics are:
mean
variance
minpool
maxpool
sum
range
harmonic_mean
geometric_mean
# Print the mean feature vector\nprint(CsPbI3_magpie.feature_vector(stats=\"mean\"))\n# Print the mean feature vector print(CsPbI3_magpie.feature_vector(stats=\"mean\"))
[ 59.2 74.8 144.16377238 412.55 13.2\n 5.4 161.4 2.22 1.8 3.4\n 8. 2.8 16. 0.2 1.4\n 0. 0. 1.6 54.584 0.6372\n 0. 129.2 ]\nIn\u00a0[7]: Copied!
print(CompositionalEmbedding(formula=\"NaCl\", embedding=\"magpie\").feature_vector())\nprint(CompositionalEmbedding(formula=\"NaCl\", embedding=\"magpie\").feature_vector())
[ 14. 48. 29.22138464 271.235 9.\n 3. 134. 2.045 1.5 2.5\n 0. 0. 4. 0.5 0.5\n 0. 0. 1. 26.87041667 1.2465\n 0. 146.5 ]\nIn\u00a0[8]: Copied!
# Print the feature vector for the mean, variance, minpool, maxpool, and sum\nCsPbI3_magpie_cbfv = CsPbI3_magpie.feature_vector(\n stats=[\"mean\", \"variance\", \"minpool\", \"maxpool\", \"sum\"]\n)\nprint(f\"The dimension of the feature vector is {CsPbI3_magpie_cbfv.shape[0]}\")\n\nprint(CsPbI3_magpie_cbfv)\n# Print the feature vector for the mean, variance, minpool, maxpool, and sum CsPbI3_magpie_cbfv = CsPbI3_magpie.feature_vector( stats=[\"mean\", \"variance\", \"minpool\", \"maxpool\", \"sum\"] ) print(f\"The dimension of the feature vector is {CsPbI3_magpie_cbfv.shape[0]}\") print(CsPbI3_magpie_cbfv)
The dimension of the feature vector is 110\n[ 59.2 74.8 144.16377238 412.55 13.2\n 5.4 161.4 2.22 1.8 3.4\n 8. 2.8 16. 0.2 1.4\n 0. 0. 1.6 54.584 0.6372\n 0. 129.2 130.56 1251.76 998.7932657\n 9932.03104 38.56 0.24 1713.04 0.52756\n 0.16 4.24 16. 31.36 74.4\n 0.16 1.84 0. 0. 1.44\n 969.102544 0.27068256 0. 6378.16 53.\n 5. 126.90447 301.59 1. 5.\n 139. 0.79 1. 0. 0.\n 0. 1. 0. 0. 0.\n 0. 1. 28.11 0. 0.\n 64. 82. 96. 207.2 600.61\n 17. 6. 244. 2.66 2.\n 5. 10. 14. 28. 1.\n 4. 0. 0. 4. 115.765\n 1.062 0. 229. 296. 374.\n 720.8188619 2062.75 66. 27. 807.\n 11.1 9. 17. 40. 14.\n 80. 1. 7. 0. 0.\n 8. 272.92 3.186 0. 646. ]\n
We can also featurise multiple formulas at once using the composition_featuriser
function.
>>> composition_featuriser([\"CsPbI3\", \"Fe2O3\", \"NaCl\"], embedding='magpie')\n
This will return a numpy
array of the feature vectors of the compositions. The order of the feature vectors will be the same as the order of the formulas in the input list.
formulas = [\"CsPbI3\", \"Fe2O3\", \"NaCl\"]\n\ncomposition_featuriser(formulas, embedding=\"magpie\", stats=\"mean\")\nformulas = [\"CsPbI3\", \"Fe2O3\", \"NaCl\"] composition_featuriser(formulas, embedding=\"magpie\", stats=\"mean\")
\r 0%| | 0/3 [00:00<?, ?it/s]
\r100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 23921.89it/s]
\nOut[9]:
[array([ 59.2 , 74.8 , 144.16377238, 412.55 ,\n 13.2 , 5.4 , 161.4 , 2.22 ,\n 1.8 , 3.4 , 8. , 2.8 ,\n 16. , 0.2 , 1.4 , 0. ,\n 0. , 1.6 , 54.584 , 0.6372 ,\n 0. , 129.2 ]),\n array([ 15.2 , 74.2 , 31.93764 , 757.28 ,\n 12.8 , 2.8 , 92.4 , 2.796 ,\n 2. , 2.4 , 2.4 , 0. ,\n 6.8 , 0. , 1.2 , 1.6 ,\n 0. , 2.8 , 9.755 , 0. ,\n 0.84426512, 98.8 ]),\n array([ 14. , 48. , 29.22138464, 271.235 ,\n 9. , 3. , 134. , 2.045 ,\n 1.5 , 2.5 , 0. , 0. ,\n 4. , 0.5 , 0.5 , 0. ,\n 0. , 1. , 26.87041667, 1.2465 ,\n 0. , 146.5 ])]In\u00a0[10]: Copied!
df = pd.DataFrame({\"formula\": formulas})\ncomposition_featuriser(df, embedding=\"magpie\", stats=[\"mean\", \"sum\"])\ndf = pd.DataFrame({\"formula\": formulas}) composition_featuriser(df, embedding=\"magpie\", stats=[\"mean\", \"sum\"])
Featurising compositions...\n
\r 0%| | 0/3 [00:00<?, ?it/s]
\r100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 522.03it/s]
\n
Computing feature vectors...\n
\r 0%| | 0/3 [00:00<?, ?it/s]
\r100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 4850.78it/s]
\nOut[10]: formula mean_Number mean_MendeleevNumber mean_AtomicWeight mean_MeltingT mean_Column mean_Row mean_CovalentRadius mean_Electronegativity mean_NsValence ... sum_NValence sum_NsUnfilled sum_NpUnfilled sum_NdUnfilled sum_NfUnfilled sum_NUnfilled sum_GSvolume_pa sum_GSbandgap sum_GSmagmom sum_SpaceGroupNumber 0 CsPbI3 59.2 74.8 144.163772 412.550 13.2 5.4 161.4 2.220 1.8 ... 80.0 1.0 7.0 0.0 0.0 8.0 272.920000 3.186 0.000000 646.0 1 Fe2O3 15.2 74.2 31.937640 757.280 12.8 2.8 92.4 2.796 2.0 ... 34.0 0.0 6.0 8.0 0.0 14.0 48.775000 0.000 4.221326 494.0 2 NaCl 14.0 48.0 29.221385 271.235 9.0 3.0 134.0 2.045 1.5 ... 8.0 1.0 1.0 0.0 0.0 2.0 53.740833 2.493 0.000000 293.0
3 rows \u00d7 45 columns
We can also calculate the \"distance\" between two compositions using their feature vectors. This can be used to determine which compositions are more similar to each other.
In\u00a0[11]: Copied!print(\n f\"The euclidean distance between CsPbI3 and Fe2O3 is {CsPbI3_magpie.distance('Fe2O3', distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint(\n f\"The euclidean distance between CsPbI3 and NaCl is {CsPbI3_magpie.distance('NaCl',distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint(\n f\"The euclidean distance between CsPbI3 and CsPbCl3 is {CsPbI3_magpie.distance('CsPbCl3',distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint( f\"The euclidean distance between CsPbI3 and Fe2O3 is {CsPbI3_magpie.distance('Fe2O3', distance_metric='euclidean', stats='mean'):.2f}\" ) print( f\"The euclidean distance between CsPbI3 and NaCl is {CsPbI3_magpie.distance('NaCl',distance_metric='euclidean', stats='mean'):.2f}\" ) print( f\"The euclidean distance between CsPbI3 and CsPbCl3 is {CsPbI3_magpie.distance('CsPbCl3',distance_metric='euclidean', stats='mean'):.2f}\" )
The euclidean distance between CsPbI3 and Fe2O3 is 375.77\nThe euclidean distance between CsPbI3 and NaCl is 194.94\nThe euclidean distance between CsPbI3 and CsPbCl3 is 144.39\n
Based on the mean-pooled feature vectors, we can see that CsPbI3 and CsPbBr3 are more similar to each other than CsPbI3 and Fe2O3.
"},{"location":"tutorial/composition/#using-the-composition-module","title":"Using the composition module\u00b6","text":""},{"location":"tutorial/species/","title":"Interacting with ionic species representations using ElementEmbeddings","text":"In\u00a0[1]: Copied!from elementembeddings.core import SpeciesEmbedding\nfrom elementembeddings.composition import (\n SpeciesCompositionalEmbedding,\n species_composition_featuriser,\n)\nfrom elementembeddings.core import SpeciesEmbedding from elementembeddings.composition import ( SpeciesCompositionalEmbedding, species_composition_featuriser, )
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n from .autonotebook import tqdm as notebook_tqdm\n
Elements are the building blocks of chemistry, but species (elements in a given charge state) dictate the structure and properties of inorganic compounds.
For example, the local spin and atomic environment in Fe(s), FeO, Fe2O3, and Fe3O4 solids are different due to variations in the charge state and coordination of iron.
For composition only machine learning, there many representation schemes that enable us to represent compounds as vectors, built on embeddings of elements. However, this may present a limitation when we want to represent ionic species, as the charge state of the element is not taken into account. As such, we need to represent ionic species as vectors.
The ElementEmbeddings package contains a set of pre-trained embeddings for elements and ionic species, which can be used to represent ionic species in a vector space.
At the time of writing, the 200-dimension SkipSpecies vector embeddings are available for ionic species representations. These embeddings are trained using the Skip-gram model on a large dataset of inorganic compounds.
In\u00a0[2]: Copied!# Load the SkipSpecies vectors as a SpeciesEmbedding object\n\nskipspecies = SpeciesEmbedding.load_data(embedding_name=\"skipspecies\")\n\n\nprint(\"Below is the representation of Fe3+ using the SkipSpecies vectors.\")\n\nprint(skipspecies.embeddings[\"Fe3+\"])\n# Load the SkipSpecies vectors as a SpeciesEmbedding object skipspecies = SpeciesEmbedding.load_data(embedding_name=\"skipspecies\") print(\"Below is the representation of Fe3+ using the SkipSpecies vectors.\") print(skipspecies.embeddings[\"Fe3+\"])
Below is the representation of Fe3+ using the SkipSpecies vectors.\n[-3.46536078e-02 -3.23320180e-02 -6.41056001e-02 -6.64595328e-03\n -3.81412022e-02 -9.60185826e-02 -1.92383174e-02 -2.02107765e-02\n 8.79131556e-02 9.14798677e-02 -3.54749635e-02 -1.33267939e-01\n -1.77447721e-01 -9.33702961e-02 -7.14094117e-02 -6.68478478e-03\n -1.49846703e-01 3.65290008e-02 -1.11083306e-01 2.04584867e-01\n -7.30767250e-02 7.07381591e-02 1.29051596e-01 8.26864019e-02\n -3.41298096e-02 1.55206323e-01 5.24081439e-02 7.91398287e-02\n 1.86461732e-02 1.88235074e-01 1.51956931e-01 1.14296928e-01\n -1.12691864e-01 6.95107281e-02 -1.16133653e-01 -1.42861262e-01\n -3.24610062e-02 -6.37443736e-02 9.47019458e-02 -7.04379454e-02\n 1.51012568e-02 -6.04141466e-02 -7.57871270e-02 6.90726042e-02\n -3.73109318e-02 -1.04284994e-01 -7.36037940e-02 -3.05999294e-02\n -4.32690326e-03 -6.09171018e-02 1.28173083e-02 4.53064829e-01\n 4.73245084e-02 -1.39801240e+00 -1.01322591e-01 -1.62838653e-01\n -4.33158763e-02 -1.32046595e-01 1.88525077e-02 -9.60192643e-03\n -5.94866455e-01 1.12727061e-01 1.86967605e-03 8.49850774e-02\n 1.26277655e-01 -5.00426851e-02 -4.56427746e-02 -3.25046569e-01\n 1.37247995e-01 -9.46224555e-02 7.27631105e-03 -5.33877499e-02\n -3.18312906e-02 -8.66127461e-02 -1.40548006e-01 6.63848501e-03\n 6.23855107e-02 1.06035680e-01 -1.68600217e-01 -1.79605886e-01\n -9.72149730e-01 1.33717686e-01 -5.84784038e-02 -1.49619198e+00\n 1.86823923e-02 7.76157603e-02 -5.89469783e-02 -9.49078351e-02\n -1.11909047e-01 3.17605101e-02 5.79413511e-02 1.40282623e-02\n 7.69326091e-02 -1.12443836e-02 -8.67934301e-02 -6.59158587e-01\n 9.15968940e-02 -3.47942114e-01 -9.98707302e-03 -4.93343398e-02\n 7.81614780e-02 1.12851635e-01 2.69402359e-02 1.41710088e-01\n 5.72816245e-02 1.60002038e-01 -2.57115781e-01 -1.09435096e-01\n -4.88008857e-02 5.72116769e-05 -1.07527770e-01 5.56552038e-02\n 7.56548047e-02 8.72470587e-02 -1.57128468e-01 -1.33189365e-01\n -1.06330979e+00 -5.80653787e-01 -7.17684031e-02 -3.73947710e-01\n 1.13771893e-02 -1.42221987e-01 -1.48932025e-01 -2.07824185e-02\n 3.69309634e-02 1.27229178e-02 4.40038621e-01 -1.32923722e-01\n -1.88622907e-01 2.58340001e-01 2.99438331e-02 1.02058776e-01\n 1.04237549e-01 -9.04425755e-02 2.39991665e-01 8.11270997e-02\n -2.99125281e-03 2.83314623e-02 -2.62917858e-02 7.42266746e-03\n -5.04185539e-03 -4.37292382e-02 1.17831230e-01 -4.98771993e-03\n 1.18534625e-01 1.53611377e-01 5.65077439e-02 -1.91291913e-01\n -9.52507034e-02 -8.89603943e-02 2.01912194e-01 1.17760837e-01\n -2.85485648e-02 -9.52739790e-02 1.49672581e-02 -7.14538768e-02\n 4.95206676e-02 3.00312508e-02 8.33884105e-02 9.99914482e-02\n -9.40189809e-02 -4.94113080e-02 5.30362427e-02 -3.15267175e-01\n -3.44095714e-02 1.56485736e-02 2.91987918e-02 -7.36336783e-02\n -1.27800524e-01 5.92167228e-02 1.07430264e-01 5.31437919e-02\n -1.76421866e-01 2.23079890e-01 7.48595372e-02 -5.39487004e-01\n 5.16922653e-01 1.29015148e-01 4.36748080e-02 -5.45317074e-03\n 1.46122992e-01 -7.71054178e-02 3.18054631e-02 -4.02254723e-02\n -7.62721375e-02 5.14244894e-03 -6.23153821e-02 -6.00104272e-01\n 6.64846972e-02 6.28835186e-02 -1.06045604e-01 -1.76288888e-01\n -4.96284366e-02 -7.97898546e-02 7.50872344e-02 -5.45614585e-03\n -6.50706142e-02 -2.17388973e-01 -3.25618118e-01 4.77024205e-02]\n
We can check the ionic species which have a feature vector for a particular embedding
In\u00a0[3]: Copied!print(\"SkipSpecies has feature vectors for the following ionic species:\\n\")\nprint(skipspecies.species_list)\nprint(\"SkipSpecies has feature vectors for the following ionic species:\\n\") print(skipspecies.species_list)
SkipSpecies has feature vectors for the following ionic species:\n\n['H+', 'H-', 'Li+', 'Be2+', 'B+', 'B2+', 'B2-', 'B3-', 'B3+', 'B-', 'C4-', 'C-', 'C4+', 'C+', 'C2+', 'C3+', 'C2-', 'C3-', 'N3-', 'N2+', 'N3+', 'N-', 'N+', 'N2-', 'N5+', 'N4+', 'O2-', 'O-', 'F-', 'Na+', 'Mg2+', 'Al3+', 'Al2+', 'Si2+', 'Si4+', 'Si-', 'Si2-', 'Si4-', 'Si3+', 'Si3-', 'P5+', 'P2-', 'P3-', 'P4+', 'P+', 'P-', 'P3+', 'P2+', 'S2-', 'S6+', 'S-', 'S2+', 'S3+', 'S+', 'S4+', 'S5+', 'Cl-', 'Cl7+', 'Cl5+', 'Cl3+', 'K+', 'Ca2+', 'Sc3+', 'Sc+', 'Sc2+', 'Ti3+', 'Ti4+', 'Ti2+', 'V4+', 'V3+', 'V2+', 'V5+', 'Cr3+', 'Cr2+', 'Cr6+', 'Cr4+', 'Cr5+', 'Mn2+', 'Mn3+', 'Mn4+', 'Mn+', 'Mn7+', 'Mn6+', 'Mn5+', 'Fe2+', 'Fe3+', 'Fe+', 'Fe4+', 'Fe6+', 'Fe5+', 'Co2+', 'Co4+', 'Co3+', 'Co+', 'Ni2+', 'Ni4+', 'Ni3+', 'Ni+', 'Cu2+', 'Cu3+', 'Cu+', 'Zn2+', 'Ga+', 'Ga3+', 'Ga4+', 'Ga2+', 'Ge4-', 'Ge4+', 'Ge2-', 'Ge2+', 'Ge3+', 'As-', 'As2-', 'As3+', 'As5+', 'As3-', 'As+', 'As2+', 'As4+', 'Se2-', 'Se-', 'Se4+', 'Se6+', 'Se5+', 'Se2+', 'Se+', 'Se3+', 'Br-', 'Br+', 'Br2+', 'Br5+', 'Br3+', 'Rb+', 'Sr2+', 'Y3+', 'Y2+', 'Y+', 'Zr2+', 'Zr4+', 'Zr3+', 'Zr+', 'Nb5+', 'Nb3+', 'Nb4+', 'Nb2+', 'Nb+', 'Nb7+', 'Mo3+', 'Mo4+', 'Mo6+', 'Mo5+', 'Mo2+', 'Tc-', 'Tc4+', 'Tc3-', 'Tc3+', 'Tc+', 'Tc7+', 'Tc5+', 'Tc6+', 'Tc2-', 'Tc2+', 'Ru2+', 'Ru6+', 'Ru4+', 'Ru5+', 'Ru3+', 'Rh+', 'Rh4+', 'Rh3+', 'Pd2+', 'Pd4+', 'Pd3+', 'Ag3+', 'Ag+', 'Ag2+', 'Cd2+', 'In3+', 'In+', 'In2+', 'Sn4+', 'Sn3+', 'Sn2+', 'Sb5+', 'Sb2-', 'Sb3-', 'Sb3+', 'Sb4+', 'Sb-', 'Sb+', 'Te-', 'Te2-', 'Te4+', 'Te6+', 'Te2+', 'Te5+', 'Te+', 'I-', 'I3+', 'I7+', 'I5+', 'I+', 'I2+', 'Cs+', 'Ba2+', 'La3+', 'La2+', 'La+', 'Ce3+', 'Ce2+', 'Ce4+', 'Pr3+', 'Pr4+', 'Pr2+', 'Nd3+', 'Nd2+', 'Pm3+', 'Sm3+', 'Sm2+', 'Eu2+', 'Eu3+', 'Gd2+', 'Gd3+', 'Tb3+', 'Tb+', 'Tb2+', 'Tb4+', 'Dy3+', 'Dy2+', 'Ho3+', 'Ho2+', 'Er3+', 'Tm3+', 'Tm2+', 'Yb3+', 'Yb2+', 'Lu3+', 'Hf3+', 'Hf2+', 'Hf4+', 'Ta5+', 'Ta3+', 'Ta4+', 'Ta+', 'Ta2+', 'W6+', 'W4+', 'W2+', 'W3+', 'W5+', 'Re5+', 'Re3+', 'Re6+', 'Re2+', 'Re4+', 'Re7+', 'Os7+', 'Os6+', 'Os5+', 'Os2-', 'Os3+', 'Os-', 'Os4+', 'Os8+', 'Os2+', 'Os+', 'Ir3+', 'Ir4+', 'Ir5+', 'Ir6+', 'Pt2+', 'Pt2-', 'Pt4+', 'Pt3+', 'Pt5+', 'Pt-', 'Pt6+', 'Pt+', 'Au-', 'Au2+', 'Au+', 'Au3+', 'Au5+', 'Au4+', 'Hg2+', 'Hg+', 'Tl+', 'Tl3+', 'Tl2+', 'Pb2+', 'Pb3+', 'Pb4+', 'Bi3+', 'Bi5+', 'Bi2+', 'Bi3-', 'Bi4+', 'Bi+', 'Ac3+', 'Th4+', 'Th3+', 'Pa4+', 'Pa5+', 'Pa3+', 'U6+', 'U4+', 'U3+', 'U2+', 'U5+', 'Np6+', 'Np4+', 'Np3+', 'Np7+', 'Np5+', 'Pu7+', 'Pu6+', 'Pu3+', 'Pu4+', 'Pu5+']\n
We can also check which elements have an ionic species representation in the embedding
In\u00a0[4]: Copied!print(\"The folliowing elements have SkipSpecies ionic species representations:\\n\")\nprint(skipspecies.element_list)\nprint(\"The folliowing elements have SkipSpecies ionic species representations:\\n\") print(skipspecies.element_list)
The folliowing elements have SkipSpecies ionic species representations:\n\n['Ge', 'W', 'Os', 'Np', 'La', 'F', 'Ho', 'Mo', 'Ag', 'Ca', 'Li', 'Ba', 'Ru', 'Pm', 'Ce', 'Zn', 'Na', 'Pd', 'Nd', 'I', 'Cl', 'Tc', 'Ac', 'Y', 'Rb', 'Tb', 'Mn', 'V', 'Re', 'Ti', 'Nb', 'Dy', 'Au', 'Cr', 'C', 'Ta', 'S', 'B', 'Sb', 'Sn', 'Br', 'Ni', 'H', 'Pa', 'Mg', 'Cu', 'Ga', 'Sr', 'Yb', 'Pu', 'O', 'Ir', 'Se', 'Tm', 'Eu', 'Fe', 'N', 'Co', 'Te', 'Cs', 'Si', 'Gd', 'Th', 'Rh', 'Be', 'Pr', 'Sm', 'Pt', 'K', 'In', 'Tl', 'U', 'P', 'Zr', 'Pb', 'Er', 'As', 'Al', 'Sc', 'Cd', 'Hf', 'Hg', 'Bi', 'Lu']\n
Like the element representations, BibTex citation information is available for the ionic species embeddings.
In\u00a0[5]: Copied!print(skipspecies.citation())\nprint(skipspecies.citation())
['@article{Onwuli_Butler_Walsh_2024, title={Ionic species representations for materials informatics}, DOI={10.26434/chemrxiv-2024-8621l}, journal={ChemRxiv}, author={Onwuli, Anthony and Butler, Keith T. and Walsh, Aron}, year={2024}} This content is a preprint and has not been peer-reviewed.', '@article{antunes2022distributed,title={Distributed representations of atoms and materials for machine learning},author={Antunes, Luis M and Grau-Crespo, Ricardo and Butler, Keith T},journal={npj Computational Materials},volume={8},number={1},pages={1--9},year={2022},publisher={Nature Publishing Group} }']\nIn\u00a0[6]: Copied!
composition = {\"Fe2+\": 1, \"Fe3+\": 2, \"O2-\": 4}\n\nFe3O4_skipspecies = SpeciesCompositionalEmbedding(\n formula_dict=composition, embedding=skipspecies\n)\ncomposition = {\"Fe2+\": 1, \"Fe3+\": 2, \"O2-\": 4} Fe3O4_skipspecies = SpeciesCompositionalEmbedding( formula_dict=composition, embedding=skipspecies )
A few properties are accessible from the SpeciesCompositionalEmbedding
class
# Print the pretty formula\n\nprint(Fe3O4_skipspecies.formula_pretty)\n\n# Print the list of elements in the composition\nprint(Fe3O4_skipspecies.element_list)\n# Print the list of ionic species in the composition\nprint(Fe3O4_skipspecies.species_list)\n\n\n# Print the stoichiometric vector of the composition\nprint(Fe3O4_skipspecies.stoich_vector)\n\n# Print the normalised stoichiometric vector of the composition\nprint(Fe3O4_skipspecies.norm_stoich_vector)\n\n# Print the number of atoms\nprint(Fe3O4_skipspecies.num_atoms)\n# Print the pretty formula print(Fe3O4_skipspecies.formula_pretty) # Print the list of elements in the composition print(Fe3O4_skipspecies.element_list) # Print the list of ionic species in the composition print(Fe3O4_skipspecies.species_list) # Print the stoichiometric vector of the composition print(Fe3O4_skipspecies.stoich_vector) # Print the normalised stoichiometric vector of the composition print(Fe3O4_skipspecies.norm_stoich_vector) # Print the number of atoms print(Fe3O4_skipspecies.num_atoms)
Fe3O4\n['O', 'Fe']\n['Fe2+', 'Fe3+', 'O2-']\n[1 2 4]\n[0.14285714 0.28571429 0.57142857]\n7\nIn\u00a0[8]: Copied!
compositions = [\n {\"Fe2+\": 1, \"Fe3+\": 2, \"O2-\": 4},\n {\"Fe3+\": 2, \"O2-\": 3},\n {\"Li+\": 7, \"La3+\": 3, \"Zr4+\": 1, \"O2-\": 12},\n {\"Cs+\": 1, \"Pb2+\": 1, \"I-\": 3},\n {\"Pb2+\": 1, \"Pb4+\": 1, \"O2-\": 3},\n]\n\nfeaturised_comps_df = species_composition_featuriser(\n data=compositions, embedding=\"skipspecies\", stats=\"mean\", to_dataframe=True\n)\n\nfeaturised_comps_df\ncompositions = [ {\"Fe2+\": 1, \"Fe3+\": 2, \"O2-\": 4}, {\"Fe3+\": 2, \"O2-\": 3}, {\"Li+\": 7, \"La3+\": 3, \"Zr4+\": 1, \"O2-\": 12}, {\"Cs+\": 1, \"Pb2+\": 1, \"I-\": 3}, {\"Pb2+\": 1, \"Pb4+\": 1, \"O2-\": 3}, ] featurised_comps_df = species_composition_featuriser( data=compositions, embedding=\"skipspecies\", stats=\"mean\", to_dataframe=True ) featurised_comps_df
\rComputing feature vectors: 0%| | 0/5 [00:00<?, ?it/s]
\rComputing feature vectors: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5 [00:00<00:00, 10727.12it/s]
\nOut[8]: formula composition mean_0 mean_1 mean_2 mean_3 mean_4 mean_5 mean_6 mean_7 ... mean_190 mean_191 mean_192 mean_193 mean_194 mean_195 mean_196 mean_197 mean_198 mean_199 0 Fe3O4 {'Fe2+': 1, 'Fe3+': 2, 'O2-': 4} -0.018255 0.001659 -0.009839 0.005230 -0.010928 -0.057023 -0.002567 -0.005813 ... -0.037202 -0.008057 -0.027421 -0.008534 -0.009001 0.002369 0.017834 -0.055822 -0.219390 0.020507 1 Fe2O3 {'Fe3+': 2, 'O2-': 3} -0.036597 -0.009373 -0.013700 -0.015516 -0.020896 -0.071463 0.002221 -0.014784 ... -0.045530 -0.024589 -0.037825 -0.025545 0.010654 -0.002034 -0.001094 -0.096479 -0.211483 0.035755 2 Li7La3ZrO12 {'Li+': 7, 'La3+': 3, 'Zr4+': 1, 'O2-': 12} -0.031236 -0.015952 -0.018968 -0.029273 -0.005297 -0.035049 0.045972 -0.032007 ... -0.042820 0.045177 -0.056733 0.006726 0.017449 -0.023732 0.021772 -0.034134 -0.102773 0.061038 3 CsPbI3 {'Cs+': 1, 'Pb2+': 1, 'I-': 3} -0.002381 0.023988 -0.026468 -0.020235 -0.002876 -0.033317 0.076300 -0.069057 ... 0.055368 0.058231 -0.079549 -0.032172 -0.076099 -0.024554 0.108428 -0.058528 -0.055804 -0.031679 4 Pb2O3 {'Pb2+': 1, 'Pb4+': 1, 'O2-': 3} -0.077403 -0.015334 0.023065 -0.060073 -0.043160 -0.140865 0.067917 -0.044093 ... 0.038975 0.102474 -0.051598 0.001011 -0.131225 -0.026707 0.145250 -0.057493 -0.188810 0.055239
5 rows \u00d7 202 columns
In\u00a0[9]: Copied!print(\n f\"The euclidean distance between Fe3O4 and Fe2O3 is {Fe3O4_skipspecies.distance({'Fe3+': 2, 'O2-': 3}, distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint(\n f\"The euclidean distance between Fe3O4 and Pb2O3 is {Fe3O4_skipspecies.distance({'Pb2+': 1, 'Pb4+': 1, 'O2-': 3}, distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint(\n f\"The euclidean distance between Fe3O4 and CsPbI3 is {Fe3O4_skipspecies.distance({'Cs+': 1, 'Pb2+': 1, 'I-': 3},distance_metric='euclidean', stats='mean'):.2f}\"\n)\nprint( f\"The euclidean distance between Fe3O4 and Fe2O3 is {Fe3O4_skipspecies.distance({'Fe3+': 2, 'O2-': 3}, distance_metric='euclidean', stats='mean'):.2f}\" ) print( f\"The euclidean distance between Fe3O4 and Pb2O3 is {Fe3O4_skipspecies.distance({'Pb2+': 1, 'Pb4+': 1, 'O2-': 3}, distance_metric='euclidean', stats='mean'):.2f}\" ) print( f\"The euclidean distance between Fe3O4 and CsPbI3 is {Fe3O4_skipspecies.distance({'Cs+': 1, 'Pb2+': 1, 'I-': 3},distance_metric='euclidean', stats='mean'):.2f}\" )
The euclidean distance between Fe3O4 and Fe2O3 is 0.38\nThe euclidean distance between Fe3O4 and Pb2O3 is 1.60\nThe euclidean distance between Fe3O4 and CsPbI3 is 2.11\n
Based on the mean-pooled feature vectors, we can see that Fe3O4 is closer to Fe2O3 than either Pb2O3 and CsPbI3.
"},{"location":"tutorial/species/#interacting-with-ionic-species-representations-using-elementembeddings","title":"Interacting with ionic species representations using ElementEmbeddings\u00b6","text":"This notebook will serve as a tutorial for using the ElementEmbeddings package to interact with ionic species representations.
"},{"location":"tutorial/species/#representing-ionic-compositions-using-elementembeddings","title":"Representing ionic compositions using ElementEmbeddings\u00b6","text":"In addition to representing individual ionic species, we can also represent ionic compositions using the ElementEmbeddings package. This is useful for representing inorganic compounds as vectors. Let's take the example of Fe3O4.
Fe3O4 is a mixed-valence iron oxide, with a formula unit of Fe3O4. We pass the composition as a dicitionary in the following format:
composition = {\n 'Fe2+': 1,\n 'Fe3+': 2,\n 'O2-': 4\n }\n"},{"location":"tutorial/species/#featurising-compositions","title":"Featurising compositions\u00b6","text":"
We can featurise the composition using the .feature_vector
method. This method returns the feature vector for the composition. This is identical in operation to the CompositionEmbedding
class for featurising compositions.
The species_composition_featuriser
can be used to featurise a list of compositions. This is useful for featurising a large number of compositions. It can also export the feature vectors to a pandas DataFrame by setting the to_dataframe
argument to True
.
We can also calculate the \"distance\" between two compositions using their feature vectors. This can be used to determine which compositions are more similar to each other.
"},{"location":"tutorial/usage/","title":"Using the ElementEmbeddings package","text":"In\u00a0[1]: Copied!# Imports\nimport numpy as np\nimport pandas as pd\nimport seaborn as sns\n\nfrom elementembeddings.core import Embedding\nfrom elementembeddings.plotter import heatmap_plotter, dimension_plotter\nimport matplotlib.pyplot as plt\n\nsns.set(font_scale=1.5)\n# Imports import numpy as np import pandas as pd import seaborn as sns from elementembeddings.core import Embedding from elementembeddings.plotter import heatmap_plotter, dimension_plotter import matplotlib.pyplot as plt sns.set(font_scale=1.5)
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n from .autonotebook import tqdm as notebook_tqdm\nIn\u00a0[2]: Copied!
# Create a list of the available CBFVs included in the package\n\ncbfvs = [\n \"magpie\",\n \"mat2vec\",\n \"matscholar\",\n \"megnet16\",\n \"oliynyk\",\n \"random_200\",\n \"skipatom\",\n \"mod_petti\",\n \"magpie_sc\",\n \"oliynyk_sc\",\n]\n\n# Create a dictionary of {cbfv name : Embedding objects} key, value pairs\nAtomEmbeds = {cbfv: Embedding.load_data(cbfv) for cbfv in cbfvs}\n# Create a list of the available CBFVs included in the package cbfvs = [ \"magpie\", \"mat2vec\", \"matscholar\", \"megnet16\", \"oliynyk\", \"random_200\", \"skipatom\", \"mod_petti\", \"magpie_sc\", \"oliynyk_sc\", ] # Create a dictionary of {cbfv name : Embedding objects} key, value pairs AtomEmbeds = {cbfv: Embedding.load_data(cbfv) for cbfv in cbfvs}
Taking the magpie representation as our example, we will demonstrate some features of the the Embedding
class.
# Let's use magpie as our example\n\n# Let's look at the CBFV of hydrogen for the magpie representation\nprint(\n \"Below is the CBFV/representation of the hydrogen atom from the magpie data we have \\n\"\n)\nprint(AtomEmbeds[\"magpie\"].embeddings[\"H\"])\n# Let's use magpie as our example # Let's look at the CBFV of hydrogen for the magpie representation print( \"Below is the CBFV/representation of the hydrogen atom from the magpie data we have \\n\" ) print(AtomEmbeds[\"magpie\"].embeddings[\"H\"])
Below is the CBFV/representation of the hydrogen atom from the magpie data we have \n\n[ 1. 92. 1.00794 14.01 1. 1. 31.\n 2.2 1. 0. 0. 0. 1. 1.\n 0. 0. 0. 1. 6.615 7.853 0.\n 194. ]\n
We can check the elements which have a feature vector for a particular embedding
In\u00a0[4]: Copied!# We can also check to see what elements have a CBFV for our chosen representation\nprint(\"Magpie has composition-based feature vectors for the following elements: \\n\")\nprint(AtomEmbeds[\"magpie\"].element_list)\n# We can also check to see what elements have a CBFV for our chosen representation print(\"Magpie has composition-based feature vectors for the following elements: \\n\") print(AtomEmbeds[\"magpie\"].element_list)
Magpie has composition-based feature vectors for the following elements: \n\n['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk']\n
For the elemental representations distributed with the package, we also included BibTex citations of the original papers were these representations are derived from. This is accessible through the .citation()
method.
# Print the bibtex citation for the magpie embedding\nprint(AtomEmbeds[\"magpie\"].citation())\n# Print the bibtex citation for the magpie embedding print(AtomEmbeds[\"magpie\"].citation())
['@article{ward2016general,title={A general-purpose machine learning framework for predicting properties of inorganic materials},author={Ward, Logan and Agrawal, Ankit and Choudhary, Alok and Wolverton, Christopher},journal={npj Computational Materials},volume={2},number={1},pages={1--7},year={2016},publisher={Nature Publishing Group}}']\n
We can also check the dimensionality of the elemental representation.
In\u00a0[6]: Copied!# We can quickly check the dimensionality of this CBFV\nmagpie_dim = AtomEmbeds[\"magpie\"].dim\nprint(f\"The magpie CBFV has a dimensionality of {magpie_dim}\")\n# We can quickly check the dimensionality of this CBFV magpie_dim = AtomEmbeds[\"magpie\"].dim print(f\"The magpie CBFV has a dimensionality of {magpie_dim}\")
The magpie CBFV has a dimensionality of 22\nIn\u00a0[7]: Copied!
# Let's find the dimensionality of all of the CBFVs that we have loaded\n\n\nAtomEmbeds_dim = {\n cbfv: {\"dim\": AtomEmbeds[cbfv].dim, \"type\": AtomEmbeds[cbfv].embedding_type}\n for cbfv in cbfvs\n}\n\ndim_df = pd.DataFrame.from_dict(AtomEmbeds_dim)\ndim_df.T\n# Let's find the dimensionality of all of the CBFVs that we have loaded AtomEmbeds_dim = { cbfv: {\"dim\": AtomEmbeds[cbfv].dim, \"type\": AtomEmbeds[cbfv].embedding_type} for cbfv in cbfvs } dim_df = pd.DataFrame.from_dict(AtomEmbeds_dim) dim_df.T Out[7]: dim type magpie 22 vector mat2vec 200 vector matscholar 200 vector megnet16 16 vector oliynyk 44 vector random_200 200 vector skipatom 200 vector mod_petti 103 one-hot magpie_sc 22 vector oliynyk_sc 44 vector
We can see a wide range of dimensions of the composition-based feature vectors.
Let's know explore more of the core features of the package. The numerical representation of the elements enables us to quantify the differences between atoms. With these embedding features, we can explore how similar to atoms are by using a 'distance' metric. Atoms with distances close to zero are 'similar', whereas elements which have a large distance between them should in theory be dissimilar.
Using the class method compute_distance_metric
, we can compute these distances.
# Let's continue using our magpie cbfv\n# The package contains some default distance metrics: euclidean, manhattan, chebyshev\n\nmetrics = [\"euclidean\", \"manhattan\", \"chebyshev\", \"wasserstein\", \"energy\"]\n\ndistances = [\n AtomEmbeds[\"magpie\"].compute_distance_metric(\"Li\", \"K\", metric=metric)\n for metric in metrics\n]\nprint(\"For the magpie representation:\")\nfor i, distance in enumerate(distances):\n print(\n f\"Using the metric {metrics[i]}, the distance between Li and K is {distance:.2f}\"\n )\n# Let's continue using our magpie cbfv # The package contains some default distance metrics: euclidean, manhattan, chebyshev metrics = [\"euclidean\", \"manhattan\", \"chebyshev\", \"wasserstein\", \"energy\"] distances = [ AtomEmbeds[\"magpie\"].compute_distance_metric(\"Li\", \"K\", metric=metric) for metric in metrics ] print(\"For the magpie representation:\") for i, distance in enumerate(distances): print( f\"Using the metric {metrics[i]}, the distance between Li and K is {distance:.2f}\" )
For the magpie representation:\nUsing the metric euclidean, the distance between Li and K is 154.41\nUsing the metric manhattan, the distance between Li and K is 300.99\nUsing the metric chebyshev, the distance between Li and K is 117.16\nUsing the metric wasserstein, the distance between Li and K is 13.68\nUsing the metric energy, the distance between Li and K is 1.25\nIn\u00a0[9]: Copied!
# Let's continue using our magpie cbfv\n# The package contains some default distance metrics: euclidean, manhattan, chebyshev\n\nmetrics = [\"euclidean\", \"manhattan\", \"chebyshev\", \"wasserstein\", \"energy\"]\n\ndistances = [\n AtomEmbeds[\"magpie_sc\"].compute_distance_metric(\"Li\", \"K\", metric=metric)\n for metric in metrics\n]\nprint(\"For the scaled magpie representation:\")\nfor i, distance in enumerate(distances):\n print(\n f\"Using the metric {metrics[i]}, the distance between Li and K is {distance:.2f}\"\n )\n# Let's continue using our magpie cbfv # The package contains some default distance metrics: euclidean, manhattan, chebyshev metrics = [\"euclidean\", \"manhattan\", \"chebyshev\", \"wasserstein\", \"energy\"] distances = [ AtomEmbeds[\"magpie_sc\"].compute_distance_metric(\"Li\", \"K\", metric=metric) for metric in metrics ] print(\"For the scaled magpie representation:\") for i, distance in enumerate(distances): print( f\"Using the metric {metrics[i]}, the distance between Li and K is {distance:.2f}\" )
For the scaled magpie representation:\nUsing the metric euclidean, the distance between Li and K is 4.09\nUsing the metric manhattan, the distance between Li and K is 7.87\nUsing the metric chebyshev, the distance between Li and K is 3.39\nUsing the metric wasserstein, the distance between Li and K is 0.32\nUsing the metric energy, the distance between Li and K is 0.23\nIn\u00a0[10]: Copied!
fig, ax = plt.subplots(figsize=(24, 24))\nheatmap_plotter(\n embedding=AtomEmbeds[\"magpie\"],\n metric=\"pearson\",\n sortaxisby=\"atomic_number\",\n # show_axislabels=False,\n ax=ax,\n)\n\nfig.show()\nfig, ax = plt.subplots(figsize=(24, 24)) heatmap_plotter( embedding=AtomEmbeds[\"magpie\"], metric=\"pearson\", sortaxisby=\"atomic_number\", # show_axislabels=False, ax=ax, ) fig.show() In\u00a0[11]: Copied!
fig, ax = plt.subplots(figsize=(24, 24))\nheatmap_plotter(\n embedding=AtomEmbeds[\"magpie_sc\"],\n metric=\"pearson\",\n sortaxisby=\"atomic_number\",\n # show_axislabels=False,\n ax=ax,\n)\n\nfig.show()\nfig, ax = plt.subplots(figsize=(24, 24)) heatmap_plotter( embedding=AtomEmbeds[\"magpie_sc\"], metric=\"pearson\", sortaxisby=\"atomic_number\", # show_axislabels=False, ax=ax, ) fig.show()
As we can see from the above pearson correlation heatmaps, the visualisation of the correlations across the atomic embeddings is sensitive to the components of the embedding vectors. The unscaled magpie representation produces a plot which makes qualitative assessment of chemical trends difficult, whereas with the scaled representation it is possible to perform some qualitative analysis on the (dis)similarity of elements based on their feature vector.
In\u00a0[12]: Copied!fig, ax = plt.subplots(figsize=(24, 24))\nheatmap_plotter(\n embedding=AtomEmbeds[\"megnet16\"],\n metric=\"pearson\",\n sortaxisby=\"atomic_number\",\n # show_axislabels=False,\n ax=ax,\n)\n\nfig.show()\nfig, ax = plt.subplots(figsize=(24, 24)) heatmap_plotter( embedding=AtomEmbeds[\"megnet16\"], metric=\"pearson\", sortaxisby=\"atomic_number\", # show_axislabels=False, ax=ax, ) fig.show() In\u00a0[13]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"magpie\"],\n reducer=\"pca\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"magpie\"], reducer=\"pca\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[14]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"magpie_sc\"],\n reducer=\"pca\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"magpie_sc\"], reducer=\"pca\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[15]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"megnet16\"],\n reducer=\"pca\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"megnet16\"], reducer=\"pca\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[16]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"magpie\"],\n reducer=\"tsne\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"magpie\"], reducer=\"tsne\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[17]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"magpie_sc\"],\n reducer=\"tsne\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"magpie_sc\"], reducer=\"tsne\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show() In\u00a0[18]: Copied!
fig, ax = plt.subplots(figsize=(16, 12))\n\ndimension_plotter(\n embedding=AtomEmbeds[\"megnet16\"],\n reducer=\"tsne\",\n n_components=2,\n ax=ax,\n adjusttext=True,\n)\n\nfig.tight_layout()\nfig.show()\nfig, ax = plt.subplots(figsize=(16, 12)) dimension_plotter( embedding=AtomEmbeds[\"megnet16\"], reducer=\"tsne\", n_components=2, ax=ax, adjusttext=True, ) fig.tight_layout() fig.show()"},{"location":"tutorial/usage/#using-the-elementembeddings-package","title":"Using the ElementEmbeddings package\u00b6","text":"
This notebook will serve as a tutorial for using the ElementEmbeddings package and going over the core features.
"},{"location":"tutorial/usage/#elemental-representations","title":"Elemental representations\u00b6","text":"A key problem in supervised machine learning problems is determining the featurisation/representation scheme for a material in order to pass it through a mathematical algorithm. For composition only machine learning, we want to be able create a numerical representation of a chemical formula AwBxCyDz. We can achieve this by creating a composition based feature vector derived from the elemental properties of the constituent atoms or a representation can be learned during the supervised training process.
A few of these CBFV have been included in the package and we can load them using the load_data
class method.
We can also explore the correlation between embedding vectors. In the example below, we will plot a heatmap of the pearson correlation of our magpie CBFV, a scaled magpie CBFV and the 16-dim megnet embeddings
"},{"location":"tutorial/usage/#pearson-correlation-plots","title":"Pearson Correlation plots\u00b6","text":""},{"location":"tutorial/usage/#unscaled-and-scaled-magpie","title":"Unscaled and scaled Magpie\u00b6","text":""},{"location":"tutorial/usage/#pca-plots","title":"PCA plots\u00b6","text":""},{"location":"tutorial/usage/#t-sne-plots","title":"t-SNE plots\u00b6","text":""}]} \ No newline at end of file diff --git a/main/sitemap.xml b/main/sitemap.xml index a010728..ac3eb34 100644 --- a/main/sitemap.xml +++ b/main/sitemap.xml @@ -2,74 +2,74 @@100%|██████████| 3/3 [00:00<00:00, 24966.10it/s]+
100%|██████████| 3/3 [00:00<00:00, 23921.89it/s]
100%|██████████| 3/3 [00:00<00:00, 561.94it/s]+
100%|██████████| 3/3 [00:00<00:00, 522.03it/s]
100%|██████████| 3/3 [00:00<00:00, 25523.15it/s]+
100%|██████████| 3/3 [00:00<00:00, 4850.78it/s]
The folliowing elements have SkipSpecies ionic species representations: -['Sr', 'I', 'Ni', 'Y', 'Na', 'Sb', 'Rb', 'Mn', 'C', 'Pd', 'Ir', 'K', 'Br', 'Pt', 'Ca', 'Li', 'Ru', 'Pr', 'Cl', 'U', 'Au', 'Er', 'Al', 'S', 'Te', 'Os', 'Hg', 'Ge', 'Nd', 'B', 'Co', 'Pb', 'Tm', 'Pu', 'Mg', 'Cd', 'Eu', 'Sn', 'Ac', 'La', 'Tb', 'W', 'F', 'In', 'Rh', 'Lu', 'Ce', 'Pa', 'Sc', 'Zr', 'Ag', 'Np', 'Re', 'Yb', 'Ta', 'Sm', 'Be', 'Cu', 'Gd', 'Si', 'O', 'As', 'Ti', 'Cr', 'Mo', 'P', 'Se', 'H', 'Ba', 'Pm', 'Hf', 'Bi', 'Fe', 'V', 'Zn', 'Ga', 'Tc', 'Dy', 'Ho', 'Tl', 'Nb', 'Th', 'N', 'Cs'] +['Ge', 'W', 'Os', 'Np', 'La', 'F', 'Ho', 'Mo', 'Ag', 'Ca', 'Li', 'Ba', 'Ru', 'Pm', 'Ce', 'Zn', 'Na', 'Pd', 'Nd', 'I', 'Cl', 'Tc', 'Ac', 'Y', 'Rb', 'Tb', 'Mn', 'V', 'Re', 'Ti', 'Nb', 'Dy', 'Au', 'Cr', 'C', 'Ta', 'S', 'B', 'Sb', 'Sn', 'Br', 'Ni', 'H', 'Pa', 'Mg', 'Cu', 'Ga', 'Sr', 'Yb', 'Pu', 'O', 'Ir', 'Se', 'Tm', 'Eu', 'Fe', 'N', 'Co', 'Te', 'Cs', 'Si', 'Gd', 'Th', 'Rh', 'Be', 'Pr', 'Sm', 'Pt', 'K', 'In', 'Tl', 'U', 'P', 'Zr', 'Pb', 'Er', 'As', 'Al', 'Sc', 'Cd', 'Hf', 'Hg', 'Bi', 'Lu']
Computing feature vectors: 100%|██████████| 5/5 [00:00<00:00, 23250.02it/s]+
Computing feature vectors: 100%|██████████| 5/5 [00:00<00:00, 10727.12it/s]