diff --git a/notebooks/camera_ready/virtual_eve/01 - Finetune Virtual EVE.ipynb b/notebooks/camera_ready/virtual_eve/01 - Finetune Virtual EVE.ipynb index 0a1d329..71ca681 100644 --- a/notebooks/camera_ready/virtual_eve/01 - Finetune Virtual EVE.ipynb +++ b/notebooks/camera_ready/virtual_eve/01 - Finetune Virtual EVE.ipynb @@ -14,21 +14,22 @@ "### Background\n", "This notebook provides two examples of finetuning SDOFM for scientific use cases. First we finetune a virtual eve virtual instrument starting from a SDOFM pretrained foundation model, accomplishing a production ready model much faster and resource efficient than training from scratch. As a second example, we finetune train a missing data data generator\n", "\n", - "There are two further examples shared in this directory:\n", + "There are three further examples shared in this directory:\n", " - `02 - Finetune Missing Data`\n", " - `03 - Finetune Instrument Degradation`\n", + " - `04 - Embeddings to F10.7`\n", " \n", - "However, this notebook has been annotated as it contains multiple aspects and examples of our foundation model. \n", + "This notebook also contains ample annotation and multiple aspects and examples of our foundation model. \n", "\n", "#### Foundation Models\n", - "In principle, one may train an AI model from scratch for a variety of SDO data-driven use cases. This is a wasteful process however as all models have a common foundational training on the same SDO dataset. The aim of producing a foundation model for SDO use cases is to avoid this inefficient process; we first run a process called pretraining to generate a base model that one can think of as a compressed representation of the dataset it was trained on, in this case SDO, and start from there. A loose analogy would be that to building a space elevator for launching new space vessels. \n", + "It is possible to train an AI model from scratch for a variety of SDO data-driven use cases using the same SDO dataset. However, the aim of producing a foundation model for SDO use cases is to avoid this inefficient process. The first step is to run a process called pretraining to generate a base model that one can think of as a compressed representation of the dataset it was trained on, in this case SDO. \n", "\n", - "This process of training a common foundation model from which many others can be adapted is akin to that of transfer learning, a method that has been around for about 15 years or so. For an extensive treatment of transfer learning please see [review paper](https://arxiv.org/abs/1811.08883). The creation of a foundation model takes this theoretical underpinning further by conducting the foundational model training on a much larger dataset, with a much larger model, which is thought to have the expressivity capable of understanding deep underlying dynamics in the data. This could be considered the holy grail of scientific foundation models; to train a massive neural network on a massive dataset such as that from CERN's Large Hadron Collider, and somehow have the model express back to us the Standard Model of Particle Physics.\n", + "This process of training a common foundation model from which many others can be adapted similar to transfer learning [review paper](https://arxiv.org/abs/1811.08883). The foundation model training is conducted on a much larger dataset, with a much larger model, which is thought to have the expressivity capable of understanding deep underlying dynamics in the data. \n", "\n", "##### Typical Foundation Model Architectures\n", "Over the course of the past few years AI researchers have began to converge on architectures that lend themselves well to the foundation model archetype. Typically the pretrained portion of the neural network that is later then repurposed for specific use cases is called the \"Head\". The Head can be thought of a feature extractor; that is, a function $f(\\cdot)$ that maps input data, for example AIA images, to a compressed latent space representation which underlines the important dynamics present in this sample. Though this latent space representation is encoded in a digital language unbeknownst to us, this latent space \"language\" representation underlines the important aspects of this sample such as if it may contain active regions, or if it is part of a calm solar cycle. This latent space representation is learned from the data itself with no other input; understanding how to further encode our current physics understanding relevant to various foundation models is an open area of research†.\n", "\n", - "Only the head is trained during model pretraining, the resulting trained model being what we call a foundation model. Once created, the foundation model can be utilized by the scientific community at large for their own particular studies without needing the burden of the expensive and resource requirements to train a large foundation model. This is for example what Meta does by sharing the pretrained model weights of it's llama family of models so researchers and practitioners do not have to begin from scratch every time. Reproducing this approach for specific scientific domains is the main aim and motivation of the SDOFM collaboration.\n", + "Only the Head is trained during model pretraining, the resulting trained model being what we call a foundation model. Once created, the foundation model can be utilized by the scientific community at large without needing to duplicate the original training. \n", "\n", "†[Opportunities for Machine Learning in Physics - Max Welling](https://www.youtube.com/watch?v=DmoVHzMbZGI)\n", "\n", @@ -39,16 +40,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Setting up the environment [skip if on running sdofm.org!]\n", - "For this section, please be sure to be located in the project root directory before executing any commands. None of the cells in this section are meant to be ran from the notebook IDE, but rather your terminal.\n", + "### Setting up the environment\n", + "For this section, please be sure to be located in the project root directory before executing any commands. None of the cells in this section are meant to be run from the notebook IDE, but rather your terminal.\n", "\n", "#### System Requirements\n", - "This tutorial assumes that you have conda or miniconda installed and are on a linux or macos machine. It's advisable to install miniconda if you have to decide between the two (smaller install), however, if you already have conda installed, you can skip on to the next step.\n", + "This tutorial assumes that you have conda or miniconda installed and are on a linux or macos machine. We recomment miniconda (smaller install).\n", "\n", "For instructions on installing miniconda, please see [Miniconda Installation](https://docs.anaconda.com/miniconda/miniconda-install/).\n", "\n", "##### Python environment setup\n", - "After you're sure you have conda installed on your system, please run the following command from the project root directory to install a new conda environment\n", + "After installing conda or miniconda, run the following command from the project root directory to install a new conda environment\n", "\n", "```bash\n", "conda env create -f notebooks/camera_ready/virtual_eve/conda_env.yml\n", @@ -63,8 +64,7 @@ "pip install -e .\n", "```\n", "Lastly, make sure to select the correct python kernel associated with this environment, (likely located in `${CONDA_PREFIX_1}/envs/virtual-eve-finetuning/bin/python`)\n", - "\n", - "Nice, you should now be all set to go!" + "\n" ] }, { @@ -73,7 +73,7 @@ "source": [ "### Getting started\n", "\n", - "We begin by importing the newly installed libraries we will need tp run this notebook. Note: the import cell below is the first one you should be executing in this notebook." + "We begin by importing the newly installed libraries we will need to run this notebook. Note: the import cell below is the first one you should be executing in this notebook." ] }, { @@ -241,7 +241,7 @@ "metadata": {}, "source": [ "## Create a finetuning model\n", - "Now time to get serious, this model will become our model's \"head.\" The objective of this component is to take now a set of finetuned embeddings and have them predict our true science task. This model was created during FDL-X 2023 and is used as an quick example. It has a switching mode that transitions the model from linear to influenced by a CNN after a defned number of epochs. We're going to do this with Pytorch Lighning for keep hardware agnostic. \n", + "The objective of this component is to take now a set of finetuned embeddings and have them predict our true science task. We'll use an existing SDO model (created during FDL-X 2023) with a switching mode that transitions the model from linear to influenced by a CNN after a defned number of epochs. This model will become our model's \"head.\" We use Pytorch Lighning hardware agnostic implementation. \n", "\n", "We first import necessary components." ] @@ -294,7 +294,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Then the CNN efficientnet model." + "Then the CNN efficientNet model." ] }, { @@ -625,7 +625,7 @@ "### Another option: Training only from the latents\n", "![Figure 3: Architectural Diagram of Virtual EVE Training with Latents](assets/architecture_diags_virtualeve_latents.svg)\n", "\n", - "As the this foundation model includes an autoencoder archetecture use of the decoder is optional. The latents created by the autoencoder can be used directly, the below is an naive implementation. For a real-world use case you'd want to design the model around these new input." + "As this foundation model includes an autoencoder archetecture use of the decoder is optional. The latents created by the autoencoder can be used directly, the below is an naive implementation. For a real-world use case you'd want to design the model around these new inputs." ] }, { @@ -794,7 +794,7 @@ "name": "stderr", "output_type": "stream", "text": [ - "Trainer will use only 1 of 4 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=4)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.\n", + "Trainer will use only 1 of 4 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=4)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. \n", "GPU available: True (cuda), used: True\n", "TPU available: False, using: 0 TPU cores\n", "HPU available: False, using: 0 HPUs\n", @@ -865,7 +865,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Now over to you? What would you like to see with models like these?" + "## If you have questions, please join the conversation on [Hugging Face:](https://huggingface.co/SpaceML/SDO-FM).\n" ] }, {