✨ AiiDAConverter: add full_dos conversion

mbercx · mbercx · commit 4a012d5cab6d · 2025-10-10T12:40:09.000+10:00
As a demonstrator of the motivation of having a `full_dos` output, we add a conversion
mapping from `full_dos` to an AiiDA `XyData`. It should be clear that this mapping can
be improved: we're missing a true `DosData` data type, which could also store the Fermi
energy. However, here we show the advantage of separating the "extraction" to base
outputs and "conversion" into library ones: we can have a single converter class that
takes care of the conversion of outputs of the various Quantum ESPRESSO codes.

While working on this, some other considerations regarding the design came to mind.
These are added to corresponding design section.
diff --git a/docs/design/outputs.md b/docs/design/outputs.md
@@ -159,13 +159,27 @@ class PymatgenConverter(BaseConverter):
             },
         ),
     }
-
 ```
 
 However, if this is not the case, the output cannot be directly constructed with this approach.
 An example here is AiiDA's `StructureData`.
 This points to poor design of this class' constructor, but we can still support the class by allowing the first element in the now `(<output_converter>, <glom_spec>)` tuple to be a function.
 
+!!! note
+
+    This approach requires careful syncing the `_output_spec_mapping` of the output classes to the `conversion_mapping` of the converter classes, and hence the code logic for obtaining is not fully localized.
+    To make things worse, in some cases it also requires understanding the raw outputs (but this can be prevented with clear schemas for the base outputs).
+    We're not fully converged on the design here, but some considerations below:
+
+    1. If we want the code for converting to a certain library to be isolated, we will always have to accept some delocalization.
+       We could consider directly extracting the data required from the raw outputs, but then a developer still has to go check the corresponding output class for the keys it uses to store the raw output, as well as the raw output itself.
+       Moreover: it could lead to a lot of code duplication; right now the base outputs are already in the default units.
+
+    2. One other issue could be name conflicts: in case there are multiple outputs from different output classes that have the same name but different content, you cannot define conversion (or lack thereof) for both of them.
+       However, it seems clear that we should try to have consistent and distinct names for each output.
+
+    At this stage, we think clearly structured and defined "base outputs" are a better approach than direct extraction.
+
 ## User interface
 
 A `get_output` method is implemented on the `PwOutput`, which is the main user-facing interface for all these features.
diff --git a/src/qe_tools/converters/aiida.py b/src/qe_tools/converters/aiida.py
@@ -2,6 +2,7 @@
 
 import numpy as np
 
+from glom import T
 from qe_tools.converters.base import BaseConverter
 
 try:
@@ -23,6 +24,21 @@ def _convert_structure_data(cell, symbols, positions):
     return structure
 
 
+def _convert_dos(energy: list, dos: list | dict):
+    xy_data = orm.XyData()
+    xy_data.set_x(np.array(energy), "energy", "eV")
+    if isinstance(dos, dict):
+        xy_data.set_y(
+            [np.array(dos["dos_down"]), np.array(dos["dos_up"])],
+            ["dos_spin_down", "dos_spin_down"],
+            ["states/eV", "states/eV"],
+        )
+    else:
+        xy_data.set_y(np.array(dos), "dos", "states/eV")
+
+    return xy_data
+
+
 class AiiDAConverter(BaseConverter):
     conversion_mapping = {
         "structure": (
@@ -33,4 +49,19 @@ class AiiDAConverter(BaseConverter):
                 "positions": ("positions", lambda positions: np.array(positions)),
             },
         ),
+        "full_dos": (
+            _convert_dos,
+            {
+                "energy": "energy",
+                "dos": (
+                    T,
+                    lambda full_dos: full_dos["dos"]
+                    if "dos" in full_dos
+                    else {
+                        "dos_down": full_dos["dos_down"],
+                        "dos_up": full_dos["dos_up"],
+                    },
+                ),
+            },
+        ),
     }