Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Cp2k and defects #242

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
dc3f263
Very first committ of emmet supporting cp2k for defect calculations. …
nwinner May 31, 2021
38c10fc
Update.
nwinner Jul 28, 2021
5955cf5
Remove unecessary methods. Update condition.
nwinner Jul 28, 2021
bd17382
Update to parameter getter.
nwinner Jul 30, 2021
4e96e06
Clean imports and update doc.
nwinner Jul 30, 2021
665644f
Merge branch 'main' of https://github.com/materialsproject/emmet into…
nwinner Jul 30, 2021
cd814b9
CP2K settings.
nwinner Jul 30, 2021
dce71e8
Update imports. stubs gone
nwinner Jul 30, 2021
a3dd4b0
Fix imports. Stubs gone.
nwinner Jul 30, 2021
87a6286
Clean imports
nwinner Jul 30, 2021
9590dc9
Clean imports
nwinner Jul 30, 2021
9d2947f
Remove validator
nwinner Jul 30, 2021
f6d3b79
Remove sandboxes
nwinner Jul 30, 2021
bc8f4af
Another log point
nwinner Jul 30, 2021
7bfec2b
Update to material doc
nwinner Aug 3, 2021
9833317
Temporary MaterialsBuilder update. Some things are not implemented.
nwinner Aug 3, 2021
c133909
Some updates, subject to change, for the Defect builder.
nwinner Aug 3, 2021
f36f5d0
Update for dielectric. Will need further adjustments.
nwinner Aug 3, 2021
35067ad
Cp2kTaskType. Should be unified?
nwinner Aug 3, 2021
83a15bf
Store as int so MPID can work with it.
nwinner Aug 3, 2021
ad48136
Use integers
nwinner Aug 3, 2021
ccee065
Move imports.
nwinner Aug 5, 2021
e731d28
Improved builders with private methods, more docstrings, and support
nwinner Aug 5, 2021
a5edc7b
New methods to update document without re-building the whole thing.
nwinner Aug 5, 2021
5462795
Include BaseTaskDocument
nwinner Aug 6, 2021
a28ea14
(1) Reordered the imports.
nwinner Aug 6, 2021
2d36240
Update query for defects and reorder imports.
nwinner Aug 6, 2021
4fd13c7
Extra log message.
nwinner Aug 6, 2021
ae1698f
Update.
nwinner Aug 11, 2021
0b4b103
Add R2SCAN option
nwinner Aug 26, 2021
f2b70e0
Updates and refinements.
nwinner Nov 2, 2021
f51ddd1
bug
nwinner Nov 3, 2021
6a3a73b
Updates
nwinner Mar 7, 2022
c0c434e
Todos
nwinner Mar 8, 2022
598f912
Merge remote-tracking branch 'upstream/main' into cp2k
nwinner Mar 8, 2022
e6310e9
Versions
nwinner Mar 8, 2022
182e820
Import updates
nwinner Mar 9, 2022
9a9baac
Changes
nwinner Mar 15, 2022
d53da01
Updates
nwinner Mar 23, 2022
f61a481
Merge branch 'main' of https://github.com/materialsproject/emmet into…
nwinner Mar 23, 2022
a45100d
Updates
nwinner May 3, 2022
c12e5fc
Possible additions
nwinner May 3, 2022
30aebee
Merge branch 'main' of https://github.com/materialsproject/emmet into…
nwinner May 3, 2022
80f6672
Update
nwinner May 4, 2022
893b93a
Merge branch 'main' of https://github.com/materialsproject/emmet into…
nwinner May 9, 2022
bea2f86
Updates
nwinner May 11, 2022
8c44520
Merge branch 'main' of https://github.com/materialsproject/emmet into…
nwinner Jun 24, 2022
ff60417
Merge branch 'main' of https://github.com/materialsproject/emmet into…
nwinner Sep 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
910 changes: 910 additions & 0 deletions emmet-builders/emmet/builders/cp2k/defects.py

Large diffs are not rendered by default.

275 changes: 275 additions & 0 deletions emmet-builders/emmet/builders/cp2k/materials.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
from datetime import datetime
from itertools import chain
from typing import Dict, Iterator, List, Optional

from maggma.builders import Builder
from maggma.stores import Store

from emmet.builders.settings import EmmetBuildSettings
from emmet.core.utils import group_structures, jsanitize
from emmet.core.cp2k.material import MaterialsDoc
from emmet.core.cp2k.task import TaskDocument

SETTINGS = EmmetBuildSettings()

__author__ = "Nicholas Winner <[email protected]>"
__maintainer__ = "Shyam Dwaraknath <[email protected]>"


class MaterialsBuilder(Builder):
"""
The Materials Builder matches VASP task documents by structure similarity into materials
document. The purpose of this builder is group calculations and determine the best structure.
All other properties are derived from other builders.

The process is as follows:

1.) Find all documents with the same formula
2.) Select only task documents for the task_types we can select properties from
3.) Aggregate task documents based on structure similarity
4.) Convert task docs to property docs with metadata for selection and aggregation
5.) Select the best property doc for each property
6.) Build material document from best property docs
7.) Post-process material document
8.) Validate material document

"""

def __init__(
self,
tasks: Store,
materials: Store,
task_validation: Optional[Store] = None,
query: Optional[Dict] = None,
settings: Optional[EmmetBuildSettings] = None,
**kwargs,
):
"""
Args:
tasks: Store of task documents
materials: Store of materials documents to generate
query: dictionary to limit tasks to be analyzed
allowed_task_types: list of task_types that can be processed
symprec: tolerance for SPGLib spacegroup finding
ltol: StructureMatcher tuning parameter for matching tasks to materials
stol: StructureMatcher tuning parameter for matching tasks to materials
angle_tol: StructureMatcher tuning parameter for matching tasks to materials
"""

self.tasks = tasks
self.materials = materials
self.task_validation = task_validation
self.query = query if query else {}
self.settings = EmmetBuildSettings.autoload(settings)
self.kwargs = kwargs

sources = [tasks]
if self.task_validation:
sources.append(self.task_validation)
super().__init__(sources=sources, targets=[materials], **kwargs)

def ensure_indexes(self):
"""
Ensures indicies on the tasks and materials collections
"""

# Basic search index for tasks
self.tasks.ensure_index("task_id")
self.tasks.ensure_index("last_updated")
self.tasks.ensure_index("state")
self.tasks.ensure_index("formula_pretty")

# Search index for materials
self.materials.ensure_index("material_id")
self.materials.ensure_index("last_updated")
self.materials.ensure_index("sandboxes")
self.materials.ensure_index("task_ids")

if self.task_validation:
self.task_validation.ensure_index("task_id")
self.task_validation.ensure_index("valid")

def get_items(self) -> Iterator[List[Dict]]:
"""
Gets all items to process into materials documents.
This does no datetime checking; relying on on whether
task_ids are included in the Materials Collection

Returns:
generator or list relevant tasks and materials to process into materials documents
"""

self.logger.info("Materials builder started")
# TODO make a cp2k allowed type setting
self.logger.info(
f"Allowed task types: {[task_type.value for task_type in self.settings.CP2K_ALLOWED_TASK_TYPES]}"
)

self.logger.info("Setting indexes")
self.ensure_indexes()

# Save timestamp to mark buildtime for material documents
self.timestamp = datetime.utcnow()

# Get all processed tasks:
temp_query = dict(self.query)
temp_query["state"] = "successful"

self.logger.info("Finding tasks to process")
all_tasks = {
doc[self.tasks.key]
for doc in self.tasks.query(temp_query, [self.tasks.key])
}
processed_tasks = {
t_id
for d in self.materials.query({}, ["task_ids"])
for t_id in d.get("task_ids", [])
}
to_process_tasks = all_tasks - processed_tasks
to_process_forms = self.tasks.distinct(
"formula_pretty", {self.tasks.key: {"$in": list(to_process_tasks)}}
)
self.logger.info(f"Found {len(to_process_tasks)} unprocessed tasks")
self.logger.info(f"Found {len(to_process_forms)} unprocessed formulas")

# Set total for builder bars to have a total
self.total = len(to_process_forms)

if self.task_validation:
invalid_ids = {
doc[self.tasks.key]
for doc in self.task_validation.query(
{"is_valid": False}, [self.task_validation.key]
)
}
else:
invalid_ids = set()

projected_fields = [
"last_updated",
"completed_at",
"task_id",
"formula_pretty",
"output.energy_per_atom",
"output.structure",
"input",
"orig_inputs",
"input.structure",
"tags",
]

for formula in to_process_forms:
tasks_query = dict(temp_query)
tasks_query["formula_pretty"] = formula
tasks = list(
self.tasks.query(criteria=tasks_query, properties=None)
)
for t in tasks:
if t[self.tasks.key] in invalid_ids:
t["is_valid"] = False
else:
t["is_valid"] = True

yield tasks

def process_item(self, tasks: List[Dict]) -> List[Dict]:
"""
Process the tasks into a list of materials

Args:
tasks [dict] : a list of task docs

Returns:
([dict],list) : a list of new materials docs and a list of task_ids that were processsed
"""

tasks = [TaskDocument(**task) for task in tasks]
formula = tasks[0].formula_pretty
task_ids = [task.task_id for task in tasks]
self.logger.debug(f"Processing {formula} : {task_ids}")

grouped_tasks = self.filter_and_group_tasks(tasks)
materials = []
for group in grouped_tasks:
try:
materials.append(
MaterialsDoc.from_tasks(
group,
quality_scores=self.settings.CP2K_QUALITY_SCORES,
)
)
except Exception as e:
# TODO construct deprecated

failed_ids = list({t_.task_id for t_ in group})
doc = MaterialsDoc.construct_deprecated_material(tasks)
doc.warnings.append(str(e))
materials.append(doc)
self.logger.warn(
f"Failed making material for {failed_ids}."
f" Inserted as deprecated Material: {doc.material_id}"
)

self.logger.debug(f"Produced {len(materials)} materials for {formula}")

return jsanitize([mat.dict() for mat in materials], allow_bson=True)

def update_targets(self, items: List[List[Dict]]):
"""
Inserts the new task_types into the task_types collection

Args:
items ([([dict],[int])]): A list of tuples of materials to update and the corresponding
processed task_ids
"""

items = list(filter(None, chain.from_iterable(items)))

for item in items:
item.update({"_bt": self.timestamp})

material_ids = list({item["material_id"] for item in items})

if len(items) > 0:
self.logger.info(f"Updating {len(items)} materials")
self.materials.remove_docs({self.materials.key: {"$in": material_ids}})
self.materials.update(
docs=jsanitize(items, allow_bson=True),
key=["material_id"],
)
else:
self.logger.info("No items to update")

def filter_and_group_tasks(self, tasks: List[TaskDocument]) -> Iterator[List[Dict]]:
"""
Groups tasks by structure matching
"""

# TODO why did the way vasp builder did it not work here?
filtered_tasks = []
for task in tasks:
for allowed_type in self.settings.CP2K_ALLOWED_TASK_TYPES:
if task.task_type is allowed_type:
filtered_tasks.append(task)
continue

structures = []

for idx, task in enumerate(filtered_tasks):
s = task.output.structure
s.index: int = idx # type: ignore
s.remove_oxidation_states()
s.remove_spin()
structures.append(s)

grouped_structures = group_structures(
structures,
ltol=self.settings.LTOL,
stol=self.settings.STOL,
angle_tol=self.settings.ANGLE_TOL,
symprec=self.settings.SYMPREC,
)

for group in grouped_structures:
grouped_tasks = [filtered_tasks[struc.index] for struc in group] # type: ignore
yield grouped_tasks
63 changes: 63 additions & 0 deletions emmet-builders/emmet/builders/cp2k/task_validator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
from typing import Dict, Optional

from maggma.builders import MapBuilder
from maggma.core import Store

from emmet.builders.settings import EmmetBuildSettings
from emmet.core.cp2k.task import TaskDocument
from emmet.core.cp2k.validation import DeprecationMessage, ValidationDoc


class TaskValidator(MapBuilder):
def __init__(
self,
tasks: Store,
task_validation: Store,
settings: Optional[EmmetBuildSettings] = None,
query: Optional[Dict] = None,
**kwargs,
):
"""
Creates task_types from tasks and type definitions

Args:
tasks: Store of task documents
task_validation: Store of task_types for tasks
"""
self.tasks = tasks
self.task_validation = task_validation
self.settings = EmmetBuildSettings.autoload(settings)
self.query = query
self.kwargs = kwargs

super().__init__(
source=tasks,
target=task_validation,
projection=[
"input",
"output.forces",
"tags",
],
query=query,
**kwargs,
)

def unary_function(self, item):
"""
Find the task_type for the item

Args:
item (dict): a (projection of a) task doc
"""
task_doc = TaskDocument(**item)
validation_doc = ValidationDoc.from_task_doc(
task_doc=task_doc,
)

bad_tags = list(set(task_doc.tags).intersection(self.settings.DEPRECATED_TAGS))
if len(bad_tags) > 0:
validation_doc.warnings.append(f"Manual Deprecation by tags: {bad_tags}")
validation_doc.valid = False
validation_doc.reasons.append(DeprecationMessage.MANUAL)

return validation_doc
Loading