Merge branch 'main' into wangc/architecture

wangcj05 · wangcj05 · commit bb5345deb6e2 · 2025-09-25T14:14:37.000-06:00
diff --git a/.github/workflows/github-actions.yml b/.github/workflows/github-actions.yml
@@ -36,16 +36,21 @@ jobs:
       - name: Install DACKAR Required Libraries
         # Either fix scikit-learn==1.5 to allow quantulum3 to use the pretrained classifier or
         # Run "quantulum3-training -s" to retrain classifier
+
         run: |
           pwd
           conda create -n dackar_libs python=3.11
           conda init bash && source ~/.bashrc && conda activate dackar_libs
           pip install spacy==3.5 stumpy textacy matplotlib nltk coreferee beautifulsoup4 networkx pysbd tomli numerizer autocorrect pywsd openpyxl quantulum3[classifier] numpy==1.26 scikit-learn pyspellchecker contextualSpellCheck pandas wordcloud jsonschema toml
           pip install neo4j jupyterlab
           pip install pytest
-          python3 -m spacy download en_core_web_lg
-          python3 -m coreferee install en
-          python3 -m nltk.downloader all
+      # python -m spacy download en_core_web_lg [for some reason, GitHub machine complains this command]
+      - name: Download trained models
+        run: |
+          conda init bash && source ~/.bashrc && conda activate dackar_libs
+          python -m coreferee install en
+          python -m nltk.downloader all
+          pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0-py3-none-any.whl
           quantulum3-training -s
 
       - name: Test
@@ -84,11 +89,15 @@ jobs:
           pip install spacy==3.5 stumpy textacy matplotlib nltk coreferee beautifulsoup4 networkx pysbd tomli numerizer autocorrect pywsd openpyxl quantulum3[classifier] numpy==1.26 scikit-learn pyspellchecker contextualSpellCheck pandas wordcloud jsonschema toml
           pip install neo4j jupyterlab
           pip install pytest
-          python3 -m spacy download en_core_web_lg
-          python3 -m coreferee install en
-          python3 -m nltk.downloader all
-          quantulum3-training -s
 
+      # python -m spacy download en_core_web_lg [for some reason, GitHub machine complains this command]
+      - name: Download trained models
+        run: |
+          conda init zsh && source ~/.zshrc && conda activate dackar_libs
+          python -m coreferee install en
+          python -m nltk.downloader all
+          pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0-py3-none-any.whl
+          quantulum3-training -s
 
       - name: Test
         run: |
@@ -98,7 +107,7 @@ jobs:
 
 
   Test-DACKAR-Windows:
-    runs-on: windows-latest
+    runs-on: windows-2025
     steps:
       - name: Setup Conda
         uses: conda-incubator/setup-miniconda@v3
@@ -128,13 +137,25 @@ jobs:
           pip install spacy==3.5 stumpy textacy matplotlib nltk coreferee beautifulsoup4 networkx pysbd tomli numerizer autocorrect pywsd openpyxl quantulum3[classifier] numpy==1.26 scikit-learn pyspellchecker contextualSpellCheck pandas wordcloud jsonschema toml
           pip install neo4j jupyterlab
           pip install pytest
+          pip uninstall numba llvmlite
+          pip install --no-cache-dir numba==0.61 llvmlite==0.44
           conda list
           which python
-          python -m spacy download en_core_web_lg
+          pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0-py3-none-any.whl
           python -m coreferee install en
           python -m nltk.downloader all
           quantulum3-training -s
 
+      # python -m spacy download en_core_web_lg [for some reason, GitHub machine complains this command]
+      # python -m spacy download en_core_web_lg
+      # pip install numba
+      # - name: Download trained models
+      #   run: |
+      #     pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0-py3-none-any.whl
+      #     python -m coreferee install en
+      #     python -m nltk.downloader all
+      #     quantulum3-training -s
+
       - name: Test
         run: |
           cd tests
diff --git a/README.md b/README.md
@@ -1,5 +1,15 @@
 # DACKAR
-*Digital Analytics, Causal Knowledge Acquisition and Reasoning for Technical Language Processing*
+*Digital Analytics, Causal Knowledge Acquisition and Reasoning*
+
+A Knowledge Management and Discovery Tool for Equipment Reliability Data
+
+To improve the performance and reliability of high dependable technological systems such as nuclear power plants, advanced monitoring and health management systems are employed to inform system engineers on observed degradation processes and anomalous behaviors of assets and components. This information is captured in the form of large amount of data which can be heterogenous in nature (e.g., numeric, textual). Such a large amount of available data poses challenges when system engineers are required to parse and analyze them to track the historic reliability performance of assets and components. DACKAR tackles this challenge by providing means to organize equipment reliability data in the form of a knowledge graph. DACKAR distinguish itself from current knowledge graph-based methods in that model-based system engineering (MBSE) models are used to capture system architecture and health and performance data. MBSE models are used as skeleton of a knowledge graph; numeric and textual data elements, once processed, are associated to MBSE model elements. Such a feature opens the door to new data analytics methods designed to identify causal relations between observed phenomena.
+
+DACKAR is structured by a set of workflows where each workflow is designed to process raw data elements (i.e., anomalies, events reported in textual form, MBSE models) and construct or update a knowledge graph. For each workflow, the user can specify the sequence of pipelines that are designed to perform specific processing actions on the raw data or the processed data within the same workflow. Specific guidelines on the formats of the raw data are provided. In addition, within the same workflow, a specific data-object is defined; in this respect, each pipeline is tasked to either process portion of the defined data-object or create knowledge graph data. The available workflows are:
+* mbse_workflow: Workflow to process system and equipment MBSE models
+* anomaly_workflow:	Workflow to process numeric data and anomalies
+* tlp_workflow: Workflow to process textual data
+* kg_workflow: Workflow to construct and update knowledge graphs
 
 ## Installation
 
diff --git a/docs/contributors.rst b/docs/contributors.rst
@@ -1,2 +1,6 @@
 Contributors
 ============
+
+Congjian Wang: congjian.wang@inl.gov   
+Diego Mandelli: diego.mandelli@inl.gov 
+Joshua J. Cogliati: joshua.cogliati@inl.gov
diff --git a/docs/support.rst b/docs/support.rst
@@ -2,11 +2,8 @@
 Support
 =======
 
-The easiest way to get help with the project is to open an issue on Github_.
-
-.. The mailing list at ... is also available for support.
-
-.. _Github: https://github.inl.gov/congjian-wang/DACKAR/issues
+The easiest way to get help with the project is to open an issue on `GitHub`_ .
+.. _Github: https://github.com/idaholab/DACKAR/issues/
 
 Developers:
 -----------

-Original file line number
+Diff line change
@@ @@ -1,2 +1,6 @@ @@
 Contributors
 ============
++
 +Congjian Wang: [email protected]
 +Diego Mandelli: [email protected]
 +Joshua J. Cogliati: [email protected]