Skip to content

Implement concurrent parsing #743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 46 commits into
base: alpha-test-dev
Choose a base branch
from
Draft

Implement concurrent parsing #743

wants to merge 46 commits into from

Conversation

MicahGale
Copy link
Collaborator

@MicahGale MicahGale commented Apr 10, 2025

Pull Request Checklist for MontePy

Description

This PR adds concurrent parsing using multiprocessing via concurrent.futures. This allows large models to be parsed faster, though it is not perfectly scalable. This option is off by default due to windows requiring a if __name__ == "__main__": guard.

I am merging this onto alpha-test-dev because this needs to be a minor release. I am not sure when we want to do a minor release yet. Probably after jit_parsing is implemented as well.

Implementation.

In general the process of parsing can be broken into three main tasks:

  1. Chunking: the process of splitting the file into individual inputs.
  2. Parsing the inputs into their full syntax trees, and MontePy objects.
  3. Post-parsing processing of linking objects together, and appending the objects to the problem.

How this works in the multi-processing mode is best viewed through the processes.

  • Main Process. The main process handles chunking, and post-parsing processing. It iterates through the file and dispatches the inputs as an iterator that fills a queue (handled by concurrent.futures). Once it is ready it reads the resulting objects out from another queue, and appends then to the problem. Once all objects appended the sub processes are terminated and the objects are linked together
  • Child processes. The children take inputs from a queue parses them and then returns the instantiated objects back through the queue.

performance

I haven't effectively collected data for this yet. This can be done at a later date. From the preliminary data I have gotten there appears to now be a floor on load times of about a second due to the added overhead.

Fixes #742, #745


General Checklist

  • I have performed a self-review of my own code.
  • The code follows the standards outlined in the development documentation.
  • I have formatted my code with black version 25.
  • I have added tests that prove my fix is effective or that my feature works (if applicable).

Documentation Checklist

  • I have documented all added classes and methods.
  • For infrastructure updates, I have updated the developer's guide.
  • For significant new features, I have added a section to the getting started guide.

First-Time Contributor Checklist

  • If this is your first contribution, add yourself to pyproject.toml if you wish to do so.

Additional Notes for Reviewers

Ensure that:

  • The submitted code is consistent with the merge checklist outlined here.
  • The PR covers all relevant aspects according to the development guidelines.
  • 100% coverage of the patch is achieved, or justification for a variance is given.

📚 Documentation preview 📚: https://montepy--743.org.readthedocs.build/en/743/

@MicahGale
Copy link
Collaborator Author

So final test that is failing is due to not having check mode supported.

@MicahGale MicahGale self-assigned this Apr 13, 2025
@MicahGale MicahGale added the performance 🐌 Issues related to speed and memory label Apr 13, 2025
@MicahGale
Copy link
Collaborator Author

I have narrowed the issue down to unpickling ParsingError. This is wild. Pickle is trying to run __init__?

I have boiled it down:

import montepy
import pickle
error = montepy.errors.ParsingError("", "", [])
pickle.loads(pickle.dumps(error))

This causes:

Traceback (most recent call last):
  File "/home/mgale/mambaforge/lib/python3.13/site-packages/IPython/core/interactiveshell.py", line 3667, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<ipython-input-23-8c9e5f955a19>", line 1, in <module>
    pickle.loads(pickled)
    ~~~~~~~~~~~~^^^^^^^^^
TypeError: ParsingError.__init__() missing 2 required positional arguments: 'message' and 'error_queue'

@MicahGale
Copy link
Collaborator Author

Maybe this is our solution? https://bugs.python.org/issue32696

@MicahGale
Copy link
Collaborator Author

MicahGale commented Apr 14, 2025

It seems to be due to the fact that we are inheriting from ValueError and not Exception?

Update: It doesn't seem like ValueError inherits from Exception, but only Exception, which is a problem:

exception Exception
All built-in, non-system-exiting exceptions are derived from this class. All user-defined exceptions should also be derived from this class.

@MicahGale
Copy link
Collaborator Author

The issues with windows are independent of serial/multi-process mode. This is wild. These issues should be abstracted away by python.

@MicahGale MicahGale mentioned this pull request Jun 26, 2025
11 tasks
Fixed broken test cases with Windows
@MicahGale
Copy link
Collaborator Author

MicahGale commented Jun 26, 2025

Update: I was able to effective fix #745. Now to get this ready I need to:

  1. address the drop in coverage.
  2. Document how to use concurrent parsing in practice (i.e., if __name__ == "__main__":)

@MicahGale MicahGale changed the base branch from develop to alpha-test-dev July 11, 2025 22:04
@MicahGale MicahGale requested a review from tjlaboss July 12, 2025 04:25
@MicahGale
Copy link
Collaborator Author

@tjlaboss I think this is finally ready.

@MicahGale MicahGale marked this pull request as ready for review July 14, 2025 15:41
@MicahGale
Copy link
Collaborator Author

I have gotten some preliminary data from @dodu94, that this actually slows down the parsing of large models. I think we need to pause this until we can figure out that issue.

@MicahGale MicahGale marked this pull request as draft July 15, 2025 19:33
@MicahGale
Copy link
Collaborator Author

TIL: multiprocessing.pool.imap maybe better suited for this use case. Also May need to multi-process the input chunking process, or seriously increase it's efficiency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance 🐌 Issues related to speed and memory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement parallel parsing
2 participants