-
Notifications
You must be signed in to change notification settings - Fork 226
Use TextIOWrapper for opening file from repository in text mode #6847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
See also https://aiida.discourse.group/t/memory-usage-folderdata-open/524/22 for a discussion on this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @ahkole
Looks good to me already.
Just to make sure if it works as expected, could you please to add your tests into the repository? --maybe requires some modification, but then we'll know for sure if something's break
I've added a small test that for the reading in text mode. I think that should cover the changes, right? The small change to the cif-interface is already covered by an existing test (that is also how I discovered that that needed to be changed). Note btw that the changes in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @ahkole,
To link the correct dependency to your PR, in pyproject.toml
remove the version:
'disk-objectstore~=1.2'
-> 'disk-objectstore'
And add a hook to your fork and branch
[tool.uv.sources]
disk-objectstore = {git = "https://github.com/ahkole/disk-objectstore.git", branch = "XX"}
and for enviroment.yml
just get rid of the version:
'disk-objectstore~=1.2'
-> 'disk-objectstore'
In the end, using pre-commit run
it automatically updates the uv
file
Co-authored-by: Ali Khosravi <[email protected]>
for more information, see https://pre-commit.ci
Ah, I wasn't aware this was possible. Thanks for the suggestion! I think this should be set correctly now. |
Unrelated to this PR, but can I propose btw to update the "Setting up your development environment" page of the Contributor wiki to mention using |
Thanks a lot @ahkole, Changes looks good to me, I run the workflow now to see if tests pass. |
Ok, we're hittinh this github bug again, |
Good! the trick worked, now let's see the tests.. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6847 +/- ##
==========================================
- Coverage 78.59% 78.59% -0.00%
==========================================
Files 567 567
Lines 43092 43093 +1
==========================================
- Hits 33866 33863 -3
- Misses 9226 9230 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@ahkole, thanks a gain for your contribution, One of the tests is failing for To reproduce run this: It appears that restapi server has problems to properly read and load the data from a
|
Hi @khsrali , I have tried to run the test as you suggested but on my system the test passes, see below:
I seem to be using a different Python version though (3.11 vs 3.9 in the results you shared). Don't know if that can make the difference? |
yeah, strange, I tried to run locally with |
Is there a way to look more into this |
Yes, that would be the way to go when debugging. The pain is that we cannot reproduce it locally, so it's more difficult to understand it and resolve it. @eimrek, we don't understand why test fails here. |
@eimrek Thanks for the reply! If I check the history of my branch (https://github.com/ahkole/aiida-core/commits/objectstore-textio/), it does seem to include #6763. It was also last synced with |
Is it somehow possible to run interactively on the CI host to try to debug the issue? |
@ahkole tests pass now 🤷♂️ For me this PR looks good already. Let me have a look at your other PR in diskobjectstore. |
@khsrali that's great to hear! Let me know what you think of the disk-objectstore PR. |
In the current implementation, the contents of a file are wrapped in a
StringIO
if a user tries to open a file from the repository in text mode (i.e. withfolderdata.open('somefile.txt', 'r')
). This means the entire file is loaded into memory, which can cause a significant increase in memory usage for large text files. This PR proposed to instead wrap the binary stream in aTextIOWrapper
for streaming based access in text mode to avoid having to load the entire file into memory.A small change was also necessary for the
cif
-file interface, since apparently theCifParser
from pymatgen only accepts filenames orStringIO
streams.This PR depends on a PR for the disk-objectstore dependency (see aiidateam/disk-objectstore#192) that adds some attributes to custom streaming types that are required by
TextIOWrapper
.