Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A quick append mode for already existing hdf5 files #146

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sherjeelshabih
Copy link
Collaborator

This is in line with what @mkuehbach brought up. I just cooked up some basic draft to share ideas on this.

One of the questions we had was what to do if we are adding new data to an existing hdf5 path in the file. We decided to offer a simple (y/n) prompt for the overwrite. But this can be discussed.

If there is anything else you would like to add please feel free to work on this branch or leave comments.

@domna @RubelMozumder @sanbrock @lukaspie

@coveralls
Copy link

coveralls commented Aug 23, 2023

Pull Request Test Coverage Report for Build 5952173941

  • 18 of 18 (100.0%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.07%) to 49.569%

Totals Coverage Status
Change from base Build 5715296537: 0.07%
Covered Lines: 5405
Relevant Lines: 10904

💛 - Coveralls

@lukaspie
Copy link
Collaborator

I fully support having the possibility to append data. Maybe this could also work nicely with the ideas of Area A that were discussed yesterday, i.e., allowing for appending ELN data individually (e.g., from a python dict)? In any way, the proposed changes lgtm.

Overwriting old data seems a bit more tricky since you wouldn't want to accidentally overwrite existing data (especially if the writing is automated, e.g. during exporting of experimental data), A y/n prompt may be a good compromise.

@mkuehbach
Copy link
Collaborator

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

@sherjeelshabih
Copy link
Collaborator Author

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?

They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

@domna
Copy link
Collaborator

domna commented Sep 19, 2023

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?

They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

I see the point. People will do it if they really want to. Maybe we could do something like the --i-am-really-sure flag we have in the nomad cli? Something which you actively need to add, instead of maybe accidentally hitting y out of habit?

@sherjeelshabih
Copy link
Collaborator Author

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?
They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

I see the point. People will do it if they really want to. Maybe we could do something like the --i-am-really-sure flag we have in the nomad cli? Something which you actively need to add, instead of maybe accidentally hitting y out of habit?

I'm on board with that. Do we make it work just like Nomad where the user has to run the program again with this flag or do we let the user type in "I am really sure!". Note I capitalized the "I" and added an exclamation mark there to make for a stronger approval.

@RubelMozumder
Copy link
Collaborator

RubelMozumder commented Sep 19, 2023

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?
They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

I see the point. People will do it if they really want to. Maybe we could do something like the --i-am-really-sure flag we have in the nomad cli? Something which you actively need to add, instead of maybe accidentally hitting y out of habit?

I'm on board with that. Do we make it work just like Nomad where the user has to run the program again with this flag or do we let the user type in "I am really sure!". Note I capitalized the "I" and added an exclamation mark there to make for a stronger approval.

Though we want to get strong consent from users either by typing or a flag, I would say, it is still important to keep track of the old data at the same time. So that nobody can exploit this functionality for any unpleasant intention.

But, Appending new data regarding new fields can be considered and nice to have.

@sherjeelshabih
Copy link
Collaborator Author

sherjeelshabih commented Sep 19, 2023

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?
They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

I see the point. People will do it if they really want to. Maybe we could do something like the --i-am-really-sure flag we have in the nomad cli? Something which you actively need to add, instead of maybe accidentally hitting y out of habit?

I'm on board with that. Do we make it work just like Nomad where the user has to run the program again with this flag or do we let the user type in "I am really sure!". Note I capitalized the "I" and added an exclamation mark there to make for a stronger approval.

Though we want to get strong consent from users either by typing or a flag, I would say, it is still important to keep track of the old data at the same time. So that nobody can exploit this functionality for any unpleasant intention.

But, Appending new data can be considered and nice to have.

The issue is we cannot prevent anyone from doing anything nefarious. It will be good to provide an option to keep old data. But in most cases, we will only append /nexus/paths. We will most likely never overwrite "raw" h5 data paths. Where I could see this overwrite conflict is from entries added by one of our readers. In that case, the overwrite will provide a more recent either bug fixed version of the /nexus/path in question or a more feature rich version. For example, we gain some new plot functionality and a user wants to quickly update their NXS files to gain this new feature. I want to accommodate these users and not make it a hassle for them in the name of behind the curtain of trying to prevent nefarious acts. Those folks will find an even simpler way to manipulate the data.

We can, in the future, provide an option to preserve old data under certain NeXus concepts with pynxtool/reader versions, etc.

@rettigl
Copy link
Collaborator

rettigl commented Aug 20, 2024

I would find such an option very important, e.g. to add evaluation results to a file. What is the status on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants