A quick append mode for already existing hdf5 files #146

sherjeelshabih · 2023-08-23T13:45:09Z

This is in line with what @mkuehbach brought up. I just cooked up some basic draft to share ideas on this.

One of the questions we had was what to do if we are adding new data to an existing hdf5 path in the file. We decided to offer a simple (y/n) prompt for the overwrite. But this can be discussed.

If there is anything else you would like to add please feel free to work on this branch or leave comments.

@domna @RubelMozumder @sanbrock @lukaspie

coveralls · 2023-08-23T13:48:10Z

Pull Request Test Coverage Report for Build 5952173941

18 of 18 (100.0%) changed or added relevant lines in 3 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.07%) to 49.569%

Totals
Change from base Build 5715296537:	0.07%
Covered Lines:	5405
Relevant Lines:	10904

💛 - Coveralls

lukaspie · 2023-09-19T09:31:20Z

I fully support having the possibility to append data. Maybe this could also work nicely with the ideas of Area A that were discussed yesterday, i.e., allowing for appending ELN data individually (e.g., from a python dict)? In any way, the proposed changes lgtm.

Overwriting old data seems a bit more tricky since you wouldn't want to accidentally overwrite existing data (especially if the writing is automated, e.g. during exporting of experimental data), A y/n prompt may be a good compromise.

mkuehbach · 2023-09-19T10:50:33Z

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

sherjeelshabih · 2023-09-19T11:13:54Z

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?

They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

domna · 2023-09-19T12:02:55Z

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?

They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

I see the point. People will do it if they really want to. Maybe we could do something like the --i-am-really-sure flag we have in the nomad cli? Something which you actively need to add, instead of maybe accidentally hitting y out of habit?

sherjeelshabih · 2023-09-19T12:11:35Z

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?
They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

I see the point. People will do it if they really want to. Maybe we could do something like the --i-am-really-sure flag we have in the nomad cli? Something which you actively need to add, instead of maybe accidentally hitting y out of habit?

I'm on board with that. Do we make it work just like Nomad where the user has to run the program again with this flag or do we let the user type in "I am really sure!". Note I capitalized the "I" and added an exclamation mark there to make for a stronger approval.

RubelMozumder · 2023-09-19T12:34:00Z

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?
They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

I see the point. People will do it if they really want to. Maybe we could do something like the --i-am-really-sure flag we have in the nomad cli? Something which you actively need to add, instead of maybe accidentally hitting y out of habit?

I'm on board with that. Do we make it work just like Nomad where the user has to run the program again with this flag or do we let the user type in "I am really sure!". Note I capitalized the "I" and added an exclamation mark there to make for a stronger approval.

Though we want to get strong consent from users either by typing or a flag, I would say, it is still important to keep track of the old data at the same time. So that nobody can exploit this functionality for any unpleasant intention.

But, Appending new data regarding new fields can be considered and nice to have.

sherjeelshabih · 2023-09-19T12:41:17Z

I am strongly against adding an option to overwrite existent data especially if there is not a rock-solid mechanism in-place which keeps hashes and provenance of each piece of information or groupings of pieces of information.

I understand. But the problem is if a user wants to replace this, do we block them from doing so and ignore the replacing keys for now? So no asking (y/n) and just warning them about it?
They could always go around our code and replace anything they like. in the h5 files. So I feel like giving the user more power to outright replace what they like using our framework is better.

I see the point. People will do it if they really want to. Maybe we could do something like the --i-am-really-sure flag we have in the nomad cli? Something which you actively need to add, instead of maybe accidentally hitting y out of habit?

I'm on board with that. Do we make it work just like Nomad where the user has to run the program again with this flag or do we let the user type in "I am really sure!". Note I capitalized the "I" and added an exclamation mark there to make for a stronger approval.

Though we want to get strong consent from users either by typing or a flag, I would say, it is still important to keep track of the old data at the same time. So that nobody can exploit this functionality for any unpleasant intention.

But, Appending new data can be considered and nice to have.

The issue is we cannot prevent anyone from doing anything nefarious. It will be good to provide an option to keep old data. But in most cases, we will only append /nexus/paths. We will most likely never overwrite "raw" h5 data paths. Where I could see this overwrite conflict is from entries added by one of our readers. In that case, the overwrite will provide a more recent either bug fixed version of the /nexus/path in question or a more feature rich version. For example, we gain some new plot functionality and a user wants to quickly update their NXS files to gain this new feature. I want to accommodate these users and not make it a hassle for them ~~in the name of~~ behind the curtain of trying to prevent nefarious acts. Those folks will find an even simpler way to manipulate the data.

We can, in the future, provide an option to preserve old data under certain NeXus concepts with pynxtool/reader versions, etc.

rettigl · 2024-08-20T21:07:35Z

I would find such an option very important, e.g. to add evaluation results to a file. What is the status on this?

A quick append mode for already existing hdf5 files

d2c72ca

rettigl mentioned this pull request Aug 20, 2024

Workflow for adding analysis results to an existing Nexus file #366

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A quick append mode for already existing hdf5 files #146

A quick append mode for already existing hdf5 files #146

sherjeelshabih commented Aug 23, 2023

coveralls commented Aug 23, 2023 •

edited

Loading

lukaspie commented Sep 19, 2023

mkuehbach commented Sep 19, 2023

sherjeelshabih commented Sep 19, 2023

domna commented Sep 19, 2023 •

edited

Loading

sherjeelshabih commented Sep 19, 2023

RubelMozumder commented Sep 19, 2023 •

edited

Loading

sherjeelshabih commented Sep 19, 2023 •

edited

Loading

rettigl commented Aug 20, 2024

A quick append mode for already existing hdf5 files #146

Are you sure you want to change the base?

A quick append mode for already existing hdf5 files #146

Conversation

sherjeelshabih commented Aug 23, 2023

coveralls commented Aug 23, 2023 • edited Loading

Pull Request Test Coverage Report for Build 5952173941

💛 - Coveralls

lukaspie commented Sep 19, 2023

mkuehbach commented Sep 19, 2023

sherjeelshabih commented Sep 19, 2023

domna commented Sep 19, 2023 • edited Loading

sherjeelshabih commented Sep 19, 2023

RubelMozumder commented Sep 19, 2023 • edited Loading

sherjeelshabih commented Sep 19, 2023 • edited Loading

rettigl commented Aug 20, 2024

coveralls commented Aug 23, 2023 •

edited

Loading

domna commented Sep 19, 2023 •

edited

Loading

RubelMozumder commented Sep 19, 2023 •

edited

Loading

sherjeelshabih commented Sep 19, 2023 •

edited

Loading