Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

57 use tifffiles #58

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

57 use tifffiles #58

wants to merge 12 commits into from

Conversation

KerekesDavid
Copy link
Collaborator

No description provided.

@KerekesDavid KerekesDavid self-assigned this Sep 24, 2024
@KerekesDavid KerekesDavid linked an issue Sep 24, 2024 that may be closed by this pull request
@KerekesDavid
Copy link
Collaborator Author

Remaining:

  • Spacenet7 (will use resample from cv2)
  • sen1floods11
  • utae

I'll get to these tonight.

@KerekesDavid
Copy link
Collaborator Author

KerekesDavid commented Sep 25, 2024

I can't test all the datasets myself (since I don't know the proper setups for most of them), but we should check if the results are still correct after the changes above.

I've tested MADOS, and will test Sen1Floods11.

Can you guys help out with the ones that you have access to?

@SebastianHafner
Copy link
Collaborator

SebastianHafner commented Sep 26, 2024

It seems that tifffile loads images as (H, W, C), whereas rasterio loads them as (C, H, W). I rearranged the dimensions for the SpaceNet 7 dataset. I tested this and it works.

BTW I also had to pip install imagecodecs. Should we add it to the environment?

Copy link
Collaborator

@SebastianHafner SebastianHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for SpaceNet 7.

@KerekesDavid
Copy link
Collaborator Author

The imagecodecs thing is interesting, it is in the dependencies list of tifffiles. Just conda installing tifffiles worked for me.

Thanks for spotting the dimension order mismatch, I'll have to modify the code in a few places. I missed it with mados since it only uses 1-channel images :(

@gle-bellier
Copy link
Collaborator

Are we sure that changing to tifffile does not bring some algorithmic changes? For example, I doubt that rasterio resampling, torch.interpolate and cv2.resize are equivalent (even if we use nearest neighbors in every case). For this reason, I suggest we keep the rasterio dependency so we can ensure that we use the datasets as they are released.

@KerekesDavid
Copy link
Collaborator Author

KerekesDavid commented Oct 1, 2024

I would be very concerned if there was any difference in a nearest neighbor (or even bilinear) interpolate between these libraries.

These are standard algorithms with well defined behaviors. If there is any difference apart from floating point noise I would file an issue on their github :D

(I wouldn't even be surprised if they were calling the same libraries under the hood.)

@KerekesDavid
Copy link
Collaborator Author

KerekesDavid commented Oct 1, 2024

So I ran some tests, and there are indeed differences in rounding patterns (eg. which pixel gets chosen if resizing to 0.5).
But even then, we are only using cv2.resize for mados and spacenet7, and for both of those we quite significantly rewrote the dataloaders already.

I see two upsides to removing rasterio/any other tiff reading module:

  1. Standardizing our data loading, so it's harder to run into cases where an image is in the wrong band order etc.
  2. Reducing the chance of running into a dependency conflict in the future. This can happen both to us if we try to update a package, or to users who want to integrate models/datasets that were written with a specific version of a package in mind.

On the other hand, if we decide to keep all dataloaders with their original packages, we'll have to keep doing this for all datasets we add in the future, no matter how many different dependencies we accumulate. There's no guarantee this is even possible in the same venv.
None of these resampling implementations are more "wrong" than the others, and all models receive the same input data anyways so it shouldn't affect their relative performance.

@KerekesDavid
Copy link
Collaborator Author

KerekesDavid commented Oct 1, 2024

New update, I did a bit of RTFM. CV2 has a nearest interpolate mode INTERP_NEAREST_EXACT that's explicitly compatible with PIL/Matlab etc. Using that eliminates all differences in my test.

@KerekesDavid
Copy link
Collaborator Author

KerekesDavid commented Oct 2, 2024

It seems that tifffile loads images as (H, W, C), whereas rasterio loads them as (C, H, W). I rearranged the dimensions for the SpaceNet 7 dataset. I tested this and it works.

BTW I also had to pip install imagecodecs. Should we add it to the environment?

So about the channel order in tiffiles: reading the files in Sen1Floods11 returns files in (C, H, W).
Turns out tifffiles just returns the images in the exact dimension order they were saved in.

So I'll have to go over each dataset and see which format they save their data in. Luckily, the info is available in TiffFile.series[0].axes as a string.

@yurujaja
Copy link
Owner

yurujaja commented Oct 7, 2024

I wanted to check on the progress of this PR. Are all the checks complete and is it ready to be merged? Please let me know if there's anything else needed, and I’d be happy to assist.

@KerekesDavid
Copy link
Collaborator Author

KerekesDavid commented Oct 7, 2024

I'm now thinking it might be more complication than it's worth to remove rasterio from the dependencies, so I put it on the backburner for a bit.

If we still want to do this, we'll either need to manually check the dimension order for each remaining dataset, or we need to write a function that takes the dimension order from tiffiles and converts it to (C, H, W). Einops could probably handle it quite easy.

Then we'll need to check each dataset and see if it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace rasterio with tifffiles everywhere
4 participants