Feature/auto detect format #77

rajatkriplani · 2025-04-03T11:12:45Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?
This PR aims to improve user experience by automatically detecting the format based on the file's content.

What does this PR do?
This PR introduces automatic detection for the format of input annotation files:

Adds a private helper function _detect_format within load_bboxes.py
Modifies from_files function signature to make the format argument optional, defaulting "auto".
When format="auto", from_files now calls _detect_format on the first input file to determine the format for loading.
Adds new unit tests to cover the format="auto" success and failure scenarios.

References

Closes #43

How has this PR been tested?

The code has been tested locally by running pytest. New unit tests have been added to tests/test_unit/test_annotations/test_load_bboxes.py

Is this a breaking change?

No. This PR only adds a new, optional behavior (format="auto") as the default.

Does this PR require an update to the documentation?

Yes. The docstring for the ethology.annotations.io.load_bboxes.from_files function

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

codecov · 2025-04-03T11:25:42Z

Codecov Report

Attention: Patch coverage is 98.11321% with 1 line in your changes missing coverage. Please review.

Project coverage is 98.68%. Comparing base (abc3c5b) to head (a8eaf5d).

Files with missing lines	Patch %	Lines
ethology/annotations/io/load_bboxes.py	98.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   98.82%   98.68%   -0.14%     
==========================================
  Files           5        5              
  Lines         255      305      +50     
==========================================
+ Hits          252      301      +49     
- Misses          3        4       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sfmig · 2025-04-03T11:26:07Z

Hi @rajatkriplani, thanks for this!

I approved the CI workflows now, would you mind having a look at the failed checks?

Additionally, we usually open PRs separately for different features / issues, but I noticed in this PR the diff with main includes the labelling guide as well as the auto detect format work. Would you mind changing it so that here you only show the auto detect format work? That makes it easier for the reviewer to go through.

Feel free to have a go, let me know if you get stuck at any point.

rajatkriplani · 2025-04-03T15:13:15Z

Hi @sfmig I have improved the test coverage, still the following line are uncovered in ethology/annotations/io/load_bboxes.py

except Exception as e:  # Catch other potential file reading errors
         raise ValueError(f"Could not read file {file_path}: {e}") from e

for the above mocking is required, so should I go with unittest.mock?

sfmig · 2025-04-08T12:22:38Z

Hi @rajatkriplani,

Yes do have a go at using unittest.mock for this. You may find examples of its use in the movement tests codebase.

Hope this helps!

sfmig · 2025-04-08T12:31:46Z

Also, do have a look at the docs building check which seems to be failing (alongside the code coverage checks)

…kriplani/ethology into feature/auto-detect-format

rajatkriplani · 2025-04-08T16:16:21Z

@sfmig I have done the mocking for some tests please have a look at it.

rajatkriplani · 2025-04-10T16:12:01Z

Hello @sfmig
I am assuming only the doc build is remaining. Please know how to make changes in the doc (and which doc?)?

rajatkriplani · 2025-04-20T15:07:51Z

Hello @sfmig just dropping a quick follow-up here since Zulip’s been a bit quiet — would love any further thoughts or feedback on this when you get a chance.

sfmig

Hi @rajatkriplani, thanks for having a go!

I think this is a good attempt, although we will likely need to iterate a bit. I left some in-line comments in the code, but also have some general comments:

In general, the PR feels a bit too verbose. If you can, I would try to keep the implementation as minimal as possible. For example, some of the error handling is dealth further down the pipeline using the validators module. I would recommend having a look at the full annotation pipeline (i.e. the process of reading an annotation file as a dataframe), understanding it well, and then try to keep only the essential bits here.
Could you have a look at the CI checks, and investigate why they are failing? I found these two likely culprits in the logs:

/home/runner/work/ethology/ethology/ethology/annotations/io/load_bboxes.py:docstring of ethology.annotations.io.load_bboxes.from_files:39: ERROR: Unexpected indentation. [docutils]
/home/runner/work/ethology/ethology/ethology/annotations/io/load_bboxes.py:docstring of ethology.annotations.io.load_bboxes.from_files:40: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]

Hope this helps!

sfmig · 2025-05-13T09:45:27Z

ethology/annotations/io/load_bboxes.py

+    if not file_path.is_file():
+        raise FileNotFoundError(f"Annotation file not found: {file_path}")
+
+    try:
+        with open(file_path) as f:
+            # Load only enough to check keys, avoid loading huge files
+            # if possible
+            # For simplicity here, load the whole thing.
+            # Optimization is possible if needed.
+            data = json.load(f)
+    except json.JSONDecodeError as e:
+        raise ValueError(
+            f"Error decoding JSON data from file {file_path}: {e}"
+        ) from e
+    except Exception as e:  # Catch other potential file reading errors
+        raise ValueError(f"Could not read file {file_path}: {e}") from e
+
+    if not isinstance(data, dict):
+        raise ValueError(
+            f"Expected JSON root to be a dictionary, but got {type(data)} "
+            f"in file {file_path}"
+        )


I think we could do away with these since later we run the validators when loading the data...

Would you mind having a look and checking if we can remove this?

Thanks for the detailed feedback! I've refactored _detect_format to be more minimal and rely on the downstream validators.

sfmig · 2025-05-13T09:49:27Z

ethology/annotations/io/load_bboxes.py

+
+    # --- Format Detection Logic ---
+    determined_format: Literal["VIA", "COCO"]
+    if format == "auto":


In the multiple file case: could we instead infer the format from every file, and throw an error if there is no consensus between them?

added a new helper _determine_format_from_paths that calls _detect_format for each file in a list and raises a ValueError if inconsistent

sfmig · 2025-05-13T09:51:21Z

ethology/annotations/io/load_bboxes.py

    file_paths: Path | str | list[Path | str],
-    format: Literal["VIA", "COCO"],
-    images_dirs: Path | str | list[Path | str] | None = None,
+    format: Literal["VIA", "COCO", "auto"] = "auto",  # Changed default and


Feel free to remove the comments that highlight the changes: it adds a bit of clutter and the changes are clear to the reviewer in the Github interface

sfmig · 2025-05-13T09:51:43Z

ethology/annotations/io/load_bboxes.py


 def _from_multiple_files(
-    list_filepaths: list[Path | str], format: Literal["VIA", "COCO"]
+    list_filepaths: list[Path], format: Literal["VIA", "COCO"]


The list_filepaths type annotation should be list[Path | str], right?

sfmig · 2025-05-13T09:53:44Z

tests/test_unit/test_annotations/test_load_bboxes.py

+
+
+@pytest.fixture
+def invalid_json_file(tmp_path: Path) -> Path:


Would you mind having a look at the existing fixtures, and re-using them whenever possible (rather than creating new ones)?

This reverts commit cff18ad.

Feat: Add auto-detection for annotation file format

aacdbe8

rajatkriplani force-pushed the feature/auto-detect-format branch from 47105b4 to aacdbe8 Compare April 3, 2025 14:17

improved test coverage for auto-detection

c7c9597

sfmig self-requested a review April 8, 2025 12:33

rajatkriplani and others added 3 commits April 8, 2025 20:22

Merge branch 'main' into feature/auto-detect-format

d7db9cd

improved test

1e31c41

Merge branch 'feature/auto-detect-format' of https://github.com/rajat…

793a6dc

…kriplani/ethology into feature/auto-detect-format

Merge branch 'main' into feature/auto-detect-format

9b3fc0d

sfmig requested changes May 13, 2025

View reviewed changes

Nirmal2727 and others added 5 commits June 10, 2025 23:54

avoid redundancy in file checkes

cff18ad

Revert "avoid redundancy in file checkes"

f06dd6f

This reverts commit cff18ad.

reduced file checks

5188712

Remove Comments

dedaed7

Merge branch 'main' into feature/auto-detect-format

a8eaf5d

sfmig mentioned this pull request Sep 11, 2025

Annotations utils #18

Closed

7 tasks



		@pytest.fixture
		def invalid_json_file(tmp_path: Path) -> Path:

Feature/auto detect format #77

Are you sure you want to change the base?

Feature/auto detect format #77

Uh oh!

Conversation

rajatkriplani commented Apr 3, 2025

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

Uh oh!

codecov bot commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sfmig commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rajatkriplani commented Apr 3, 2025

Uh oh!

sfmig commented Apr 8, 2025

Uh oh!

sfmig commented Apr 8, 2025

Uh oh!

rajatkriplani commented Apr 8, 2025

Uh oh!

rajatkriplani commented Apr 10, 2025

Uh oh!

rajatkriplani commented Apr 20, 2025

Uh oh!

sfmig left a comment

Choose a reason for hiding this comment

Uh oh!

sfmig May 13, 2025

Choose a reason for hiding this comment

Uh oh!

rajatkriplani Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfmig May 13, 2025

Choose a reason for hiding this comment

Uh oh!

rajatkriplani Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

sfmig May 13, 2025

Choose a reason for hiding this comment

Uh oh!

rajatkriplani Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

sfmig May 13, 2025

Choose a reason for hiding this comment

Uh oh!

sfmig May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Apr 3, 2025 •

edited

Loading

sfmig commented Apr 3, 2025 •

edited

Loading

rajatkriplani Jun 10, 2025 •

edited

Loading