Skip to content

Fix incorrect session grouping #904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

mohamedelabbas1996
Copy link
Contributor

@mohamedelabbas1996 mohamedelabbas1996 commented Jul 22, 2025

Summary

This PR fixes the issue of incorrectly grouped sessions (events) by refactoring and improving the image grouping logic when syncing a deployment.

List of Changes

  • Refactored group_images_into_events to support a new use_existing flag:

    • If use_existing=True:

      • Only new images (images not assigned to an event) are grouped.
      • Groups are merged into existing events based on time proximity and overlap.
    • If use_existing=False:

      • All images in the deployment are re-grouped.
      • If a new group exactly matches an existing event (same start and end time), it reuses the event.
      • Otherwise, new events are created.
  • Removed the group_by field from the Event model.

  • Added two admin actions:

    • Fix Sessions: allows to fix incorrectly grouped sessions.
    • Remove from Event: allows admins to manually remove selected source images from selected events.

Related Issues

Closes #237

Detailed Description

Previously, session grouping relied on the group_by field, which reused an existing event if a group had the same start date. This caused issues when images taken on the same day—but far apart in time—were incorrectly grouped into a single session, even though timestamp-based grouping split them into multiple groups. Since all groups shared the same start date, they got assigned to the same event due to group_by.
This PR fixes this issue by improving the group_images_into_events function. It introduces a use_existing flag to control the behavior: when use_existing=False, all deployment images are regrouped; when True, only new images (those not yet assigned to an event) are processed. Images are grouped based on their timestamps using a max_time_gap threshold, and then each group is either merged into an existing event (if overlapping or close enough and use_existing=True) or assigned to a new event (or an existing one if its start and end time exactly match the group). The group_by field is removed from the Event model, and an admin action is added to help fix incorrectly grouped sessions. Additionally, cached fields in event related models (e.g. Occurrence) are updated accordingly.

Screenshots

N/A

Deployment Notes

N/A

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

Copy link

netlify bot commented Jul 22, 2025

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 21e39eb
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6883b069de915d0008ae9f32

@mohamedelabbas1996 mohamedelabbas1996 self-assigned this Jul 22, 2025
# Get only newly added images (images without an event)
image_qs = image_qs.filter(event__isnull=True)

images = list(image_qs.order_by("timestamp"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You likely don't have to evaluate the queryset yet with list(image_qs). You can check if images are found with images_qs.exists(), which is efficient for large datasets.

event = None
if use_existing:
# Look for overlap or proximity
for existing_event in existing_events:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are looping over a queryset, you can do for existing_event in events_qs, which supposedly avoids loading the whole queryset result into memory. Sometimes you need to convert to a list so you can index the list like events[3], but often you never need to convert to a list.

@@ -416,3 +461,5 @@ def create_local_admin_user():
email = os.environ.get("DJANGO_SUPERUSER_EMAIL", "Unknown")
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
logger.info(f"Test user credentials: {email} / {password}")
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intentional?

@@ -162,6 +162,51 @@ def create_captures(
return created


def create_captures_in_range(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks helpful, thanks!


if event:
if use_existing:
# Adjust times if necessary (merge)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? I think you just checked if and existing event has the exact start & end time. Perhaps you meant to do an OR query? If an existing event has either the same start or end time as the group.

If there is an existing event with exactly the same start AND end time (for same deployment), then I don't think we should check for use_existing. Just re-use those without question.

@mihow mihow requested a review from Copilot July 24, 2025 00:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes incorrect session grouping by refactoring the group_images_into_events function to support better handling of existing events during deployment synchronization. The key improvement is the addition of a use_existing flag that controls whether new images are merged into existing events or all images are re-grouped from scratch.

  • Refactored image grouping logic to support merging new images into existing events based on time proximity and overlap
  • Removed the group_by field from the Event model in favor of a unique constraint on deployment, start, and end times
  • Added comprehensive test coverage for various grouping scenarios including overlapping groups, proximity-based merging, and exact overlap handling

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
ami/tests/fixtures/main.py Added create_captures_in_range helper function for test data generation
ami/main/tests.py Added comprehensive test cases for the new grouping logic scenarios
ami/main/models.py Refactored group_images_into_events function and removed group_by field from Event model
ami/main/migrations/0061_remove_event_unique_event_and_more.py Database migration to remove group_by field and update constraints

Comment on lines +464 to +465
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
logger.info(f"Test user credentials: {email} / {password}")
Copy link
Preview

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line appears to be duplicated from line 462. The duplicate assignment and logging should be removed.

Suggested change
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
logger.info(f"Test user credentials: {email} / {password}")

Copilot uses AI. Check for mistakes.

@@ -416,3 +461,5 @@ def create_local_admin_user():
email = os.environ.get("DJANGO_SUPERUSER_EMAIL", "Unknown")
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
logger.info(f"Test user credentials: {email} / {password}")
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown")
logger.info(f"Test user credentials: {email} / {password}")
Copy link
Preview

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logging statement is duplicated from line 463. The duplicate should be removed.

Suggested change
logger.info(f"Test user credentials: {email} / {password}")

Copilot uses AI. Check for mistakes.

# event based on overlapping or proximity if use_existing is True.
# Otherwise if use is_existing is False, we look for an existing event
# with the exact same start and end times and reuse it,
# if not found create a new event.
Copy link
Preview

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing word in comment. Should be 'if none is found, create a new event.' or 'if not found, create a new event.'

Suggested change
# if not found create a new event.
# if not found, create a new event.

Copilot uses AI. Check for mistakes.

Comment on lines 1029 to 1032
for group in timestamp_groups:
if not len(group):
continue

start_date = group[0]
end_date = group[-1]

# Print debugging info about groups
delta = end_date - start_date
hours = round(delta.seconds / 60 / 60, 1)
logger.debug(
f"Found session starting at {start_date} with {len(group)} images that ran for {hours} hours.\n"
f"From {start_date.strftime('%c')} to {end_date.strftime('%c')}."
)
group_start, group_end = group[0], group[-1]
group_set = set(group)
group_image_ids = [img.pk for img in images if img.timestamp in group_set]
Copy link
Preview

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list comprehension creates a new list for each group and checks membership in group_set for every image. Consider creating a mapping of timestamps to image IDs beforehand to improve performance when processing multiple groups.

Suggested change
for group in timestamp_groups:
if not len(group):
continue
start_date = group[0]
end_date = group[-1]
# Print debugging info about groups
delta = end_date - start_date
hours = round(delta.seconds / 60 / 60, 1)
logger.debug(
f"Found session starting at {start_date} with {len(group)} images that ran for {hours} hours.\n"
f"From {start_date.strftime('%c')} to {end_date.strftime('%c')}."
)
group_start, group_end = group[0], group[-1]
group_set = set(group)
group_image_ids = [img.pk for img in images if img.timestamp in group_set]
# Precompute a mapping of timestamps to image IDs
timestamp_to_image_ids = collections.defaultdict(list)
for img in images:
timestamp_to_image_ids[img.timestamp].append(img.pk)
for group in timestamp_groups:
group_start, group_end = group[0], group[-1]
group_image_ids = []
for timestamp in group:
group_image_ids.extend(timestamp_to_image_ids.get(timestamp, []))

Copilot uses AI. Check for mistakes.

if use_existing:
# Look for overlap or proximity
for existing_event in existing_events:
existing_event.refresh_from_db(fields=["start", "end"])
Copy link
Preview

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database refresh is called for every existing event in every group iteration. Consider fetching fresh event data once before the loop or only refresh when actually needed after a merge.

Copilot uses AI. Check for mistakes.

@mihow
Copy link
Collaborator

mihow commented Jul 24, 2025

This is looking good!! One of our oldest issues :)

Will you also add a function for fixing existing events? It can be just a Django admin function in the Session list view (allow selecting multiple sessions). We need something to fix the sessions like this:

https://antenna.insectai.org/projects/18/sessions/2579
https://antenna.insectai.org/projects/18/sessions/5284

One approach could be to set the images in the selected sessions so that event=None, e.g. "Remove images from session". Then run the normal group_images_into_events function for the whole deployment. But open to other ideas.

I still like the idea of a function that can scan all sessions in the deployment (or project) and "detect" if there are images that shouldn't be there (based on the gap setting). Then we can alert the user that it needs to be regrouped.

Will you make a follow-up ticket for making the max gap setting a Project setting? for #893

@mihow
Copy link
Collaborator

mihow commented Jul 24, 2025

I tested on the sessions in the Mothra deployment in project 18 and was able to repair the existing sessions:

before:
image

after:
image
image

you can see it broke the two long sessions into smaller chunks

@mihow
Copy link
Collaborator

mihow commented Jul 24, 2025

I pushed a change with my suggested action for removing source images from existing events. I also noticed that Occurrences have a cached field that keeps track of the event as well, so this needs to be updated in our grouping methods. There are other ways to keep occurrences in sync, but they will likely be per-occurrence or per-image update, whereas this can update more at once.

@mohamedelabbas1996 mohamedelabbas1996 changed the title [Draft] Fix incorrect session grouping Fix incorrect session grouping Jul 25, 2025
@mohamedelabbas1996 mohamedelabbas1996 marked this pull request as ready for review July 25, 2025 09:11
queryset.dissociate_related_objects()
self.message_user(request, f"Dissociated {queryset.count()} events from captures and occurrences.")

@admin.action(description="Fix sessions by regrouping images")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having a dedicated fix_sessions action. However you can use the function above (queryset.dissociate_related_objects()) to remove images and occurrences from the Event. That's what I designed it for. Something like:

queryset.dissociate_related_objects()
for deployment in deployments:
    group_images_into_events(deployment)

I think use_existing=True works in this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some sessions are incorrectly grouped
2 participants