-
Notifications
You must be signed in to change notification settings - Fork 4
Fix incorrect session grouping #904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for antenna-preview canceled.
|
ami/main/models.py
Outdated
# Get only newly added images (images without an event) | ||
image_qs = image_qs.filter(event__isnull=True) | ||
|
||
images = list(image_qs.order_by("timestamp")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You likely don't have to evaluate the queryset yet with list(image_qs)
. You can check if images are found with images_qs.exists()
, which is efficient for large datasets.
ami/main/models.py
Outdated
event = None | ||
if use_existing: | ||
# Look for overlap or proximity | ||
for existing_event in existing_events: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are looping over a queryset, you can do for existing_event in events_qs
, which supposedly avoids loading the whole queryset result into memory. Sometimes you need to convert to a list so you can index the list like events[3], but often you never need to convert to a list.
@@ -416,3 +461,5 @@ def create_local_admin_user(): | |||
email = os.environ.get("DJANGO_SUPERUSER_EMAIL", "Unknown") | |||
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown") | |||
logger.info(f"Test user credentials: {email} / {password}") | |||
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this intentional?
@@ -162,6 +162,51 @@ def create_captures( | |||
return created | |||
|
|||
|
|||
def create_captures_in_range( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks helpful, thanks!
ami/main/models.py
Outdated
|
||
if event: | ||
if use_existing: | ||
# Adjust times if necessary (merge) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check necessary? I think you just checked if and existing event has the exact start & end time. Perhaps you meant to do an OR query? If an existing event has either the same start or end time as the group.
If there is an existing event with exactly the same start AND end time (for same deployment), then I don't think we should check for use_existing
. Just re-use those without question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes incorrect session grouping by refactoring the group_images_into_events
function to support better handling of existing events during deployment synchronization. The key improvement is the addition of a use_existing
flag that controls whether new images are merged into existing events or all images are re-grouped from scratch.
- Refactored image grouping logic to support merging new images into existing events based on time proximity and overlap
- Removed the
group_by
field from the Event model in favor of a unique constraint on deployment, start, and end times - Added comprehensive test coverage for various grouping scenarios including overlapping groups, proximity-based merging, and exact overlap handling
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
ami/tests/fixtures/main.py | Added create_captures_in_range helper function for test data generation |
ami/main/tests.py | Added comprehensive test cases for the new grouping logic scenarios |
ami/main/models.py | Refactored group_images_into_events function and removed group_by field from Event model |
ami/main/migrations/0061_remove_event_unique_event_and_more.py | Database migration to remove group_by field and update constraints |
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown") | ||
logger.info(f"Test user credentials: {email} / {password}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line appears to be duplicated from line 462. The duplicate assignment and logging should be removed.
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown") | |
logger.info(f"Test user credentials: {email} / {password}") |
Copilot uses AI. Check for mistakes.
@@ -416,3 +461,5 @@ def create_local_admin_user(): | |||
email = os.environ.get("DJANGO_SUPERUSER_EMAIL", "Unknown") | |||
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown") | |||
logger.info(f"Test user credentials: {email} / {password}") | |||
password = os.environ.get("DJANGO_SUPERUSER_PASSWORD", "Unknown") | |||
logger.info(f"Test user credentials: {email} / {password}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logging statement is duplicated from line 463. The duplicate should be removed.
logger.info(f"Test user credentials: {email} / {password}") |
Copilot uses AI. Check for mistakes.
# event based on overlapping or proximity if use_existing is True. | ||
# Otherwise if use is_existing is False, we look for an existing event | ||
# with the exact same start and end times and reuse it, | ||
# if not found create a new event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing word in comment. Should be 'if none is found, create a new event.' or 'if not found, create a new event.'
# if not found create a new event. | |
# if not found, create a new event. |
Copilot uses AI. Check for mistakes.
ami/main/models.py
Outdated
for group in timestamp_groups: | ||
if not len(group): | ||
continue | ||
|
||
start_date = group[0] | ||
end_date = group[-1] | ||
|
||
# Print debugging info about groups | ||
delta = end_date - start_date | ||
hours = round(delta.seconds / 60 / 60, 1) | ||
logger.debug( | ||
f"Found session starting at {start_date} with {len(group)} images that ran for {hours} hours.\n" | ||
f"From {start_date.strftime('%c')} to {end_date.strftime('%c')}." | ||
) | ||
group_start, group_end = group[0], group[-1] | ||
group_set = set(group) | ||
group_image_ids = [img.pk for img in images if img.timestamp in group_set] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list comprehension creates a new list for each group and checks membership in group_set for every image. Consider creating a mapping of timestamps to image IDs beforehand to improve performance when processing multiple groups.
for group in timestamp_groups: | |
if not len(group): | |
continue | |
start_date = group[0] | |
end_date = group[-1] | |
# Print debugging info about groups | |
delta = end_date - start_date | |
hours = round(delta.seconds / 60 / 60, 1) | |
logger.debug( | |
f"Found session starting at {start_date} with {len(group)} images that ran for {hours} hours.\n" | |
f"From {start_date.strftime('%c')} to {end_date.strftime('%c')}." | |
) | |
group_start, group_end = group[0], group[-1] | |
group_set = set(group) | |
group_image_ids = [img.pk for img in images if img.timestamp in group_set] | |
# Precompute a mapping of timestamps to image IDs | |
timestamp_to_image_ids = collections.defaultdict(list) | |
for img in images: | |
timestamp_to_image_ids[img.timestamp].append(img.pk) | |
for group in timestamp_groups: | |
group_start, group_end = group[0], group[-1] | |
group_image_ids = [] | |
for timestamp in group: | |
group_image_ids.extend(timestamp_to_image_ids.get(timestamp, [])) |
Copilot uses AI. Check for mistakes.
if use_existing: | ||
# Look for overlap or proximity | ||
for existing_event in existing_events: | ||
existing_event.refresh_from_db(fields=["start", "end"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database refresh is called for every existing event in every group iteration. Consider fetching fresh event data once before the loop or only refresh when actually needed after a merge.
Copilot uses AI. Check for mistakes.
This is looking good!! One of our oldest issues :) Will you also add a function for fixing existing events? It can be just a Django admin function in the Session list view (allow selecting multiple sessions). We need something to fix the sessions like this: https://antenna.insectai.org/projects/18/sessions/2579 One approach could be to set the images in the selected sessions so that I still like the idea of a function that can scan all sessions in the deployment (or project) and "detect" if there are images that shouldn't be there (based on the gap setting). Then we can alert the user that it needs to be regrouped. Will you make a follow-up ticket for making the max gap setting a Project setting? for #893 |
I pushed a change with my suggested action for removing source images from existing events. I also noticed that Occurrences have a cached field that keeps track of the event as well, so this needs to be updated in our grouping methods. There are other ways to keep occurrences in sync, but they will likely be per-occurrence or per-image update, whereas this can update more at once. |
…ryset evaluations
…olnickLab/antenna into fix/group-images-into-sessions
…b/antenna into fix/group-images-into-sessions
…olnickLab/antenna into fix/group-images-into-sessions
queryset.dissociate_related_objects() | ||
self.message_user(request, f"Dissociated {queryset.count()} events from captures and occurrences.") | ||
|
||
@admin.action(description="Fix sessions by regrouping images") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having a dedicated fix_sessions
action. However you can use the function above (queryset.dissociate_related_objects()
) to remove images and occurrences from the Event. That's what I designed it for. Something like:
queryset.dissociate_related_objects()
for deployment in deployments:
group_images_into_events(deployment)
I think use_existing=True
works in this case
Summary
This PR fixes the issue of incorrectly grouped sessions (events) by refactoring and improving the image grouping logic when syncing a deployment.
List of Changes
Refactored
group_images_into_events
to support a newuse_existing
flag:If
use_existing=True
:If
use_existing=False
:Removed the
group_by
field from theEvent
model.Added two admin actions:
Related Issues
Closes #237
Detailed Description
Previously, session grouping relied on the
group_by
field, which reused an existing event if a group had the same start date. This caused issues when images taken on the same day—but far apart in time—were incorrectly grouped into a single session, even though timestamp-based grouping split them into multiple groups. Since all groups shared the same start date, they got assigned to the same event due togroup_by
.This PR fixes this issue by improving the
group_images_into_events
function. It introduces ause_existing
flag to control the behavior: whenuse_existing=False
, all deployment images are regrouped; whenTrue
, only new images (those not yet assigned to an event) are processed. Images are grouped based on their timestamps using amax_time_gap
threshold, and then each group is either merged into an existing event (if overlapping or close enough anduse_existing=True
) or assigned to a new event (or an existing one if its start and end time exactly match the group). Thegroup_by
field is removed from theEvent
model, and an admin action is added to help fix incorrectly grouped sessions. Additionally, cached fields in event related models (e.g.Occurrence
) are updated accordingly.Screenshots
N/A
Deployment Notes
N/A
Checklist