Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized add_all and upsert_all operations in ordered_map module #16084

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PersonaNormale
Copy link

Performance Optimization for OrderedMap Bulk Operations

Problem

The current implementation of add_all and upsert_all in ordered_map performs individual binary searches for each element in an unsorted array.

Solution

This PR replaces vector::zip approach with a quick sort-based algorithm that:

  • Pre-sorts key-value pairs together in a single pass
  • Tracks insertion positions to minimize redundant binary searches
  • Properly handles error cases for mismatched arrays

Performance Improvement

Before:

  • Vector Element Insertion = O(entries)
  • Find Index = O(log(entries))
    And we were forced to do this for vec_elements times.
  • Leading to: O(vec_elements * log(entries) * entries)

After:
Now, that we have modified the binary search for reusing last index because we have warranty of an ordered vector:

  • Sorting = O(vec_elements log(vec_elements))
  • Insertion = O(entries * log(entries - last_index))
    That can lead to an O(vec_elements log(vec_elements)) if we have a lot of elements
    Or O(entries * log(entries - last_index_found)) if entries is greater or vec_elements are already sorted/psuedo-sorted.
  • In total = O(vec_elements log(vec_elements) + entries * log(entries - last_index_found))

Implementation Details

  • Added quick_sort_with_companion_array implementation using Hoare's partition algorithm
  • Added is_ordered helper function to detect already-sorted arrays, worst case for quicksort
  • Added error constant EARRAYS_DIFFERENT_LENGTH (error code 5)

Testing

Added test coverage for:

  • Edge cases (empty arrays, single elements, already sorted data)
  • Error handling for mismatched arrays and invalid inputs
  • Various input data patterns and sizes
  • Validation of sorting correctness

All existing tests continue to pass with no regressions.

Copy link

trunk-io bot commented Mar 8, 2025

⏱️ 6s total CI duration on this PR

Job Cumulative Duration Recent Runs
permission-check 3s 🟥
permission-check 3s 🟥

settingsfeedbackdocs ⋅ learn more about trunk.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant