-
-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enterprise Form Submissions Iterators #35295
base: master
Are you sure you want to change the base?
Conversation
|
||
# if a limit exists, increase it by 1 to allow us to check whether additional items remain at the end | ||
padded_limit = limit + 1 if limit else None | ||
self.original_it = iter(sequence_factory_fn(padded_limit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pass padded_limit
to sequence_factory_fn
? Based on the test cases, sequence_factory_fn
ignores the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh we find it is being used in create_multi_domain_form_generator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're asking why we need a limit, the issue is that Tastypie needs some way to communicate the 'limit' it receives with the underlying limit used by Elasticsearch. If we can't communicate this to elasticsearch, it leads to situations like the API asking for 100 records and Elasticsearch fetching 5000, leading to 4900 unused records. Or the reverse, where the API asks for 5000 records,and Elasticsearch fetches them 100 at a time, leading to 50 calls to Elasticsearch rather than 1.
Ideally, we'd just pass the limit all the way down, but we don't have control over how the Paginator is instantiated or how it is called -- we can tell Tastypie to use a certain Paginator class, but it controls the instatiation and the call. What we can control is the 'objects' object that gets passed to the paginator. That object can't know about the limit yet, because the paginator is responsible for setting that limit. So the object needs a way to receive the limit prior to retrieving the results.
The important part here is that the iterator needs to be able to receive a limit parameter, even if it does nothing with it. In the case of the tests, they ignore that limit, but if the underlying sequence was an elasticsearch query, it would need access to that limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't go through all the commits due to the time
|
||
# if a limit exists, increase it by 1 to allow us to check whether additional items remain at the end | ||
padded_limit = limit + 1 if limit else None | ||
self.original_it = iter(sequence_factory_fn(padded_limit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh we find it is being used in create_multi_domain_form_generator
if self.limit and not self.is_complete: | ||
# the end of the limited sequence was reached, check if items beyond the limit remain | ||
try: | ||
next(self.original_it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [3]: initial_it = iter(range(10))
...: it = ResumableIteratorWrapper(lambda _: initial_it,limit =2)
In [4]: next(it)
Out[4]: 0
In [5]: next(it)
Out[5]: 1
In [6]: next(it)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
Cell In[6], line 1
----> 1 next(it)
File ~/commcare-hq/corehq/apps/api/resumable_iterator_wrapper.py:27, in ResumableIteratorWrapper.__next__(self)
24 self.iteration_started = True
26 try:
---> 27 self.prev_element = next(self.it)
28 except StopIteration:
29 if self.limit and not self.is_complete:
30 # the end of the limited sequence was reached, check if items beyond the limit remain
StopIteration:
In [7]: next(initial_it)
Out[7]: 3
Calling next
on original_it
will result in us skipping an element, do you have later commit to overcome it?
objects = SequenceWrapper(range(5), lambda ele: {'next': ele}) | ||
paginator = KeysetPaginator(request_data, objects, resource_uri='http://test.com/') | ||
page = paginator.page() | ||
self.assertEqual(page['meta']['next'], 'http://test.com/?limit=3&next=2') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused why the next uri have limit=3
if we will delete limit key in request_data
class KeysetPaginator(Paginator): | ||
''' | ||
An alternate paginator meant to support paginating by keyset rather than by index/offset. | ||
`objects` is expected to represent a query object that exposes an `.execute(limit)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not having a base QueryObject
class, and having a function execute
that will raise NotImplemented
error. Then have a docstring to explain what do you expect from the execute
function or even give an example. Anyone who wants to use this KeysetPaginator
should pass a object inherits from the base QueryObject
|
||
num_fetched = len(results.hits) | ||
|
||
if num_fetched >= results.total or (remaining and num_fetched >= remaining): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is num_fetched
different from results.total
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"this is how elasticsearch works" -- Matt cc @gherceg
corehq/apps/enterprise/iterators.py
Outdated
limit=limit | ||
) | ||
|
||
xform_converter = RawFormConverter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialize this in __init__
and make it something that can be passed into this query and can be swapped out for something else depending on use case
corehq/apps/enterprise/iterators.py
Outdated
return start_date, end_date | ||
|
||
|
||
class RawFormConverter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to something like EnterpriseFormReportConverter
or something better? more specific name re: usecase
|
||
def _create_enterprise_account_covering_domains(self, domains): | ||
billing_account = generator.billing_account('[email protected]', '[email protected]') | ||
billing_account.enterprise_admin_emails = ['[email protected]'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a customer billing account because it is mapped to multiple domains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in bff5fac
def _create_enterprise_admin(self, email, domain): | ||
user = WebUser.create( | ||
domain, email, 'test123', None, None, email) | ||
user.is_superuser = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be avoided in tests. please add to enterprise_admin_emails
in related BillingAccount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved as part of bff5fac
role = Role.objects.create(slug="test_role") | ||
UserRole.objects.create(user=user.get_django_user(), role=role) | ||
accounting_admin_role = Role.objects.get_or_create( | ||
name="Accounting Admin", | ||
slug=privileges.ACCOUNTING_ADMIN, | ||
)[0] | ||
Grant.objects.create(from_role=role, to_role=accounting_admin_role) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't needed when the user is actually an enterprise admin
and not a superuser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as part of bff5fac
encoded_auth = base64.b64encode(auth_string.encode()).decode() | ||
request = factory.get( | ||
'/', | ||
{'startdate': '2004-10-10', 'enddate': '2004-11-10'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will push to a different level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included as part of bff5fac
|
||
@es_test(requires=[form_adapter]) | ||
class FormSubmissionResourceTests(TestCase): | ||
def test_happy_path(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to test_resource_is_accessible
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly additional test to ensure permissions restrict users that need to be restricted?
corehq/apps/enterprise/iterators.py
Outdated
return self.domain_lookup_tables[domain].get(app_id, None) | ||
|
||
|
||
def loop_over_domains(domains, query_factory, limit=None, last_domain=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call this run_query_over_domains
? and maybe switch the order of query_factory
and domains
corehq/apps/enterprise/iterators.py
Outdated
current_iterator = _get_domain_iterator(**next_args) | ||
|
||
|
||
def loop_over_domain(domain, query_factory, limit=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, loop_query_over_domain
and switch order of args
the previous progress arguments | ||
- start_date: a date to start the date range. Can be None | ||
- end_date: the inclusive date to finish the date range. Can be None | ||
last_domain, last_time, and last_id are intended to represent the last result from a previous query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does sphinx support back-ticks (`) for denoting variables?
} | ||
|
||
@classmethod | ||
def get_query_paraams(cls, fetched_object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_query_params
instead of paraaaaaaaams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests were passing with this incorrect name. good to verify if tests are covering this?
return start_date, end_date | ||
|
||
|
||
class EnterpriseFormReportConverter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly use ABC
?
@classmethod | ||
def get_kwargs_from_map(cls, map): | ||
''' | ||
Takes a map-like object from a continuation request (generally GET/POST) and extracts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to include a note here about where it's used?
next_query_args = query_factory.get_next_query_args(next_query_args, last_hit) | ||
|
||
|
||
class ReportQueryFactoryInterface: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe good to use ABC
here?
|
||
if limit and has_more: | ||
last_fetched = objects[-1] | ||
next_page_params = self.objects.get_query_params(last_fetched) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if there is a missing test for this? didn't fail for misnamed method name
} | ||
|
||
|
||
class PageableQueryInterface: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still being used?
request_data, | ||
objects, | ||
resource_uri=resource_uri, | ||
limit=limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially rename limit to indicate page size as it is used?
Product Description
Technical Summary
This PR is now ready for review. This PR creates iterators to handle enterprise report requests in a scalable manner. Previously, both our enterprise reports and the enterprise report APIs needed to generate the entire enterprise report in order to deliver any results. With this PR, the form submissions report API has been modified to instead source that information from an iterator that will only fetch data up until a page boundary. If more data than a page boundary is needed, the request will be paginated.
Feature Flag
No feature flag.
Safety Assurance
Safety story
I've done local testing, created multiple test suites, and verified this PR on staging.
Automated test coverage
New test suites created in test_iterators.py and test_apis.py
QA Plan
As there is no user-facing component to this other than the API results, I don't think this needs to be run through QA -- the same process I'm using would be duplicated by QA.
Rollback instructions
Labels & Review