-
-
Notifications
You must be signed in to change notification settings - Fork 227
Create new user_domain_memberships field in ES user index #36397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
As it is, this works fine but the filtering function is a bit inefficient since domain_memberships isn’t a nested field, so it’ll search across all domain_memberships a user might have, instead of just the current one. Reindexing to turn it into a nested type I feel like more trouble than its worth vs my current solution, which is to check in the returned list of users whether any have multiple domain_memberships and do another check specifically for the current domain’s domain_membership if so, but I was wondering if anyone else had more thoughts on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes make sense. I'm not sure about the performance implications that you mention in your comment, but I also think it's fine as is.
corehq/apps/es/users.py
Outdated
is_active, | ||
is_active_in_domain, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also try to combine these two filters into one, so it works regardless of user type. The location
and role_id
filters do similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm getting a bit confused with how these filters typically stack. If active
and is_active_in_domain
were combined similarly to how location
is handled, on its own, that'll return both mobile workers and web users that are active/deactivated, right? So is it expected that the full query should always look like UserES().web_users().active(True)
/ UserES().mobile_users.active(True)
to limit it to the doc_types we're trying to query for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think so - otherwise .active()
implicitly filters to mobile workers, and is_active_in_domain
likewise filters to web users.
In my opinion, UserES().active(True)
should return all active users, of both types. If you want to limit to a particular user type, then you can always add a separate filter for that.
…h will eventually replace domain_membership and domain_memberships as one combined field Also adds function to populate new combined field
…mbership/domain_memberships changes to user_domain_memberships
@AmitPhulera soliciting your feedback here since I’m unsure about this approach. I assumed we don't actually want a unified domain_membership couch doc (mostly since I don't know how That migration would work, but assumed it would be a huge hassle) and opted to just copy both |
corehq/apps/es/users.py
Outdated
if not domain: | ||
return filters.term("is_active", active) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall, how does WebUser.is_active
fit in here?
corehq/apps/es/users.py
Outdated
) | ||
|
||
|
||
def is_active(active=True): | ||
return filters.term("is_active", active) | ||
def is_active(active=True, domain=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to use the is_active
filter without passing in a domain? I worry that making optional makes it too easy to forget to support web users
domain_membership_filter | ||
) | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviving this discussion - I was imagining this filter being something that works for all users, like:
UserES().is_active()
returns all active usersUserES().is_active().mobile_users()
returns all active mobile workersUserES().is_active().web_users()
returns all active mobile workers
Instead, it behaves like:
UserES().is_active()
returns all active mobile workersUserES().is_active(domain=domain)
returns all active web usersUserES().OR(is_active(), is_active(domain=domain))
returns both
This behavior seems unintuitive to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I've been trying to think of a better solution to this for a while too. I'm going to address your other comments about the need for the domain arg and the how WebUser.is_active fits into all of this here too.
So because of how web user domain active status works, we must have a domain arg to filter for domain-scoped deactivated/active web users (the whole reason we wanted to create this new nested field). This unfortunately does mean that not passing in a domain implies we only really care about getting mobile workers, and to a lesser extent web users (for their SSO active status), but technically that is included here. I just pushed a slight refactor so now the behavior looks like:
UserES().is_active()
returns all user.is_active=True users. This will usually be filtered for mobile users (all usage of this filter previously already did)UserES().is_active(domain=domain)
returns all user.is_active=True and user.domain_membership/s.is_active=True users on the domain. This will usually be filtered for web users (which I'll do in the other PR).
For active=True filters, it must return users that have their user.is_active status and user.domain_membership/s.is_active equal to True because if either is False, it's deactivated in one way or another (by an IDP or by the domain). This is a bit confusing to put down in words honestly, I'd be up to chat briefly if this still looks concerning/confusing in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately does mean that not passing in a domain implies we only really care about getting mobile workers
This is why I think we should not allow callers to exclude domain
. If it's always required, then there's no ambiguity. It doesn't look like there are too many references to update. You can define the function like this to make domain a required keyword-only argument that comes after an optional positional argument:
def is_active(active=True, *, domain):
Or of course you could make domain the first arg:
def is_active(domain, active=True):
Otherwise I just think it's too easy for someone without the context to accidentally make a change that breaks a feature for projects like ours that use web users for data entry. It is suprising that adding .is_active()
to a query excludes all web users - I wouldn't expect people to anticipate that or notice it in code review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
domain is now a required parameter in is_active but the commit I pushed has a bunch of comments (for Martin) about places where I'm unsure what is being returned is appropriate for the context. I think there's going to have to be an additional exclude-or-include domain deactivated users parameter to either include domain deactivated when active is True (users that are domain deactivated but still active as users) or exclude them when active is False (same type of users), since we're still following the philosophy that a domain deactivated user is still a valid user in the domain. I can't tell without knowing exactly what the returned users will be used for unfortunately so i left a few guesses. @MartinRiese
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good to me, but going to hold off approval until Ethan has a chance to respond about the filtering.
return memberships | ||
|
||
|
||
def populate_user_domain_memberships(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just going to be ran once in a private release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
twice - before deploy, to get the bulk of the data populated and after deploy, to fill in gaps (user changes/new users) that might've occurred between the first run and the code is deployed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll also need to be included in a Django migration to account for third party environments where we can't run this code manually. That migration needs to happen in a subsequent deploy if we're to worry about gaps. However we'd probably want to add @once_off_migration
or similar to this code now so that the later migration noops on prod.
An alternative approach is to move the query changes to a subsequent deploy and follow the approach described here: https://commcare-hq.readthedocs.io/migrations_in_practice.html#multiple-deploys
(I think I already linked to that doc, but I can't find where I did, so apologies if I'm repeating myself)
filters.AND( | ||
filters.term('user_domain_memberships.domain.exact', domain), | ||
domain_membership_filter | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure if this is true but instead of doing and_or
would this be equivalent?
filter = filters.AND(
user_active_filter,
queries.nested(
'user_domain_memberships',
filters.AND(
filters.term('user_domain_memberships.domain.exact', domain),
filters.NOT(filters.term("user_domain_memberships.is_active", False))
)
if active:
return filter
else:
return filters.NOT(filter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not a 100% sure in my reading of this modified logic but doing a filters.NOT on an AND I think means it'll only filter for things where both user_active_filter and the nested filter don't match, which is not the intention. If I'm filtering for "inactive" users I want to return users that either have their is_active as False or their domain_membership.is_active as False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, that's faintly how i remember it working too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Jonathan's read is correct - it should be a complete disjunction between the two clauses. I also find the and_or
logic to be hard to think about. You could also do it with a recursive call if you like the way that reads:
def is_active(active=True, *, domain):
if not active:
return filters.NOT(is_active(domain=domain))
return filters.AND(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha this made it click in my branch - changed to follow Jonathan's original suggestion.
Technical Summary
Jira Ticket
The ES related changes/migration to enable filtering for the current domain active status.
Adds the
user_domain_memberships
field to act as unified field for both domain_membership and domain_memberships. This new field is exactly the same as the 2 others but adds the newis_active
property in DomainMembership and is of typenested
, to make filtering for specific DomainMemberships possible.Additionally updates all filtering functions to use this new field instead of either domain_membership or domain_memberships.
The migration and the initial population command will be run on a private release after approval and before merging to avoid conflicts during the deploy process.
Safety Assurance
Safety story
Tested locally (with supporting changes that'll be found in another PR) to deactivate/reactivate and show deactivated/active users. Will test on staging after initial approval since these migrations are difficult to reverse.
Migrations
Rollback instructions
I'm not sure how the migration rollback will work but reverting this would require remaking the user index.
Labels & Review