Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update split_one_chevs balancemode v2 #328

Merged
merged 10 commits into from
Jul 10, 2024

Conversation

jauggy
Copy link
Member

@jauggy jauggy commented Jun 12, 2024

Context

I played a game with split_one_chevs on and noticed a few issues. The 1Chevs were not split. This was because Macwhite was recognised by Teiserver as a 2Chev whereas Chobby showed them as a 1Chev. This is likely because Chobby gets a player's rank on login and then never updates it so they get out of sync with TeiServer.
3
2

How to resolve?

To resolve this my algorithm will now group both 1 and 2Chevs into the "Noob" bucket. There's also going to be slight change on how we draft players who are in the "Noob" bucket detailed below.

Other Findings

When putting the replay

https://www.beyondallreason.info/replays?gameId=39096966518c40a66b2130b057864aa5

into https://openskill-test.web.app/ I noticed that the library expects Team 1 to win, despite that most humans would bet on Team 2.

4

Since the library expects Team 1 to win, if my team were to win I would gain a lot of OS i.e. +1. If my team were to lose, I only lose -0.37.

From this we can conclude that if you were to choose from the three "Noobs": CindersFire, Macwhite, Victorious_Dead you probably want to avoid the overrated players, which are likely the ones with highest uncertainty. If an overrated player is on the other team, you stand to win more and lose less, and since they're overrated, you're also more likely to win. Therefore, for those in the "Noobs" category, we probably want to pick those with low uncertainty as they are less likely to be overrated.

split_one_chevs Algorithm v2

Based on these findings, the algorithm will now draft players based on these criteria:

  1. Always pick experienced (3Chev+) players over noobs (1-2Chevs).
  2. Prefer higher OS for experienced players.
  3. Prefer lower uncertainty for noobs.

This draft mimics how a human might draft players with the given visible information in a lobby. It's not super mathematical. Players generally look at chevron level to determine how overrated someone might be. Someone did complain in chat about the lobby balance in the game I played mentioned above. They were obviously eyeballing the chevron levels and assuming those two players were overrated.

Further enchancements

  1. Previously, if the balancer was called it would check the result and if the deviation in ratings between teams was too high, it would rerun the balancer but split all parties into solo players. Now the balance result will have a new field: has_parties?. If this is false we do not need to rerun the balancer again.
  2. Permissions have been reduced for fake users that are not admin. The purpose of this is if you want to test that the Balance tab doesn't appear to normal users, you need to reduce the permissions of fake users.
  3. Anyone with Staff role e.g. Tester, Contributor, etc. can now see the balance tab. The balance tab now has a dropdown allowing you to switch to difference balance modes.
  4. fuzz_multiplier (randomness added to match rating) is now only enabled for Teifion's algos. This makes it easier to debug issues.
  5. If there are no noobs, an alternate balancer (loser_picks) will be called. This is the default algo and it supports parties.

Known Bugs

Teiserver doesn't know the rank shown to the user in Chobby. Chobby gets the rank on login and then never updates. Teiserver, therefore, may classify a user as 2Chev but they might be shown as 1Chev in Chobby.

Unit Tests

Run this to run multiple unit tests that relate to balance

  mix test --only balance_test

Local Dev Tests

See comment here for test steps.

Theoretical Testing on past replays

Go here: https://balance-algo-web.web.app/
And enter a past replay. Change algorithm to Split One Chevs v2

@jauggy jauggy marked this pull request as ready for review June 12, 2024 23:57
Add split-one-chevs-v2
Update moduledoc
This reverts commit 0dd77e3.
@jauggy jauggy force-pushed the jauggy/split-one-chevs-v2 branch 2 times, most recently from 4f05643 to dacd9fd Compare June 15, 2024 11:19
Check for bad balancer
@jauggy jauggy force-pushed the jauggy/split-one-chevs-v2 branch from c09fbef to 95d352e Compare June 15, 2024 13:24
@jauggy
Copy link
Member Author

jauggy commented Jun 15, 2024

Have updated and now you can see dropdown to change the balancer. Also removed the admin pages (now redirects) to be consistent with the previous PR made by Perfi to remove them.

1

@jauggy jauggy force-pushed the jauggy/split-one-chevs-v2 branch from 0dd4639 to d638612 Compare June 15, 2024 20:23
Refactor minor
@jauggy jauggy force-pushed the jauggy/split-one-chevs-v2 branch from d638612 to a2c5976 Compare June 15, 2024 22:42
Minor update
@jauggy jauggy force-pushed the jauggy/split-one-chevs-v2 branch from 0497a07 to 2cd66d9 Compare June 16, 2024 00:11
@jauggy
Copy link
Member Author

jauggy commented Jun 25, 2024

Sample video of testing balancer tab in Integration server:
https://www.youtube.com/watch?v=KUqzvU6GBug

Note that the balancer tab only appears for rated games. Also the chevron level of players is based on current data always (since we don't store history of this).

@jauggy jauggy force-pushed the jauggy/split-one-chevs-v2 branch 2 times, most recently from ed2850c to 21f8613 Compare June 25, 2024 06:30
@jauggy
Copy link
Member Author

jauggy commented Jun 28, 2024

Local Dev Tests

You must rerun the fake data task. This is because if you ran it previously, the fake users will have too high permissions. I modified the fakedata task so fake users will have normal permissions. Also the task will now also add fake playtime data.

mix teiserver.fakedata

Launch the website

mix phx.server

Login to the website using root@localhost
Now go to Admin > Matches > Select a Match > Balance Tab.
You will see the logs for the loser_picks algo.

There is a dropdown with the label "Balance Algorithm" near the top. Change this to split_one_chevs
You will now see the logs for split_one_chevs.

Testing permissions of normal users

  1. Login as root admin and find a user. Copy their email which should be a guid. Relogin as that user using that email and the password is password. Check that they cannot see the balance tab.
  2. Relogin as admin and then give that user a contributor/tester role. Relogin as the user and now they should have access.

Copy link
Collaborator

@L-e-x-o-n L-e-x-o-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teiserver doesn't know the rank shown to the user in Chobby. Chobby gets the rank on login and then never updates. Teiserver, therefore, may classify a user as 2Chev but they might be shown as 1Chev in Chobby.

I thought rank was only calculated on login, when/where else does Teiserver calculate rank?

@@ -96,7 +96,7 @@ defmodule Mix.Tasks.Teiserver.Fakedata do
name: generate_throwaway_name() |> String.replace(" ", ""),
email: UUID.uuid1(),
password: root_user.password,
permissions: ["admin.dev.developer"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in order to test that a user cannot see the balance tab, they must have zero permissions. That admin.dev.developer permission basically gives them admin role.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be good to integrate FakePlaytime into Fakedata to avoid the need for 2 commands?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll investigate to see if I can do it. This is much more about my Elixir skills not being strong enough to safely modify this code.

socket
|> mount_require_any(["Reviewer"])
|> mount_require_any(["Reviewer", "Contributor"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ratings were locked behind Overwatch, balance behind Reviewer, neither available to Contributor which in my opinion would find the data more useful, mainly for debugging purposes, than Overwatch for moderation. This doesn't make much sense to me and I would change it all to Contributor but this is something for Beherith to decide.

end)
end)
|> List.flatten()

past_balance =
BalanceLib.create_balance(groups, match.team_count, mode: :loser_picks)
BalanceLib.create_balance(groups, match.team_count, algorithm: balancer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are balancer and algorithm the same thing in this context? Some places are using one, some the other, it might be nice to always use the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can update my word usage to be more consistent. Will look into it.

end

@spec has_enough_noobs?([BT.expanded_group()]) :: bool()
def has_enough_noobs?(expanded_group) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_enough_noobs only checks if there are noobs, not how many?
Is this intended or not?

Copy link
Member Author

@jauggy jauggy Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some ideas like enough noobs be something like

  • At least a single 1chev or
  • At least two, 2chevs

But decided to keep it simple for now. So the function is more about allowing additional complexity in the future.

A single noob actually gets treated differently compared to Teifion's balancer. Teifion's balancer will pick the noob higher (since it's based on their OS). My algorithm will always pick the noob last irrespective of OS.

@jauggy
Copy link
Member Author

jauggy commented Jun 28, 2024

Teiserver doesn't know the rank shown to the user in Chobby. Chobby gets the rank on login and then never updates. Teiserver, therefore, may classify a user as 2Chev but they might be shown as 1Chev in Chobby.

I thought rank was only calculated on login, when/where else does Teiserver calculate rank?

To be honest this bug really baffled me. I still have no idea why they are out of sync. From my searching it only gets calculated on login.

@L-e-x-o-n
Copy link
Collaborator

Teiserver doesn't know the rank shown to the user in Chobby. Chobby gets the rank on login and then never updates. Teiserver, therefore, may classify a user as 2Chev but they might be shown as 1Chev in Chobby.

I thought rank was only calculated on login, when/where else does Teiserver calculate rank?

To be honest this bug really baffled me. I still have no idea why they are out of sync. From my searching it only gets calculated on login.

That was my understanding as well. I checked again, not sure what we are missing...

@jauggy
Copy link
Member Author

jauggy commented Jun 28, 2024

Currently at least there is an issue for it: #332
So that the bug is at least recorded.

@jauggy
Copy link
Member Author

jauggy commented Jun 28, 2024

@L-e-x-o-n I have updated the PR now with the following changes:

  • fakedata task now also calls task to add playtime stats
  • balancer renamed to be more consistent with other usage

@L-e-x-o-n L-e-x-o-n merged commit 4e28cd9 into beyond-all-reason:master Jul 10, 2024
3 checks passed
# which has an expiry of 60s
# See application.ex for cache settings
rating_type_id = MatchRatingLib.rating_type_name_lookup()[rating_type]
rating = get_user_balance_rating_value(userid, rating_type_id)
{skill, uncertainty} = get_user_rating_value_uncertainty_pair(userid, rating_type_id)
rating = calculate_rating_value(skill, uncertainty)
rating = fuzz_rating(rating, fuzz_multiplier)
Copy link
Member Author

@jauggy jauggy Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistake
calculate_rating_value should not be called since we already have the rating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants