Motivation
Temporary accounts were switched to being generated in a "scrambled" order. This task is to determine what is the preferred order for temporary accounts.
This came up in the discussion on T332805: Decide the prefix character for temporary usernames
Relevant comments from that discussion:
In T332805#9075416, @Tgr wrote:The numbers aren't incrementing, they are pseudo-random (at least that's how the test setup is currently configured). They don't reset, but with pseudo-random numbers there is no apparent difference anyway.
In T332805#9078131, @Tchanders wrote:In T332805#9075501, @RHo wrote:In T332805#9075416, @Tgr wrote:The numbers aren't incrementing, they are pseudo-random (at least that's how the test setup is currently configured). They don't reset, but with pseudo-random numbers there is no apparent difference anyway.
@Niharika or @Tchanders could you please confirm? This whole time myself and previous AHT designer had been operating under the understanding that it was an incrementing number. If not the case, can it be made so for the benefits mentioned?
It was set to scramble a couple of weeks ago in this patch from the Growth Team: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/938915
In T332805#9079420, @Urbanecm_WMF wrote:In T332805#9079288, @RHo wrote:In T332805#9078131, @Tchanders wrote:In T332805#9075501, @RHo wrote:In T332805#9075416, @Tgr wrote:The numbers aren't incrementing, they are pseudo-random (at least that's how the test setup is currently configured). They don't reset, but with pseudo-random numbers there is no apparent difference anyway.
@Niharika or @Tchanders could you please confirm? This whole time myself and previous AHT designer had been operating under the understanding that it was an incrementing number. If not the case, can it be made so for the benefits mentioned?
It was set to scramble a couple of weeks ago in this patch from the Growth Team: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/938915
@Tgr @Urbanecm_WMF - per above, can we reset to not scramble but serial?
We not only use scramble as of now; we also use multiple shards of the serial provider (aka multiple sources of incrementing integers, where each time a source is selected randomly). The way how this works is that each number source generates every Nth number (if we have three of them, the first one generates numbers 1, 4, 7, ..., the second numbers 2, 5, 8, ... and the third one numbers like 3, 6, 9, ...). This means that if we switched back to serial, the temporary account names probably wouldn't form a perfectly incrementing sequence. The following could be a perfectly valid sequence of temporary account names:
- *Unregistered 1
- *Unregistered 4
- *Unregistered 2
- *Unregistered 5
- *Unregistered 3
Scrambling takes this a level up and makes the account names seemingly random. Unfortunately, merely switching back to serial wouldn't give us a perfectly incrementing series of account names, as illustrated above. I'm not sure how scrambling contributes to the interpretation: for big numbers, users probably won't see minor ordering hiccups, unless they're by an order of magnitude wrong.
I'm not really sure about the technical reason for switching to scrambling. About switching to multiple shards of the serial provider, my assumption is that using only one shard would put a lot of burden on a single counter shared across all wikis. The counter can't be really made local to each wiki, as we need to ensure the generated usernames are unique across all wikis (temp accounts can switch between projects, retaining the same temp account, just as regular users do, so we need to "reserve" their name on all projects).
@Tgr and @tstarling (who originally suggested switching to scrambling and increasing the shard count on the patch), please correct me if I'm mistaken in any part of the comment above.