THC Science

May 2020

Hello, and welcome to Wikipedia. This is a message letting you know that one or more of your recent edits to Sandbox has been undone by an automated computer program called ClueBot NG.

ClueBot NG makes very few mistakes, but it does happen. If you believe the change you made was constructive, please read about it, report it here, remove this message from your talk page, and then make the edit again.
For help, take a look at the introduction.
The following is the log entry regarding this message: Sandbox was changed by Yapperbot (u) (t) ANN scored at 0.952021 on 2020-05-15T13:44:44+00:00

Thank you. ClueBot NG (talk) 13:44, 15 May 2020 (UTC)[reply]

Just a note - this wasn't a malfunctioning bot, this was me making a mistake myself when manually trying something. Sorry, won't happen again! Naypta ☺ | ✉ talk page | 13:46, 15 May 2020 (UTC)[reply]

Current discussions

Wikipedia_talk:Feedback_request_service#Bot_enabled_--_concerns
User_talk:Naypta#Yapperbot_is_frisky_this_morning and other sections on User_talk:Naypta (revised 01:42, 17 June 2020 (UTC))[reply]
Wikipedia_talk:Feedback_request_service#Editors_removing_themselves_from_notification_list (added 01:42, 17 June 2020 (UTC))
Wikipedia:Administrators'_noticeboard/Incidents#Please_stop_Yapperbot_NOW (permalink) <-- This one initially had most activity.

Moved to User talk:Naypta § Frequency functionality continued

and then to

Moved to User talk:Yapperbot § Frequency functionality continued

^please add new comments here.

--David Tornheim (talk) 09:15, 16 June 2020 (UTC) [revised 23:57, 16 June 2020 (UTC)][reply]

I was here to notify about some issues too. Already done. Also, did you kidnap legobot? What have you done with legobot?! —usernamekiran (talk) 09:40, 16 June 2020 (UTC)[reply]

@Usernamekiran: I would ping bot coder to get his/her attention to your question. I believe questions/conerns about the bot should be here, not the coder's talk page. --David Tornheim (talk) 11:06, 16 June 2020 (UTC)[reply]

@Usernamekiran: Hi, thanks for getting in touch - hopefully you've seen the explanations as to what's going on, sorry for any inconvenience! As to Legobot, no kidnapping involved - it's still working just fine on other tasks, it had just stopped doing the FRS for some reason a number of months ago. Attempts to get in touch with Legoktm about it weren't successful, so I rebuilt the functionality into my own bot. The discussion that prompted me to do that is here. Naypta ☺ | ✉ talk page | 11:16, 16 June 2020 (UTC)[reply]

Frequency functionality continued

Moved from WP:ANI § Please stop Yapperbot NOW

(permalink)

Discussion continued from WP:ANI (and other locations as noted above). Mathglot (talk) 10:22, 16 June 2020 (UTC) [restored with permlink --David Tornheim (talk) 23:39, 16 June 2020 (UTC)][reply]

Copied from Naypta's talk page (permalink) on 00:17, 17 June 2020 (UTC):[reply]

Discussion continued from WP:ANI (and other locations as noted at User_talk:Yapperbot#Current_discussions). Mathglot (talk) 10:27, 16 June 2020 (UTC)[reply]

@Mathglot: Thanks for opening up this continued discussion.

Can you commit to looking into an adjustment to the code so that a cold start after some time offline won't repeat this? I wrote my answer to whether or not bot should be turned off during an edit-conflict. I'm willing to commit to looking at the code, but I expect it will take a few days before I have any sense of how it works, given my experience with programming/coding does not include wiki-bot coding. I can't promise I will have the time and patience to sufficiently understand it to verify that this wouldn't happen again, but I will give it a shot. I promise that within the week I will at least get started and will put in at least an hour to looking at it and possibly asking the coder or other bot-coders key questions about how it or certain bot-commands work.

If there is no documentation, I might start (or add to) it.

That's it for this subject for tonight for me... --David Tornheim (talk) 10:48, 16 June 2020 (UTC)[reply]

@David Tornheim and Mathglot: I won't comment on where this discussion should be - as mentioned previously, I'm more than happy to go wherever it takes me!

This is as far as I can tell the first time that the bot being "turned off" during an edit conflict has been mentioned. What do you envisage that would do? Also, doesn't that have the potential to create quite serious issues with frequently-used talk pages? (It may also not be possible in the current implementation, as Yapperbot is coded deliberately to use MediaWiki's "New section" functionality to avoid ever having edit conflicts.)

The idea of rate limiting is clearly one that's possible, though. In theory, this issue shouldn't ever reoccur anyway, but in the event that it did, it might be good to have rate limits involved. I already have edit limiting code from the bot trial, which is hooked into the FRS bot, so changing that to have a limit on the number of messages sent to a single user per run (the bot currently runs on Toolforge every hour) would definitely be possible if people think that's a good idea. One alternative would be simply to add another parameter to {{Frs user}} that allows users to customise a daily limit - perhaps with a default of 3, then allowing users to set any number there, or 0 for no limit.

Whatever changes are made, I want to make sure that everyone is happy with them - so please let me know your thoughts! Naypta ☺ | ✉ talk page | 11:09, 16 June 2020 (UTC)[reply]

On further reflection, I think there are two good options going forward (although there may well be other ones that I've not considered - I welcome additional suggestions!):

Add a per-week limit to {{Frs user}}. I previously said per day limit, but in the vast majority of cases (i.e. pretty much any apart from this edge case of edge cases) that wouldn't be helpful. A week limit would accomplish much the same thing, just with far more utility in normal times, too.
Build the code of the bot to ship multiple notifications to a user in one template. This has advantages and disadvantages: whilst it'd mean less talk page spam in this edge case, it would also mean that the notification might potentially be less clear, as the heading would have to be just "feedback requested" rather than a category (as they might contain multiple categories). It'd also mean the bot would be less easy to debug if there were issues to come up: at the moment, because each message is a product of a single RfC, it's easy to track back issues if they occur and fix them, which would be more problematic without that clear connection.

Let me know your thoughts

Naypta ☺ | ✉ talk page | 11:36, 16 June 2020 (UTC)[reply]

Hiya - Allie from IRC here. I would advise putting a hard limit on how many times Yapperbot can write to a specific user's talkpage in a one-hour period, and I would suggest that limit is once - all the FRS notifications for a day should really be delivered in a single edit anyway. I would also suggest implementing a proper rate-limit which takes the per-month limit and uses that to calculate a "cooling-off" period between notifications to a user. For instance, I think I'm set at 30 notifications per month, so a 24 hour cooling-off period would be appropriate, but someone who is set at one notification per month should recieve a notification (on average) every 30 days, instead of just on the first of each calendar month. I'm a bit concerned you're referring to this as an 'edge case' - in my opinion, scheduling is core bot functionality. -- a ^{they/them | argue | contribs} 12:01, 16 June 2020 (UTC)[reply]

@Alfie: Hi Allie. This is an edge case, because it is by no means normal for there to be this many "new" RfCs to process. If you take a look at the history of the pages Legobot transcludes RfCs onto - for instance, take the Biographies category - you can see that, on a daily basis, there's normally one, maybe two, RfCs per category. Ninety-nine to process at once is, in every sense of the word, a rarity.

That being said, of course, it being a rarity and an edge case does not mean that it's not something that would be useful to address. A cooling-off period, as you refer to it, is of course possible to implement, but I'm not sure it's really all that necessary - if you look at the history of the way that Legobot previously did this, this was never an issue, and I suspect had I just not sent any notifications of the ongoing RfCs and only started sending messages regarding new ones, it wouldn't have ever come up as an issue either. Bundling FRS notifications in a run is definitely possible, although there's nothing to guarantee that a further run that same day wouldn't pick up a new RfC or GA nom, which would then send another message. Once again, the thing to bear in mind here is that the vast majority of the time, each run will consist of one RfC, maybe two at a push - nowhere near the number experienced this morning. Naypta ☺ | ✉ talk page | 12:07, 16 June 2020 (UTC)[reply]

PS: As to documentation, the specific bot code doesn't have explicit documentation, because it's not a library, but all the relevant bits of code are commented. Code for ybtools, which is the shared library used across all of the bot's tasks, is commented in standard Godoc style as it is a library, so its full documentation is available here. Naypta ☺ | ✉ talk page | 11:11, 16 June 2020 (UTC)[reply]

P.S. shouldn't this discussion be at Wikipedia_talk:Feedback_request_service or User_talk:Yapperbot? --David Tornheim (talk) 10:52, 16 June 2020 (UTC)[reply]

If you just bundled all the invites into a single section (possibly by detecting whether the last section on a user's talkpage is an existing recent notification, and adding a new notification to it) I think people would be 90% less annoyed. But people are making much too big a deal of this, if indeed it's just a startup phenomenon. E Eng 13:51, 16 June 2020 (UTC)[reply]
@EEng: No flagellation intended, self or otherwise - it's not about going "oh no, aren't I awful", it's about going "okay, how can we make sure this doesn't happen again?" Naypta ☺ | ✉ talk page | 08:29, 17 June 2020 (UTC)[reply]

@EEng: That's one of the options I've mentioned above, yeah. I disagree that people are making too big a deal of it, though; it's important. Being a botop means being in a position of trust, by the very nature of running a bot, and I want people to feel that they can put that trust in me. If people feel I've broken that trust, that's a huge issue, so it is important to have these discussions - at least from my perspective. As I said at the ANI thread, bots are here to serve the community, not the other way around, and I want to make sure that mine works the way it's supposed to. Naypta ☺ | ✉ talk page | 13:56, 16 June 2020 (UTC)[reply]
I can send you a whip with which to flagellate yourself. E Eng 13:57, 16 June 2020 (UTC)[reply]

I very much appreciate your saying that in that tone. --David Tornheim (talk) 23:25, 16 June 2020 (UTC)[reply]
@David Tornheim and Mathglot: Thanks David for moving the discussion over here. To give an update on some steps I've already taken to try and ensure a better distribution even if such a large list were ever to come up again for simultaneous sending:
- The random number generator that selects which users are going to be invited to give feedback is now re-seeded every selection, rather than on every bot run, which should significantly increase the variety in users selected when a bot run has a lot of messages to process.
- The bot now waits for five seconds between each invitation, to try and prevent people being spammed and unable to edit their own talk page from edit conflicts.
- The number of messages being sent for each RfC/GA nom has been lowered - it was previously a random number between 15 and 25, now it's a random number between 5 and 15.
- Some issues with storing the state of processing GA nominations have been fixed, which had previously caused problems with GA nominations sometimes going out twice.
Hopefully these will all be helpful to ensure the bot works better! Naypta ☺ | ✉ talk page | 08:35, 17 June 2020 (UTC)[reply]

Thanks for the update. I haven't delved into the code, except a brief look at this code you referred to. (What language is it written in?) Responding to some of the points:

The random number generator is re-seeded every selection, rather than on every bot run.

That seems odd to me that that would make any significant difference. The only possible concern I would have is if the same process started with the exact same seed. (I would only keep the seed the same if I were trying to debug it.) Are you using the clock for the seed? That's how I used to do it, but I was told that even with the clock, patterns can still emerge. Maybe there are new techniques for dealing with seed that were not available many years ago.

The number of messages being sent for each RfC/GA nom has been lowered - it was previously a random number between 15 and 25, now it's a random number between 5 and 15.

That sounds way too low to me. I've seen RfC's with as many as 200 responses. Do you mean 15-25/day has been reduced to 5-15/day?

Did you get those numbers from LegoBot?

Have you looked at the LegoBot code to see how it works? Do you know if it is available for pubic review too?

...[problems with] GA nominations have been fixed...sometimes going out twice.

Glad to hear it!

Is there a location where other users reviewed your code before it was released? If I find it, I'll delete this, or tell others where that is.

--David Tornheim (talk) 10:32, 17 June 2020 (UTC)[reply]

@David Tornheim: I'm using the clock for the seed, yes - the reason I changed it to every selection is to improve randomisation on runs that have huge numbers of messages, like the one we had yesterday morning. As the RNG was only seeded once, at the start of the process, with the timestamp at the start, the random permutation algorithm was returning the same permutations throughout - because it had the same seed - meaning that a small number of people received a very large number of messages, because their usernames were sorted to the front of the queue by the permutation generator. Now that it's reseeding the generator every selection, it will perform a new random selection every time.

The Legobot code is available for public review over here, but it's not exactly well-documented, to say the least - I'll freely admit that my eyes gloss over a bit when I see things like $temp01, $temp02 and foreach ($temp02 as $temp2) {. I had difficulty gleaning much at all from it, so no, I didn't get the numbers from Legobot - but I can adjust them as necessary, if the feeling is that they ought to be different. I might end up putting them on-wiki, so it doesn't take a code change to adjust them as needed.

As to code review, the code was available for review during the BRFA, but I don't know if it was actually reviewed by anyone - I don't imagine that it was, as it would make BAG members' lives even more difficult having to review entire codebases in languages they might not be familiar with to approve bots. The trial run over there went smoothly, because it was dealing with a smaller volume of messages to send, so there wasn't really an opportunity for this sort of an issue to arise. Naypta ☺ | ✉ talk page | 10:36, 17 June 2020 (UTC)[reply]

That's really too bad if no one reviewed it. It's troubling to me the possibility that code of this importance was not reviewed by at least one long-term experienced bot coder--ideally someone who had been here close to the start of the project. If truly no one reviewed it, I applaud your bravery at putting yourself out there like this, and give you way more slack than I might have from the beginning. No individual should have that level of responsibility.

I had assumed that all bot code had to be reviewed by a number of coders--maybe that was the case in the past.

How much time did you spend analyzing this kind of data:

1. the average number of new RfC's launched per month

2. " " per category

3. the typical number of responses to an RfC?

Do you have some data that you compiled and analyzed?

It sounds like the tests you did were insufficient.

Maybe we can work on some better tests and data analysis to be sure it is doing what editors expect and the input/output ratio makes sense. If we can figure out what LegoBot did, that might be the best, since the challenges we are seeing now may have been worked out already over the years the bot was working.

I'm going to try to do some more research on how Legbot handled the RfCs. Did anyone assist you at all in working on the code?

I see programming language is Golang.

I'm a little unclear on exactly where Yapperbot is located. I think you said: ytbtools

If that's the case, where is the entry point(s)? I hope you can be patient with me. This may be obvious to others... --David Tornheim (talk) 11:10, 17 June 2020 (UTC)[reply]

P.S. After writing the above, I see that three editors commented at Wikipedia:Bots/Requests for approval/Yapperbot. I'm surprised they are not here commenting and making suggestions on how to move forward. I think we should invite them at some point, but hopefully they will find there way here on their own. I think those who reviewed and approved this may share some responsibility for the problems and glitches that could have been caught. --David Tornheim (talk) 11:10, 17 June 2020 (UTC)[reply]

@David Tornheim: At the end of the day, this is a volunteer project, and whilst MediaWiki has paid developers, neither I nor the BAG, nor any other user, has any financial incentive for building bots - we do it to make the encyclopedia better. Imposing a requirement for code review makes total sense when committing stuff to the core repositories for MediaWiki, and indeed is used for MediaWiki extensions, but for a bot - something that runs off a single user account, can be easily blocked if need be, and is still subject to many (albeit not all) of the same limits that other users would have - it's just not a requirement. To the best of my knowledge, it's never been one.

I don't have a formal analysis of the number of RfC responses, no; I of course have anecdotal experience, but I don't have a specific report to show you. If this was a new bot task, I might have considered doing such a report, but as it's just taking over from what a bot was doing previously, replicating the functionality without adding anything new, my feeling (and evidently the feeling of the BAG) was that such a requirement would have been unnecessary. This is, again, the difference between an enwiki bot and commercial code: this is a passion project, not something where there's a list of deliverables, a timeframe and some set objectives.

I'm the sole author of all of the code - there have been no other contributors, pull requests or specific code suggestions made, with exception to some help I had with some of the regexes from some very helpful people on #regex on Freenode. ybtools is the library that powers all of Yapperbot's tasks, including the FRS one; the specific FRS code is here. The entry point in that code is main() in main.go, as is standard for Golang code. Naypta ☺ | ✉ talk page | 11:19, 17 June 2020 (UTC)[reply]

I wrote the below response while you were post the above. I haven't had a chance to read it yet. Need to call it a night... --David Tornheim (talk) 11:55, 17 June 2020 (UTC)[reply]

Legobot's code

@Naypta: I just looked at the LegoBot code you referred me to: [[1]]. It actually looks pretty straight forward to me--far easier to read than I expected. I see it does a per day calculation:

70: $time = 30/$frsl_limit;

That's how it avoided the problem you ran into.

I don't see any limits to how many notices go out per RfC. I do see that it uses an SQL database to make it efficient.

It looks like it cycles through every single RfC and every single user who wants notification and there is nothing random about it.

I believe it focuses on how much time is left before user can be notified again.

I believe it is something like this: (for the "all" RfC case)

initialize MinTime = maximum time to wait before bot runs again (e.g. 1 day)

For each user ($u) {

For each rfc ($r) { /* starting with oldest first */

if ($u has not been notified of $r -and- $u has not reached notification limit yet) {

notify $u of $r

tell database that $u has been notified of $r

}

Calculate delta time $t required before $u can receive next notice.

If $t < MinTime then MinTime = $t

}

The bot would wait the larger of $t -and- some min. increment (e.g. 10 seconds) before bot cycles through these again.

To be efficient it could keep track of how much time is left before $u can be notified in each category $c.

Are you familiar with SQL? I recently took a Database course, so I know it.

Because it is database, a properly designed query should take care of both loops, returning all relevant records.

Anyway, I believe I can figure out how it works and document it. I will be very curious to see how things are similar or different than your code.

--David Tornheim (talk) 11:55, 17 June 2020 (UTC)[reply]

@@ Line 89: / Line 89: @@
 ::::P.S. After writing the above, I see that three editors commented at [[Wikipedia:Bots/Requests for approval/Yapperbot]].  I'm surprised they are not here commenting and making suggestions on how to move forward.  I think we should invite them at some point, but hopefully they will find there way here on their own.  I think those who reviewed and approved this may share some responsibility for the problems and glitches that could have been caught.  --[[User:David Tornheim|David Tornheim]] ([[User talk:David Tornheim|talk]]) 11:10, 17 June 2020 (UTC)
 :{{re|David Tornheim}} At the end of the day, this is a volunteer project, and whilst MediaWiki has paid developers, neither I nor the BAG, nor any other user, has any financial incentive for building bots - we do it to make the encyclopedia better. Imposing a requirement for code review makes total sense when committing stuff to the core repositories for MediaWiki, and indeed is used for MediaWiki extensions, but for a bot - something that runs off a single user account, can be easily blocked if need be, and is still subject to many (albeit not all) of the same limits that other users would have - it's just not a requirement. To the best of my knowledge, it's never been one.{{pb}}I don't have a formal analysis of the number of RfC responses, no; I of course have anecdotal experience, but I don't have a specific report to show you. If this was a new bot task, I might have considered doing such a report, but as it's just taking over from what a bot was doing previously, replicating the functionality without adding anything new, my feeling (and evidently the feeling of the BAG) was that such a requirement would have been unnecessary. This is, again, the difference between an enwiki bot and commercial code: this is a passion project, not something where there's a list of deliverables, a timeframe and some set objectives.{{pb}}I'm the sole author of all of the code - there have been no other contributors, pull requests or specific code suggestions made, with exception to some help I had with some of the regexes from some very helpful people on #regex on Freenode. <code>ybtools</code> is the library that powers all of Yapperbot's tasks, including the FRS one; [https://github.com/mashedkeyboard/yapperbot-frs the specific FRS code is here]. The entry point in that code is <code>main()</code> in <code>main.go</code>, as is standard for Golang code. [[User:Naypta|Naypta]] ☺ &#124; <small>[[User talk:Naypta|✉ talk page]]</small> &#124; 11:19, 17 June 2020 (UTC)
+::I wrote the below response while you were post the above.  I haven't had a chance to read it yet.  Need to call it a night...  --[[User:David Tornheim|David Tornheim]] ([[User talk:David Tornheim|talk]]) 11:55, 17 June 2020 (UTC)
+=== Legobot's code ===
+:::::{{re|Naypta}} I just looked at the LegoBot code you referred me to:  [[https://github.com/legoktm/harej-bots/blob/master/frsbot.php]].  It actually looks pretty straight forward to me--far easier to read than I expected.  I see it does a per day calculation:
+::::::70:		$time = 30/$frsl_limit;
+:::::That's how it avoided the problem you ran into.
+:::::I don't see any limits to how many notices go out per RfC. I do see that it uses an SQL database to make it efficient.
+:::::It looks like it cycles through every single RfC and every single user who wants notification and there is nothing random about it.
+:::::I believe it focuses on how much time is left before user can be notified again.
+:::::I believe it is something like this:           (for the "all" RfC case)
+:::::::initialize MinTime = maximum time to wait before bot runs again (e.g. 1 day)
+:::::::For each user ($u) {
+::::::::For each rfc ($r) {  /* starting with oldest first */
+:::::::::if ($u has not been notified of $r -and- $u has not reached notification limit yet) {
+:::::::::::::notify $u of $r
+:::::::::::::tell database that $u has been notified of $r
+:::::::::}
+:::::::::Calculate delta time $t required before $u can receive next notice.
+:::::::::If $t < MinTime then MinTime = $t
+::::::::}
+:::::::}
+:::::The bot would wait the larger of $t -and- some min. increment (e.g. 10 seconds) before bot cycles through these again.
+:::::To be efficient it could keep track of how much time is left before $u can be notified in each category $c.
+:::::Are you familiar with [[SQL]]?  I recently took a Database course, so I know it.
+:::::Because it is database, a properly designed query should take care of both loops, returning all relevant records.
+:::::Anyway, I believe I can figure out how it works and document it.  I will be very curious to see how things are similar or different than your code.
+:::::--[[User:David Tornheim|David Tornheim]] ([[User talk:David Tornheim|talk]]) 11:55, 17 June 2020 (UTC)

THC Science

Bringing Science to the Cannabis Conversation!

Cannabis Ruderalis

Revision as of 11:55, 17 June 2020

May 2020

Current discussions

Frequency functionality continued

Legobot's code

Leave a Reply