THC Science

Talk Page 2022

2021

Wishing you a happy 2022! [edit]

Happy New Year!

MMXXII Lunar Calendar

GreenC,
Have a great 2022 and thanks for your continued contributions to Wikipedia.

– Background color is Very Peri (#6868ab), Pantone's 2022 Color of the year

Send New Year cheer by adding {{subst:Happy New Year 2022}} to user talk pages.

– North America¹⁰⁰⁰ 16:20, 3 January 2022 (UTC)[reply]

IABot edits on dawiki[edit]

Hi! I noticed edits like da:Special:Diff/10981352 and came to your page to read more. I noticed that you have a bot task for "fixes known problems with Internet Archive Wayback Machine links" etc. I wonder what those problems are and if you could also fix them on dawiki if relevant? --MGA73 (talk) 21:13, 4 January 2022 (UTC)[reply]

In theory, it depends what the problem is due to technical limits of working with templates in other languages, how complex the the problem is. Like following the footsteps of IABot and fixing formatting errors on wiki, but not fixing IABot itself. -- GreenC 05:38, 5 January 2022 (UTC)[reply]

Thank you. Yes you are right that language could be a problem. As far as I know all English templates and parameters work on dawiki too. I checked out User:GreenC/WaybackMedic 2.5. Is there any where else I can look to know more about the edits? --MGA73 (talk) 12:58, 5 January 2022 (UTC)[reply]

I do not mean that it is not good enough. I just wanted to be sure I'm reading the right place. It looks very cool so far. --MGA73 (talk) 13:02, 5 January 2022 (UTC)[reply]

Yes that's generally what it does, also a lot more it's continually under development. The program is so large to make it work in another language would be a major project. I'd like to, it won't be any time soon. For example there are many requests at WP:BOTREQ that currently can only be done in Enwiki, that should also be done in other languages. -- GreenC 15:18, 5 January 2022 (UTC)[reply]

Thank you. Well if you ever want to try you could perhaps just run the bot on dawiki (like 50 test edits) without any changes of the code. Then we could see what happens. Hopefully it will just skip templates in Danish but still fix those in English. If it breaks stuff I can fix that manually. --MGA73 (talk) 19:58, 5 January 2022 (UTC)[reply]

MGA73 - can you choose a few articles good for testing? Not 50, like 4 or 5. -- GreenC 15:02, 6 January 2022 (UTC)[reply]

Hi! Thank you. I picked a few articles IABot edited recently that also have many links (is that a good criteria?):

Please only test if it is not too much work :-) And in case there are any errors or problems I can clean up. --MGA73 (talk) 15:20, 6 January 2022 (UTC)[reply]

OK. I need to finish another project first, the bot is currently tooled for that project, than can run some tests on these articles. The bot separates processing from upload so I can process the articles, look at proposed diffs and logs to see what it would do, before upload. -- GreenC 16:11, 6 January 2022 (UTC)[reply]

Thanks a million! --MGA73 (talk) 18:37, 6 January 2022 (UTC)[reply]

I tried it (no upload). Unfortunately the dates are a problem it is seeing |archive-date=5. januar 2007 and switching to |archive-date=January 5, 2007. There are a lot of functions related to dates it would not be quick or easy to add support for other forms. GreenC 21:55, 9 January 2022 (UTC)[reply]

Another question: At some point dawiki had a bad code in the CS1-module so a mix of bad things made some urls being marked as dead instead of live. Does the bot check and fix that? Or can we ask the bot to fix that? --MGA73 (talk) 11:02, 9 January 2022 (UTC)[reply]

Not that I am aware of. It would probably require a separate bot to do header checks for status 200. If there is a redirect URL it is more difficult due to soft-404s, those could be skipped initially. -- GreenC 22:03, 9 January 2022 (UTC)[reply]

Thanks. Too bad. Well if everything else fails we can always remove "dead" and have the bot check them again. --MGA73 (talk) 06:41, 10 January 2022 (UTC)[reply]

Someone had an old database-dump so I got a list of 419 articles that I think may have a wrong url-status. So I will just remove "dead" from all links in those articles and hopefully IABot will be so kind to check the links again :-) --MGA73 (talk) 14:57, 10 January 2022 (UTC)[reply]

It seems that if I remove "dead" or "url-status=dead" then the bot will still ignore the links. Do I have to remove "archive-url" and/or "archive-date" to get the bot to check the link? Or should I change to "live"? --MGA73 (talk) 16:03, 10 January 2022 (UTC)[reply]

You could try but I am not sure it will work as IABot might either do nothing or set it back to dead since a missing url-status is the same as url-status=dead for CS1. Another option that's available on enwiki (not sure dawiki) is |url-status=bot: unknown which is a flag saying the status was set by a bot, but the bot doesn't know the true status. It's a flag for humans or other bots so they know it needs help. -- GreenC 16:48, 10 January 2022 (UTC)[reply]

Thank you. I tried to remove the url and that seems to work. But according to https://iabot.toolforge.org/index.php?page=runbotqueue I can't ask the bot to check all the pages. I have to take them 1 page at a time :-( --MGA73 (talk) 18:27, 10 January 2022 (UTC)[reply]

Oh I see, good idea. Yes, when doing archive all links option, it allows one request/page at a time. -- GreenC

It seems that bot jobs are completely disabled. --MGA73 (talk) 20:50, 10 January 2022 (UTC)[reply]

I think Cyberpower is trying to fix a problem today unrelated. -- GreenC 21:03, 10 January 2022 (UTC)[reply]

Hi! On Help_talk:Citation_Style_1#CS1_maint:_url-status we talked about "url-status=dead" and the use of "Dead link". On da.wiki I noticed da:Special:Diff/11055454 where IABot set status to dead leading to an "CS1 maint: url-status"-error. The bot does now about "Dead link"-template (in Danish "Dødt link") per da:Special:Diff/11055568. So I wonder why this happens. Should IABot be able to fix it or is it an example of what your bot would fix? --MGA73 (talk) 09:58, 13 February 2022 (UTC)[reply]

It is a bug in IABot, or incorrect site cfg, not sure which. I left a bug report. My bot would simply remove an orphan |url-status=dead since it has no other members |archive-url= and |archive-date= which are required. . -- GreenC 15:11, 13 February 2022 (UTC)[reply]

Edit conflict[edit]

Sorry we overlapped edits. Thanks for fixing this. GA-RT-22 (talk) 05:04, 5 January 2022 (UTC)[reply]

GA-RT-22 no problem, it was my fault to begin with for deleting that source, earlier today, I missed that factoid needed that source. -- GreenC 05:10, 5 January 2022 (UTC)[reply]

archive.org and cloudflare captcha[edit]

Is there any way to save a page/site which has cloudflare captcha enabled? i.e. this page as reference for Citizen News. I have been trying the different archival sites, but they can't get past the captcha. – robertsky (talk) 05:51, 5 January 2022 (UTC)[reply]

That's a good question. Archive.today often is ahead of the curve but it's hit or miss. Another option create a free account with Conifer (old webrecorder.io) and it will allow you to interact with the page as it records the save real-time. Thus you can save archives of infinite scroll, multi-click slideshows and presumably captcha, but I never tried captcha. Another to try is ghostarchive.org .. let me know if you find a solution. -- GreenC 07:36, 5 January 2022 (UTC)[reply]

conifer proxies the cloudflare hcaptcha network requests, and thus fed me into a loop of the page being reloaded constantly. tried ghostarchive and archive.is, but they also faced the same issue. – robertsky (talk) 10:33, 5 January 2022 (UTC)[reply]

I got it! I utilised webrecorder's complementary tool, archiveweb.page to record the page in browser. I also have Cloudflare's Privacy Pass extension enabled to skip their captcha with pre-loaded challenge token. After that it is a matter of exporting the warc file and upload it on to conifer. – robertsky (talk) 16:27, 5 January 2022 (UTC)[reply]

Hey, nifty! I did this because IABot will delete the archive URL as unrecognized otherhwise. Webrecorder has cool stuff. Wikipedia users could generate warcs that can be moved, copied, saved and hosted from anywhere: https://replayweb.page/docs/ -- GreenC 17:16, 5 January 2022 (UTC)[reply]

I have extracted the list of urls that are currently on the enwiki, and archived them all. just in case we wanna manually/semi-automate inserting of archive-url on the existing refs. https://conifer.rhizome.org/robertsky/citizen-news-wp-ext-links – robertsky (talk) 23:55, 5 January 2022 (UTC)[reply]

– robertsky: are the youtube links working for you? They do not for me /embed/5wAppSO66_w. Wonder if the "embed" is the problem. A good service for youtube on-demand is ghostarchive: /embed/5wAppSO66_w. Wayback also supports but may be slower to save via a queue: /embed/5wAppSO66_w. BTW Ghost uses webrecorder software on the back-end. There are a couple things about Conifer to be aware of. It is a security risk since anyone can create a warc, modify the content to support a conspiracy theory misinformation; I could imagine a day when enwiki decides to ban the site for that reason. I'm looking into any solutions such as checksums. Also there is limited storage space per account and youtube uses a lot. --- GreenC 14:40, 6 January 2022 (UTC)[reply]

I didn't test the youtube links. those are generated incidentally when trying archive webpages with youtube videos embedded. I wasn't too much concerned with youtube as there isn't much of a running cost to have the videos hosted on youtube, unlike the website. Agreed on conifer being a possible security risk. I could think of multiple ways to conduct such nefarious activities when I was doing the archiving. – robertsky (talk) 16:33, 6 January 2022 (UTC)[reply]

Relevant. I know one of the authors. Basically there are no accepted standards, but people have been looking at it for years. Hard problem it seems. -- GreenC 19:31, 6 January 2022 (UTC)[reply]

reminds me of Quis custodiet ipsos custodes? – robertsky (talk) 06:32, 9 January 2022 (UTC)[reply]

List of Pokémon Go-related injuries and deaths[edit]

I am currently making the page, as well as the 2 deaths caused by that Burger King pokeball. Draft:List of deaths caused by the Pokemon franchise — Preceding unsigned comment added by 98.148.167.84 (talk) 06:45, 6 February 2022 (UTC)[reply]

Replacing WebCite archives[edit]

Now that WebCite archives are no longer accessible (they might have been destroyed, who knows) is anyone/any bot doing anything to replace them with working archives? Kailash29792 (talk) 05:34, 18 February 2022 (UTC)[reply]

@Kailash29792: There was consensus to replace them with Wayback links, and I think GreenC had an approved bot to do so. The problem, according to him (or at least what I think he said) is with content drift - sometimes, the webcite archived page is different than the one on web.archive.org. In addition, maybe there were pages that worked with webcite but not web.archive.org. And also apperently, there are some talks going on with WebCite and archive.org to transfer the database (not sure if that's true or the talks are still ongoing).

It's my personal belief that if we know Webcite is never coming back, then we should just replace all the links with web.archive.org. Rlink2 (talk) 14:49, 18 February 2022 (UTC)[reply]

That's right. There is still some thread of hope, but what he's attempting to do, will take time and money he has to raise. From what I know of the owner, I don't particularly trust him to do the right thing, for Wikipedia purposes, even if he gets it back online, we'd be better without it long term.

Also, it might be that archive.today could be better than Wayback in this case. Because WebCite has been around since 1997, it was the only 'save page now' option until Archive.today arrived in 2012 (Wayback SPN started around 2014, I think). At the time, archive.today did a sweep of all links on Wikipedia including saving the WebCite archive pages themselves (double archive). Don't bother checking Wayback due to how WebCite structures its archives, the Wayback Machine to this day in incapable of reliably double archiving WebCite. There is an opportunity to find WebCite archives pages hosted at Archive.today that would match the dates we need. No idea how many there might be. -- GreenC 05:39, 20 February 2022 (UTC)[reply]

Following up from external links discussion[edit]

Hi, you were pinged in that discussion as someone who knows the answer to my question: Museum Folkwang recently (don’t know how long ago) restructured their site index, leaving many links on Wikipedia broken. How do I go about getting a bot to check, update, and fix these links? Viriditas (talk) 21:44, 19 February 2022 (UTC)[reply]

Hi User:Viriditas, yes you found the right person. Normally these are reported at WP:URLREQ. Each case is different. The more known the better, like, are the pages still alive but at new URLs (site migration). If so, can the new URLs be deciphered from the old, or are they completely different. Do the old URLs have redirects or just plain dead that need archives. Anything that can discovered would be helpful for determine how to configure the bot. -- GreenC 05:09, 20 February 2022 (UTC)[reply]

Good to know for the future! Luckily, Special:Linksearch shows it can be fixed manually since there’s so few errors. Thank you! Viriditas (talk) 06:53, 20 February 2022 (UTC)[reply]

Great. Those are the best kinds :) -- GreenC 16:26, 20 February 2022 (UTC)[reply]

I fixed the broken links on the English Wikipedia, but there are still issues with wikis in other languages and sister projects like Commons, which continue to have broken links. To fix them, all I did was add "eMP” to the url, like this. Is that enough info for you to take a look at links to Museum Folkwang on the other projects? Viriditas (talk) 21:42, 20 February 2022 (UTC)[reply]

I’ll make a request at URLREQ. Viriditas (talk) 21:47, 20 February 2022 (UTC)[reply]

NUMBEROF and Wikidata[edit]

Quick bug report: {{NUMBEROF}} doesn't seem to work with counts for Wikidata? Eg {{NUMBEROF|activeusers|wikidata}} returns 24291. (Thanks for the great template/module/bot, by the way :) ) --Yair rand (talk) 23:24, 22 February 2022 (UTC)[reply]

List of sites tracked. Not a bug, just not tracked. (thank you). I think Wikidata is so different from everything else it doesn't fit the model. Have not looked at it too closely before. API:Siteinfo, which it uses to get stats, I don't think works with Wikidata? -- GreenC 02:17, 23 February 2022 (UTC)[reply]

Siteinfo seems to work on Wikidata. But thanks anyway. --Yair rand (talk) 08:13, 23 February 2022 (UTC)[reply]

So it does. I guess that would require a new Lua module and update to the bot. -- GreenC 14:52, 23 February 2022 (UTC)[reply]

It now works: {{NUMBEROF|activeusers|www.wikidata}} -> 24291 .. let's follow up at the other thread on the template talk page. -- GreenC 15:55, 23 February 2022 (UTC)[reply]

web.archive.org can archive ghostarchive videos[edit]

Example: https://web.archive.org/web/20220206195653/https://ghostarchive.org/varchive/-CIOhY4ysRE

Could be useful as a "second level backup" for if ghost goes down (Not that I think it would). Thought I would let you know. Rlink2 (talk) 14:01, 25 February 2022 (UTC)[reply]

Wayback has or plans to save every YouTube on Wikipedia so presumably in the case Ghost went down there might be a wayback version available: https://web.archive.org/web/20220217052941/https://www.youtube.com/watch?v=-CIOhY4ysRE .. good to know Wayback can archive Ghost, and Ghost allows itself to be archived. It's not being done automatically, though it might be a good idea. -- GreenC 14:34, 25 February 2022 (UTC)[reply]

Unclear on autonomy of IABot[edit]

Hi, following up with our last discussion, I was wondering about the autonomy of IABot. Is it trawling all the wikis and checking for dead links? My understanding is that it’s not. If, as I assume, it’s not, can I request that it trawl all the articles linked to the WikiProject Visual Arts template on the English Wikipedia? Viriditas (talk) 21:38, 25 February 2022 (UTC)[reply]

It does auto scan all pages in theory, enwiki is so large it might take a very long time. Template:WikiProject Visual arts has a transclusion count of 75.6k .. could get the list of pages, break into 5 or 10 parts, with each as a user-submitted job via iabot.org -- GreenC 05:05, 26 February 2022 (UTC)[reply]

Thanks. Maybe I should just start small and focus on one category for now. How do I ask the bot to fix all articles listed in Category:Pierre-Auguste Renoir, including subcategories? Viriditas (talk) 09:23, 26 February 2022 (UTC)[reply]

IA Bot - Books and languages[edit]

Hello! I wanted to ask 2 questions in regard to IA Bot mostly out of curiosity.

It has come to my attention lately that IABot "has been merged" with GreenCBot in some tasks about books. Can you tell me more about this whole thing?
In SqWiki (my homewiki) we ask for all our citations to have their languages specified for statistical purposes. We have this category that has around 12k articles with missing language values in their citations. Considering that IA Bot is very powerful and has access to a lot of information regarding references, could it be possible so that it also determined and put the language value in some of our citations? Anything that could lower that number somehow would be appreciated. - Klein Muçi (talk) 23:12, 27 February 2022 (UTC)[reply]

@Klein Muçi Nice to see you again. I am sorry you lost the Steward elections but I think you may have a chance next year or two. Keep up the good work and I will be voting for you if you decide to go at it again.

Regarding Number 2, it doesn't seem to me like IABot would have the ability to detect the language for any arbitary source. At best, maybe it could detect the language parameter for limited and whitelisted sources. GreenC would know more about this. (I don't know anything about question 1, GreenC can answer)

In the case IABot does not handle this, it is sort of possible to develop a tool like this for your wiki to detect the language used on any given source. There are some offline tools (don't depend on any external service) that do this, I have used them and they are pretty good at langauge detection (not translation; just detection and identification). Rlink2 (talk) 00:03, 28 February 2022 (UTC)[reply]

@Rlink2, hello! :) Thank you very much for your support! Can you tell me more about such tools? That's exactly what I'm looking for. As for the IA capabilities, that's, again, what I was hoping for. Maybe given its vast information it has about citations in general, it can deduct the language for some links taken from certain websites or other mediums which are known to only produce content in one specific language. - Klein Muçi (talk) 00:12, 28 February 2022 (UTC)[reply]

@Klein Muçi

Sure, I've used such tools (as in "programming libraries") before. Sometimes the website in the code will declare the language of the content, so you can just use what they give you.

For the sites that don't (99% of them most likely) there are libaries for most popular programming languages that will detect the language of any given text. At the very least, it is really great for determining if something is English or not English (not sure about its accuracy rate for actually telling the right language, but its a start)

The hardest part of making such a tool would probably making sure the text extracted from the website is correct. As GreenC will know, there are a whole bunch of strange sites with strange layouts. Some sites like Youtube, Facebook, Twitter, and Instagram may not work (but those sites have always been special cases, IABot can't even handle any of those sites correctly, see phab:T294880).

I just tried it on a couple random sites and it seemed to work just fine though. Could be an interesting idea to explore, who knows.

If IABot has language detection it would probably just be for mediums which are known to only produce content in one specific language., as you said. Rlink2 (talk) 00:34, 28 February 2022 (UTC)[reply]

@Rlink2, just being able to automatically put |language=en in articles that use English citations would be an immense help. I'm sure tens of thousand of entries would be immediately removed from that category making the remaining list manageable.

That idea has been suggested to me 2 years ago. Take a look here. But ever since I haven't been able to find further help on it. - Klein Muçi (talk) 00:44, 28 February 2022 (UTC)[reply]

@Klein Muçi

Regarding Majavah's comment, the attribute is one way of determing the language (as I stated above as well). But not the only way, most sites do not use that attribute. For the ones that do, it would usually be more reliable than the language detector. For the ones that don't, that's where the language detector libraries come in.

You can set the tool to only mark sources with English if it is only certain (at a confidence percentage you can set) it is actually English.

Do you have a list of these backlogged citations that are in need of a lang parameter? Rlink2 (talk) 01:08, 28 February 2022 (UTC)[reply]

@Rlink2, I'm not sure I understand your question correctly. I've already provided above the category (list) of the said citations you're asking for. Do you mean something else other than that? - Klein Muçi (talk) 01:10, 28 February 2022 (UTC)[reply]

@Klein Muçi

Oh yes, I see it now. Didn't read. Silly me.... Rlink2 (talk) 01:13, 28 February 2022 (UTC)[reply]

@Rlink2, no problem at all. I was thinking that maybe you wanted something more specific. - Klein Muçi (talk) 01:15, 28 February 2022 (UTC)[reply]

@Klein Muçi

I cooked up a quick script using the tool, and copy and pasted the result into the edit window. It seems to be working ok (note it only works for cite web templates). See https://sq.wikipedia.org/wiki/Speciale:Kontributet/Rlink2 for diffs.

For cite books and journals obviously it does not have access to the cited material (maybe this is where IAbot can come in) but it could detect if the title in the citation is in English. If the title of the book is in English, it is highly highly probable the actual content is actually in english. Rlink2 (talk) 01:39, 28 February 2022 (UTC)[reply]

@Rlink2, yes, the results do indeed look good. If the work can be automatized, we can do a full run for the web sources and see how much are left and after that decide how to act with the remaining ones. Even though I believe the title's language can be good enough to determine the content's language. - Klein Muçi (talk) 01:42, 28 February 2022 (UTC)[reply]

@Klein Muçi

If the work can be automatized yes it can be automatized if that is your wish. It would be good to see a larger sample of diffs before running it fully unsupervised, to clean up and prevent any bugs and false markings.

Thanks for bringing this matter to us, it is very much appreciated. I am always happy to see people appreciate stuff I do. Rlink2 (talk) 02:01, 28 February 2022 (UTC)[reply]

@Rlink2, thank you. It has been more than 2 years I look for ways to solve that problem and I've even asked at WP:User scripts for help but so far this is the only time we're talking about something concrete about this.

Will you be running the script by your account? If so, I can give you autopatrolled rights now so everyone has an easier time. Does it work on any language or only English for the moment? Also, can we set it up with the parameter marked in its long form for standardization reasons? (|language= instead of |lang=) I believe we can continue the conversation further on SqWiki, either at your talk page or mine, so GreenC doesn't get a notification for each message that we send. - Klein Muçi (talk) 02:10, 28 February 2022 (UTC)[reply]

Phil Yates[edit]

Hi there! Do you have anything that might help Draft:Phil Yates get back into article space? BOZ (talk) 19:00, 1 March 2022 (UTC)[reply]

THC Science

Bringing Science to the Cannabis Conversation!

Cannabis Ruderalis

Contents

Wishing you a happy 2022! [edit]

IABot edits on dawiki[edit]

Edit conflict[edit]

archive.org and cloudflare captcha[edit]

List of Pokémon Go-related injuries and deaths[edit]

Replacing WebCite archives[edit]

Following up from external links discussion[edit]

NUMBEROF and Wikidata[edit]

web.archive.org can archive ghostarchive videos[edit]

Unclear on autonomy of IABot[edit]

IA Bot - Books and languages[edit]

Phil Yates[edit]

Leave a Reply