Cannabis Ruderalis

Content deleted Content added
PresN (talk | contribs)
→‎Pages broken: new topic
Tag: CD
Line 284: Line 284:


Why is <code>"[Ss]urnames?"</code> in the [[Module:Disambiguation#L-26|template list]], causing <code>isDisambiguationPage()</code> to return "yes" for a surname? [[WP:Surname pages|Surname pages]] are [[WP:set index articles|set index articles]], not disambiguation pages. Given name articles do not display this behaviour. ({{tl|Hndis}} is correctly included, because full name lists are dabs, though we should really also add its widely used redirect {{tl|Hndab}}.) [[User:Certes|Certes]] ([[User talk:Certes|talk]]) 18:06, 30 August 2023 (UTC)
Why is <code>"[Ss]urnames?"</code> in the [[Module:Disambiguation#L-26|template list]], causing <code>isDisambiguationPage()</code> to return "yes" for a surname? [[WP:Surname pages|Surname pages]] are [[WP:set index articles|set index articles]], not disambiguation pages. Given name articles do not display this behaviour. ({{tl|Hndis}} is correctly included, because full name lists are dabs, though we should really also add its widely used redirect {{tl|Hndab}}.) [[User:Certes|Certes]] ([[User talk:Certes|talk]]) 18:06, 30 August 2023 (UTC)

== Pages broken ==

{{u|PresN}} every page with a WikiProject template seems to be broken as a result of your change, e.g. [[Talk:Basil Spence]], please revert asap. – [[User:Isochrone|<span style="color:#042559">I</span><span style="color:#0c3c8a">s</span><span style="color:#1d58b8">o</span>chrone]] ([[User talk:Isochrone|T]]) 21:37, 30 August 2023 (UTC)

Revision as of 21:37, 30 August 2023

Plain DISAMBIG magic word

@Evad37: I just had a look over this module's code, and it seems to be missing the case when a user specifies a plain __DISAMBIG__ magic word without using a template. I'm not sure how many disambiguation pages actually do this, but it's probably worth putting it in. Best — Mr. Stradivarius ♪ talk ♪ 04:23, 20 October 2020 (UTC)[reply]

About 7 pages according to this search, most of which already have a disambiguation template. But yeah, probably still worth checking for it. - Evad37 [talk] 23:16, 20 October 2020 (UTC)[reply]
 Done - Evad37 [talk] 22:47, 25 October 2020 (UTC)[reply]
Thanks! Mr. Stradivarius ♪ talk ♪ 11:59, 29 October 2020 (UTC)[reply]

Not working?

  • {{#invoke:Disambiguation|isDisambiguationPage|Abbas Kola}} → yes
    Abbas Kola is a dab-page
  • {{#invoke:Disambiguation|isDisambiguationPage|Kamenitsa}} → yes
    Kamenitsa is a dab-page
  • {{#invoke:Disambiguation|isDisambiguationPage|List of schools in Ireland}} → yes
    List of schools in Ireland is a dab-page

— Martin (MSGJ · talk) 22:37, 21 October 2020 (UTC)[reply]

@Evad37: can any improvement be made here? — Martin (MSGJ · talk) 20:05, 25 October 2020 (UTC)[reply]
 Fixed, was missing the {{geodis}} and {{schooldis}} templates - Evad37 [talk] 22:47, 25 October 2020 (UTC)[reply]
Thanks! I'll do a bit more testing, then look to implement this module in the class mask template — Martin (MSGJ · talk) 12:46, 26 October 2020 (UTC)[reply]

Found another few:

  • {{#invoke:Disambiguation|isDisambiguationPage|K7}} → yes
    K7 is a dab-page
  • {{#invoke:Disambiguation|isDisambiguationPage|2600}} → yes
    2600 is a dab-page
  • {{#invoke:Disambiguation|isDisambiguationPage|Lost Battalion}} → yes
    Lost Battalion is a dab-page

— Martin (MSGJ · talk) 18:14, 26 October 2020 (UTC)[reply]

I tried to add {{mil-unit-dis}} to the module and failed miserably. Please can someone help with this? — Martin (MSGJ · talk) 12:11, 29 October 2020 (UTC)[reply]

Sandbox:
  • 2600 -> yes (numberdis)
  • K7 -> yes (Letter-Number Combination Disambiguation)
  • Lost Battalion -> yes (mil-unit-dis).
Something to do with the pattern looking for dashes earlier in the list? --Izno (talk) 15:47, 29 October 2020 (UTC)[reply]
(Maybe. Moving mil-unit-dis doesn't fix it. --Izno (talk) 15:50, 29 October 2020 (UTC))[reply]
The dashes need escaping with %, which I have now done — Martin (MSGJ · talk)

Set index article

A set index article is not a dab page, but I think it might be useful to be able to detect them, whether with this module or a different one — Martin (MSGJ · talk) 18:13, 26 October 2020 (UTC)[reply]

  • {{#invoke:Disambiguation|isDisambiguationPage|War of independence}}
    War of independence is a set index article

Tracking

Category:Disambiguation pages not detected by Module:Disambiguation lists 0 pages which are tagged with {{WikiProject Disambiguation}} but whose subject pages are not detected as being dab pages by this module. They consist of:

— Martin (MSGJ · talk) 12:04, 27 October 2020 (UTC)[reply]

Methodology

There are 32 templates in Category:Disambiguation message boxes. Each one may have several redirects (e.g. Template:Place name disambiguation has 7). That is a lot of possibilities to check for, and there is nothing to stop editors creating a new template (or redirect) at any time. Is there any better approach we could take? — Martin (MSGJ · talk) 12:32, 27 October 2020 (UTC)[reply]

We could have a bot update a list of disambiguation templates, I suppose. — Mr. Stradivarius ♪ talk ♪ 12:01, 29 October 2020 (UTC)[reply]
It might also help if I look to see which redirects are not being used much and take them to RfD. By the way, could you help me with the request above? — Martin (MSGJ · talk) 12:12, 29 October 2020 (UTC)[reply]

Templates

Template-protected edit request on 1 October 2021

Please change "[Ll]etter%-Number Combination Disambiguation" to "[Ll]etter%-[Nn]umber [Cc]ombination [Dd]isambiguation", so that articles with {{Letter-number combination disambiguation}} are also correctly identified. rchard2scout (talk) 09:20, 1 October 2021 (UTC)[reply]

 Done firefly ( t · c ) 09:29, 1 October 2021 (UTC)[reply]

Template-protected edit request on 1 October 2021 (2)

Please add the following list of templates:

	"[Mm]il-unit-disambig",
	"[Mm]ilitary unit disambiguation",
	"[Gg]eo-dis",
	"[Gg]eodisambig",
	"[Dd]isambig[Gg]eo",
	"[Dd]isambig[GN]",
	"[Dd]isambigNm",
	"[Dd]isambigName",
	"[Ss]urname",
	"[Ss]urnames",
	"[Ss]pecies Latin name disambiguation",
	"[Ss]peciesLatinNameDisambig",
	"[Ll]atinNameDisambig",
	"[Mm]athematical disambiguation",
	"[Mm]athematics disambiguation",
	"[Mm]athdab",
	"[Mm]ath dab",
	"[Rr]oad disambiguation",
	"[Rr]oaddis",

And change "[Gg]eodis" to "[Gg]eod[ai][bs]"

These were found by sampling pages in Category:Disambiguation pages not detected by Module:Disambiguation, and I think they'll cover the majority of pages that are not a set list.

I'm not very proficient in Lua string patterns, so it might be possible to merge some of the items in this list with smarter regexing (which might or might not impact performance, I have no idea). rchard2scout (talk) 10:28, 1 October 2021 (UTC)[reply]

@Rchard2scout Could we expand all templates (Special:ExpandTemplates – which in Lua seems to be frame:preprocess) and just check for __DISAMBIG__ in the expanded output? If possible, that would be accurate, though at the cost of performance. – SD0001 (talk) 10:40, 1 October 2021 (UTC)[reply]
I think that would be even better. It would certainly be cleaner and less likely to break. I have no idea of the performance cost of that. You could just try it (per WP:PERF) if you think it won't cause any huge problems. --rchard2scout (talk) 10:59, 1 October 2021 (UTC)[reply]
I wouldn't be surprised if this broke some transclusions. When wikitext is preprocessed in a Lua module, that counts towards the total Lua execution time for the page. If this time exceeds 10 seconds, then an error is shown and no more Lua modules are processed. If the page being tested happens to be a huge article that takes several seconds to render, then there is a good chance that the original page being rendered will go over the 10-second limit, particularly if it is using Module:Disambiguation multiple times. — Mr. Stradivarius ♪ talk ♪ 11:54, 22 October 2021 (UTC)[reply]

I have added those requested above — Martin (MSGJ · talk) 12:02, 25 October 2021 (UTC)[reply]

Template-protected edit request on 14 April 2022

Can "[Hh]ndab" and "[Gg]iven name" be added? Similar templates such as "[Hh][Nn][Dd][Ii][Ss]" and "[Ss]urname" are included. Michaelwallace22 (talk) 17:37, 14 April 2022 (UTC)[reply]

Given name and surname list pages are not disambiguation pages. olderwiser 17:42, 14 April 2022 (UTC)[reply]
 Not done as there is active discussion disputing this request above. If a consensus is reached please reactivate the edit request. — xaosflux Talk 15:50, 21 April 2022 (UTC)[reply]

Error

The module seems to identify Category:Disambig-Class biography articles as a disambiguation page (see the class rating on Category talk:Disambig-Class biography articles) probably because of the Disambig in the page title. — Martin (MSGJ · talk) 07:37, 21 April 2023 (UTC)[reply]

No, perhaps I'm wrong. The example below shows only the talk page is incorrectly identified — Martin (MSGJ · talk) 07:40, 21 April 2023 (UTC)[reply]

{{#invoke:Disambiguation|isDisambiguationPage|Category:Disambig-Class biography articles}}{{#invoke:Disambiguation|isDisambiguationPage|Category talk:Disambig-Class biography articles}}

 Resolved The module identifies Category talk:Disambig-Class biography articles as a dab page because it contains the template {{WikiProject Disambiguation}}. This could be fixed, but it's a minor issue — Martin (MSGJ · talk) 07:54, 21 April 2023 (UTC)[reply]

Performance issues

I've been doing some work to cut down on the execution speed of this module after Module:Class mask started using it, since it's been heavily impacting the execution time of Talk:World War II (To the point of occasional timeouts). I've implemented a faster version on the sandbox, and it works as expected, cutting down the execution time from ~8.5s to ~3.5s for the above talk page, and still detects pages as expected, but when I tried using mw.ustring instead of string to match how the current version works, it kills the performance. 2 Questions:

  1. How important is using mw.ustring here as opposed to string?
  2. Why does mw.ustring impact the performance so much?

Thanks. Aidan9382 (talk) 11:30, 27 April 2023 (UTC)[reply]

Thanks, that would be fantastic. There are no unicode characters in the strings being checked, so I would guess that ustring is not required here. On question 2, I suppose there are 150,000 characters to check instead of 128? By the way, I did something on Talk:World War II to improve the situation but if you need it for testing then just revert me — Martin (MSGJ · talk) 11:53, 27 April 2023 (UTC)[reply]
The main reason the performance impact from using mw.ustring confuses me is that testing the current live version of the module with and without mw.ustring doesn't seem to change the performance, but for the /sandbox version it does. Judging from a very quick look, it appears it might be an issue with (g)match in specific, since swapping out ustring.match for ustring.find seems to be significantly faster. I don't think it's worth converting from gmatch to some system that avoids using match just to use ustring, so I'll just use string. Thanks for the advice. Aidan9382 (talk) 12:10, 27 April 2023 (UTC)[reply]
3.5s is still significant and I feel sure there are more efficient methods we could use instead of analysing the whole text of the page. How about using Module:Is instance to check if the page is an instance of Wikimedia disambiguation page (Q4167410)? — Martin (MSGJ · talk) 12:14, 27 April 2023 (UTC)[reply]
3.5s isn't exactly accurate - thats the process speed of every module there, not just Module:Disambiguation. A more accurate measure of the change is ~6.6s to ~0.5s
As for implementing Module:Is instance - that probably will work, but that module in specific has only a template entry point, so I'd have to implement a solution into here or modify that module for a module entry point, and I've not used wikidata functions before, so someone else might be better for doing that. Aidan9382 (talk) 12:33, 27 April 2023 (UTC)[reply]
~6.6s to ~0.5s sounds even more impressive. I'll let you deploy your version first because that is a clear improvement. Then I might experiment with using wikidata and do some comparison tests later. — Martin (MSGJ · talk) 12:40, 27 April 2023 (UTC)[reply]
(edit conflict) Ustring is not required when checking for __DISAMBIG__ or (disambiguation). It does make a difference when checking for the disambiguation template patterns, however. With the normal string library, %s will match only ASCII space characters (spaces, newlines, carriage returns, tabs, vertical tabs, and I think form feed characters). With ustring, it will match "all characters with General Category "Separator", plus tab, linefeed, carriage return, vertical tab, and form feed". This includes things like non-breaking spaces, zero-width spaces, etc. These kinds of spaces are counted as whitespace by the Mediawiki parser when parsing template syntax, so ignoring them will potentially lead to this module missing disambiguation templates. For example, it would miss {{ disambiguation }} - note the full-width spaces. The number of such disambiguation template transclusions is probably going to be very small, however. As for question 2, Scribunto calls back into PHP every time you do an operation with mw.ustring. Lua doesn't handle Unicode natively, and the Scribunto developers chose to work around this by using PHP's Unicode libraries. Each time you switch between Lua and PHP there is some cross-process overhead, and this can add up if you use ustring patterns frequently. — Mr. Stradivarius ♪ talk ♪ 12:23, 27 April 2023 (UTC)[reply]
Thanks. This sounds roughly like what I suspected in terms of behaviour differences. Do you know if there's a way to implement ustring here without the issue of major overhead (which is mostly coming from the single gmatch call)? I'm not overly familiar with ustring, so I'm not sure if theres anything special possible there. I don't suspect this to be a major issue (I've never personally ran into a template with abnormal whitespace, and don't imagine they're common either) so I may deploy the string version anyways, but having this fixed would still be nice. Aidan9382 (talk) 12:42, 27 April 2023 (UTC)[reply]
I don't think there is a way to make ustring itself faster. There are some things you could tweak in your algorithm, though. At the moment you are finding the names of all the templates on the page, then checking each name to see if it matches one of the patterns. Instead, you could check each pattern each time you found a new template name. If you find a match, then this would mean you wouldn't have to check any of the other templates on the page.
	for template in string.gmatch(content, "{{%s*([^|}]-)%s*[|}]") do
		for _i, v in ipairs(disambigTemplates) do
			if string.match(template, "^"..v.."$") then
				return true
			end
		end
	end
In practice, this probably won't make too much difference, however, as disambiguation templates are usually at the end of the page. If you made a custom gmatch function that searches from the end of the string to the start of the string, then this might result in a speed-up. Having said that, the original gmatch is written in C, and the custom one would be in Lua, so if there are no disambiguation templates on a page, there is a good chance that this will be slower. If you do decide to do more work on performance for this module, you should measure everything to find out what works and what doesn't. — Mr. Stradivarius ♪ talk ♪ 02:15, 29 April 2023 (UTC)[reply]

Using Wikidata

In the sandbox is a version which will look at instance of (P31) to see if the page is a disambiguation page. It isn't foolproof (as there are a few pages not correctly identified) but it might be an efficient thing to check before analysing the content of the page. — Martin (MSGJ · talk) 15:29, 15 May 2023 (UTC)[reply]

Would be interested to see a comparison of speeds of the sandbox vs live. (Not sure how best to test it.) — Martin (MSGJ · talk) 17:44, 15 May 2023 (UTC)[reply]
How I perform speed tests is to fire a template call a bunch of times (Like this), and to compare the "Lua time usage" stat in show preview on both the live and sandbox version. Using an #invoke: with isDisambiguationPage, the time difference (from live to sandbox) appears to be from 0.09s to 0.25s on a non-disambiguation page (Peter Dulley) and 0.07s to 0.11s for a disambiguation page (Example), though timings are a bit give or take due to how much they fluctuate. I didn't think that the sandbox would be slower, so I wonder where most of the time impact is coming from. Aidan9382 (talk) 18:06, 15 May 2023 (UTC)[reply]
Interesting - so no benefit at all then. — Martin (MSGJ · talk) 21:50, 15 May 2023 (UTC)[reply]

False positives for pattern [%w_%s]-%f[%w][Dd]isam[%w]-

The pattern [%w_%s]-%f[%w][Dd]isam[%w]- matches {{italic disambiguation}}, which is not a disambiguation template. (See the comment about fictional character pages here.) This pattern seems overly broad - can it be made more specific? — Mr. Stradivarius ♪ talk ♪ 02:10, 30 August 2023 (UTC)[reply]

Partly fixed: those characters are now Articles with short description. However, they're also still Pages with short description; is that correct? Certes (talk) 10:33, 30 August 2023 (UTC)[reply]
@Mr. Stradivarius: I made an edit for this to the sandbox: [1] which leaves the regex pattern but adds a false positives list that then gets checked against; will wait for someone who is more familiar with the template/Lua to promote. --PresN 12:35, 30 August 2023 (UTC)[reply]
Thanks. Given Lua's limited implementation of regexp, I think an exception list is the best we can do. Certes (talk) 18:07, 30 August 2023 (UTC)[reply]
Okay, I added some more testcases to the test page to cover this and other use cases, and I felt confident enough to promote. Ping me if there are any issues. --PresN 21:32, 30 August 2023 (UTC)[reply]

Surnames

Why is "[Ss]urnames?" in the template list, causing isDisambiguationPage() to return "yes" for a surname? Surname pages are set index articles, not disambiguation pages. Given name articles do not display this behaviour. ({{Hndis}} is correctly included, because full name lists are dabs, though we should really also add its widely used redirect {{Hndab}}.) Certes (talk) 18:06, 30 August 2023 (UTC)[reply]

Pages broken

PresN every page with a WikiProject template seems to be broken as a result of your change, e.g. Talk:Basil Spence, please revert asap. – Isochrone (T) 21:37, 30 August 2023 (UTC)[reply]

Leave a Reply