→Pages broken: new topic Tag: CD |
|||
Line 284: | Line 284: | ||
Why is <code>"[Ss]urnames?"</code> in the [[Module:Disambiguation#L-26|template list]], causing <code>isDisambiguationPage()</code> to return "yes" for a surname? [[WP:Surname pages|Surname pages]] are [[WP:set index articles|set index articles]], not disambiguation pages. Given name articles do not display this behaviour. ({{tl|Hndis}} is correctly included, because full name lists are dabs, though we should really also add its widely used redirect {{tl|Hndab}}.) [[User:Certes|Certes]] ([[User talk:Certes|talk]]) 18:06, 30 August 2023 (UTC) |
Why is <code>"[Ss]urnames?"</code> in the [[Module:Disambiguation#L-26|template list]], causing <code>isDisambiguationPage()</code> to return "yes" for a surname? [[WP:Surname pages|Surname pages]] are [[WP:set index articles|set index articles]], not disambiguation pages. Given name articles do not display this behaviour. ({{tl|Hndis}} is correctly included, because full name lists are dabs, though we should really also add its widely used redirect {{tl|Hndab}}.) [[User:Certes|Certes]] ([[User talk:Certes|talk]]) 18:06, 30 August 2023 (UTC) |
||
== Pages broken == |
|||
{{u|PresN}} every page with a WikiProject template seems to be broken as a result of your change, e.g. [[Talk:Basil Spence]], please revert asap. – [[User:Isochrone|<span style="color:#042559">I</span><span style="color:#0c3c8a">s</span><span style="color:#1d58b8">o</span>chrone]] ([[User talk:Isochrone|T]]) 21:37, 30 August 2023 (UTC) |
Revision as of 21:37, 30 August 2023
Plain DISAMBIG magic word
@Evad37: I just had a look over this module's code, and it seems to be missing the case when a user specifies a plain __DISAMBIG__
magic word without using a template. I'm not sure how many disambiguation pages actually do this, but it's probably worth putting it in. Best — Mr. Stradivarius ♪ talk ♪ 04:23, 20 October 2020 (UTC)
- About 7 pages according to this search, most of which already have a disambiguation template. But yeah, probably still worth checking for it. - Evad37 [talk] 23:16, 20 October 2020 (UTC)
Done - Evad37 [talk] 22:47, 25 October 2020 (UTC)
- Thanks!
— Mr. Stradivarius ♪ talk ♪ 11:59, 29 October 2020 (UTC)
- Thanks!
Not working?
{{#invoke:Disambiguation|isDisambiguationPage|Abbas Kola}}
→ yes- Abbas Kola is a dab-page
{{#invoke:Disambiguation|isDisambiguationPage|Kamenitsa}}
→ yes- Kamenitsa is a dab-page
{{#invoke:Disambiguation|isDisambiguationPage|List of schools in Ireland}}
→ yes- List of schools in Ireland is a dab-page
— Martin (MSGJ · talk) 22:37, 21 October 2020 (UTC)
Found another few:
{{#invoke:Disambiguation|isDisambiguationPage|K7}}
→ yes- K7 is a dab-page
{{#invoke:Disambiguation|isDisambiguationPage|2600}}
→ yes- 2600 is a dab-page
{{#invoke:Disambiguation|isDisambiguationPage|Lost Battalion}}
→ yes- Lost Battalion is a dab-page
— Martin (MSGJ · talk) 18:14, 26 October 2020 (UTC)
I tried to add {{mil-unit-dis}} to the module and failed miserably. Please can someone help with this? — Martin (MSGJ · talk) 12:11, 29 October 2020 (UTC)
- Sandbox:
- 2600 -> yes (numberdis)
- K7 -> yes (Letter-Number Combination Disambiguation)
- Lost Battalion -> yes (mil-unit-dis).
- Something to do with the pattern looking for dashes earlier in the list? --Izno (talk) 15:47, 29 October 2020 (UTC)
Set index article
A set index article is not a dab page, but I think it might be useful to be able to detect them, whether with this module or a different one — Martin (MSGJ · talk) 18:13, 26 October 2020 (UTC)
{{#invoke:Disambiguation|isDisambiguationPage|War of independence}}
→- War of independence is a set index article
Tracking
Category:Disambiguation pages not detected by Module:Disambiguation lists 0 pages which are tagged with {{WikiProject Disambiguation}} but whose subject pages are not detected as being dab pages by this module. They consist of:
- Articles: perhaps they have been converted from dab pages into articles and {{WikiProject Disambiguation}} was not removed
Redirects- moved to separate category Category:Redirects tagged as disambiguation pages- Set index articles: many of these have been tagged with {{WikiProject Disambiguation}}
- Real dab pages, which are not yet being detected
— Martin (MSGJ · talk) 12:04, 27 October 2020 (UTC)
Methodology
There are 32 templates in Category:Disambiguation message boxes. Each one may have several redirects (e.g. Template:Place name disambiguation has 7). That is a lot of possibilities to check for, and there is nothing to stop editors creating a new template (or redirect) at any time. Is there any better approach we could take? — Martin (MSGJ · talk) 12:32, 27 October 2020 (UTC)
- We could have a bot update a list of disambiguation templates, I suppose. — Mr. Stradivarius ♪ talk ♪ 12:01, 29 October 2020 (UTC)
Templates
- Template:Disambiguation 196489
- Template:Dmbox 410202
- Template:Airport disambiguation 487
- Template:Biology disambiguation 31
- Template:Call sign disambiguation 2786
- Template:Caselaw disambiguation 55
- Template:Chinese title disambiguation 128
- Template:Disambiguation cleanup 9
- Template:Disamb-cleanup 0
- Template:Cleanup disambig 0
- Template:CleanupDisambig 0
- Template:Dabclean 0
- Template:Dab-cleanup 0
- Template:Disambig-CU 0
- Template:Disambig-cu 0
- Template:Disambig cleanup 0
- Template:Geodis-cleanup 0
- Template:Disambig-cleanup 0
- Template:Disambiguation-cleanup 0
- Template:Disambcleanup 0
- Template:Disambigcleanup 0
- Template:Diaambig-cleanup 0
- Template:Disambig-cleaup 0
- Template:Disambig-cleanip 0
- Template:Disambig-clenup 0
- Template:Disambiguate-cleanup 0
- Template:Dab cleanup 0
- Template:Cleanup disambiguation 0
- Template:Genus disambiguation 569
- Template:Hospital disambiguation 127
- Template:Human name disambiguation 63031
- Template:Human name disambiguation cleanup 1
- Template:Letter disambiguation 0
- Template:Letter-number combination disambiguation 4688
- Template:Mathematical disambiguation 356
- Template:Military unit disambiguation 1298
- Template:Music disambiguation 4
- Template:Number disambiguation 734
- Template:Phonetics disambiguation 5
- Template:Place name disambiguation 38880
- Template:Portal disambiguation 6
- Template:Road disambiguation 853
- Template:Roaddis 141
- Template:School disambiguation 3144
- Template:Species Latin name abbreviation disambiguation 2148
- Template:Species Latin name disambiguation 148
- Template:Station disambiguation 1506
- Template:Synagogue disambiguation 39
- Template:Taxonomic authority disambiguation 0
- Template:Taxonomy disambiguation 129
- Template:Wikipedia disambiguation 158
Template-protected edit request on 1 October 2021
Please change "[Ll]etter%-Number Combination Disambiguation"
to "[Ll]etter%-[Nn]umber [Cc]ombination [Dd]isambiguation"
, so that articles with {{Letter-number combination disambiguation}} are also correctly identified. rchard2scout (talk) 09:20, 1 October 2021 (UTC)
Template-protected edit request on 1 October 2021 (2)
Please add the following list of templates:
"[Mm]il-unit-disambig", "[Mm]ilitary unit disambiguation", "[Gg]eo-dis", "[Gg]eodisambig", "[Dd]isambig[Gg]eo", "[Dd]isambig[GN]", "[Dd]isambigNm", "[Dd]isambigName", "[Ss]urname", "[Ss]urnames", "[Ss]pecies Latin name disambiguation", "[Ss]peciesLatinNameDisambig", "[Ll]atinNameDisambig", "[Mm]athematical disambiguation", "[Mm]athematics disambiguation", "[Mm]athdab", "[Mm]ath dab", "[Rr]oad disambiguation", "[Rr]oaddis",
And change "[Gg]eodis"
to "[Gg]eod[ai][bs]"
These were found by sampling pages in Category:Disambiguation pages not detected by Module:Disambiguation, and I think they'll cover the majority of pages that are not a set list.
I'm not very proficient in Lua string patterns, so it might be possible to merge some of the items in this list with smarter regexing (which might or might not impact performance, I have no idea). rchard2scout (talk) 10:28, 1 October 2021 (UTC)
- @Rchard2scout Could we expand all templates (Special:ExpandTemplates – which in Lua seems to be frame:preprocess) and just check for
__DISAMBIG__
in the expanded output? If possible, that would be accurate, though at the cost of performance. – SD0001 (talk) 10:40, 1 October 2021 (UTC)- I think that would be even better. It would certainly be cleaner and less likely to break. I have no idea of the performance cost of that. You could just try it (per WP:PERF) if you think it won't cause any huge problems. --rchard2scout (talk) 10:59, 1 October 2021 (UTC)
- I wouldn't be surprised if this broke some transclusions. When wikitext is preprocessed in a Lua module, that counts towards the total Lua execution time for the page. If this time exceeds 10 seconds, then an error is shown and no more Lua modules are processed. If the page being tested happens to be a huge article that takes several seconds to render, then there is a good chance that the original page being rendered will go over the 10-second limit, particularly if it is using Module:Disambiguation multiple times. — Mr. Stradivarius ♪ talk ♪ 11:54, 22 October 2021 (UTC)
- I think that would be even better. It would certainly be cleaner and less likely to break. I have no idea of the performance cost of that. You could just try it (per WP:PERF) if you think it won't cause any huge problems. --rchard2scout (talk) 10:59, 1 October 2021 (UTC)
I have added those requested above — Martin (MSGJ · talk) 12:02, 25 October 2021 (UTC)
Template-protected edit request on 14 April 2022
Can "[Hh]ndab"
and "[Gg]iven name"
be added? Similar templates such as "[Hh][Nn][Dd][Ii][Ss]"
and "[Ss]urname"
are included. Michaelwallace22 (talk) 17:37, 14 April 2022 (UTC)
- Given name and surname list pages are not disambiguation pages. older ≠ wiser 17:42, 14 April 2022 (UTC)
Not done as there is active discussion disputing this request above. If a consensus is reached please reactivate the edit request. — xaosflux Talk 15:50, 21 April 2022 (UTC)
Error
The module seems to identify Category:Disambig-Class biography articles as a disambiguation page (see the class rating on Category talk:Disambig-Class biography articles) probably because of the Disambig in the page title. — Martin (MSGJ · talk) 07:37, 21 April 2023 (UTC)
- No, perhaps I'm wrong. The example below shows only the talk page is incorrectly identified — Martin (MSGJ · talk) 07:40, 21 April 2023 (UTC)
{{#invoke:Disambiguation|isDisambiguationPage|Category:Disambig-Class biography articles}}
→
{{#invoke:Disambiguation|isDisambiguationPage|Category talk:Disambig-Class biography articles}}
→
Resolved The module identifies Category talk:Disambig-Class biography articles as a dab page because it contains the template {{WikiProject Disambiguation}}. This could be fixed, but it's a minor issue — Martin (MSGJ · talk) 07:54, 21 April 2023 (UTC)
Performance issues
I've been doing some work to cut down on the execution speed of this module after Module:Class mask started using it, since it's been heavily impacting the execution time of Talk:World War II (To the point of occasional timeouts). I've implemented a faster version on the sandbox, and it works as expected, cutting down the execution time from ~8.5s to ~3.5s for the above talk page, and still detects pages as expected, but when I tried using mw.ustring
instead of string
to match how the current version works, it kills the performance. 2 Questions:
- How important is using
mw.ustring
here as opposed tostring
? - Why does
mw.ustring
impact the performance so much?
Thanks. Aidan9382 (talk) 11:30, 27 April 2023 (UTC)
- Thanks, that would be fantastic. There are no unicode characters in the strings being checked, so I would guess that ustring is not required here. On question 2, I suppose there are 150,000 characters to check instead of 128? By the way, I did something on Talk:World War II to improve the situation but if you need it for testing then just revert me — Martin (MSGJ · talk) 11:53, 27 April 2023 (UTC)
- The main reason the performance impact from using mw.ustring confuses me is that testing the current live version of the module with and without mw.ustring doesn't seem to change the performance, but for the /sandbox version it does. Judging from a very quick look, it appears it might be an issue with (g)match in specific, since swapping out ustring.match for ustring.find seems to be significantly faster. I don't think it's worth converting from gmatch to some system that avoids using match just to use ustring, so I'll just use string. Thanks for the advice. Aidan9382 (talk) 12:10, 27 April 2023 (UTC)
- 3.5s is still significant and I feel sure there are more efficient methods we could use instead of analysing the whole text of the page. How about using Module:Is instance to check if the page is an instance of Wikimedia disambiguation page (Q4167410)? — Martin (MSGJ · talk) 12:14, 27 April 2023 (UTC)
- 3.5s isn't exactly accurate - thats the process speed of every module there, not just Module:Disambiguation. A more accurate measure of the change is ~6.6s to ~0.5s
- As for implementing Module:Is instance - that probably will work, but that module in specific has only a template entry point, so I'd have to implement a solution into here or modify that module for a module entry point, and I've not used wikidata functions before, so someone else might be better for doing that. Aidan9382 (talk) 12:33, 27 April 2023 (UTC)
- 3.5s is still significant and I feel sure there are more efficient methods we could use instead of analysing the whole text of the page. How about using Module:Is instance to check if the page is an instance of Wikimedia disambiguation page (Q4167410)? — Martin (MSGJ · talk) 12:14, 27 April 2023 (UTC)
- (edit conflict) Ustring is not required when checking for
__DISAMBIG__
or(disambiguation)
. It does make a difference when checking for the disambiguation template patterns, however. With the normal string library,%s
will match only ASCII space characters (spaces, newlines, carriage returns, tabs, vertical tabs, and I think form feed characters). With ustring, it will match "all characters with General Category "Separator", plus tab, linefeed, carriage return, vertical tab, and form feed". This includes things like non-breaking spaces, zero-width spaces, etc. These kinds of spaces are counted as whitespace by the Mediawiki parser when parsing template syntax, so ignoring them will potentially lead to this module missing disambiguation templates. For example, it would miss{{ disambiguation }}
- note the full-width spaces. The number of such disambiguation template transclusions is probably going to be very small, however. As for question 2, Scribunto calls back into PHP every time you do an operation with mw.ustring. Lua doesn't handle Unicode natively, and the Scribunto developers chose to work around this by using PHP's Unicode libraries. Each time you switch between Lua and PHP there is some cross-process overhead, and this can add up if you use ustring patterns frequently. — Mr. Stradivarius ♪ talk ♪ 12:23, 27 April 2023 (UTC)- Thanks. This sounds roughly like what I suspected in terms of behaviour differences. Do you know if there's a way to implement ustring here without the issue of major overhead (which is mostly coming from the single gmatch call)? I'm not overly familiar with ustring, so I'm not sure if theres anything special possible there. I don't suspect this to be a major issue (I've never personally ran into a template with abnormal whitespace, and don't imagine they're common either) so I may deploy the string version anyways, but having this fixed would still be nice. Aidan9382 (talk) 12:42, 27 April 2023 (UTC)
- I don't think there is a way to make ustring itself faster. There are some things you could tweak in your algorithm, though. At the moment you are finding the names of all the templates on the page, then checking each name to see if it matches one of the patterns. Instead, you could check each pattern each time you found a new template name. If you find a match, then this would mean you wouldn't have to check any of the other templates on the page.
- Thanks. This sounds roughly like what I suspected in terms of behaviour differences. Do you know if there's a way to implement ustring here without the issue of major overhead (which is mostly coming from the single gmatch call)? I'm not overly familiar with ustring, so I'm not sure if theres anything special possible there. I don't suspect this to be a major issue (I've never personally ran into a template with abnormal whitespace, and don't imagine they're common either) so I may deploy the string version anyways, but having this fixed would still be nice. Aidan9382 (talk) 12:42, 27 April 2023 (UTC)
- The main reason the performance impact from using mw.ustring confuses me is that testing the current live version of the module with and without mw.ustring doesn't seem to change the performance, but for the /sandbox version it does. Judging from a very quick look, it appears it might be an issue with (g)match in specific, since swapping out ustring.match for ustring.find seems to be significantly faster. I don't think it's worth converting from gmatch to some system that avoids using match just to use ustring, so I'll just use string. Thanks for the advice. Aidan9382 (talk) 12:10, 27 April 2023 (UTC)
for template in string.gmatch(content, "{{%s*([^|}]-)%s*[|}]") do
for _i, v in ipairs(disambigTemplates) do
if string.match(template, "^"..v.."$") then
return true
end
end
end
- In practice, this probably won't make too much difference, however, as disambiguation templates are usually at the end of the page. If you made a custom gmatch function that searches from the end of the string to the start of the string, then this might result in a speed-up. Having said that, the original gmatch is written in C, and the custom one would be in Lua, so if there are no disambiguation templates on a page, there is a good chance that this will be slower. If you do decide to do more work on performance for this module, you should measure everything to find out what works and what doesn't. — Mr. Stradivarius ♪ talk ♪ 02:15, 29 April 2023 (UTC)
Using Wikidata
In the sandbox is a version which will look at instance of (P31) to see if the page is a disambiguation page. It isn't foolproof (as there are a few pages not correctly identified) but it might be an efficient thing to check before analysing the content of the page. — Martin (MSGJ · talk) 15:29, 15 May 2023 (UTC)
- Would be interested to see a comparison of speeds of the sandbox vs live. (Not sure how best to test it.) — Martin (MSGJ · talk) 17:44, 15 May 2023 (UTC)
- How I perform speed tests is to fire a template call a bunch of times (Like this), and to compare the "Lua time usage" stat in show preview on both the live and sandbox version. Using an #invoke: with isDisambiguationPage, the time difference (from live to sandbox) appears to be from 0.09s to 0.25s on a non-disambiguation page (Peter Dulley) and 0.07s to 0.11s for a disambiguation page (Example), though timings are a bit give or take due to how much they fluctuate. I didn't think that the sandbox would be slower, so I wonder where most of the time impact is coming from. Aidan9382 (talk) 18:06, 15 May 2023 (UTC)
False positives for pattern [%w_%s]-%f[%w][Dd]isam[%w]-
The pattern [%w_%s]-%f[%w][Dd]isam[%w]-
matches {{italic disambiguation}}, which is not a disambiguation template. (See the comment about fictional character pages here.) This pattern seems overly broad - can it be made more specific? — Mr. Stradivarius ♪ talk ♪ 02:10, 30 August 2023 (UTC)
- Partly fixed: those characters are now Articles with short description. However, they're also still Pages with short description; is that correct? Certes (talk) 10:33, 30 August 2023 (UTC)
- @Mr. Stradivarius: I made an edit for this to the sandbox: [1] which leaves the regex pattern but adds a false positives list that then gets checked against; will wait for someone who is more familiar with the template/Lua to promote. --PresN 12:35, 30 August 2023 (UTC)
Surnames
Why is "[Ss]urnames?"
in the template list, causing isDisambiguationPage()
to return "yes" for a surname? Surname pages are set index articles, not disambiguation pages. Given name articles do not display this behaviour. ({{Hndis}} is correctly included, because full name lists are dabs, though we should really also add its widely used redirect {{Hndab}}.) Certes (talk) 18:06, 30 August 2023 (UTC)
Pages broken
PresN every page with a WikiProject template seems to be broken as a result of your change, e.g. Talk:Basil Spence, please revert asap. – Isochrone (T) 21:37, 30 August 2023 (UTC)