MediaWiki_talk:Captcha-addurl-whitelist

MediaWiki talk:Captcha-addurl-whitelist

MediaWiki talk:Captcha-addurl-whitelist


Hi, can we add the BBC, Guardian and independent?:

ϢereSpielChequers 17:58, 21 March 2018 (UTC)
At least 1 day hold - not sure if this should have wider review (perhaps at WP:RSN)? — xaosflux Talk 12:22, 26 March 2018 (UTC)
 On hold @WereSpielChequers: I think this needs a wider review as it has a broad impact. Please bring post at least a discussion on WP:RS for the specific additions you would like. If there is no objection after a week (or if consensus forms for support) please include that discussion link here and reactivate the edit request. — xaosflux Talk 14:13, 28 March 2018 (UTC)
We have the New York Times, which is unfortunately paywalled, and it seems to me that dozens of other sources could safely be listed. In the UK, the BBC and Guardian are obviously of a similar calibre. The Independent is not impartial but I think it still qualifies along with the Times (of London), Financial Times, Telegraph and some of the tabloids. I expect most other countries with free speech could provide a similar list. Certes (talk) 14:39, 28 March 2018 (UTC)
@Certes: this page has very little watchers, I suggest you bring this up at WP:RSN or another large forum. Please include specific URL/domain names in discussions for review. This certainly CAN be expanded easily from a technical level. — xaosflux Talk 14:46, 28 March 2018 (UTC)
Thanks. I certainly didn't know about this page until you kindly pointed me at a link to it. My reply was more aimed at WereSpielChequers or anyone else bringing the topic up at WP:RSN. It's all too easy for us to take easy editing for granted and to overlook the obstacles which (perhaps for good reasons) lie in the way of newcomers. Certes (talk) 15:35, 28 March 2018 (UTC)
@Certes: if you brought this up at a larger venue like RSN and it was OK, feel free to reactivate the edit request and add the discussion link below. — xaosflux Talk 00:29, 19 April 2018 (UTC)

From the Wikipedia Library

Hi,

Sam Walton provided this list of websites from the Wikipedia Library partners. Clayoquot (talk | contribs) 23:13, 29 March 2018 (UTC)

More information Extended content, Publisher ...
@Clayoquot: I posted at Wikipedia:Reliable_sources/Noticeboard#White_listing_sites_from_WP:TWL for a review, if no issues in a week please activate the edit request tag at the top of this section. Thanks, — xaosflux Talk 01:51, 30 March 2018 (UTC)
{{on hold}} pending RSN or time. — xaosflux Talk 14:51, 30 March 2018 (UTC)
Thanks. The relevant discussion is now archived and there were no objections. Cheers, Clayoquot (talk | contribs) 22:14, 18 April 2018 (UTC)
 Doing...xaosflux Talk 23:09, 18 April 2018 (UTC)
 Done @Clayoquot: these have been added, let me know if you see any trouble. — xaosflux Talk 23:14, 18 April 2018 (UTC)
Excellent! I'm glad I mentioned it, which I think is what led to all this activity. Thanks for getting some sensible updates through, all. :) Quiddity (WMF) (talk) 23:41, 18 April 2018 (UTC)
Thanks to @Samwalton9 (WMF): as well. — xaosflux Talk 00:27, 19 April 2018 (UTC)
No problem! There are definitely many more sites that could be added here, but that's a good start :) Samwalton9 (WMF) (talk) 09:50, 19 April 2018 (UTC)

Proposal to add major newspapers etc.

A short RSN discussion showed some support for the principle of adding major newspapers to this list, and I think we can extend that to some other media such as the BBC. Should we produce a full list for approval?

Please can non-UK editors add respected journals from their own countries? The Washington Post, The Globe and Mail and The Hindu have been suggested. I've left off tabloids such as The Sun (United Kingdom) and the Daily Mirror to maximise the chance of approval. I hope we can leave the initial www off the URL pattern, to allow variants such as news.bbc.co.uk. The Times has a paywall; is it worth including such sources?

Someone recently posted a link to a useful article with a Venn diagram classifying news sources by political bias and level of detail, but I've lost it. Please can someone point us at that again? Thanks, Certes (talk) 10:44, 19 April 2018 (UTC)

 On hold activated an edit request too see if any patrolling admins want to comment before processing. — xaosflux Talk 12:03, 19 April 2018 (UTC)
Would it be better to start this discussion somewhere else, returning if and when it has enough detail and support to qualify as an edit request? If so, is WP:RSN the right forum? I don't think anyone doubts that these are reliable sources; the question is whether they should be added to this whitelist. Certes (talk) 12:19, 19 April 2018 (UTC)
@Certes: RSN is the best forum I can think of for these, you can move it there, or just link in to this from there with a summary. Basically if domains are representative of reliable sources, are useful for new users, and not being abused (such as for spam, advertising, selling subscriptions, etc) they are OK to be on this list as far as I'm concerned. — xaosflux Talk 12:27, 19 April 2018 (UTC)
A notice was posted at WP:RSN on 19 April asking that people come here to comment. EdJohnston (talk) 14:39, 22 April 2018 (UTC)
FWIW, I fully support this. Ed [talk] [majestic titan] 19:57, 22 April 2018 (UTC)
 Doing...xaosflux Talk 20:08, 22 April 2018 (UTC)
 Donexaosflux Talk 20:12, 22 April 2018 (UTC)
Thank you! I still hope editors from beyond the UK will contribute similar lists for their countries. Certes (talk) 22:48, 22 April 2018 (UTC)

What exactly is this?

I wonder what exactly is this? Is this just a list of urls that don't require a CAPTCHA for unregistered users? Therefore should we add all low risks but popular URLs? --Emir of Wikipedia (talk) 20:49, 22 April 2018 (UTC) (please Reply to icon mention me on reply; thanks!)

@Emir of Wikipedia: yes, normally unregistered and new editors have to solve a captcha to add links; these specific domains are exempt from that. There is some performance to consider, so keeping this to "popular" as in links that are actually being appropriately added to pages is a factor. In general this means the links should be for "reliable sources". It is important that the exemptions are not useful for disruptive use as well. We have only recently begun using this and this page is not well watched - I suggest discussing additions at WP:RSN first. — xaosflux Talk 21:39, 22 April 2018 (UTC)
Thanks for the information. I have seen the discussions at RSN and came here for clarification. --Emir of Wikipedia (talk) 20:01, 23 April 2018 (UTC)

Please add IPCC and National Academies domains

Could you please add:

  • ipcc.ch (Intergovernmental Panel on Climate Change)
  • nap.edu (National Academies of Sciences, Engineering, and Medicine)

? Clayoquot (talk | contribs) 22:52, 22 February 2020 (UTC)

 Not done (not yet) following the directions, please link to where this was discuss additions publicly such as at the Wikipedia:Reliable sources/Noticeboard. — xaosflux Talk 14:02, 23 February 2020 (UTC)
Xaosflux, it's pretty inconceivable that a discussion at RSN would yield a result other than "yes, those are reliable sources". Would you consider pulling an IAR to add these two without going through a community process? Best, Clayoquot (talk | contribs) 17:57, 23 February 2020 (UTC)
@Clayoquot: I'll leave this open for at least a day in case anyone else wants to skip the discuss (which on these is usually more of a 'no objections, go ahead') type. I've never heard of ipcc.ch, (it appears to only have 5 article usages). nap.edu only appears to have 4 article usages as well - so at the very least these don't seem to be popular sources. — xaosflux Talk 19:00, 23 February 2020 (UTC)
Xaosflux, For www.nap.edu, I'm seeing usage in 957 pages, and www.ipcc.ch appears to be referenced in 736 pages. Clayoquot (talk | contribs) 17:42, 24 February 2020 (UTC)
Looks like I had my wildcard wrong, more popular than my first count indeed :) — xaosflux Talk 18:14, 24 February 2020 (UTC)
Xaosflux, We've all done that :) Clayoquot (talk | contribs) 02:54, 25 February 2020 (UTC)
@Clayoquot: please post at WP:RSN if you are ignored for a week, reactivate and I'll add here. — xaosflux Talk 15:26, 27 February 2020 (UTC)
Posted there. Thanks. Clayoquot (talk | contribs) 18:20, 27 February 2020 (UTC)
Done. There were no objections: https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Noticeboard/Archive_286#CAPTCHA_exemption_for_reliable_domains Clayoquot (talk | contribs) 22:00, 7 March 2020 (UTC)
Could someone make this change please? @Xaosflux:? Clayoquot (talk | contribs) 17:30, 11 March 2020 (UTC)
 Done @Clayoquot: as there were no objections, I've added. — xaosflux Talk 17:38, 11 March 2020 (UTC)

RfC on adding generally reliable sources to the CAPTCHA whitelist

There is a request for comment on adding generally reliable sources from the perennial sources list to the CAPTCHA whitelist, which allows new and anonymous users to cite them in articles without needing to solve a CAPTCHA. If you are interested, please participate at WP:RSN § Adding generally reliable sources to the CAPTCHA whitelist. — Newslinger talk 19:42, 7 March 2020 (UTC)

The discussion has passed with "near-unanimous" consensus in favour of the proposal and should be implemented. For future reference, it is now archived at Wikipedia:Reliable_sources/Noticeboard/Archive_291#Adding_generally_reliable_sources_to_the_CAPTCHA_whitelist. 107.190.33.254 (talk) 17:01, 7 May 2020 (UTC)
Would someone please regex this up in to a ready to go addition, then activate the edit request here? — xaosflux Talk 00:57, 8 May 2020 (UTC)

@Newslinger and Xaosflux: Not sure why this discussion died out, but on WP:RSNP, this did the trick:

console.log([...$('.perennial-sources .s-gr a[href*="Linksearch&target=https://"]')].map(a => '\\b' + a.href.match(/\*\.(.*)/)[1].replaceAll(".", "\\.")).join("\n"))
More information RSNP list ...
More information Duplicates to remove from the old list ...

I participated in that discussion, but see no reason think the consensus isn't still valid. Suffusion of Yellow (talk) 19:19, 20 May 2023 (UTC)

@Suffusion of Yellow I was only here as an edit request patrolling admin, the ER wasn't ready - if it's ready now, please reactivate the request to enqueue this again. — xaosflux Talk 19:50, 20 May 2023 (UTC)
Well, I don't see any problems, but can't hurt to ask Headbomb who probably has RSNP memorized. Does it look like I generated that list properly? Suffusion of Yellow (talk) 23:21, 20 May 2023 (UTC)
Minor quibble: does the /en after bdw.com actually work? I'm not exactly how the check does with the whitelist, but I imagine it works only on the domain (not the path within the host), to prevent citations such as wikipedia.org.spamsite.tld/spamspamspam.doc. Certes (talk) 11:10, 21 May 2023 (UTC)
Oops, it doesn't: see #Protected edit request on 11 April 2021 (updated today) below. Certes (talk) 22:34, 21 May 2023 (UTC)
I've reactivated the request, per lack of objection. Please:
  • Add all lines from the "RSNP" list above
  • Remove all lines from the "Duplicates" list
Thanks. Suffusion of Yellow (talk) 20:54, 23 May 2023 (UTC)
 Done Izno (talk) 23:09, 24 May 2023 (UTC)

Adding NCBI to the list

Resolved

Is undeniably a source of reliable peer-reviewed journal articles and is often used in citations (eg. WP:PUBMED) - i.e. same as jstor.org, which is already on the list. 107.190.33.254 (talk) 17:08, 7 May 2020 (UTC)

The entire nih.gov domain is already on the list - is it not working? — xaosflux Talk 17:48, 7 May 2020 (UTC)
My bad; then; I only searched for "ncbi" using ctrl+f and couldn't find it. Through I could have sworn it didn't always work; maybe it was some other website as result of citation templates or maybe I was adding multiple sources. Anyway, now it works without a doubt, case closed. Thanks, 107.190.33.254 (talk) 18:19, 7 May 2020 (UTC)

Protected edit request on 14 May 2020

Remove "such as those used in {{cite doi}}." from the header and "and in Template:Cite doi" from the comment after doi.org, since Template:Cite doi was deprecated. * Pppery * it has begun... 19:35, 14 May 2020 (UTC)

 Done. Thanks for submitting this! — Newslinger talk 21:46, 14 May 2020 (UTC)

Protected edit request on 11 April 2021

  • Change every single regex entry to have $ at the end. Two example lines:
    • - \bwikipedia\.org # All language versions of Wikipedia
    • + \bwikipedia\.org$ # All language versions of Wikipedia
    • (...)
    • - \bbbc\.com
    • + \bbbc\.com$

I've indicated with <del> and <ins> what the respective changes for these lines should be, but I think the changes should be self-explanatory.

The reason this change is necessary is because currently this whitelist also whitelists urls such as http://wikipedia.org.phishing.site.example.org/my_virus_url, just to give a blatant example of a bad url. Please do test this yourself, but from my testing on another wiki, those URLs were accepted as long as the regular expressions are not finished with a $. As the page states: "Every non-blank line is a regex fragment which will only match hosts inside URLs". This means that the end of the domain name can safely be finished with a $ marker, since the text that will be matched against will never contain anything after the last character in the domain name.

I'm not sure if this should be communicated to other international versions of wikipedia, but it seems relevant for you guys to change this since you are the first hit on Google when I search for the system message name ("MediaWiki:Captcha-addurl-whitelist"). Joeytje50 (talk) 17:43, 11 April 2021 (UTC)

I'm pretty sure this would break it to only allow https://wikipedia.org, and not say https://wikipedia.org/any/page.php. If I'm right, what you actually want is to add a / to the end. Anomie 01:03, 12 April 2021 (UTC)
If the trailing slash is optional then we need something like \bwikipedia\.org(/.*)?$, though I think this still allows not-wikipedia.org. Certes (talk) 10:14, 12 April 2021 (UTC)
The \b boundries aren't stopping that? — xaosflux Talk 17:58, 16 April 2021 (UTC)
 Not done this needs more review and testing before bulk changes are made. — xaosflux Talk 17:58, 16 April 2021 (UTC)

@Joeytje50, Anomie, Xaosflux, and Certes: Some tests at test2wiki (testwiki's link handling is broken) Anything not marked (captcha) didn't get a captcha:

So yes, the problem is real. It looks like the right format is (?<=[./])some\.good\.site(?:/|$) Not sure what to do here. Adding all those (?:/|$) seems cheap enough. But what about all those (?<=[./]) lookbehinds? Could that cause a performance hit? Suffusion of Yellow (talk) 21:54, 21 May 2023 (UTC)

Even that will match https://malicious.domain/pretending.to.be.some.good.site/virus.exe, though not https://some.good.site:80/innocent.doc. Is the whole URL matched against the pattern? If so, we may need to parse the whole URL, starting the regexp with ^. There's at least one whole website devoted to how to do that properly, or see page 50 of https://www.ietf.org/rfc/rfc3986.txt. Certes (talk) 23:03, 21 May 2023 (UTC)
No, see the https://spam.site/acm.org example above. Assuming this is the right place, the regexes are bundled together, then prefixed with ^(?:https?:)?\/\/+[a-z0-9_\-.]*. We could use the <noprotocol> option and supply the prefixes ourselves, but would that be even slower? Or we could do the bundling ourselves, but that would make this page as unreadable as some edit filters. Suffusion of Yellow (talk) 23:47, 21 May 2023 (UTC)
@Suffusion of Yellow and Certes: If the prefix ^(?:https?:)?\/\/+[a-z0-9_\-.]* is added, then that would be an issue in MediaWiki itself, right? You would expect the prefix to require a period at the end, if there is any subdomain preceding the whitelisted domain. Otherwise I'm pretty sure almost every single wiki that has a whitelist is vulnerable to adding a link to http://fake-wikipedia.org (demo). A simple \b is not sufficient, due to the existence of the dash in domain names.
So regardless of this protected edit request, I'd say MediaWiki should change the prefix to ^(?:https?:)?\/\/+([a-z0-9_\-.]*\.)* to enforce the period at the end. Let me know what you guys think about that.
Regarding this edit request, I'd say the testing done by Suffusion of Yellow is pretty conclusive that some changes are needed. The lookbehind is required because of the aforementioned issue with hyphens (simple \b is insufficient), and the lookahead for the trailing slash or string terminator is required because otherwise wikipedia.org.spam.site would be whitelisted as well. I haven't re-enabled the edit request template at the top, but if anyone knows what the impact would be on performance, I think this request can be re-enabled. If performance is impacted significantly, I think the aforementioned change to MediaWiki software is even more important, and if lookbehinds are impacting performance, I'd assume changing the lookbehind to (/|$) as a regular capturing group would work as well.
The updated edit request is now:
At the start of every line: \b(?<=[./])
At the end of every line: (?:/|$)
Joeytje50 (talk) 11:49, 29 January 2024 (UTC)
Thanks, that looks good to me. It's hard to be sure without analysing the code which will apply the regexp, but I am hopeful that it will work without side effects. Certes (talk) 13:48, 29 January 2024 (UTC)

Protected edit request on 20 May 2023

Please add:

\btoolforge\.org

I assume this will be uncontroversial; wmflabs is already there. Suffusion of Yellow (talk) 00:22, 20 May 2023 (UTC)

 Donexaosflux Talk 01:07, 20 May 2023 (UTC)

Protected edit request on 1 June 2023

Please add the following URLs (except for books.google.com and cnbc.com, those are auto-generated by various CS1 templates when the required IDs are passed to them; see Template:Citation Style documentation/id2):

\bapi\.semanticscholar\.org
\barxiv\.org
\bbiorxiv\.org
\bbooks\.google\.com
\bciteseerx\.ist\.psu\.edu
\bcnbc\.com
\bhdl\.handle\.net
\blccn\.loc\.gov
\bmathscinet\.ams\.org
\bopenlibrary\.org
\bosti\.gov
\bpapers\.ssrn\.com
\btools\.ietf\.org
\bui\.adsabs\.harvard\.edu
\bzbmath\.org

93.72.49.123 (talk) 14:50, 1 June 2023 (UTC)

 Done  Martin (MSGJ · talk) 12:28, 13 June 2023 (UTC)

Share this article:

This article uses material from the Wikipedia article MediaWiki_talk:Captcha-addurl-whitelist, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.