Template_talk:AI_generated

Template talk:AI-generated

Template talk:AI-generated


WikiProject iconAI Cleanup
WikiProject iconThis template is within the scope of WikiProject AI Cleanup, a collaborative effort to clean up artificial intelligence-generated content on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.

Encourage editors to delete or to fix?

My proposed change. Seems to me that best practice would be to encourage editors to delete AI generated text when found. It is too time-consuming to fix it when both the prose and the sources are usually fictitious. To me, AI generated text has a similar flavor as copyright and hoaxes, the best practice for which is also deletion.

Regarding the argument that we shouldn't encourage deletion via a maintenance tag, I think it'd be fine, because 1) a maintenance tag can probably get consensus quicker than a new CSD criteria, 2) there may be some exceptions/edge cases, 3) tagging before deleting can help with crowdsourcing (for example, the tagger wants a second opinion or is too busy or too unsure to execute the deletion themselves), 4) maybe just a section is AI generated and not the whole article. Edit to clarify: The tag would encourage someone else to use a deletion process or to delete the problematic section, and also allow room for human judgment and edge cases.

Thoughts? –Novem Linguae (talk) 21:05, 27 January 2023 (UTC)

I think what @Thryduulf wrote at the CSD discussion is right. If an article is in mainspace we should encourage it to be nominated for deletion for all the reasons you say Novem. If it's in draft or userspace, we should give the cleanup options. Best, Barkeep49 (talk) 21:07, 27 January 2023 (UTC)
Deletion discussions are the absolute best way of getting a new CSD criterion - without them there is no evidence of frequency or evidence of a consensus that they should all (or some subset of them) should always deleted. They also aid in crafting the criterion because more data makes it easier to objectively specify what separates things that get deleted from things that don't. Thryduulf (talk) 22:01, 27 January 2023 (UTC)

Different Icon?

While the current icon is a robot, it doesn't seem to convey the idea that it's from a robot "conversation". How about  rsjaffe 🗣️ 22:35, 27 January 2023 (UTC)

The current icon looks far better DFlhb (talk) 00:58, 28 January 2023 (UTC)
@CactiStaccingCrane changed it to even a better one, so I withdraw this suggestion. — rsjaffe 🗣️ 03:11, 28 January 2023 (UTC)

My buddy is a real MJ whiz, so I have asked him if he can come up with something. jp×g 03:11, 28 January 2023 (UTC)

Be interested in seeing it. — rsjaffe 🗣️ 03:12, 28 January 2023 (UTC)

Renaming category

The category should be renamed from "Articles containing suspected AI-generated texts" to "Articles containing non-compliant AI-generated texts", or perhaps "suspected non-compliant", to make it clear that this is a cleanup category rather than a general category for all articles where LLMs were used (though I suppose there is yet no affirmative consensus on whether LLM text generation is allowed or forbidden). DFlhb (talk) 17:14, 31 January 2023 (UTC)

Hidden text from within documentation.

No idea why this was put in the documentation itself as hidden text. Seems better to put it here where people can see it. — Trey Maturin 13:14, 23 May 2023 (UTC)

Previously hidden text

Some trials on articles I've /User:JPxG/ written (using the lead paragraphs): Powder House Island and Nina Jankowicz are estimated at 99% human-written, but First Wikipedia edit is at 20% GPT for some reason. 1-Pentadecanol returns 40% GPT based on the first sentence, which decreases with subsequent sentences to 99%. However, when I asked ChatGPT to "write me a Wikipedia article about 1-pentadecanol", the result was this:

1-Pentadecanol, also known as n-pentadecanol or pentadecyl alcohol, is a straight-chain saturated primary alcohol with the chemical formula CH3(CH2)14OH. It is a white, waxy solid that is insoluble in water, but is miscible with most organic solvents.

1-Pentadecanol is found in many natural sources, including vegetable oils and animal fats. It is also used as a raw material in the manufacture of lubricants, candles, and other products.

This was estimated as 92% human-written. I don't know exactly what the underlying mechanism of this tool is, but we may want to take its output with a grain of salt.

When encountering a newly created page, given that the prose seems reasonably competently written (and, typically, at least several paragraphs long), these are some very mild indicators that undeclared LLM use is involved, which could mean that checking with a machine identifier could be worthwhile:

  • The creator has words like "AI" or "neural" (and similar terms indicating an interest in LLMs or, more broadly, deep learning) in their username
  • The content concerns a fictitious subject; sometimes the title will just be fairly nonsensical, or will render as poor English, yet the content (not having any grammatical or orthographic errors) seems surprisingly coherent on surface, as if the "author" had a good idea what they were writing about; due to their incredible language-manipulating capacity, LLMs far surpass humans at stringing plausible sentences about nonsense
  • There are fictitious references which otherwise look persuasive (see these examples)
  • The references are not fictitious, and there are inline citations, but the text–source integrity is abysmal (indicating a facile after-the-fact effort by the creator to evade suspicion by inserting citations into raw LLM output, without making the needed adjustments to establish genuine verifiability, which could otherwise be a quite painstaking process of correcting, rearranging, and copyediting)
    or
    there simply aren't any references (meaning that the model might have generated some but they were manually removed because others would notice that they are junk)
    • one wonders how could someone seemingly familiar with Wikipedia content standards enough to be capable of writing a decent-seeming chunk of article prose be so incompetent at ensuring minimal verifiability at the same time
  • There are references in the style of what is outputed by Wikipedia's usual citation templates, but there are no inline citations
  • The content looks as if copied from somewhere due to not having wiki markup (wikilinks, templates, etc.)
    • one wonders how could someone seemingly familiar with Wikipedia content standards enough to be capable of writing a decent-seeming chunk of article prose miss adding at least a bare amount of wikilinks
  • The article obviously serves to promote an entity (such as by giving it visibility) but the prose seems very carefully tweaked to look objective
    • one wonders how could someone so unfamiliar with Wikipedia content standards that they would attempt to publish a promotional page be so skilled at crafting exceedingly neutral verbiage (creators of promotional articles and drafts are typically incapable of completely eschewing promotional language)
  • The last paragraph is oddly out of place since the text ends with a conclusion of sorts, encapsulating some earlier points; it may start with or contain a phrase such as "In conclusion", "This article has examined" or similar. Such structures and phrases may be extremely prevalent in LLMs' corpora, so they can't shake the habit off even when told to "write a Wikipedia article", despite the fact that Wikipedia articles do not have this characteristic.

A few more things to note:

  1. It's surprisingly easy to fool the detector through minor edits to the GPT output. The detector is also pretty much useless for non-prose text.
  2. From my own experimentation I've found that machine-translated content, regardless of whether written by human or GPT, tends to yield "unclear" on the detector, which I assume is probably an intentional foresight to prevent obfuscation of AI output using machine translators.
  3. GPT-4 is now a thing (albeit something you either have to buy yourself for $20 or acquire through Bing's waitlist), and since OpenAI's own detector is designed for GPT-3, GPT-4 output fools it, at least for the time being. I'm pretty sure I've heard of GPT-4 having baked-in flag tokens in order to make future detection easier, though.
  4. You'd have to get on a waitlist to actually access either, but it's now possible through both Bing and ChatGPT (both GPT-4) to browse the web, allowing for "legitimate" citations (although even with those present it's still very possible for the AI to hallucinate anyways).
  5. In addition to "in conclusion...", some other common dead giveaways include "as X, ...", "it is important...", and "firstly, secondly, thirdly..." (especially in a Wikipedia context). These are ubiquitous on GPT-3, but can also be found on GPT-4. However, again, it's very easy for people to realize this and revise the output to obfuscate these...

WiktionariThrowaway (talk) 22:59, 20 March 2023 (UTC)

OpenAI has retracted their own detection model. Frostly (talk) 23:46, 9 January 2024 (UTC)

Revert

Feel free to revert my changes, but I've reverted the template back to diff 1158367048 as it maintains consistency with other maintenance templates and already includes the changes previously made to the template. Dawnbails (talk) 14:15, 11 June 2023 (UTC)

Cont.

Alalch E. 14:48, 11 June 2023 (UTC)

I'll copy what I said on Template_talk:AI-generated here:

Feel free to revert my changes, but I've reverted the template back to diff 1158367048 as it maintains consistency with other maintenance templates and already includes the changes previously made to the template.

I think it matters that maintenance templates should probably maintain relative consistency to each other, and I don't see how the revisions after it improve much other than, for no particular reason, moving text down and linking the same page on it twice.

I also can't seem to find the discussion you speak of. I'd appreciate a link to it. Dawnbails (talk) 14:36, 11 June 2023 (UTC)

@Dawnbails: The difference between revisions is substantive. It's about what to recommend: removal or "cleanup by throughly verifying the information and ensuring that it does not contain any copyright violations". The discussion is this one: WT:LLM#Non-compliant LLM text: remove or tag?. The core message of the template is what's important, not mere form as in whether is resembles other maintenance templates.—Alalch E. 14:48, 11 June 2023 (UTC)
I see. I did read the discussion earlier, but I seemed to have missed the change on the template related to cleanup— that's my bad. Cheers. Dawnbails (talk) 14:52, 11 June 2023 (UTC)

Name of model

Is it really necessary to explicitly mention ChatGPT? Whilst I understand this template is somewhat of an anomaly due to the emerging field I feel that naming a certain brand in a maintenance template is inappropriate. – Isochrone (T) 20:39, 12 September 2023 (UTC)

I think the idea is that LLM is too jargony for folks to understand. I agree with this idea and wouldn't mind keeping an explicit mention of ChatGPT for now. –Novem Linguae (talk) 18:37, 13 September 2023 (UTC)
Looks like all mentions of GPT/ChatGPT were recently removed by InfiniteNexus. I am still concerned that LLM is too jargony for most folks to understand. –Novem Linguae (talk) 21:59, 16 February 2024 (UTC)
That's what the link to the article is there for. InfiniteNexus (talk) 02:41, 17 February 2024 (UTC)
I would be open to adding "AI" or "artificial intelligence" in there for clarification. InfiniteNexus (talk) 02:43, 17 February 2024 (UTC)

Share this article:

This article uses material from the Wikipedia article Template_talk:AI_generated, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.