Wikipedia:Prosesize

Wikipedia:Prosesize

Wikipedia:Prosesize


Prosesize is a gadget for adding a toolbox link to show the size of and number of words in a page. It is a rewrite of User:Dr_pda/prosesize.js.

Quick Facts Description, Author(s) ...

Installation and removal

Like most Wikipedia tools, you must be logged in to install or use the Prosesize gadget.

To install it, select it at the "Browsing" section of Special:Preferences#mw-prefsection-gadgets (direct link), and then save. To remove the gadget, if you installed User:Dr pda/prosesize.js in your Special:MyPage/common.js or Special:MyPage/skin.js, remove that first. Then disable the gadget in your preferences.

Usage instructions

Once you are logged in and have installed the gadget, go to the left panel of the Wikipedia page (or the right panel if you use Vector2022), under "Tools" click "Page size" (size won't appear if you are not logged in) and you will see on the top left corner of the page − below the title of the article − the data from the app.

Sample output

  • HTML document size: 270 kB
  • Prose size (including all HTML code): 88 kB
  • References (including all HTML code): 65 kB
  • Wiki text: 83 kB
  • Prose size (text only): 56 kB (9412 words) "readable prose size"
  • References (text only): 8241 B

Meaning of output

Summary

  • HTML document size: Size of the HTML downloaded by your browser.
  • Prose size (including all HTML code): Size of HTML within <p> tags
  • References (including all HTML code): Size of reference HTML
  • Wiki text: Size of wikitext (seen when editing)
  • Prose size (text only): Size and word count of text within <p> tags (called "readable prose size")
  • References (text only): Size of reference text

HTML document size

This is the total size of the HTML document. If you went to View->Page Source (or the equivalent) in your browser, and saved the resulting output to your computer, the file size would be the size of this file. This number does not include any images.

Prose size

The script counts the text within <p> tags in the HTML source of the document, which corresponds almost exactly to the definition of "readable prose". This method is not perfect, however, and may include text which isn't prose, or exclude text which is (e.g. in {{cquote}}, or prose written in bullet-point form). The text counted as prose is highlighted in yellow, so it is easy to see whether the prose size is over or underestimated.

Two numbers are given for the prose size: HTML and text only. The HTML size is the size of the HTML code contained within <p> tags. This number can be compared to the file size to see how much of the document consists of readable prose. The text-only size is the size of just the words, without any formatting. (This is what you would get if you copied and pasted the prose from the article into something like notepad, which strips out all the formatting). The word count is self-explanatory and is based on splitting the text by spaces.

References size

The HTML references size is the size of what is produced by the <references/> tag, plus the size of the HTML to produce the markers (i.e. [1]). The text-only size is again just the text of the references, plus the text of the markers. Note that the contribution of the markers is explicitly subtracted from both prose size numbers. The markers also should not affect the word count, since there should be no spaces between them and the preceding word/punctuation.

Wikitext size

In addition to the above numbers, which are calculated from the HTML source of the page, there is also the size of the text plus wiki markup which appears in the edit box when you edit a page. This number is shown next to each revision on the History tab. The script queries the API to retrieve this value for the current article.


Share this article:

This article uses material from the Wikipedia article Wikipedia:Prosesize, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.