Wikipedia:Size_of_Wikipedia

Wikipedia:Size of Wikipedia

Wikipedia:Size of Wikipedia


The size of the English Wikipedia can be measured in terms of the number of articles, number of words, number of pages, and the size of the database, among other ways. As of 26 April 2024, there are 6,816,939 articles in the English Wikipedia containing over 4.5 billion words (giving an average of about 668 words per article). The total number of pages is 60,533,450. Articles make up 11.26 percent of all pages on Wikipedia.[1] As of 2 July 2023, the size of the current version of all articles compressed is about 22.14 GB without media.[2][3]

A treemap-like breakdown of Wikipedia's topic areas as of February 2016, based on a random sampling of 1,000 articles
An image estimating the size of a printed version of Wikipedia as of March 2020 (from an automatically updated image based on using volumes of Encyclopædia Britannica with a silhouette of an average man for scale)
Quick Facts English articles, Total wiki pages ...

Wikipedia continues to grow, and the number of articles on Wikipedia is increasing by about 14,000 a month (as of January 2024). The number of articles added to Wikipedia every month reached its peak in 2006, at over 50,000 new articles a month, and has been slowly but steadily declining since then. While this might seem to show that Wikipedia's growth is slowing or stopping, it should be noted that the amount of text added to Wikipedia articles every year has been constant since 2006, at roughly 1 gigabyte of (compressed) text added per year. This implies that as time progresses, proportionally more content is added to existing articles rather than new articles, and that Wikipedia has maintained the same persistent rate of growth throughout since the 2010s. In other words, over time, the average article size is growing faster than the number of articles.

Most of the earlier entries were extracted from Wikipedia:Milestones. Later entries are taken from observations of the new software's built-in article count features. For information on what Wikipedia's software counts as an article, see Wikipedia:What is an article#Lists of articles and statistics.

The article count of bot-generated Wikipedias such as the Cebuano-language edition of Wikipedia, as well as the Swedish-, Dutch- and the Waray-language editions, grow much faster than those that are primarily written by humans such as the English Wikipedia. Swedish Wikipedian Sverker Johansson's Lsjbot is the primary author of those four primarily bot-generated Wikipedias. Cebuano and Waray are Filipino languages. However, individual articles in bot-generated Wikipedias are on average much shorter than those in primarily human-written Wikipedias.[4] Thus, article count alone is a very poor indicator of the scale and scope of all Wikipedia editions.

Wikipedia growth of article count

Before 2012, Wikipedia's growth approximately followed a Gompertz growth model. This model was created in June 2010, and it is determined by the Gompertz function,

,

with parameters

a = 4378449
b = −15.42677
c = −0.384124
t is the time in years since 2000/1/1 (so 2010/1/1 is t = 10.00)

and where e is the constant 2.71828... (Euler's number).

Number of English Wikipedia articles[5]
English Wikipedia editors with >100 edits per month[6]

Some characteristics of this model are:

  • a pivot point at which the growth is at its peak. For en.wikipedia.org (English Wikipedia), this might have been in August 2006 with 60,000 new articles per month.
  • a maximum to the number of articles of about 4.4 million (as determined by parameter a of the model). It should be noted that there will always be new events, people, organizations, objects, works, locations, concepts, and receptions to describe in the future, which this model does not account for, and the actual number of English Wikipedia articles has already exceeded this maximum number of Wikipedia articles in December 2013.

This model is related to the quantity (number of articles). The quality might still increase independently depending on the individual article. The model does not account for article size.

Graphs of size and growth rate

In this section, the first graph shows the historical and expected total number of articles; in the second graph, the monthly growth rate, slowing since late 2006 (series trending downward).

Detailed analysis of the data shows that from 2006 to 2009 the article growth rate followed a six-monthly cycle with faster growth in February and August than in May and November. This cycle does not appear in the growth-rate graph here because the values shown in the graph have been averaged over periods of six months.

The final graph in this section shows content page growth (i.e. including articles and other pages) to May 2019: note the near-linear growth trend since 2018.

Note the small drop in article count from late August 2022 to early September 2022.

Annual growth rate for the English Wikipedia

More information Date, Article count ...

At this rate, there are 348 days until the English Wikipedia reaches 7 million articles.

Number of words

As of February 2024, there are more than 4.5 billion words in all English Wikipedia articles, about 670 words per article, and about 27 billion characters, assuming that each word is six characters long (five letters for each word on average plus a space or punctuation mark). For the most recent word count, please see the Special:Statistics page.

The table below shows only the number of words in all content pages, meaning the 6,816,939 articles; it does not include words in other namespaces like Talk, User, or Wikipedia. Data for 2002 through 2010 is from the old Wikistats-1 and thus only precise to the month rather than a specific day within a month. Data for 2018 to the present is from the Special:Statistics page, as saved on that date by the Internet Archive. There is no record of the number of words from January 2010 to December 2017; Wikistats-1 no longer includes the number of words after January 2010, and the Special:Statistics page only started showing the number of words in all content pages in December 2017. Some time within that almost eight-year span, the average number of words dropped. Note that the Internet Archive does not always have an archived version of the Special:Statistics page on the first day of each year.

Yearly statistics

More information Date, Word count ...
A Average increase per year from 2010 to 2018; total increase of 1,279,581,000 words over the same time period
B Average percent increase per year from 2010 to 2018; overall increase of 71% over the same time period
C Average increase per day from 2010 to 2018
D So far this year

Monthly statistics since January 2019

The table below includes the total number of words in all articles and the number of words added at the start or near the start of each month since January 2019.

More information Monthly statistics, Date ...

Number of pages

As of 26 April 2024, there are 60,533,450 pages in the English Wikipedia, of which 6,816,939 (11.26%) are articles, which are found in the main namespace, or simply mainspace. The rest of the pages belong to one of the other 11 namespaces, listed here in alphabetical order: Category, Draft, File, Help, MediaWiki, Module, Portal, Template, TimedText, User, and Wikipedia. Each page on Wikipedia can have a corresponding talk page as well. The data for this table is from the Special:Statistics page, as saved on that date by the Internet Archive. Note that the Internet Archive does not always have an archived version of the Special:Statistics page on the first day of each year. Over time, the percentage of all pages on Wikipedia that consist of articles drops as more non-article pages are created than articles.

More information Date, Number of pages ...

Size of the English Wikipedia database

Total article text in English Wikipedia, measured in gigabytes (compressed).[7]

There are various elements of the Wikipedia database to consider when describing its size. The most obvious include the markup text of the articles, templates, media/file descriptions, and primary meta-pages that would be needed to render the text of the latest version of the current encyclopedia proper. The associated talk pages and the other namespaces (User, Wikipedia, Help, etc.) are often considered separately. Each of these has an associated edit history, etc. In addition, there are the images and other multimedia (stored in common across all Wikipedias). It is important to take into account whether the data is compressed and if so what compression scheme is used. Besides the English Wikipedia, there are hundreds of Wikipedias in other languages to consider, as well.

In April 2010, the size of the full English Wikipedia edit history was 5.6 TB uncompressed.[8]

As of June 2015, the dump of all pages with complete edit history in XML format at enwiki dump progress on 20150602 is about 100 GB compressed using 7-Zip, and 10 TB uncompressed.

As of May 2015, the current version of the English Wikipedia article / template / redirect text was about 51 GB uncompressed in XML format.

The size of the article text in the English Wikipedia, measured in gigabytes (compressed), grew steadily from 1 GB in 2006 to 9 GB in 2013 to 11.5 GB in 2015 as shown in the chart. However, due to an error in compiling the data dump for April 2016, the size of the article text shrunk by approximately 9 percent to 10.8 GB compressed (though the actual size of the article text grew in reality, which can be seen by comparing with the March 2016 and May 2016 data dumps).[9] Likewise, there is also a similar error when compiling the April 20, 2018 data, showing as 12.85 GB (again, the actual size of the article text grew in reality as can be seen by comparing the previous and following months).[10]

As of February 2013, the XML file containing current pages only, no user or talk pages, was 42,987,293,445 bytes uncompressed (43 GB). The XML file with current pages, including user and talk pages, was 93,754,003,797 bytes uncompressed (94 GB). The full history dumps, all 174 files of them, took 10,005,676,791,734 bytes (10 TB).[11]

As of August 2023, Wikimedia Commons, which includes the images, videos and other media used across all the language-specific Wikipedias contained 96,519,778 files, totalling 470,991,810,222,099 bytes (428.36 TB). [12]

Other sources for recent size estimates are:

Comparisons with other Wikipedias

Distribution of the 62,886,707 articles in different language editions (as of 26 April 2024)[13]

  English (10.8%)
  Cebuano (9.7%)
  German (4.6%)
  French (4.1%)
  Swedish (4.1%)
  Dutch (3.4%)
  Russian (3.1%)
  Spanish (3.1%)
  Italian (3%)
  Egyptian Arabic (2.6%)
  Other (51.5%)

Codes: en - Englishes - Spanishde - Germanja - Japaneseru - Russianfr - Frenchit - Italianpl - Polishpt - Portuguesezh - Chinese

This graph is based on data from https://stats.wikimedia.org/EN/TablesArticlesTotal.htm as of 14 June 2015, with recent values for the English Wikipedia taken from the data below. The sum includes all 270+ Wikipedia languages. See the front page at https://www.wikipedia.org for a recent article count for the ten largest Wikipedias.

The English edition remains the largest Wikipedia, slightly larger than the second-largest edition, the Cebuano Wikipedia. Many other editions shared the quasi-exponential growth of the English edition, though lagging one to three years behind. As these other Wikipedias have grown, the overall percentage of articles in English has been steadily decreasing, and it fell below 25 percent in March 2007. The percentage of articles in the ten largest Wikipedias has also been decreasing, although these top ten still account for about 67 percent of all Wikipedia articles as of June 2007. Note that Lsjbot, a bot run by Sverker Johansson, is responsible for much of the growth of the second- and fifth-largest Wikipedias, the Cebuano and the Swedish Wikipedias, respectively, as well as the rapid growth of the Waray Wikipedia. The charts don't show the Cebuano, Swedish, or the Waray Wikipedias. Those three Wikipedias' article count growth primarily consists of stubs pertaining to living organisms and geographical entities (such as islands, rivers, dams, and mountains).

Currently, the English Wikipedia makes up 10.84 percent of all volumes in all editions of Wikipedia.

The English Wikipedia's database size is just over 10 times greater than the next-largest Wikipedia by article count, the Cebuano Wikipedia. (As of October 2023)

Chronology of software versions

  • Phase I UseMod Wiki-based software: January 10, 2001 – January 25, 2002
  • Phase II PHP-based software: January 25, 2002 – July 20, 2002
  • Phase III PHP-based software: July 20, 2002 – present

This data set notes the fact that these figures are drawn from multiple data sources and different estimates (see the key below for details), and presents them as a spreadsheet-ready table for graphing. The original data sets are archived: see the links below. Note also that the figures are sampled at random times of day.

Hard copy size

In early 2015, Michael Mandiberg published the English Wikipedia in 7473 volumes of 700 pages each via Lulu, an online e-books and print self-publishing platform, distributor, and retailer.[14]

The following graphic illustrates how big the English Wikipedia might be if the articles (without images and other multimedia content) were to be printed and bound in book form with a format similar to Encyclopædia Britannica. Each volume is assumed to be 25 cm (9.8 in) tall, 5 cm (2.0 in) thick, and containing 1,600,000 words or 8,000,000 characters. The size of this illustration is based upon the live article count manually adjusted by the average word count on an irregular basis.

3363 volumes
17 stacks
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                     
                                       
                                       
                                       
                                       
                                       
                                       
                                       
                                       

The data set

Key to the data below:

  • approx: this figure is an approximation
  • lowerbound indicates that there were at least this many pages
  • mpac3.2: main page article count from the Phase III software following an adjustment to the count on March 29, 2015
  • mpac3.1: main page article count from the Phase III software from May 25, 2003, to March 28, 2015: article namespace, not redirects, containing at least one internal wiki link
  • mpacIII: main page article count from the Phase III software up to May 22, 2003: article namespace, comma, not redirect
  • mpacII: main page article count from the Phase II software
  • spII: stats page article count from the Phase II software
  • all: total of all pages of any sort
  • commapp: pages that include a comma, a crude way of finding "real" articles
  • conscnt: "conservative count" taken by removing the count of various types of non-article from the comma page count
  • MF: Malcolm Farmer
  • LMS: Larry Sanger
  • WA: Wikipedia:Announcements

Now extended and annotated with (somewhat gnomic) source information. Note that sampling times are only recorded to the day given by the user recording the entry and that there is no clear time-zone information for that day. However, most of the more recent counts (up to 2022) were taken at the start of the day based on UTC taken from the List of Wikipedias table in Meta Wiki. Since 2023, the counts are taken at around the same time, albeit taken from the new List of Wikipedias table in Wikimedia Commons as it leaves a revision record in its history page, though not necessarily at midnight UTC.

Note: The current mpac3.2 article count for the English-language Wikipedia is 6,816,939 articles

More information Data set ...

These pages hold the earlier source data in its original ad-hoc tabular format:

See also


References

  1. Calculated as follows: 6,816,939 articles / 60,533,450 pages * 100 = 11.26 percent
  2. "Index of /enwiki/latest/". Wikimedia Downloads. Wikimedia Foundation. Retrieved 1 February 2021.
  3. Click on the link in reference 2 and search for "pages-articles-multistream.xml.bz2" to see the current size of the database (as defined as the current version of all articles compressed). See Wikipedia:Database download#Where do I get it? for more information.
  4. In November 2021, the English Wikipedia had 624 words per article, versus 195 words per article for Cebuano, 193 for Swedish, 169 for Dutch, and 117 for Waray-Waray. See Wikipedia:Wikipedia Signpost/2021-12-28/By the numbers.
  5. "Wikistats - Statistics For Wikimedia Projects". stats.wikimedia.org. Wikimedia Foundation. Retrieved 11 February 2022.
  6. "Full history dump for English Wikipedia is back – Infodisiac". Archived from the original on 28 August 2018. Retrieved 1 January 2017.
  7. "7,473 volumes at 700 pages each: meet Print Wikipedia". Wikimedia blog. 2015-06-19. Retrieved 2015-07-02.

Share this article:

This article uses material from the Wikipedia article Wikipedia:Size_of_Wikipedia, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.