Posted by: bluesyemre | April 6, 2018

What are the ten most cited sources on #Wikipedia? Let’s ask the data


A new dataset of fifteen million records documents source usage in Wikipedia by identifier and across languages.

n the Spanish Wikipedia, at the top of the list of most-cited sources in articles you’ll find: a “Catalog of Fishes”, a dictionary of minor planets, an encyclopedia of Argentinian films, a field guide to the songbirds of South America, and an atlas of Spanish popular culture.

Citations are the foundation of Wikipedia’s reliability: they trace the connection between content added by our community of volunteer contributors and its sources. For readers, citations provide a mechanism to validate and check for themselves that what Wikipedia says is sound and trustworthy: they act as a gateway towards a broader ecosystem of reliable knowledge. In an effort to spearhead more research on where Wikipedia gets its facts from, and to celebrate Open Citations Month, we asked ourselves:  what are the most cited sources across all of Wikipedia’s language editions?

To answer this question, we published a dataset of every citation referencing an identifier across all 297 Wikipedia language editions. The dataset breaks down sources cited in each language by identifier–a PMID or PMC (for articles in the biomedical literature), a DOI (for scholarly papers), an ISBN (for book editions), or an ArXiV ID (for preprints).

What’s in the data?

The full dataset, extracted from the March 1, 2018 Wikipedia content dumps, includes a total of 15,693,732 records and shows important variations across languages in the kind of sources volunteer contributors cite. The dataset also only includes citations by identifier, which means not all citations on Wikipedia are reflected in the dataset; many more publications than the records included in this dataset are cited that don’t reference any identifier (and our next analysis will be able to tell you what percentage of total citations this dataset represents).

  1. Updated world map of the Köppen-Geiger climate classification:  2,830,341 citations  []
  2. Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragment Methods: An Analysis of AlogP and CLogP Methods:  21,350 citations []
  3. The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC):  20,247 citations []
  4. The de Vaucouleurs Atlas of Galaxies:  19,068 citations [ISBN: 9780521820486]
  5. The Complete New General Catalogue and Index Catalogues of Nebulae and Star Clusters by J. L. E. Dryer:  19,060 citations [ISBN: 9780933346512]
  6. Galaxies and How to Observe Them:  19,058 citations [ISBN: 9781852337520}
  7. A Concise History of Romania:  15,597 citations [ISBN: 9780521872386]
  8. Catalog of Fishes California Academy of Sciences:  11,980 citations, [ISBN: 0940228475]
  9. Dictionary of Minor Planet Names:  10,651 citations [ISBN: 9783540002383]
  10. National and religious composition of the population of Croatia, 1880-1991: By settlements:  8,230 citations [ISBN: 9789536667079]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.


%d bloggers like this: