Wikidata:Open data publishing

From Wikidata
Jump to navigation Jump to search
This page provides an overview of open data for data publishers including information on database rights, best practices and adding data to Wikidata.

Open data definition

[edit]
  • Open data is data that anyone can freely and easily access, use and share, this is defined by both the license and the formats the data is made available in.
  • Linked open data is open data that is linked to other data sets.

Resources

[edit]

About Wikidata

[edit]

Wikidata is a multilingual free knowledge base about the world that can be read and edited by humans and machines alike. The data on Wikidata is added by a community of volunteers both manually and by using software, much like other Wikimedia projects including Wikipedia. Wikidata has millions of items, each representing things like a person, a place, an artwork, an abstract concept, or some other type of entity.

Resources

[edit]

Database rights

[edit]

Databases can fall under copyright and/or sui generis database rights depending on which country they were created in. Individual facts cannot be protected using database rights or copyright.

  • Copyright is a legal right created by the law of a country that grants the creator of an original work exclusive rights for its use and distribution.
  • Sui generis database rights are comparable to but distinct from copyright, that exists to recognise the investment that is made in compiling a database, even when this does not involve the "creative" aspect that is reflected by copyright. It is unclear how much of a database can be copied before breaking database rights.

Copyright and sui generis database rights differ by country:

  • In the United States of America a database is protected by copyright when the selection or arrangement is original and creative.
  • In the European Union databases are protected under the Database Directive which provides a sui generis database right for "the initiative and the risk of investing" in "obtaining, verifying or presenting the contents" by deploying "financial resources" or expending "time, effort and energy".

For data to be considered open data it must be free to access, use and share. Wikidata uses the Creative Commons 0 public domain license which allows people to use the data without restrictions, to reach the largest audience possible. Wikidata is designed to attribute the source of the data for each fact it holds.

Resources

[edit]

Benefits of open data

[edit]

Common benefits of open data include:

  • Transparency
  • Releasing social and commercial value
  • Participation and engagement

Open Knowledge International identifies open data as being a contributor to:

  • Meeting global challenges
  • Enhancing research, science, and culture
  • Strengthening citizens, democratic accountability and governance
  • Holding business accountable to consumers

Resources

[edit]
Cc.logo.white

Licensing

[edit]

The Creative Commons licensing tool provides licenses for both open data and other kinds of content, this includes HTML code for the licenses to add to your web pages.

Open data formats best practice

[edit]

Tim Berners-Lee, the inventor of the Web has suggested a 5-star deployment scheme for Open Data.

the "five stars of open data"
the "five stars of open data"
Number of stars Description Properties Example format

make your data available on the Web (whatever format) under an open license
  • Open license
PDF

★★

make it available as structured data (e.g., Excel instead of image scan of a table)
  • Open license
  • Machine readable
XLS

★★★

make it available in a non-proprietary open format (e.g., CSV instead of Excel)
  • Open license
  • Machine readable
  • Open format
CSV

★★★★

use URIs to denote things, so that people can point at your stuff
  • Open license
  • Machine readable
  • Open format
  • Data has URIs
RDF

★★★★★

link your data to other data to provide context
  • Open license
  • Machine readable
  • Open format
  • Data has URIs
  • Linked data
LOD

Open data producers can use Wikidata IDs as identifiers in datasets to make their data 5 star linked open data. Importing data into Wikidata makes it 5 star data. The more stars the data has the easier it will be to import it into Wikidata, the minimum required in practice is 2 stars.

Resources

[edit]

Labeling

[edit]

The Linked Data Research Centre (LiDRC) Laboratory badges are available to display on a web page to indicate the 5 Star Open Data rating.

Recognition

[edit]

The Open Data Institute certificates recognises well-published open data.

Organisations working on Open Data

[edit]

There are several organisations working on open data including:

Open data platforms

[edit]

There are many options for publishing open data that can be categorised in two ways:

  1. Self publishing on own website
  2. Publishing on external data platforms

Software for hosting data

[edit]
  • CKAN: a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data.

External platforms

[edit]

Organisations producing open data

[edit]

There are 1000s of organisations producing open data:

Adding data to Wikidata

[edit]

Once data has been published it can be added to Wikidata by: