www.fgks.org   »   [go: up one dir, main page]

Wikipedia’s Next Big Thing: Wikidata, A Machine-Readable, User-Editable Database Funded By Google, Paul Allen And Others

Comment

Image Credits:

Wikidata, the first new project to emerge from the Wikimedia Foundation since 2006, is now beginning development. The organization, known best for its user-edited encyclopedia of knowledge Wikipedia, recently announced the new project at February’s Semantic Tech & Business Conference in Berlin, describing Wikidata as new effort to provide a database of knowledge that can be read and edited by humans and machines alike.

There have been other attempts at creating a semantic database built from Wikipedia’s data before – for example, DBpedia, a community effort to extract structured content from Wikipedia and make it available online. The difference is that, with Wikidata, the data won’t just be made available, it will also be made editable by anyone.

The project’s goal in developing a semantic, machine-readable database doesn’t just help push the web forward, it also helps Wikipedia itself. The data will bring all the localized versions of Wikipedia on par with each other in terms of the basic facts they house. Today, the English, German, French and Dutch versions offer the most coverage, with other languages falling much further behind.

Wikidata will also enable users to ask different types of questions, like which of the world’s ten largest cities have a female mayor?, for example. Queries like this are today answered by user-created Wikipedia Lists – that is, manually created structured answers. Wikidata, on the hand, will be able to create these lists automatically.

The initial effort to create Wikidata is being led by the German chapter of Wikimedia, Wikimedia Deutschland, whose CEO Pavel Richter calls the project “ground-breaking,” and describes it as “the largest technical project ever undertaken by one of the 40 international Wikimedia chapters.” Much of the early experimentation which resulted in the Wikidata concept was done in Germany, which is why it’s serving as the base of operations for the new undertaking.

The German Chapter will perform the initial development involved in the creation of Wikidata, but will later hand over the operation and maintenance to the Wikimedia Foundation when complete. The estimation is that hand-off will occur a year from now, in March 2013.

The overall project will have three phases, the first of which involves creating one Wikidata page for each Wikipedia entry across Wikipedia’s over 280 supported languages. This will provide the online encyclopedia with one common source of structured data that can be used in all articles, no matter which language they’re in. For example, the date of someone’s birth would be recorded and maintained in one place: Wikidata. Phase one will also involve centralizing the links between the different language versions of Wikipedia. This part of the work will be finished by August 2012.

In phase two, editors will be able to add and use data in Wikidata, and this will be available by December 2012. Finally, phase three will allow for the automatic creation of lists and charts based on the data in Wikidata, which can then populate the pages of Wikipedia.

In terms of how Wikidata will impact Wikipedia’s user interface, the plan is for the data to live in the “info boxes” that run down the right-hand side of a Wikipedia page. (For example: those on the right side of NYC’s page). The data will be inputted at data.wikipedia.org, which will then drive the info boxes wherever they appear, across languages, and in other pages that use the same info boxes. However, because the project is just now going into development, some of these details may change.

Below, an early concept for Wikidata:

All the data contained in Wikidata will be published under a free Creative Commons license, which opens it up for use by any number of external applications, including e-government, the sciences and more.

Dr. Denny Vrandečić, who joined Wikimedia from the Karlsruhe Institute of Technology, is leading a team of eight developers to build Wikidata, and is joined by Dr. Markus Krötzsch of the University of Oxford. Krötzsch and Vrandečić, notably, were both co-founders of the Semantic MediaWiki project, which pursued similar goals to that of Wikidata over the past few years.

The initial development of Wikidata is being funded through a donation of 1.3 million Euros, granted in half by the Allen Institute for Artificial Intelligence, an organization established by Microsoft co-founder Paul Allen in 2010. The goal of the Institute is to support long-range research activities that have the potential to accelerate progress in artificial intelligence, which includes web semantics.

“Wikidata will build on semantic technology that we have long supported, will accelerate the pace of scientific discovery, and will create an extraordinary new data resource for the world,” says Dr. Mark Greaves, VP of the Allen Institute.

Another quarter of the funding comes from the Gordon and Betty Moore Foundation, through its Science program, and another quarter comes from Google. According to Google’s Director of Open Source, Chris DiBona, Google hopes that Wikidata will make large amounts of structured data available to “all.” (All, meaning, course, to Google itself, too.)

This ties back to all those vague reports of “major changes” coming to Google’s search engine in the coming months, seemingly published far ahead of any actual news (like this), possibly in a bit of a PR push to take the focus off the growing criticism surrounding Google+…or possibly to simply tease the news by educating the public about what the “semantic web” is.

Google, which stated it would be increasing its efforts at providing direct answers to common queries – like those with a specific, factual piece of data – could obviously build greatly on top of something like Wikidata. As it moves further into semantic search, it could provide details about the people, places and things its users search for. It would actually know what things are, whether birth dates, locations, distances, sizes, temperatures, etc., and also how they’re connected to other points of data. Google previously said it expects semantic search changes to impact 10% to 20% of queries. (Google declined to provide any on the record comment regarding its future plans in this area).

Ironically, the results of Wikidata’s efforts may then actually mean fewer Google referrals to Wikipedia pages. Short answers could be provided by Google itself, positioned at the top of the search results. The need to click through to read full Wikipedia articles (or any articles, for that matter) would be reduced, leading Google users to spend more time on Google.

More TechCrunch

Featured Article

How Abridge became one of the most talked about healthcare AI startups

Ask any of the health-focused VCs to name one of the top AI startups and one name comes up over and over again: a company  based in Pittsburgh called Abridge. And it’s a startup that launched before OpenAI was a household name and LLMs entered the common Valley vocabulary.  In…

2 hours ago
How Abridge became one of the most talked about healthcare AI startups

Cheap irrigation has transformed many regions around the world into breadbaskets, but it also means that there can be little left for other uses.

Kilimo helps farmers save water and get paid for it

Two years ago, an employee at Fisker Inc. told me that the most pressing concern inside the EV startup was not whether its Ocean SUV would get built. Fisker was…

Fisker failed because it wasn’t ready to be a car company

The agency was investigating the company over potential violations of the Children’s Online Privacy Act.

FTC refers TikTok child privacy case to Justice Department

Apple’s changes may affect apps that today have an estimated $393 million in revenue and have been downloaded roughly 58 million times over the past year.

iOS 18 could ‘sherlock’ $400M in app revenue

At the Augmented World Expo on Tuesday, Snap teased an early version of its real-time, on-device image diffusion model that can generate vivid AR experiences. The company also unveiled generative…

Snap previews its real-time image model that can generate AR experiences

A researcher has found a bug that allows anyone to impersonate Microsoft corporate email accounts, making phishing attempts look credible and more likely to trick their targets.  As of this…

Security bug allows anyone to spoof Microsoft employee emails

Welcome to TechCrunch Fintech! This week, we’re looking at layoffs at BaaS startup Unit and car insurance company Loop, as well as Brex’s decision to abandon its co-CEO model, Apple…

Unit and Loop lay off staff and Brex ditches co-CEO model

We all know the feeling when we send a funny TikTok video, anticipating a response from a friend, only to receive a basic laughing emoji or, worse, no reaction at…

Meet Seen, a new app for friends to record reactions to TikToks and other content

Butterflies wants to let users create AI personas that then take on their own lives and coexist with others. 

Former Snap engineer launches Butterflies, a social network where AIs and humans coexist

Genspark taps generative AI to write custom summaries in response to search queries.

Genspark is the latest attempt at an AI-powered search engine

Apple is continuing its AI push, this time with its education offering. The company announced on Tuesday that it will train all Apple Developer Academy students and mentors on the…

Apple Developer Academy adds AI training for students and alumni

TechCrunch has learned that the arrested hacker is the alleged leader of the group that masterminded the Twilio hacks in 2022.

UK national accused of hacking dozens of US companies arrested in Spain

Decagon is a generative AI platform that automates various aspects of customer support channels.

Decagon claims its customer service bots are smarter than average

Pok Pok’s growth caught investors’ attention, leading to a $6 million Series A.

Now a Series A startup, kids’ app and ‘digital toy’ Pok Pok is coming to Android

Series A to B startups — check out the ScaleUp Startups Exhibitor Program at TechCrunch Disrupt 2024! Why Join the ScaleUp Startups Exhibitor Program? Amplify Your ReachShowcase your groundbreaking innovation…

Series A to B startups scale up at Disrupt 2024

SurrealDB, a startup developing a database architecture of the same name, has closed a new round of funding as it readies a managed service.

SurrealDB is helping developers consolidate their databases

The $200 Beam pro looks like an Android phone, but instead it’s a mobile device designed specifically for Xreal’s glasses.

XReal introduces a $200 device that brings Android apps to its AR glasses

Being a solo GP hasn’t slowed Bilimoria a bit. He went on to raise three additional funds and has now closed a new fund to invest in biotech, climate and…

Zal Bilimoria just raised a $50M fourth Refactor Capital fund, and still relishes his solo GP status

Golf has exploded in popularity in recent years thanks to the pandemic and the popularity of Netflix’s Full Swing documentary series. More than 531 million rounds of golf were played…

Loop Golf looks to take the stress out of booking a tee time

Self-driving vehicles rely on many sensors to detect objects and the world around them. The conventional approach is to work with cameras and lidars. But some tech companies and startups…

Bitsensing raises $25M for its high-resolution radar in autonomous driving

Balto Energy hopes to speed the electrification by helping homeowners choose and finance the projects that make the most sense for them.

Dandelion co-founder is back to help you electrify your home for less

SewerAI sells cloud-based, AI-powered subscription products designed to streamline field inspections and data management of sewer infrastructure.

SewerAI uses AI to spot defects in sewer pipes

For the last two decades, Raquel Urtasun, founder and CEO of autonomous trucking startup Waabi, has been developing AI systems that can reason as a human would.  The AI pioneer…

Waabi’s GenAI promises to do so much more than power self-driving trucks

Fisker Group Inc., the EV startup founded by famed designer Henrik Fisker, filed for Chapter 11 bankruptcy protection — a capstone to months of problems with its Ocean SUV that included…

EV startup Fisker files for bankruptcy

Meta said today that it finally launched its much-awaited API for Threads so developers can build experiences around it.

Threads finally launches its API for developers

The company says its platform functions like a search engine for materials, enabling the fast evaluation of a “vast number of novel structures.”

CuspAI raises $30M to create a GenAI-driven search engine for new materials

Suse on Tuesday is announcing its AI strategy and SUSE AI solutions, a new vendor- and LLM-agnostic generative AI platform.

SUSE wants a piece of the AI cake, too

Google has released its dedicated AI mobile app Gemini in India — over four months after its debut in the U.S. — with support for nine Indian languages alongside English. The…

Google brings Gemini mobile app to India with support for 9 Indian languages

Finbourne, founded out of London’s financial center, has built a platform to help financial companies organize and use more of their data in AI and other models.

Finbourne taps $70M for tech that turns financial data dust into AI gold