www.fgks.org   »   [go: up one dir, main page]

This project will create mechanisms to allow for the rapid and flexible access to over 9,000 hours of spoken audio files, drawn from some of the leading British and American spoken word corpora.

Mining a Year of Speech

This project will create mechanisms to allow for the rapid and flexible access to over 9,000 hours of spoken audio files, drawn from some of the leading British and American spoken word corpora.

This project shall address the challenge of providing rich, intelligent data mining capabilities for a substantial collection of spoken audio data in American and British English. We shall apply and extend state-of-the art techniques to offer sophisticated, rapid, and flexible access to a richly annotated corpus containing a year of speech (about 9,000 hours, 100 million words, or 2 Terabytes of speech), derived from the Linguistic Data Consortium, the British National Corpus, and other existing resources. This is at least ten times more data than has previously been used by researchers in fields such as phonetics, linguistics, or psychology, and more than 100 times common practice in spoken language research.

It is impractical for a researcher to listen to a year of audio to search for certain words or phrases, or to manually measure the resulting data. With our methods, such tasks will take just a few seconds. The purposes for which scholars in different fields conduct such searches are very varied, and it is neither possible nor desirable to predict what people might want to look for. That said, the following questions illustrate some possibilities:

  1. When did X say Y? For example, "find the video clip where George Bush said 'read my lips'."
  2. Are there changes in dialects, or in their social status, that are tied to the new social media?
  3. How do arguments work? For example, how do different people handle interruptions?
  4. How frequent are linguistic features such as phrase-final rising intonation ("uptalk") across different age groups, genders, social classes, and regions?

Though our experience and research interests happen to be focussed on such matters as intonation, pronunciation differences between dialects, and dialogue modeling, the text-to-speech alignment and search tools produced by the project will open up this 'year of speech' for use by a wide variety of researchers interested in e.g. oral history, newsreels, or media studies. Audio-video usage on the Internet is large and growing at an extraordinarily high rate – witness the huge growth of Skype and YouTube (now the second most frequently used search engine in the world).

In the multimedia space of Web 2.0, automatic and reliable annotation and searchable indexing of spoken materials would be a "killer app". It is easy to envisage a near-future world in which a search query would return the relevant video clip and associated metadata about the event. The Penn team wishes to study what problems stand in the way of constructing such an audio search engine and to try to solve them.

Bookmark and Share
Summary
Start date
4 January 2010
End date
30 June 2011
Funding programme
Digitisation and e-Content
Strand
Digging into data challenge
Project website
Lead institutions
University of Oxford (UK partners)
University of Pennsylvania (US partners)
Committees
Topic
Fontsize disabled - Your browser does not support JavaScript