www.fgks.org   »   [go: up one dir, main page]

Wikimedia blog

News from inside the Wikimedia Foundation.org

The Taj to the Tuk-Tuk. Language in the Indian Wikiworld.

(This is the seventh installment in a series of updates from the WikiHistories summer research fellows, who will be studying the virtual community history of different Wikipedia editing communities.)

Lets just cut to the chase. Yes, the Taj Mahal is every bit as amazing as it’s supposed to be. It’s huge, it changes colors with the rays of the sun and its intricate carvings truly are breathtaking. It is worth putting up with the hassle of Agra’s touts and what may be the worst weather on the entire planet. Really, even in winter it’s pushing 90•, though at least without the sticky humidity that makes the air feel like a sponge the rest of the year. All the misery, though, doesn’t make a bit of difference when you’re in front of the gardens, surrounded by Indians dressed in their finest, everyone gasping as the Taj comes into view.

Clearly, this building is a source of pride for both humanity and the people who live in the nation in which it was built. As I wandered the grounds I was exposed to one of the most unexpected bits of local custom I would find throughout my trip. Foreigners at the Taj Mahal, who pay about 37 times more than Indians to see the site (not an exaggeration), are part of the local attraction. I was approached by dozens of people, some of whom simply handed me their children without warning, so they could take pictures. This would continue to happen at all the major historical sites, but nowhere was it more prevalent than at the Taj. I’d come halfway around the world to see their history, and that, apparently, needed to be documented.

This pride made me curious. What gems of information would I find in the Hindi Wikipedia’s entry on the Taj Mahal that weren’t present in the English Wikipedia entry? It was exciting to think that with this tool at my disposal I would learn something special, something to get me on the inside. When I excitedly looked up the entry I found … a translation of the English page. Bummer.

 

Surely the monsoon, a season so tied to the Indian collective consciousness it’s not just a season, it’s the inspiration for festivals and literature, has a page that explains all this, adding poetry and national identity to a scientifically leaning article. Negative. The page appears to be an early translation of the English page.

But perhaps I’m looking in the wrong place. Just because I, as a visitor, find these places and things to be fascinating and what I think define India, doesn’t mean that the local population feels the same. It makes sense that even though the monsoon affects India for months that a well written and lengthy article in English, that predates the Hindi Wikipedia page, would be translated rather than written from scratch. Many of the pages are, and several of the Indian Wikipedians I spoke with thought this was just fine. Marathi Wikipedian Mandar Kulkarni, whom I met with in Pune, envisions a Wikiworld in which articles are written in any language and translated to the others. Logistically, not so realistic, but in the true spirit of an open Internet in which one can write about his local community in his local language and share that information with anyone on earth in their local language.

I asked Kulkarni whether this translating of pages leaves out the Indian perspective on English and other non Indic languages pages, but he assured me that because so many Indians edit English Wikipedia, the Western viewpoint isn’t the only one being represented, a sentiment echoed by English Wikipedia editors Pradeep Mohandas and Pranav Curumsey.

For Indic language editors, writing in their local language is a way to keep that language alive and add to the long literary tradition while English language editors are more focused on the globalized world of knowledge. For many, whose local language is another Indic language, Hindi becomes a language of “us” or India, with the local language that of “me.” It’s the language that ties the country together, but not the one that necessarily does the same for neighbors. Further, the definition of “Hindi” is rather complex. Colloquial Hindi, used conversationally, has subtle variations dependent on the location from which the user hails. This can include loanwords from other Indic languages that would be used in one region but not another, or pronunciation. For me the Central India, New Delhi Hindi sounds the most familiar while the pronunciation used in Mumbai and other parts of Maharashtra make my ears work a little harder. Wikipedia doesn’t suffer too much from these differences, first, because it’s written so the pronunciation differences don’t come info play and second, because it’s written in Modern Standard Hindi, a Sanskritized Hindi that differs from that one would use when, say, picking up a tuk-tuk on the street.

It sounds confusing, but it’s really not any different from the regional dialects and different forms of English that exist throughout the English-speaking world. The difference though, is that many students aren’t literate in Hindi at the levels they are in their local language and English. They’re fluent, but Hindi education doesn’t continue throughout school with the rigor English education does. For this reason, many editors have worked on Hindi Wikipedia as a means of practicing a language they can speak effortlessly.

But that doesn’t mean the tuk-tuk driver, or his son or daughter is left out completely. Modern Standard Hindi doesn’t always mean lengthy literary prose. Sometimes a page is just a little stub, where translation of an English page is an option but where something more local and unique can be understood by those without a high level of education and, if they choose, can be added to.

 

Patricia Sauthoff

Masters Candidate

South Asian History

School of Oriental and African Studies, University of London

Filter preventing abusive edits comes to all wikis

The AbuseFilter extension for MediaWiki, which helps prevent vandalism on wikis, will be globally enabled on all Wikimedia projects later today.

AbuseFilter was developed by Andrew Garrett with support from the Wikimedia Foundation; it was first enabled on the English Wikipedia in March 2009.

Since then, many local wiki communities have asked individually for AbuseFilter to be turned on on their wiki. As of July 2011, AbuseFilter was already enabled on 66 wikis, out of the 843 wikis the Wikimedia Foundation hosts.

It recently appeared it would just be simpler to enable AbuseFilter by default on all wikis, rather than doing it on request.

When enabled, AbuseFilter comes with no built-in default filters, so no immediate change will be visible on wikis where it is enabled.

Contrary to other anti-vandalism tools, AbuseFilter works by analyzing edits before they’re saved, rather than trying to identify (and revert) them after the fact.

Filters, or “rules”, can be added to AbuseFilter to identify certain kinds of edits matching a pattern. Actions can be taken for these edits, like tagging the edit, preventing the user from saving the page, or even automatically blocking the user. The AbuseFilter documentation provides the format in which filters must be written.

A screenshot of the list of AbuseFilter rules on the English Wikipedia

AbuseFilter catches abusive edits matching defined patterns.

Because AbuseFilter has been in use on the English Wikipedia for more than two years, more details about how AbuseFilter works are available in their documentation; Instructions on how to create a filter are also available.

It is possible to export filters from a wiki, and to import them into another one.

AbuseFilter is an extremely powerful tool, with the potential of preventing edits, blocking users, and making a whole wiki unusable. Therefore, it must be used with extreme caution; filters should only be created and edited by administrators who understand their purpose and syntax.

AbuseFilter can also be used to identify edits that are not abusive, for tracking purposes. Tags can be automatically added to edits matching a certain pattern, thus giving editors and patrollers a heads-up about certain edits (see examples).

Because such tags can also be used to identify legit edits, AbuseFilter is sometimes referred to as “Edit filter”.

AbuseFilter offers the possibility for certain filters to be private, to prevent long-time abusers from knowing how their edits are being identified.

We hope this tool will prove useful to our community of editors and patrollers.

Guillaume Paumier
Technical communications manager

New Media Order in Turkey

(This is the sixth installment in a series of updates from the WikiHistories summer research fellows, who will be studying the virtual community history of different Wikipedia editing communities.)

During my trip in Turkey, I’ve met with many interesting Vikipedians who truly believe in the importance of their contribution to Vikipedi and enjoy the many hours they spend in front of their computers editing the encyclopedia. It has been a highly remarkable experience to meet so many users with a highly successful educational background and with great ambition for their futures. Most Vikipedians are in different stages of their high school and college educations and see Vikipedi as an important part of their academic growth, as well as a significant part of their social life. This was one of the reasons why many of them repeatedly expressed interest in organizing regular meetings. But, they are also interested in organizing international meetings such as Wikimania.

This year, Vikipedians worked really hard for their Wikimania 2012 campaign. But they were not able to find a sponsor for the event and as a result did not have a chance to organize a comprehensive campaign. Although Vikipedi lost the opportunity to host the conference for the next year, there is new hope for Wikimania 2013. Another event that really excites the community is an upcoming conference on new media in Istanbul, New Media Order, where Wikipedia co-founder Jimmy Wales has been invited to be a keynote speaker along with Julian Assange, the co-founder of Wikileaks.org. In addition to an immense synergy that these two prominent free-information advocates might bring, the conference also has a great potential to be an important venue to talk about Internet freedom in Turkey.

Thousands of Turks gathered in some 40 cities and towns around the country on Sunday, May 15th, to join marches organized against Internet censorship

This issue has recently garnered a great deal of attention in the context of the imminent threat of the recently proposed Internet ban by the Turkish government, aimed at controlling access to “harmful content.” Last May, thousands of people protested the Internet ban proposal on the streets of major cities across the country by pointing out that the filtering system is compulsory, based on a very arbitrary criteria and too comprehensive. However, there has not been any significant progress in talks with the agency that would be in charge of the application of this new blanket filtering mechanism by 22 August, 2011.

In the light of these recent debates, the New Media Order conference might serve as a platform for a serious discussion by including a larger group of people who have stakes in the free access to information on the Internet. During my interactions with the members of the Vikipedi community I have noticed a great sensibility about the prospects of these restrictions. However, in order to be able to participate in these kinds of vital debates, Vikipedi users think that a Wikimedia Turkey office would be highly useful, which seems to be the next big step for the community.

“We don’t think they would ever filter contents of Vikipedi, but who knows, maybe we would be the first to go, because Vikipedi has already seen many threats, and in some occasions we even had to delete revision histories” said one Vikipedi user, who prefers to remain anonymous. However, these possibilities most of the time translate themselves into a concerted awareness of Vikipedi’s responsibility that further energizes the community,  bringing more hope for saving free information in Turkey.

Ayhan Aytes

PhD Candidate

Communication and Cognitive Science

University of California San Diego

Kiwix localisation is supported at translatewiki.net

Offline use of Wikimedia content is a strategic goal for the Wikimedia Foundation. Kiwix is an offline app that allows user to read content without an internet connection, and it can now be localized into many languages on translatewiki.net.

There are many instances where people do not have an Internet connection available, or where it is cheaper to work offline, notably in the “Global south”.

Data from Wikimedia projects can be exported to the openZIM format, and then read offline on Kiwix, the only openZIM client.

Several projects with local developers invested a considerable amount of time creating their own offline app for their language, their script or for special requirements like formatting for books.

With the localization of Kiwix on translatewiki.net, it is now much more of an option to work on such features in Kiwix. Customizations like including fonts with a package or having specific formatting for a book or a source remain possible.

We hope our community will help localize Kiwix in the 270+ languages we currently support with Wikimedia projects. Please start translating the interface and let us know how it goes.

Thanks,

Gerard Meijssen
Internationalization / Localization outreach consultant

2011 Fundraiser Engineering Is Underway!

Engineering efforts for the 2011 annual Wikimedia Foundation fundraiser are underway. This year’s efforts kicked off at the end of May and will be ongoing through the 2011 fundraiser.

This article is the first in a series of posts that we will make following the completion of our development sprints. We will provide an overview of what happened during the sprint, discuss some of the challenges faced, and highlight our achievements.

This year, the fundraiser engineering team is following agile methodology that came out of an ‘inception’ process facilitated by ThoughtWorks.

During the process, we defined and prioritized the high-level requirements for this year’s engineering efforts, identified pain points in our development process, and strategized solutions to enable the team to quickly respond to the constantly changing needs of the fundraiser at a sustainable pace.

We came up with clearly defined roles and lines of communication for everyone involved in the development process, having daily time-boxed stand-up meetings, two-week long development sprints, and a flexible yet well-defined format for creating user stories and acceptance criteria.

We also resolved to implement unit tests for all new software we develop and generally strive for good code hygiene in an effort to build more resilient and reusable software.

After exploring a myriad of open- and closed-source agile-oriented project management tools to help us coordinate our work, we settled on Mingle. While we would much prefer to use an open-source solution, we settled on this proprietary tool as it much more closely meets our needs than any of the others we explored.

You can log in to Mingle to view our backlog, sprint histories, and sprint progress with:

  • Username: guest
  • Password: guest

The team this year is comprised of:

Sprint 4 wrap up

We just completed our fourth development sprint. Our efforts during this sprint were somewhat hampered by vacation and travel for Wikimania. During this sprint, we:

  • Began adding an API for the ContributionTracking extension, which will allow us to seamlessly forward donors to PayPal
  • Added filtering mechanisms for campaign and banner logs in CentralNotice, to allow for more easily tracking changes to campaigns and banners.

You can view sprint 4 in Mingle (log in with guest/guest) and read our notes from the retrospective.

Sprint 5 kick off

We are currently exploring the possibility of adding new payment providers for processing donations (in addition to our current providers, PayPal and PayflowPro), in order to increase the currencies available for donations as well as potentially open up new donation methods (e.g. bank transfer).

Adding a new payment provider to the current architecture is a significant engineering challenge, requiring some serious refactoring of the DonationInterface extension, and we are eager to get started. So, we have decided to make sprint 5 a one-week sprint to try and wrap up the unfinished tasks from sprint 4 so that we can kick off engineering efforts to accommodate additional payment providers as soon as possible.

You can view sprint 5 in Mingle (log in with guest/guest).

Upcoming deployments

Pending code review, we will be deploying the following later this week:

  • Fixes to CentralNotice that allow banner dismissal by banner category
  • CentralNotice enhancements which allow for logging banner settings changes as well as filtering logs by time, user, campaign, and banner

Get involved

If you are interested in getting involved, visit us on IRC in #wikimedia-fundraising.

Arthur Richards
Fundraiser tech lead

Wikimedia engineering July 2011 report

Major news in July include:

  • Ongoing data replication from our primary Florida data center to our new Virginia data center;
  • The deployment of the Article Feedback feature to all articles on the English Wikipedia, and the deployment of MoodBar;
  • The successful implementation of a MySQL-based parser cache on Wikimedia wikis;
  • Mid-term evaluation of our Summer of Code projects.

Read the rest of this entry »

Calling mobile testers for round two

Thanks to everyone for participating in our first round of mobile gateway testing.

This time around we’d like you to have our new mobile gateway for your default experience.

Follow this link on your mobile phone to opt in: http://tinyurl.com/woptin and send us feedback.

Read the rest of this entry »

What is “Platform Engineering”?

If you’ve been following this blog or other Wikimedia Foundation updates closely over the past year, you may have seen several references to the “Platform Engineering” group (nee “General Engineering”), which is the group I’ve been managing for the past year. I’d like to explain who we are, and what we’re doing. We always strive for transparency as a group, but one ulterior motive for this particular narrative is that we’re hiring (more on that in a bit), and we hope this helps people understand what we’re looking for.

Read the rest of this entry »

Come beta test offline Wikipedia

I’m happy to report that we have a new beta version of Kiwix available for testing. For those new to the project, Kiwix is the simplest and easiest way to take Wikipedia with you when you have no internet connection.

We’ve added some features that I’ll talk about below but for those of you that are just looking to get involved: download a fresh copy and give us feedback. Head over to our project pages if you want to see our full roadmap.

With this new beta we have some exciting new features:

  • Mac OS X version;
  • Content Manager
  • Revised search interface

While the majority of our user base is Linux and Windows we didn’t want OSX users to feel left out. It’s now part of our regular build process. Three platform builds per release .. that’s our goal.
We’re especially happy with how the content manager has turned out. Rather than having to scour the internet to find openZim files you’ll now be able to discover new ones right within Kiwix.

We’re starting out with a limited set of data files to simplify our testing, but we’ll be expanding in the next months as we connect the download manager to the Books collection extension. This will greatly expand the amount of content you can download from Wikipedia. With the extra content, we’ll also add filtering capabilities to make sorting easier.

Finally, we’ve tweaked the look and feel of search results. It’s now far more similar to search engine results pages, which will hopefully make both search and browse much easier.There are also lots of others change under the hood and for those curious head over to the change log.

Tomasz Finc
Director Mobile & Special Projects

Results from the Japanese Editor Survey

We have blogged recently about the results from our semi-annual editor survey. Although the survey was conducted in 22 languages, it didn’t include Japanese, due to the March earthquake and ensuing Tsunami in Japan.  It is with great pleasure that we would like to share toplines from a survey of editors conducted recently on the Japanese Wikipedia. We fielded it for about a week in the end of July, and got 208 complete responses.  Like the semi-annual editor survey, the Japanese editor survey was available only to registered users of the Japanese Wikipedia and every editor saw the invitation to participate in the survey only once. The latter was done to control for bias towards more active editors.

The topline data covers all the questions from the survey: demographics, interactions with community members, technology ecology, and editing behaviors.  We are hoping that the Japanese community (as well as others) will check the data, conduct some analysis and provide feedback to us.

Please also check out the graphs for some key demographics of Japanese editors. The results from the editor survey in the Japanese Wikipedia show that the Japanese editing community is similar to others demographically: predominantly male, highly educated and slightly older than what we imagined our community to be before we conducted the survey.

 

 

Mani Pande, Head of Global Development Research

(This is the ninth in series of blog posts where we previously shared insights from the April 2011 Editors Survey.)