www.fgks.org   »   [go: up one dir, main page]

weeklyOSM 521

10:56, Sunday, 19 2020 July UTC

07/07/2020-13/07/2020

lead picture

OSM now with ÖPNV public transport map 1 | © Wikipedia | Map data © OpenStreetMap contributors

Mapping

  • Mateusz Konieczny wants to know whether there is a way of tagging to distinguish between company offices the public can walk into and those where if you tried, it would result in being escorted out by security.
  • Michael Montani has requested comments on their proposal to introduce a natural=bare_soil tag for ‘an area covered by soil, without any vegetation’. (Nabble)
  • Matthew Woehlke wants feedback on his junction=intersection proposal. The new tag would identify portions of a highway which are part of an intersection. (Nabble)
  • Skyler Hawthorne asked the tagging list if there is an accepted way to tag terrace buildings that have names.
  • Mike Thompson has noticed that network has different meanings and possible values depending on the type of route it is added to. They asked the tagging list why the network tag can’t have consistent meaning across all route types.
  • Someone has made a site relation for the Aurelian city walls of Rome. Martin Koppenhoefer asks the readers of the tagging list if this makes sense.
  • Speciality coffee is a term for the highest grade of coffee available. Jake Edmonds is seeking suggestions for how to tag cafes that serve speciality coffee.
  • Martijn van Exel’s series of Tuesday evening JOSM streams continues. He is looking for suggestions of what he could cover in future sessions.
  • Fabian Kowatsch introduced the new filter parameter available in the OpenStreetMap History Data Analytics Platform (ohsome).
  • higa4 analysed (ja) (automatic translation) the change over time of the OSM map of Japan using ohsome.
  • The Belgian Green Party launched a new tool (automatic translation) to help crowdsource information on nature reserves and forests. The tool uses and contributes to OpenStreetMap directly.

Community

  • On the Talk-at mailing list, the contributor plepe presented (de) (automatic translation) his ogd-wikimedia-osm-checker. It compares (automatic translation) the entries of different OGD datasets with Wikidata, Wikipedia, Wikimedia Commons and OpenStreetMap. The source code is also available.
  • The OSM April Fool’s joke from 2017 (adaptations to plate tectonics) was not recognised as such by ScubbX and a discussion about it was started (automatic translation) on the Talk-at mailing list recently.
  • OSMF Board member Rory McCann reported on his activities in June – both within and outside the Board.
  • Harry Wood intends to end the decade-old tradition of weekly ‘featured images’, unless others are willing to step up and take over his role.
  • Dara Carney-Nedelman blogged about their joy on discovering the OSM community. Dara also calls on all students young and old, if their summer plans may not be what they imagined, to learn a new skill: mapping.

Imports

  • Homy is asking for feedback on a proposed import of public bicycle repair stations in Baden-Württemberg, Germany.

OpenStreetMap Foundation

  • The minutes of the non-public OSMF Board meeting on 11 June 2020 have been published. The agenda included the selection of Microgrant applicants, the membership application of a possible ODbL violator, and responses to the RFC on iD governance.
  • The Data Working Group has published its activity report for the second quarter of 2020. Besides the number of tickets, it contains concise descriptions of some outstanding cases.
  • The OSMF Board has amended its Rules of Procedure.
  • The minutes of the Licensing Working Group meeting on 11 June have been published.
  • John Whelan explained why he prefers TransferWise to PayPal when making payment or donating to the OSMF.
  • The minutes of the OSM System Administrators Group meeting of 4 June have been published.

Events

  • Videos of the State of the Map 2020, which took place online, continue to be uploaded to media.ccc.de.

Humanitarian OSM

  • HOT is looking for a Head of Community to manage HOT’s Community Team. Applications close 26 July 2020.
  • HOT is supporting the Greater Accra Resilient and Integrated Development Project to assist in the protection of communities from flooding.

Maps

  • Martin Ždila announced that freemap.sk has been expanded to further (European) countries. The map menu is available in English, Slovak, Czech and Hungarian.
  • [1] The public transport map ÖPNVKarte is now available as a map layer on OpenStreetMap.org. The OpenStreetMap Blog features an article on the new layer’s arrival.
  • Are you planning an action and need an off-line map you can distribute? Using Aktionskarten such a map can be created in five minutes.
  • A Dutch user of OsmAnd would like to display 1m contours, a not unreasonable request for a resident of the relatively flat landscape of the Netherlands.

Licences

  • Reddit user brezherov asked if Google Maps is copying OSM. Their question arose when they noticed that after adding local businesses and buildings to OSM, within a couple of weeks Google Maps had significantly updated those same areas.

Software

  • Sam Crawford explained how Trail Router works. Trail Router is a route planner whose routing algorithm favours greenery and nature, and biases against busy roads.
  • Openbloc has created a new JavaScript library for the creation of 3D maps. A demo can be found here.

Programming

  • SviMik has created a tool to synchronise your Mapillary and OpenStreetCam accounts. There is a discussion thread on the Mapillary forum.
  • User K_Sakanoshita announced (ja) an update to ‘Town Walk Map Maker(ja). The update improves the map representation, POI information, and interface.

Did you know …

  • … you have the opportunity to contribute a little bit to OSM every day? Ilya Zverev will give you a small daily task with his telegram bot ‘OSM Streak’.
  • MapRoulette? It gives you small and easy tasks you can complete in under a minute to improve OpenStreetMap.

Other “geo” things

  • Harald Schernthanner distributed a funny map (automatic translation) via Twitter. It is supposed to show how a Viennese person imagines Austria appears on a map.
  • The wealthier you are the more light pollution you create. Asmi Kumar explains how machine learning can estimate the wealth of an area by comparing daytime and night-time satellite images.
  • The Long Beach Post reported on how they analysed a detailed data log of every person the Long Beach Police Department stopped or detained over the span of 2019. They used OpenRefine for data cleanup and the creation of n-gram fingerprints to reconcile incorrect street names against a canonical list they created using OpenStreetMap data of official street names and intersections in Long Beach.
  • A Standardised Test of university entry of Taiwan, Advanced Subjects Test was held on 3 to 5 July. The geography (automatic translation) quiz had much more geographic technology related material compared to previous years’ tests. The mask map and determining where to buy masks for preventing the spread of COVID-19 was one of the quiz topics which showed the concept of GIS.

Upcoming Events

Where What When Country
Budapest Auguszt patisserie test & drinks in La Piazza 2020-07-16 hungary
Cologne Bonn Airport 129. Bonner OSM-Stammtisch (Online) 2020-07-21 germany
Nottingham Nottingham pub meetup 2020-07-21 united kingdom
Lüneburg Lüneburger Mappertreffen 2020-07-21 germany
Berlin 13. OSM-Berlin-Verkehrswendetreffen (Online) 2020-07-21 germany
Budapest Cziniel patisserie test & hake on bank Római 2020-07-21 hungary
Ludwigshafen a.Rhein (Stadtbibliothek) Mannheimer Mapathons e.V. 2020-07-23 germany
Düsseldorf Düsseldorfer OSM-Stammtisch 2020-07-29 germany
London London Missing Maps Mapathon (ONLINE) 2020-08-04 uk
Stuttgart Stuttgarter Stammtisch 2020-08-05 germany
Kandy 2020 State of the Map Asia 2020-10-31-2020-11-01 sri lanka

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Nakaner, Nordpfeil, Polyglot, Rogehm, Supaplex, TheSwavu, YoViajo, derFred, geologist, k_zoar.

The bias for Wikipedia as a project is strong, the bias for English makes it worse. When our aim is to share the sum of all knowledge, we have to acknowledge this and consider the consequences and allow for potential remedies.

"Bias" is a loaded word. When you read the Wikipedia article it is only negative. Dictionaries give more room an example: "our strong bias in favor of the idea". The Wikimedia Foundation is considering rebranding and it explicitly states that it seeks a closer relation with its premier brand Wikipedia. 

This is a published bias. It follows that other projects do not receive the same attention, do not get the same priority. For me it is obvious that as a consequence the WMF could do better when it intends to "share in the sum of all available knowledge" let alone the knowledge that is available to it.

Arguably another more insidious bias is the bias for English, particularly the bias for the English Wikipedia. Given that the proof of the pudding is in the eating, we have a world wide public and the use for our information hardly grows. Research is done on English Wikipedia so in effect we arguably do not even know what we are talking about.

When we are to do better, it means that we be need to be free to discuss our biases, present arguments and even use the arguments or publications of others to make a point. The COO of the WMF states in the context of diversity in tech and media that "when the bonus of executives relies on diversity, diversity will happen". It is reasonable to use this same argument. When the bonuses for executives of the WMF rely on the growth in all our projects, it stands to reason that they will make the necessary room for growth. When one of the best Wikipedians says "There are only a limited number of projects that the WMF can take on at any time, and this wouldn't have been my priority", this demonstrates a bias against the other projects. Arguably the WMF has never really, really, really supported other projects, it does not market them, it does not support them, they exist because the MediaWiki software allows for the functionality. 

When we are to counter the institutional bias of the WMF, we have to be able to make the case, present arguments and ask for the WMF to accept the premise and consider suggestions for change. This proves to be an issue and makes our biases even more intractable.
Thanks,
       GerardM

Migrating tools.wmflabs.org to HTTPS

17:18, Friday, 17 2020 July UTC

Starting 2019-01-03, GET and HEAD requests to http://tools.wmflabs.org will receive a 301 redirect to https://tools.wmflabs.org. This change should be transparent to most visitors. Some webservices may need to be updated to use explicit https:// or protocol relative URLs for stylesheets, images, JavaScript, and other content that is rendered as part of the pages they serve to their visitors.

Three and a half years ago @yuvipanda created T102367: Migrate tools.wmflabs.org to https only (and set HSTS) about making this change. Fifteen months ago a change was made to the 'admin' tool that serves the landing page for tools.wmflabs.org so that it performs an http to https redirect and sets a Strict-Transport-Security: max-age:86400 header in its response. This header instructs modern web browsers to remember to use https instead of http when talking to tools.wmflabs.org for the next 24 hours. Since that change there have been no known reports of tools breaking.

The new step we are taking now is to make this same redirect and set the same header for all visits to tools.wmflabs.org where it is safe to redirect the visitor. As mentioned in the lead paragraph, there may be some tools that this will break due to the use of hard coded http://... URLs in the pages they serve. Because of the HSTS header covering tools.wmflabs.org, this breakage should be limited to resources that are loaded from external domains.

Fixing tools should be relatively simple. Hardcoded URLs can be updated to be either protocol relative (http://example.org//example.org) or explicitly use the https protocol (http://example.orghttps://example.org). The proxy server also sends an X-Forwarded-Proto: https header to the tool's webservice which can be detected and used to switch to generating https links. Many common web application frameworks have support for this already:

If you need some help figuring out how to fix your own tool's output, or to report a tool that needs to be updated, join us in the #wikimedia-cloud IRC channel.

TJ Bliss

TJ Bliss, Chief Academic Officer at the Idaho State Board of Education and former Chief Advancement Officer at Wiki Education, has been appointed to Wiki Education’s new Advisory Board.

TJ is the first member of Wiki Education’s Advisory Board that is tasked with increasing Wiki Education’s reputation and network for revenue generation. He will work closely with me on adding influencers in key areas of the organization’s programmatic focus (equity, communicating science, OER/OEP, linked open data, GLAM, Wikimedia, etc.) as well as prospective donors to the newly formed Advisory Board. 

I’m thrilled to have TJ spearhead the creation of Wiki Education’s Advisory Board with me. His passion for Open Educational Resources and for Wiki Education’s mission will make a big difference. 

TJ has a long track record of supporting Wiki Education. As a Program Officer at the William and Flora Hewlett Foundation, TJ provided a major initial grant to support Wiki Education’s efforts with Open Educational Practice. In 2017, TJ left the Hewlett Foundation to join Wiki Education’s senior staff to lead advancement, fundraising, and business development efforts. TJ’s transition to the Advisory Board will allow him to directly support Wiki Education’s important mission going forward. 

“I am honored to be able to continue my involvement with Wiki Education, which I believe is one of the most important organizations working in the open knowledge space today,” TJ says. “I’m looking forward to helping build Wiki Education’s reputation and influence, to ensure the organization can support faculty, students, and open knowledge projects for many years to come.”

Speeding up Toolforge tools with Redis

11:22, Friday, 17 2020 July UTC

Over the past two weeks I significantly sped up two of my Toolforge tools by using Redis, a key-value database. The two tools, checker and shorturls were slow for different reasons, but now respond instantaneously. Note that I didn't do any proper benchmarking, it's just noticably faster.

If you're not familiar with it already, Toolforge is a shared hosting platform for the Wikimedia community build entirely using free software. A key component is providing web hosting services so developers can build all sorts of tools to help Wikimedians with really whatever they want to do.

Toolforge provides a Redis server (see the documentation) for tools to use for key-value caching, pub/sub, etc. One important security note is that this is a shared service for all Toolforge users to use, so it's especially important to prefix your keys to avoid collisions. Depending on what exactly you're storing, you may want to use a cryptographically-random key prefix, see the security documentation for more details.

Redis on Toolforge is really straightforward to take advantage of for caching, and that's what I want to highlight.

checker

"checker"

Visit the toolSource code

checker is a tool that helps Wikisource contributors quickly see the proofread status of pages. The tool was originally written as a Python CGI script and I've since lightly refactored it to use Flask and jinja2 templates.

On each page load, checker would make a database query to get the list of all available wikis, and then an additional query to get information about the selected wiki and an API query to get namespace information. This data is basically static, it would only change whenever a new wiki is created, which is rare.

<+bd808> I think it would be a lot faster with a tiny bit of redis cache mixed in

I used the Flask-Caching library, which provides convenient decorators to cache the results of Python functions. Using that, adding caching was about 10 lines of code.

To set up the library, you'll need to configure the Cache object to use tools-redis.

from flask import Flask
from flask_caching import Cache
app = Flask(__name__)
cache = Cache(
    app,
    config={'CACHE_TYPE': 'redis',
            'CACHE_REDIS_HOST': 'tools-redis',
            'CACHE_KEY_PREFIX': 'tool-checker'}
)

And then use the @cache.memoize() function for whatever needs caching. I set an expiry of a week so that it would pick up any changes in a reasonable time for users.

shorturls

"shorturls"

Visit the toolSource code

shorturls is a tool that displays statistics and historical data for the w.wiki URL shortener. It's written in Rust primarily using the rocket.rs framework. It parses dumps, generates JSON data files with counts of the total number of shortened URLs overall and by domain.

On each page load, shorturls generates an SVG chart plotting the historical counts from each dump. To generate the chart, it would need to read every single data file, over 60 as of this week. On Toolforge, the filesystem is using NFS, which allows for files to be shared across all the Toolforge servers, but it's sloooow.

<+bd808> but this circles back to "the more you can avoid reading/writing to the NFS $HOME, the better your tool will run"

So to avoid reading 60+ files on each page view, I cached each data file in Redis. There's still one filesystem call to get the list of data files on disk, but so far that seems to be acceptable.

I used the redis-rs crate combined with rocket's connection pooling. The change was about 40 lines of code. It was a bit more invovled because redis-rs doesn't have any support for key prefixing nor automatic (de)serialization so I had to manually convert to/from JSON.

The data being cached is immutable, but I still set a 30 day expiry on it, just in case I change the format or cache key, I don't want the data to sit around forever in the Redis database.

Conclusion

Caching mostly static data in Redis is a great way to make your Toolforge tools faster if you are reguarly making SQL queries, API requests or filesystem reads that don't change as often. If you need help or want tips on how to make other Toolforge tools faster, stop by the #wikimedia-cloud IRC channel or ask on the Cloud mailing list. Thanks to Bryan Davis (bd808) for helping me out.

Today, we are writing to share the discovery and squashing of a bug that occurred earlier this year. This particular bug was also one of the rare instances in which we kept a Phabricator ticket private to address a security issue. To help address questions about when and why we make a security-related ticket private, we’re also sharing some insight into what happens when a private ticket about a security issue is closed.

Late last year, User:Suffusion of Yellow spotted a bug that could have allowed an association to be made between logged-in and non-logged-in edits made from the same IP address. Users with dynamic IP addresses could have been affected, even if they personally never made any non-logged-in edits.

Suffusion of Yellow created a Phabricator ticket about it, and immediately worked to get eyes on the issue. The bug was repaired with their help. We’re grateful for their sharp eyes and their diligent work to diagnose and fix the problem. As part of our normal procedure, the Security team investigated once the bug was resolved. They found no evidence of exploit. We are not able to reveal further technical details about the bug, and here is why:

When a Phabricator ticket discussing a security bug is closed, Legal and Security teams at the Wikimedia Foundation evaluate whether or not to make the ticket public. Our default is for all security tickets to become public after they are closed, so that members of the communities can see what issues have been identified and fixed. The majority of tickets end up public. But once in a while, we need to keep a ticket private.

We have a formal policy we use to determine whether a ticket can be publicly viewable, and it calls for consideration of the following factors:

  • Does the ticket contain non-public personal data? For example, in the case of an attempt to compromise an account, the ticket may include IP addresses normally associated with the account, to identify login attempts by an attacker.
  • Does the ticket contain technical information that could be exploited by an attacker? For example, in discussing a bug that was ultimately resolved, a ticket may include information about other potential bugs or vulnerabilities.
  • Does the ticket contain legally sensitive information? For example, a ticket may contain confidential legal advice from Foundation lawyers, or information that could harm the Foundation’s legal strategy.

In this case, we evaluated the ticket and decided that it could not be made public based on the criteria listed above.

Even when we can’t make a ticket public, we can sometimes announce that a bug has been identified and resolved in another venue, such as this blog. In this case, Suffusion of Yellow encouraged us to make the ticket public, and while pandemic-related staff changes have caused a delay, that request reminded us to follow through with this post. We appreciate their diligence. Keeping the projects secure is a true partnership between the communities of users and Foundation technical staff, and we are committed to keeping users informed as much as possible.

Respectfully,

David Sharpe
Senior Information Security Analyst
Wikimedia Foundation

Clara de Pablo is a Fellow in the Office of Communications and Marketing at the Smithsonian’s National Museum of American History. She enrolled in Wiki Education’s introductory Wikidata course to learn more about how to apply linked data practices to her work.

Clara de Pablo

My involvement with Wikidata began — as all great stories do — with a long and meandering intern task. I work in communications at the Smithsonian’s National Museum of American History, and we’d decided to invite members of Congress to a preview of an upcoming exhibition. To narrow down the 535 members of congress, we asked our high-school intern Sofia to make a list of all the representatives with daughters. We thought this would take her a few days. A full month later, Sofia was still Googling senators and typing the ages of their daughters into a Word document. The information existed, but it wasn’t searchable or organized in a usable way. 

Wikidata seemed like a perfect foil. Wikidata steps in essentially as Wikipedia for data — so much data exists online, and in theory, Wikidata makes it easy to search and navigate. A week or so into Sofia’s research project (she was on senators whose last names started with “S”), I signed up for a Wikidata course. 

The course itself was set up in a very friendly, helpful way. The class met every Tuesday afternoon via video call, during which our instructors, Will and Ian, would screen share and show us how to navigate or use one aspect of Wikidata. The weeks were organized into tangible objectives — learning what “item” or “property” meant, learning how to edit entries, learning how to query. Before each meeting, there was a short slideshow tutorial on our class dashboard, which introduced the concepts and often guided us through short exercises to apply them. The video calls were especially useful for troubleshooting places we were getting stuck, or to see what other people in the course were doing with their newfound skills. The instructors often recommended queries to look at for inspiration, and made themselves available to answer questions via email or Slack outside of our meeting times. 

Queries in particular demanded special attention. Wikidata is searchable through the process of querying— using the computer language SPARQL to “ask” queries and sort a massive amount of data into the answering dataset to a specific question. As the course progressed, it became very apparent that Wikidata carries a steep learning curve. A few members of the class had figured out how to use queries to make elaborate interactive data trees; I had only succeeded in changing “Instance of: dog” to “Instance of: cat.” For fun, I tried to see if I could make a chart of the US Senators. I was frustrated for nearly an hour before I realized that I needed to add the boundary “Instance of: humans.”

This points to a fundamental challenge within Wikidata: linked data is only useful or usable to people who understand querying. Without access to a months-long course, I would never have been able to figure it out. Even with the help of the course, there is still a lot about Wikidata left to learn before I can build my own datasets in a meaningful way. This barrier to usability keeps most people from joining the Wikidata community — and, just like Wikipedia, Wikidata works best when more people contribute their expertise to it. 

The course helped alleviate some of these barriers for me — I am able to create and edit items, and adapt existing queries to answer simple questions. I feel confident that I could create usable datasets using what I learned about items and properties, and link them to existing Wikidata entries. The course helped dismantle something that looked intimidating on the outside — strings of numbers that defined properties, a scary new coding language — and broke them down into a series of simple, logical steps. Ultimately, the greatest benefit of joining was having access to teachers who could help answer my questions when I got stuck. 

Wikidata has great potential in the museum field if it becomes more user-friendly. The Smithsonian (and cultural institutions across the country) have an incredible treasure trove of data and information in our collections, but it’s poorly (if at all) accessible to members of the public. The ability to use linked data to search our digital collections would make our information usable to anyone who wanted it. For example, take classroom education. Using linked data, teachers and students could easily search for historic events from the same year, baseball gloves owned by World Series champions, American presidents with pet pigs. Museum curators might know these things off the top of their heads from countless years spent in the collections, but the information in their heads isn’t searchable by the general public. 

The Smithsonian was founded in 1846 as an “establishment for the increase and diffusion of knowledge among men.” As the possibilities for sharing knowledge have rapidly expanded, the Smithsonian is racing to adapt to a technological world. The Smithsonian’s new Secretary, Lonnie Bunch, has declared one of his priorities to be making the Smithsonian “digital first.” Across the Smithsonian, hundreds of people are working to ensure that our digital databases reflect the full scope of the collections they represent. Linked data would help make these massive online stores useful and usable — all of the Smithsonian’s knowledge would be available to the public we serve. 

Wikidata could be an incredible resource for making data usable to the public, but linked data has a steep learning curve. Learning how to use it requires practice, a lot of patience, and a little bit of help sometimes. 

Interested in taking a course like the one Clara took? Visit learn.wikiedu.org to see current course offerings.

UBC student creates article on missing genus

18:52, Wednesday, 15 2020 July UTC

In the spring 2020 academic term, higher education around the world shifted from in-person to virtual classes. The University of British Columbia was no exception, but for students in Dr. Shona Ellis’s Morphology and Evolution of Bryophytes class, the switch wasn’t as challenging as it could’ve been, because a key assignment was to improve a Wikipedia article. The Wikipedia assignment, done through Wiki Education’s Student Program, lent itself naturally to the virtual learning environment, student Geoff Lau says.

“It was already an independent project so we didn’t need to spend class time on it, and Dr. Ellis was really good about giving us enough time to ask questions about the assignment and do the research,” he says. “We knew the details of this assignment about halfway into the course and it wasn’t due until the end, so we had a lot of time to prepare. Another great feature was that Dr. Ellis was able to monitor our progress online, so she knew exactly where we were at in the assignment.”

Monitoring progress is something that happens through Wiki Education’s Dashboard software, where Dr. Ellis could see Geoff and his classmates’ work. For Geoff, that was creating the new article on a genus of mosses called Ulota.

How did Geoff pick Ulota? Funny story: He was collecting specimens of bryophytes for another class assignment, and thought he’d nabbed a Ulota specimen. Turns out the specimen in question actually wasn’t, but it caused Geoff to realize Wikipedia was missing the article on Ulota, so when he was picking his topic, he knew just which genus to add.

“I quickly did some background searches to check how many scientific journal articles had been published about the genus, and what species had been examined in detail,” Geoff says. “It turned out that there had actually been quite a lot of studies published on this genus so I decided to write my assignment about it.”

This pre-writing research work is what sets student editors up for success in Wikipedia assignments. Since Wikipedia requires that all information added be cited to a reliable source, student editors like Geoff not only need to identify topics that need expansion — they also need to ensure sufficient reference materials exist to expand it. In finding reference materials, Geoff had to also dig into taxonomy for his particular topic.

Ulota is tightly intertwined with Orthotrichum, another genus of mosses, so when I was sifting through the literature, it was important that I find articles that are recent enough to address taxonomic shifts,” he says. “Some species had been placed originally in Ulota, then into Orthotrichum, and some species had originally been placed in Orthotrichum, then into Ulota. I wanted to make sure the species I was looking at was placed in the right group first, so I had to do a little digging into recent phylogenies and taxonomic shifts. Luckily, there were several papers on the phylogeny of this genus so it helped a lot.”

Geoff succeeded, adding 26 references to his brand-new article. He also added a handful of freely licensed images he found on Wikimedia Commons. Throughout the assignment, he found himself learning new skills, from using Wikipedia’s sandbox feature to focusing only on facts and his own conclusions.

“This experience was a really new way of tackling an assignment in a university course. It was purely descriptive and although that may sound simple, it was difficult at the beginning not to write any personal interpretations of the data. I wasn’t explaining experimental outcomes or methods as I was used to, instead I had to look for largely morphological descriptions and new species descriptions. That was a change from articles I usually see or read about, which often involve an experimental procedure and interpretations of the outcomes,” he says. “It was largely an aesthetic change from a traditional assignment but a welcome one.”

Interview by Cassidy Villeneuve, post by LiAnna Davis.

The Wikimedia Foundation is extremely concerned about the national security law recently passed in China and implemented in Hong Kong that prohibits a broad range of speech and grants wide-ranging surveillance powers to authorities. This law may have serious implications for protection of the privacy of users on Wikipedia and other projects operated by the Foundation, potentially enabling authorities to request personal user data.

Privacy is critical to sustaining freedom of expression and association, enabling knowledge and ideas to thrive. We strive to protect and preserve those values for the people who contribute their time, energy, and knowledge to Wikimedia projects. The Wikimedia Foundation is dedicated to protecting the privacy of readers and editors, allowing them to contribute clear, fact-based, and uncensored knowledge to the platforms we host.

As detailed in our policies, the Wikimedia Foundation has stringent legal, ethical, and human rights-based standards for responding to requests for user data. We only disclose nonpublic user information if we believe that there is a credible and imminent threat to life or limb that the data would help prevent, or if we are required to under applicable and enforceable law in our jurisdiction. If we were to receive any requests for nonpublic user information from the government of Hong Kong, we would not provide any data unless we were certain of the request’s legal validity after a thorough analysis of this new law, and also a full assessment of applicable law, human rights standards, and the rights of our community members.

The Foundation approaches all requests for nonpublic user information received from government authorities in the same way, safeguarding the ability for anyone, anywhere, to contribute their knowledge to the world without fear of reprisal. As can be seen from our transparency report, released twice a year, we do not grant the large majority of requests that we receive.

Knowledge captured on Wikipedia and its related projects is the work of thousands of global volunteer editors. Every day, volunteers discuss, debate, improve, edit, and even revert contributions, striving to ensure that information is trustworthy and up-to-date. Protecting the privacy of those editors is a critical and necessary foundation to ensure that people feel able to add to, alter, and improve their contributions to knowledge from anywhere in the world.

The members of our community who contribute to, edit, and administer Wikipedia and its related projects do so out of their own generosity and on their own initiative. These projects are a truly collaborative effort and no one person can make a final decision about the information on Wikipedia or other Wikimedia projects by themselves. The Foundation does not dictate or direct their speech; they are not agents or representatives of the Foundation; and they, in turn, are not responsible for the text of this statement or the Foundation’s policy on requests for nonpublic user information.

The Foundation relies on its volunteers to provide the most accurate, reliable information possible to the world. We are dedicated to supporting them in this work, protecting them as they do so to the best of our ability while also providing a safe space for readers to benefit from the information they assemble.

In 2015 I noticed git fetches from our most active repositories to be unreasonably slow, sometimes up to a minute which hindered fast development and collaboration. You can read some of the debugging details I have conducted at the time on T103990. Gerrit upstream was aware of the issue and a workaround was presented though we never went to implement it.

When fetching source code from a git repository, the client and server conduct a negotiation to discover which objects have to be sent. The server sends an advertisement that lists every single reference it knows about. For a very active repository in Gerrit it means sending references for each patchset and each change ever made to the repository, or almost 200,000 references for mediawiki/core. That is a noticeable amount of data resulting in a slow fetch, especially on a slow internet connection.

Gerrit originated at Google and has full time maintainers. In 2017 a team at Google went to tackle the problem and proposed a new protocol to address the issue, and they closely worked with git maintainers while doing so. The new protocol makes git smarter during the advertisement phase, notably to filter out references the client is not interested in. You can read Google introduction post at https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html

Since June 28th 2020, our Gerrit has been upgraded and now supports git protocol version 2. But to benefit from faster fetches, your client also needs to know about the newer protocol and have it explicitly enabled. For git, you will want version 2.18 or later. Enable the new protocol by setting git configuration protocol.version to 2.

It can be done either on an on demand basis:

git -c protocol.version=2 fetch

Or enabled in your user configuration file:

$HOME/.gitconfig
[protocol]
    version = 2

On my internet connection, fetching for mediawiki/core.git went from ~15 seconds to just 3 seconds. A noticeable difference in my day to day activity.

If you encounter any issue with the new protocol, you can file a task in our Phabricator and tag it with git-protocol-v2.

This post was written for the Women in Red blog by their intern, Laura Rose Wood. We launched the internship with Women in Red in January.

To the Future of Women in Red and Online Diversity

I find it hard to reconcile that my time as a Women in Red Wikimedia Trainer is coming to an end. The time has flown by and it seems like only yesterday I was setting up my desk with a castle-view in Argyle House. I definitely didn’t foresee myself conducting the majority of my internship work from my student bedroom! But nonetheless, working with Wikipedia and promoting diversity have given me an immense sense of pride, taught me a great deal, and given me experiences I’d never expected.

The changes pushed by COVID-19 were dramatic and unexpected. It was almost as if my internship was completed in three acts.

Act I – Imposter Syndrome

When asked in my first week how I would measure my success at my internship, the deliverables seemed daunting and intimidating for one part-time intern to accomplish. I’m just a Graphic Design student!

I knew the Women in Red project was massively important, and a fantastic initiative for creating gender parity in open content. I knew Wikipedia was a powerful platform for change, often visited and accessible to new users once the first hurdle of article creation was crossed. But I was still slightly unsure how I could make a significant dent in the systemic gender bias which many open knowledge platforms, not just Wikipedia, face.

I had a baseline understanding of systemic bias in these early weeks, but less so how that would affect event preparation. My concept of systemic bias was that, in a Wikipedia sense, it lay largely in the lack of diversity among editors and therefore the representation in the content being created was skewed. However, given that participants in our events are more often women, it became clear that there is not a lack of gender diversity among potential editors. So what would encourage diversity among senior Wikipedia editors?

It became clear to me that creating some form of sustained engagement would be key. This would not simply be about pulling in a new audience, but how we can keep experienced editors feeling supported, continue development, cover new topics and satiate their hunger to continue contributing whilst also making our events accessible to new editors.

In researching lists of suggested women and finding sources for them I found that systemic bias was far deeper than a reflection of those creating content on Wikipedia. Wikipedia relies on reliable sources to back up information. And oftentimes in fields where women and people of colour have faced barriers to entry, or their achievements devalued, this reliable secondary sourcing can be difficult to come by. We need researchers in academic sectors or publishing to continue to document these minorities, but the power we have as editors is to surface this knowledge.

Act II – Isolation(ish)

Enter Stage left Coronavirus. In a blink, my intention to create or utilise some kind of online hosting platform where attendees could support each other beyond our in-person training was cut short. Or so I’d worried…

But this radical shake up has forced people across all sectors to re-examine delivery and communication of information. The move to remote delivery was not without its challenges. There were concerns about the potential for the experience feeling more impersonal. We’d have to bring the sense of community to people’s homes.

Following suit with the new workplace normal, we needed a hosting platform on which to conduct a webinar and re-create the Women in Red edit-a-thon event conventions remotely. Not only this, but we needed supplementary resources in place of physical hand-outs, and collated lists of further readings to help participants. No more physical merch.

As a design student, my obvious route to consistency was to create a visual identity which I could use across our core resources, and in the webinar itself. I created banners for editable resources and tried to make a consistent presentation layout which I could change the colours of to suit the theme of each session. That’s not to say that no changes were made as sessions progressed – every edit-a-thon I have reflected on what went well and made changes to how we focus our training, and the whole experience has been a huge learning curve. But keeping the same overarching structure and design has, I think, helped editors feel that Women in Red is more than ‘just another WikiProject’.

Design takes away from the at times monolithic, white and greyscale interface of Wikimedia (which I am by no means critiquing). If you by any small chance are a branding nerd like me and get excited about visual communication and want to read up on why the Wikimedia Foundation follow the visual style they do, they have a style guide here. They believe that ‘Content precedes Chrome’, a kind of modern, user focused content version of modernist ‘form follows function’ philosophy.

Child falling over in three stages. Photo from Wikipedia page for Falling – “three phases in timed shutter release”, by Jamie Campbell, licensed under CC BY 2.0.

In a Wikipedia context, the idea that the content should come first makes articles easy to navigate from a browsing standpoint. It makes it all the more easy for us fall into a click hole of Wiki links and before we know it its 3am and we’re looking at the Wikipedia article for Falling (accident) or learning what a Squonk is.

However, for our new editors, it can seem daunting to be faced with such a design. Visual cues and links can seem hard to differentiate at first, and the pace of our sessions requires that we go through the basic user interface stuff reasonably quickly so users can get on with editing. The Visual Editor tool is a massive help for this, is extremely useable and works much like a word processor with which most of us have some degree of familiarity. But, especially when the site is so ubiquitous, I think there can still be a kind of editing anxiety.

This is where the kind of repetitive, kinaesthetic learning of creating your own article can give confidence. There’s a sense of cradle to grave achievement in creating a biography from scratch and hitting that final ‘Publish’ button that can instil confidence for future editing.

Act III – Time for positiviTea!

During our in-person sessions, there’s usually a tea and coffee where attendees can have a wee chat and get to know each other. The challenge as we move forward in this changing climate is how we continue to facilitate a community atmosphere. In the webinars, we usually encourage editors to introduce themselves at the beginning of the session both so I can gauge experience and Wiki literacy but also to bring a face to the names in the chat panel. I hope that this gives users more confidence in asking questions and being bold.

Promoting a Women in Red community in which our participants feel welcomed and supported is an in-road into the wider global Wikipedia community. One of the barriers to individual editors’ contribution to open source is that participation is both self-guided and self-sustaining. But this doesn’t have to be isolating, even when we’re self-isolating.

Epilogue

What does the future hold for the sharing of knowledge in the post-COVID world?

We seem increasingly likely to turn to the internet as our primary source of reliable information. It is therefore up to us to construct and contribute to repositories with verifiable information.

The democratisation of knowledge through open access platforms may seem like a utopian ideal, but if the recent pandemic has highlighted one thing it is that such alternatives to physical resources are becoming increasingly important to the functioning of our society as a whole. Archives that opened access to their collections during lockdown prove this is achievable.

Sharing and communication is key. Information is and always has been free, it is the medium by which it is shared that can create barriers to access. The huge community effort which we’ve witnessed on social media in creating resources to support the Black Live Matter movement has been a testament to this. If we all work to give a platform to minority voices in our own way, we can ensure that traditionally overlooked pockets of knowledge are given representation. We can make way for cross-community discussion and enable discouraged potential voices to come to the forefront.

Over the course of my internship, I saw more than 60 attendees learn to edit Wikipedia and hone their newfound editing skills. There are now 57 new biographical articles about women and 12 new articles about queer books, authors, artists, bookshops and publications. We’ve run some wonderfully diverse, intersectional events and had attendees from all over the UK thanks to our ability to host the sessions online. Whilst I may not have planned to run events in this way, this pandemic and the subsequent move to online delivery has made our materials more accessible to a broader audience. I hope that this outreach will continue to inspire editors to continue Women in Red work, or any editing in the name of diversity and open knowledge. To help keep this momentum going now that my post is coming to an end, I’ve created a resource which synthesises all the essential information about editing and creating biographical articles, and how to deliver your own Women in Red online editathon.

If you’re curious exactly what we’ve been up to, these are some of my personal stand-out articles created by our editors, although this list is by no means exhaustive:

  • Mountaineer and rock climber who co-founded the Ladies’ Scottish Climbing Club, Jane Inglis Clark.
  • Doctor and former captain of the Afghanistan national women’s football team, Hajar Abulfazl.
  • Advocate for Women’s football who created the first professional women’s football team in Mexico, Marbella Ibarra.
  • 22 year old Filipino climate activist Marinel Sumook Ubaldo.
  • Ugandan climate and environmental rights activist Hilda Flavia Nakabuye.
  • ECA alumni and botanical artist Olga Stewart.
  • Scottish botanist and teacher Mary Pirie.
  • Edinburgh midwife who kept a casebook of 1,296 labours which she assisted, Margaret Bethune.
  • Diabetes researcher who created the Metabolic Unit at the Western General Hospital, Edinburgh, Joyce Baird. A new ‘Baird
  • Family Hospital’ is due to open in Aberdeen in 2021, named for her and her family’s contributions to the field of medicine.
  • And of course Lavender Menace Lesbian & Gay Community Bookshop, a pioneering LGBT+ space in Edinburgh in the
  • 1980s, whose founders are still doing fab things today in the name of archiving queer literature.
Dr. Kathleen Sheppard.
Image: File:Kathleen Sheppard WEF blog.jpg, Kathleen Sheppard, CC BY-SA 4.0, via Wikimedia Commons.

Dr. Kathleen Sheppard has done it again! In 2019, Dr. Sheppard received two awards for her innovative pedagogy and her outstanding leadership at Missouri University of Science and Technology where she is Associate Professor of History and Political Science. She was awarded the Faculty Experiential Learning Award specifically for her use of the Wikipedia assignment in her History of Science course, and later in 2019 received Missouri S&T’s Woman of the Year Award for her research in the field of women and science and her efforts in making Missouri S&T a more equitable place for women and minorities.

She has now received Missouri S&T’s President’s Award for Innovative Teaching for her ongoing use of the Wikipedia assignment in her History of Science courses as well as other “novel teaching strategies.” As the announcement notes, “Student comments repeatedly note Dr. Sheppard’s passion for teaching, preparedness, approachability, and ability to make students see history through different perspectives.”

Dr. Sheppard joined the Wikipedia Student Program in 2017, and in that time she has taught a total of 9 courses as part of our program. Her students have contributed over 150,000 words to Wikipedia, and their work has been viewed millions of times. Her students have greatly expanded the biographies of dozens of women scientists, including 19th century American astronomer Maria Mitchell.

In a reflection that Dr. Sheppard published in 2018, she described her motivation for adopting the Wikipedia assignment as a way to address the “failures of pseudotransactionality” prevalent in education: “Pseudotransactionality is the practice of having students pretend to write a letter to an employer, a newspaper article, or even a tweet. It is a real process, but with an artificial end; they know this, so they tend not to work that hard at it… However, I drove home the point that writing for Wikipedia is a real transaction between the student and the real-world reader.”

Students rarely have the chance to make their voices heard beyond the classroom, and thanks to the dedication of instructors like Dr. Sheppard, thousands of students have had the chance to leave their mark on a site that reaches millions daily.

Congratulations again to Dr. Sheppard for your outstanding work! We’re so grateful to have you and your students in our program.

My first week as the new Wikimedia Training Intern

15:28, Monday, 13 2020 July UTC

Hannah Rothmann is an intern at Edinburgh University, training with our Wikimedian in Residence who is based there, Ewan McAndrew. Hannah wrote this post for the University’s blog.

Hi, my name is Hannah and I will be going into the final year of my Classics degree in September. I have just finished week 1 of my Wikimedia Training Internship; the start date was delayed because of the COVID-19 pandemic and the uncertainty that came with it. Adjusting to working remotely from home, meeting new people but over video calls and Microsoft teams and also learning about entirely new things has meant that it has been a strange and somewhat nerve-racking first week and not what I would have expected from a summer internship a year ago. Thankfully, my line manager, Ewan McAndrew, has been very welcoming and made me feel at ease despite this novel situation!

The Wikimedia Training Internship caught my attention among a long and varied list of Employ.Ed internships. The aim of my internship of is to create materials to teach people how to edit and use Wikipedia and Wikidata with the goal of them becoming active editors and contributing to a growing database of free, credible and jointly gathered information. I was shocked when I discovered this week that only around 18% of biographical pages on the English Wikipedia are about women! Hopefully, by making more accessible teaching materials we will be able to address this imbalance and increase the diversity of Wikipedia and Wikidata. This means making resources that avoid complicated jargon, address all stumbling blocks a beginner wiki-user may encounter and will enable the uninitiated to become confident editors and contributors.

Wikimedia UK believes ‘that open access to knowledge is a fundamental right and a driver in the democratic creation, distribution and consumption of knowledge’. These aims demonstrate the importance of the work of Wikimedia UK. My line manager Ewan stressed this importance and that Wikimedia related activities have a growing significance in a learning environment shifting more towards the digital world when he had to argue that the internship should go ahead despite financial impact COVID-19 on the university; many internships were cancelled. My internship will hopefully enable remote learning and help people see how they can change their approach to teaching to incorporate Wikimedia related activities into how students learn.

This aim means that the work I am doing is firmly rooted in the present and even the future. Just this week I have learnt new ways to use technology and skills which will be indispensable in a world moving ever more into the realm of online, online learning and the online experience. Although at first glance this internship appears in direct contrast to my Classics degree, which is focussed among other things on reading and interpreting ancient texts, the aim of a Classics degree, in my opinion, is to understand that ideas and concepts of whatever period always have relevance and there is always the possibility of continual learning. The different skills I will develop in my internship and the skills I am learning from my degree will hopefully enrich my approach to work and any work that I do in this time and in the future.

So far, I have been getting used to remote working and all the quirks that come with it (hoovering is not something that goes too well with a work video call for example!) and I have also been figuring out where the gaps are in the current resources that Ewan has to teach people about Wikipedia and Wikidata while also filling in my rather large gaps of knowledge. For example, I had no idea what Wikidata really was before the start of my first week and I am still trying to understand it fully. I was lucky enough to attend the NHLI Women in Science Wikithon at the end of my first week which gave me a chance to implement what I had learnt about Wikipedia editing and it showed me how much more still needs to be done to improve diversity. Dr Jess Wade, who was Wikimedia UK’s Wikimedian of the year 2019, gave an introduction exploring why we should all edit Wikipedia. She has personally made hundreds and hundreds of Wikipedia pages for women and for notable women in science who previously had been ignored and in doing this has increased awareness regarding Wikipedia and how it can be used to tackle inequality and lack of diversity. After this introduction, it was a treat to have some training from Dr Alice White who showed us how to begin editing and creating our own pages. I edited some pages already created but lacking details, for example a page about Dr Susan Bewley, as I did not feel quite ready to begin making my own pages. The work Dr Jess Wade has been doing and continues to do along with this event really showed me how Wikipedia could be used as a force for good and also the importance of ensuring people have access to learning materials.

I am excited about getting to grips with my internship, developing skills, challenging my abilities all with the aim to make Wikipedia and Wikidata a platform that anyone anywhere will feel able to use, edit and appreciate!

Tech News issue #29, 2020 (July 13, 2020)

00:00, Monday, 13 2020 July UTC
TriangleArrow-Left.svgprevious 2020, week 29 (Monday 13 July 2020) nextTriangleArrow-Right.svg
Other languages:
Deutsch • ‎English • ‎Nederlands • ‎español • ‎français • ‎italiano • ‎magyar • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎മലയാളം • ‎ไทย • ‎中文 • ‎日本語 • ‎한국어

weeklyOSM 520

09:54, Sunday, 12 2020 July UTC

30/06/2020-06/07/2020

lead picture

GroundZero published Wambachers Boundaries – still in pre-alpha 1 | map data © OpenStreetMap contributors

About us

  • To strengthen our teams, we are urgently looking for volunteer native speakers to help us with proofreading.Okay, I want in. What do I have to do? Just:
    1. visit https://osmbc.openstreetmap.de/ – then we already have your OSM nickname and an email address for the invitation to our MatterMost instance.
    2. then please write an email to theweekly[.]osm[at]gmail[.]com, so we know that you want to participate. Then we will send you a link to our internal wiki page where we try to explain how OSMBC works.

Mapping

  • Tan is considering expanding their shop=bubble_tea proposal to cover all shops providing takeaway beverages.
  • Garry Keenor’s proposal to tag railway tracks with electrification systems using third or fourth rails has been approved.

Community

  • The OpenStreetMap Asia community has decided to have State of the Map Asia 2020 as an online event. To volunteer please login to Slack or GitHub.
  • The record recently set for the number of daily mappers has attracted attention (we reported earlier). Andrew and other CWG members asked Pascal Neis if he could provide some insight. According to Pascal, the record week in May featured increased mapping in Peru, Botswana, Central African Republic and other countries. In particular, a large number of newly registered members started contributing to the Cusco region of Peru.
  • Sarah Treanor and Katie Prescott wrote up a series of interviews that they had with people involved with HOT, for the BBC. The article highlights how the availability of good maps can be a matter of life and death and the lack of a commercial incentive to identify the nearest Starbucks leaves large parts of the world inadequately mapped.
  • Volker Gringmuth, aka kreuzschnabel, known through countless very productive posts in the German forum and from his tutorial ‘Getting started with OpenStreetMap’ (de), declared (automatic translation) on his blog that he does not want to contribute to the OSM community any more and wants to give up mapping. Hopefully the numerous blog comments will encourage him to change his mind.
  • Frederik Ramm’s talk ‘There might have been a misunderstanding…’ at SotM 2020 inspired a philosopical discussion (automatic translation) on the German forum. Skinfaxi started the conversation by asking ‘We have map in the name. Shouldn’t we then also convey a clear mission statement that the information we collect meets basic cartographic requirements?’.

Imports

  • Claire Halleux asked for feedback on a proposed import of around 3.8 million building footprints in north-eastern DRC and 2.7 million building footprints in western Uganda.

OpenStreetMap Foundation

  • The OSMF microgrants committee announced 12 successful applicants who will receive grants in the 2020 tranche. Projects range from mapping new cities in Uganda, to education (Pacific Islands, Albania, Kosovo), publicity material (UK), and software development (Street Complete and OSM Calendar).
  • Allan Mustard, the chair of the OpenStreetMap Foundation, answered questions from participants of SotM 2020. The video of Allan’s talk is available here.

Events

  • Poster presentations from the State of the Map 2020 have been uploaded here.
  • One of the biggest and most important Open Source conferences in Taiwan, COSCUP, will be held on 1 and 2 August in Taipei, Taiwan. And of course the OpenStreetMap Taiwan community will be around, collaborating with the Wikidata Taiwan community, providing numerous lectures on the second day of COSCUP. Even with the threat of COVID-19, they are still having the COSCUP event in a physical venue, with some speakers’ talks in streaming format. In-person attendees need to follow the Guidelines for Prevention and Protection of COVID-19 and fill out the Personal Health Declaration Form before participating the event.
  • Videos from SotM 2020 are now starting to be available from here. Session pads are also available for each session.
  • Christoph Hormann reviewed the SotM 2020 artwork from a cartographic perspective and uses it to note some of the pitfalls of generalisation; it is hard to shrink the world to the size of a sticker. Bernelle Verster responded by explaining the realities of time constraints, available skills, and available contributors.

Humanitarian OSM

  • The Missing Maps Group congratulated HOT on being selected as an Audacious Project.
  • The HOT Tech Team has launched their new blog.
  • The second round of rapid response micrograntees has been announced.

Education

  • Researchers from the Ohio State University have studied access to green spaces in metro areas. They combined census-block-group demographics, OpenStreetMap data and socioeconomic data with satellite imagery to analyse access to green spaces and vegetation in two metropolitan areas: Columbus, Ohio, and Atlanta, Georgia.

Maps

  • OpenAndroMaps reports that more and more boundaries are grouped in relations and thus cannot be reliably processed by the MapsForgeWriter.
  • Andreas Schneider reported about his work on the adaptation of the OSM leisure map for the Garmin Fenix smartwatch.
  • [1] Swedish company, Ground Zero Communications AB, are trying to rebuild Wambacher’s OSM-Boundaries map series. The new site is pre-alpha.

switch2OSM

  • Pierre Béland thanks DJI for using an OpenStreetMap background for their Drones interactive rescue map and suggests that they should add the appropriate OSM attribution to recognise efforts of the community in supporting such projects.

Software

  • Michal Migurski, of the Operations Working Group, reported on action to monitor the server used for the OSM Q&A site. It is hoped that greater instrumentation of the server will help identify the cause of sporadic faults in the UI which we reported earlier.
  • Xiao Guoan has written a manual on setting up an OSM tile server for Debian Buster.
  • ZeLonewolf wrote a guide how to install a ‘complete’ Overpass stack on an Ubuntu 18.04 server.

Programming

  • Even Rouault, one of the main developers of the GDAL library, pointed out in an e-mail to the mailing lists GDAL-Dev and QGIS-Developer that Spatialite, with no new release for almost five years now, will soon no longer be compatible with current versions of the Proj library. The geospatial community has to decide whether to fork Spatialite or to remove spatial analysis functions from GDAL and QGIS.
  • Write access to the OpenStreetMap Trac will be disabled at the end of July. The OpenStreetMap Subversion repository will follow in August.
  • Mapbox took a further step out of the open source ecosystem. The release notes of the first alpha releases of Mapbox GL Native for iOS 5.10 and Mapbox GL Native for Android 9.3.0 mention that they have started depending on non-free dependencies released under Mapbox Terms of Service, not a free and open licence. Only Mapbox clients can access these binaries. (via @RichardF on Twitter).

Releases

  • Changelog JOSM / 2020-07-02: Stable release 16731 (20.06).
  • François has released a new version of indoor= that brings visual, interaction, and deployment enhancements. indoor= renders indoor data available in OpenStreetMap .

Did you know …

  • … Prof. Alexander Zipf presents HeiGIT gGmbH, the Heidelberg Institute for Geoinformation Technology at Heidelberg University, which aims to improve knowledge and technology transfer from fundamental research in geoinformatics to practical applications.
  • … the journaliststoolbox, a list of mapping and geocoding resources for journalists?
  • … the OSM mailing list for science?

OSM in the media

  • Listen to Jerry Brotton who navigates the transformation from paper to digital mapping, from print to pixels, and asks what is being gained and lost.
  • Julien Guillot searched (fr) (automatic translation) for an application that will find you the ideal cycling route. In their Libération article they note that both Google and Apple maps are indicative of the influence of car traffic in our societies, by their lack of cycling infrastructure information.
  • On the website of the Ukrainian blog 0629.com.ua an article has appeared (ru) (automatic translation) in which the ‘on-the-ground’ rule is discussed in relation to the disputed parts of the Donbass region.

Other “geo” things

Upcoming Events

Where What When Country
Munich Münchner Stammtisch 2020-07-14 germany
Hamburg Hamburger Mappertreffen 2020-07-14 germany
Nottingham Nottingham pub meetup 2020-07-21 united kingdom
Lüneburg Lüneburger Mappertreffen 2020-07-21 germany
Berlin 13. OSM-Berlin-Verkehrswendetreffen (Online) 2020-07-21 germany
Düsseldorf Düsseldorfer OSM-Stammtisch 2020-07-29 germany
Kandy 2020 State of the Map Asia 2020-10-31-2020-11-01 sri lanka

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by AnisKoutsi, Joker234, Nakaner, Nordpfeil, PierZen, Polyglot, Rogehm, SK53, Supaplex, TheSwavu, YoViajo, derFred, geologist.

Telling the story of governors of Mozambique

08:09, Sunday, 12 2020 July UTC
As part of my Africa project I look for political positions like Presidents, Prime Ministers, Ministers and now also Governors. I started with provinces et al because a South African minister of health of a province was considered to be not notable enough.

With Wikilambda or if you wish "Abstract Wikipedia" being a thing it is important to consider how the story is told. The bare bones of a story already shows in Reasonator. Most of the Mozambican governors are new to Wikidata. They have a position of  "governor of their state", a start and end date and as applicable a predecessor and a successor. Obviously they are politician and Mozambican.

This time I had to go for the Portuguese Wikipedia for a source. There is a list mixed with colonial governors and they need to fit a different mold. They are Portuguese and arguably they are not politicians but administrators. 

What I am eager to learn is how Wikilambda will be able to tell these stories. How it will expand the stories as more is known. I wonder if a tool like ShEx will play a role. Anyway, good times.
Thanks,
      GerardM

What Does A Volunteer Development Coordinator Do?

08:01, Sunday, 12 2020 July UTC

A giant wall of text follows, giving a snapshot of work I do. I nurture the software community that supports the Wikimedia movement. So here's a big swath of stuff I did between February 1st and today.

Wrote and posted a blog entry about the San Francisco hackathon. Still need to do more followup with participants.

Handed over the MediaWiki 1.19 deployment communications plan to Guillaume Paumier, WMF Technical Communications Manager. He blogged a summary of the deployment and of our efforts and that's just the tip of the iceberg; he also set up a global message delivery and improved the CentralNotice maintenance message and did even more to make sure that we thoroughly communicate about the upcoming deployment to all the Wikimedia communities. I also shared information with various folks regarding testing of site-specific gadgets on 1.19.

I sent at least 285 work-related emails. That's 41 per workday but I definitely sent some work-related email on weekends.

Some patch queue work, responding to contributors and getting experienced developers to review the patches. I'm just trying to keep our queue from growing while code reviewers are mostly focused on getting MediaWiki 1.19 reviewed, polished, and deployed. But I do want to take care of all parts of the volunteer pipeline -- initial outreach and recruiting, training, code improvement, commit access, continued interest and participation, and debriefing when they leave -- so the patch review queue is a continuing worry.

Some work preparing for the Pune hackathon and for GLAMCamp DC, neither of which I am attending. I wrote or edited some tutorials and made a tutorial category which pleases me. We have more good material for workshops and stuff now, yay! And I helped the GLAMCamp people a bit in talking through what technical goals they wanted to achieve during the weekend.

Got dates from Wikimedia Germany for the Berlin hackathon, 1-3 June, and started trumpeting it. Also worked on planning for it and did outreach. For example, I reached out to about 13 chapters that are pursuing or interested in some kind of technology work like, say, funding or working on the offline Wikipedia reader (Wikimedia Switzerland), or usability and accessibility for Wikisource (Wikimedia Italy), or the Toolserver, a shared hosting service for tools and stuff that hackers use to improve or make use of the Wikimedia sites (for example, Wikimedia Germany & Wikimedia Hungary). We hope they can convene, share insights and collaborate at the WMDE hackfest.

Told at least 30 contributors to apply for Wikimania scholarships because the deadline is 16 February.

Talked to some Wikimedia India folks about planning technical events, and contributed to a page of resources for upcoming events.

Worked on some event planning & decisions for a potential event.

Passed the word to some friends, acquaintances, and email lists about some job openings at the Foundation.

Google Summer of Code has been announced, and I am managing MediaWiki's participation. I have started -- flyers, emails, recruiting potential students, improving the wiki page, asking experts whether they might mentor, and so on. I'm trying to start a thing where every major women's college in North America gets a GSoC presentation by March 15th, to improve the number of GSoC applications that come from women; let's see how that goes. MediaWiki still needs to apply to participate as a mentoring organization and acceptances only go out after that, but I'm comfortable spending time preparing anyway. And the women's college outreach will lead to an increase in the number of applications for all the participating open source projects, instead of just aiming a firehose at MediaWiki; that's fine. Like Tim O'Reilly says, aim to create more value than you capture.

Related to that -- I set up a talk for one of our engineers to give at Mills, a women's college that has an interesting interdisciplinary computer science program (both grad and undergrad, the grad program being mixed-sex) and I think it may end up being a really amazing talk. Ian Baker is going to talk about how CS helps us work in Wikimedia engineering, how we collaborate with the community during the design, development, and testing phases, and what skills and experiences come in handy in his job. I'll publicize more once there's an official webpage to point to.

Had a videoconference with a developer and my boss about our conversion to Git. I prepped for it by collecting some questions and getting preliminary answers, and then after the call we ended up with all this raw material and I sent a fairly long summary to the developers' mailing list. There's a lot left to do, and the team needs to work on some open issues, but we have a lot more detail on those TODOs now, so that's good.

Saw a nice email from Erik Möller publicizing the San Francisco hackathon videos and tutorials/documentation, yay!

Talked with a few people about submitting talks to upcoming conferences. I ought to write some preliminary Grace Hopper, Open Source Bridge, and Wikimania proposals this week.

Various volunteer encouragement stuff (pointing to resources, helping with installation or development problems, troubleshooting, teaching, putting confused people in touch with relevant experts, etc.), especially talking in IRC to eager students who want to do GSoC. Many of them are from India. I wonder how many of them see my name and think I'm in India too.

Commit access queue as usual.

Saw privacy policy stuff mentioned on an agenda for an IRC meeting on the 18th, so I talked to a WMF lawyer a little bit about privacy policy stuff for our new Labs infrastructure. We set up a meeting for this week to iron stuff out.

Helped with the monthly report. I have a colleague who wants to learn more about All This Engineering Stuff, so every month we have a call where I explain and teach the context of the report, and for this month's call I suggested we add another colleague who also wants to learn how the tech side works. Who knows, maybe this will turn into a tradition!

Followed up on the GSoC 2011 students who never quite got their projects set up and deployed on Wikimedia servers, and looks like two of them have some time and want to finish it now, yay! Updated the Past Projects page.

Checked in on the UCOSP students who are working on a mobile app for Wiktionary and told them about Wikimania, new mobile research, etc. Also got some feedback from their mentor, Amgine, on how they're doing.

Started an onwiki thread to discuss the MobileFrontend rewrite question(s).

Talked to Oren Bochman, the volunteer who's working on our Lucene search stuff, and followed up on a bunch of his questions/interests.

Ran & attended meetings.

Suggested to the new Wikimedia Kenya chapter that maybe we could collaborate, since they're interested in helping schools get Wikipedia access via offline reading.

Looked into the code review situation by getting a list of committers with their associated numbers of commits, reviews, and statuschanges. It's just a first pass, but it's a start for discovering who's been committing way more than they review, so we can start efforts to mentor them into more code reviewing confidence. I also saw who's been reviewing way more than they commit, and saw a name I wasn't familiar with -- looks like I've now successfully recruited him to come to the Berlin hackathon. :-)

Put two groups of people in touch with each other: did a group-intro mail to people at various institutions working on Wikimedia accessibility, and another to people who want to work on a redesign of mediawiki.org's front page.

And there was other miscellaneous stuff, but this is already sooooo TL;DR (too long; didn't read). (Which is funny because that's the name of my team.) Monday awaits!

New names for everyone!

12:45, Saturday, 11 2020 July UTC

The Cloud Services team is in the process of updating and standardizing the use of DNS names throughout Cloud VPS projects and infrastructure, including the Toolforge project. A lot of this has to do with reducing our reliance on the badly-overloaded term 'Labs' in favor of the 'Cloud' naming scheme. The whole story can be found on this Wikitech proposal page. These changes will be trickling out over the coming weeks or months, but one change you might notice already.

New private domain for VPS instances

For several years virtual machines have been created with two internal DNS entries: <hostname>.eqiad.wmflabs and <hostname>.<project>.eqiad.wmflabs. As of today, hosts can also be found in a third place: <hostname>.<project>.eqiad1.wikimedia.cloud. There's no current timeline to phase out the old names, but the new names are now the preferred/official internal names. Reverse DNS lookups on a instance's IP address will return the new name, and many other internal cloud services (for example Puppet) will start using the new names for newly-created VMs.

Eventually the non-standard .wmflabs top level domain will be phased out, so you should start retraining your fingers and updating your .ssh/config today.

This Month in GLAM: June 2020

15:59, Friday, 10 2020 July UTC

Emma Chow will graduate from Brown in 2021 with a Bachelor of Arts in American Studies. Last academic year, she was assigned to edit Wikipedia as a class assignment, supported by Wiki Education’s Student Program. She reflected on the experience on Brown’s website; her post is re-published here with permission.

Emma Chow
Emma Chow

For as long as I can remember, I have been told to not use Wikipedia.

I distinctly recall my first lesson about searching the web. In first grade, the school’s librarian taught us about reliable sources.  She projected the website page, “Pacific Northwest Tree Octopus” in front of the entire class. After spending time on our own researching the tree octopus, we reassembled as a group and the librarian asked, “Is the Pacific Northwest tree octopus real?” My tiny hand shot up with the rest of my peers in total belief of its existence. To my dismay, this would be the day that I learned that not everything you read online is true… It turned out that I was not the only one fooled by the tree octopus: Wikipedia currently has an article about the internet hoax that is the Pacific Northwest Tree Octopus. That class concluded, not surprisingly, with the warning to be very wary. The teacher said that anyone can contribute to online databases, including those “not qualified to do so.” The message was clear: the general public is not considered a reliable academic source, or so I thought until I wrote my own Wikipedia entry.

Moving forward fourteen years, I am now in my third year at Brown and enrolled last fall in an American Studies course entitled “American Publics.” The class, taught by Professor Susan Smulyan, examined the public sphere’s historical, cultural, and political dimensions, as well as the challenges of public life in America, all while also discussing the place of Public Humanities within the University. One of the main ​assignments for the course involved writing a Wikipedia page on a local Rhode Island subject. I chose to write on  Mashapaug Pond, the largest freshwater pond in the city of Providence.

Our class worked in partnership with the Rhode Island Council for the Humanities (RICH) and their Arts and Culture Research Fellow, Janaya Kizzie, on a project aimed at contributing Wikipedia pages about Rhode Island’s artistic and cultural leaders, and geo-cultural and historical sites. You can read about the RICH’s goals here and here. This project has an impressive range of goals: to enhance Rhode Island’s reputation as a creative destination, to forge a vital bridge between the past and present, connect arts and cultural communities, represent diverse backgrounds, and catalyze education focused on arts and culture – with all of this to be achieved though the medium of Wikipedia. My article on Mashapaug Pond, written with my classmate Grace DeLorme, is one of the 250 new articles that Janaya Kizzie is curating.

Despite my childhood teacher’s warnings, I use Wikipedia constantly. That said, I have never cited it in any of my academic essays out of concern about its questionable reliability and stature in the scholarly world. Yet we know Wikipedia is an obligatory first stop for many as they begin online research. It makes the complex simple; easily accessible, its language and tone facilitate direct, approachable use. Scholarly articles, in contrast, can seem incomprehensible, with their density of language and reliance, for many, on what are unfamiliar terminologies. Yet, and to my surprise, I had difficulty writing at the level of a 5th grade reader, the requirement for authoring a Wikipedia entry.

Throughout my education, I have always been taught to “pick a side” and make an argument when writing a paper. So, when I sat down to write a Wikipedia page on a local pond, I was lost. Every time I read something controversial about the pond, whether it was the mass genocide of the Native Americans, the forced eviction of the West Elmwood community, or the toxic legacy of the Gorham Silver plant, I found myself spiraling into an argumentative mindset. It was challenging to write for the “public:” I had to force myself to exclude the complex discourse that I would have normally have used in my academic writing. By adding to and completing the Mashapaug Pond Wikipedia page, I learned to be clearer, more concise, and thus accessible in my writing.  As a final source of challenge, my classmates and I sometimes found it difficult to find the Wikipedia-required two primary sources for each entry!

Co-authoring this page connected me to the history and the community of Providence. When I decided to move from my hometown of Boulder, Colorado to go to school at Brown, I knew nothing about the city. Shockingly, after two years of living there, my experience of Providence and Rhode Island was grounded solely within the University, barely extending beyond College Hill. I was locked inside the “Brown Bubble,” and it wasn’t until I wrote this Wikipedia entry that I began to know something about the history and community of my new city.

I learned many local lessons, extending to the Narragansett tribe and their interactions with Roger Williams, to the manufacturing sector, exemplified by Gorham Silver, and the history of urban renewal. In addition, thinking about Mashpaug Pond brought me into contact with a local arts organization, first called the Urban Pond Procession and later UPPArts, and their campaign to raise awareness of the pond’s toxicity. It was incredible that I learned so much about an ever-changing community from one particular location.

I especially appreciated having contact with people around Providence as I talked to them about Mashapaug Pond. One unforgettable person I met was Holly Ewald, artist and founder of UPP Arts.  Beyond talking about the site, she shared with me the stories of the annual parade produced by the Mashapaug Pond community.  I also came to know another community member, Mariani Lefas-Tetens, the Assistant Director of School and Teacher Programs at the Rhode Island School of Design Museum. She discussed her work about the new Gorham Silver exhibit at the Museum and how she met with different community members to find out what they’d like the installation to express.  Researching and writing about a local pond thus enabled me to access a broad range of knowledge about the wider Providence community.

I would definitely describe writing and researching a Wikipedia page as a form of public engagement. This is based not only my own expanded understanding of Providence, but also my deeper thinking about Wikipedia as a public platform. It defines itself as a “multilingual digital encyclopedia generated and preserved as an open collaboration project by a community of volunteer editors.” Indeed, it is THE largest general reference work on the internet. I would go as far to say that Wikipedia is a “public sphere.” Jürgen Habermas, German sociologist and philosopher, gives this definition of such a concept, fittingly written for an encyclopedia:

By “public sphere” we mean first of all a domain of our social life in which such a thing as public opinion can be formed. Access to the public sphere is open in principle to all citizens. A portion of the public sphere is constituted in every conversation in which private persons come together to form a public… Citizens act as a public when they deal with matters of general interest without being subject to coercion; thus with the guarantee that they may assemble and unite freely, and express and publicize their opinions freely.[1]

Wikipedia is organized with ‘talk’ and ‘edit’ tabs that invite participant-editors to converse and share ideas. Even though this is easier said than done on particular pages (don’t even try to add or change anything on the Beyoncé page), in principle, Wikipedia is open to all. It is completely free. So as long as someone has access to the internet, they can fully participate with this resource. Also, because companies cannot write or hire someone else to write their own Wikipedia page, this eliminates coercion. And, as shown by my experience, it also opens the way for college students to contribute to their new communities. By writing about both local history and contemporary life, a place is given a visibility that makes more public opportunities possible.

Now let me ask you: would you write a Wikipedia page? Cite it in one of your academic papers? Assign it as a class project? Today I encourage you to take a leap of faith and join the digital public sphere that is Wikipedia.

Also feel free to check out my Wikipedia page here!

Other pages from our class can be found below:

RI Computer Museum https://en.wikipedia.org/wiki/Rhode_Island_Computer_Museum

Zeinab Kante and Kelvin Yang​

RI State Song https://en.wikipedia.org/wiki/Rhode_Island’s_It_for_Me and here as well, https://en.wikipedia.org/wiki/Music_of_Rhode_Island

Keiko Cooper-Hohn​

Rites and Reason Theatre https://en.wikipedia.org/wiki/Rites_and_Reason_Theatre

Khail Bryant and Mia Gratacos-Atterberry​

Providence Art Club https://en.m.wikipedia.org/wiki/Providence_Art_Club

Sophie Brown and Ava McEnroe​

RI Pride https://en.wikipedia.org/wiki/Rhode_Island_Pride

Evan Lincoln and Juniper Styles

Shey Rivera https://en.wikipedia.org/wiki/Shey_Rivera_R%C3%ADos

Sara Montoya and Santi Hernandez

Donna Freitas https://en.wikipedia.org/wiki/Donna_Freitas

Matthew Marciello

 

[1] Jürgen Habermas, “The Public Sphere: An Encyclopedia Article (1964),” translated by Sara Lennox and Frank Lennox, New German Critique (3)1974, 49.

Just one week left to sign up for our 2020 AGM!

11:32, Friday, 10 2020 July UTC

Photo: Wikimedia UK 2019 AGM , by John Lubbock. CC BY-SA 4.0.

We’re gearing up for this year’s Annual General Meeting with just over a week to go until the event! On Saturday 18th July we will be meeting virtually, and though we won’t see each other in person we’re hoping to make the day as interactive as possible.

We’ll be using Zoom, with a conference link sent to all Eventbrite sign ups, so claim your free ticket here. If you’re not sure how to use Zoom, you can watch instructions on this support page, or contact us with any queries. While the AGM is an opportunity for our members to vote on essential governance of the charity, we also encourage participation from volunteers, partners, supporters, and anyone else who’s interested in the Wikimedia community in the UK.

Agenda

11am Welcome and introduction, including technical onboarding

11.15am Keynote talk from Gavin Wilshaw, Mass Digitisation Manager at the National Library of Scotland, on the Wikisource project that the library has been delivering since the March shutdown

11.35am Q&A with Gavin, and contributions from other participants about their work on Wikimedia during and in response to the pandemic

12noon BREAK

12.15pm A global movement – short updates on Wikimedia 2030, the rebranding project, and the Universal Code of Conduct

12.30pm Lightning talks

1pm BREAK (+ social networking)

2pm Start of the formal AGM: reports, questions and announcement of voting results for the Resolutions and Elections

3pm Wikimedian of the Year and Honorary Member Awards

3.30pm Thanks and close

Proxy voting

All voting for this year’s AGM will happen by proxy. Our current Articles of Association require members to be present in person at the AGM to vote on the day, something we’re not able to facilitate this year. This means all votes must be submitted by the proxy deadline which is 2pm on Thursday 16th July. You can find the director candidate statements here, ask them questions here, and read through the resolutions we’ll be voting on here.

Voting packs have been sent to all members by email, but if you haven’t received one and you think you should have, please do get in touch with Katie at membership@wikimedia.org.uk.

Sign up

You must register for the AGM on Eventbrite rather than on wiki as we need your full name, not Wikimedia user name, and we’ll be sending out video conference links to all attendees registered through Eventbrite closer to the date.

If you have any questions or more general comments, please do get in touch with Katie at membership@wikimedia.org.uk.

We look forward to seeing you next Saturday!

 

If you’d like to join Wikimedia UK as a member or renew your current membership, you can do so here. To support free access to knowledge through our programmes, you can support us here.

Celebrating 600,000 commits for Wikimedia

11:27, Friday, 10 2020 July UTC

Earlier today, the 600,000th commit was pushed to Wikimedia's Gerrit server. We thought we'd take this moment to reflect on the developer services we offer and our community of developers, be they Wikimedia staff, third party workers, or volunteers.

At Wikimedia, we currently use a self-hosted installation of Gerrit to provide code review workflow management, and code hosting and browsing. We adopted this in 2011–12, replacing Apache Subversion.

Within Gerrit, we host several thousand repositories of code (2,441 as of today). This includes MediaWiki itself, plus all the many hundreds of extensions and skins people have created for use with MediaWiki. Approximately 90% of the MediaWiki extensions we host are not used by Wikimedia, only by third parties. We also host key Wikimedia server configuration repositories like puppet or site config, build artefacts like vetted docker images for production services or local .deb build repos for software we use like etherpad-lite, ancillary software like our special database exporting orchestration tool for dumps.wikimedia.org, and dozens of other uses.

Gerrit is not just (or even primarily) a code hosting service, but a code review workflow tool. Per the Wikimedia code review policy, all MediaWiki code heading to production should go through separate development and code review for security, performance, quality, and community reasons. Reviewers are required to use their "good judgement and careful action", which is a heavy burden, because "[m]erging a change to the MediaWiki core or an extension deployed by Wikimedia is a big deal". Gerrit helps them do this, providing clear views of what is changing, supporting itemised, character-level, file-level, or commit-level feedback and revision, and allowing series of complex changes to be chained together across multiple repositories, and ensuring that forthcoming and merged changes are visible to product owners, development teams, and other interested parties.

Across all of repositories, we average over 200 human commits a day, though activity levels vary widely. Some repositories have dozens of patches a week (MediaWiki itself gets almost 20 patches a day; puppet gets nearly 30), whereas others get a patch every few years. There are over 8,000 accounts registered with Gerrit, although activity is not distributed uniformly throughout that cohort.

To focus engineer time where it's needed, a fair amount of low-risk development work is automated. This happens in both creating patches and also, in some cases, merging them.

For example, for many years we have partnered with TranslateWiki.net's volunteer community to translate and maintain MediaWiki interfaces in hundreds of languages. Exports of translators' updates are pushed and merged automatically by one of the TWN team each day, helping our users keep a fresh, usable system whatever their preferred language.

Another key area is LibraryUpgrader, a custom tool to automatically upgrade the libraries we use for continuous integration across hundreds of repositories, allowing us to make improvements and increase standards without a single central breaking change. Indeed, the 600,000th commit was one of these automatic commits, upgrading the version of the mediawiki-codesniffer tool in the GroupsSidebar extension to the latest version, ensuring it is written following the latest Wikimedia coding conventions for PHP.

Right now, we're working on upgrading our installation of Gerrit, moving from our old version based on the 2.x branch through 2.16 to 3.1, which will mean a new user interface and other user-facing changes, as well as improvements behind the scenes. More on those changes will be coming in later posts.


Header image: A vehicle used to transport miners to and from the mine face by 'undergrounddarkride', used under CC-BY-2.0.

Modeling wrongful convictions on Wikidata

21:00, Thursday, 09 2020 July UTC

Wikidata continues to play a more central role on the internet by supplying digital assistants like Siri and Alex with facts. Amplifying Wikidata’s facts through these digital assistants has implications for discovering new ideas, answering questions, and representing history accurately. In a recent Wikidata course, this exact concern came up in the area of criminal justice.

Wrongful convictions have happened consistently throughout history. This is well-documented on Wikipedia and on projects like the National Registry of Exonerations. There is also the Innocence Project, whose mission is “is to free the staggering number of innocent people who remain incarcerated, and to bring reform to the system responsible for their unjust imprisonment.”

In our Wikidata courses we introduce participants to several aspects of the open data repository — how linked data works, how to express relationships between items, and also how to create new ways of structuring information. This last concept is extremely important not only to ensure the completeness of Wikidata for the millions that use digital assistants, but also to ensure content is accurate and able to be queried.

If you consider how foundational structuring information is to the usefulness of Wikidata, then discovering information that has yet to be structured is one of the most important parts of contributing to Wikidata. The way to express these relationships on Wikidata is through properties (“depicts,” “country,” and “population,” for example). In one of our recent courses, a participant discovered a missing property and took the initiative to create it.

Ken Irwin, a trained reference librarian, has a passion for criminal justice reform. While searching Wikidata during our beginner course, he noticed that there were only properties concerning conviction. By only having conviction data, any post-conviction data could not be represented on Wikidata. The Wikidata community, of course, has a process for creating new properties. Irwin proposed an “exonerated of” property, and editors began discussing ways to structure this kind of data. An interesting question about data modeling followed.

Wikidata editors revealed potential ways to model post-conviction data. An “exonerated of” property would cover some of this information, but what about documenting data about pardons, amended sentences, and extended sentences? There is also an ongoing debate as to whether this information should exist as a qualifier, modifying a conviction property (since you cannot have an exoneration without a conviction). Anther school of thought suggested that exoneration data should exist as its own property since that particular person would have been cleared of any conviction.

These kinds of discussions have a direct impact on queries — how to pull this information from Wikidata, and how to associate it with other criminal justice data (i.e., where does this fall in the spectrum of rendering judgement, etc.).

After a period of debate, this property was approved in January 2020. You can see how many items use this property by clicking this link. Once that page loads, click the blue “play button” in the lower left of your screen to run the query.

These query results embody the word “wrong” in the phrase “wrongfully convicted,” and have a far reaching implication when it comes to describing people accurately. The ability to improve accuracy around the representation of information is one of the many reasons why so many people are drawn to working on Wikidata.

Stories like this underscore the importance of editors pursuing their passions, uncovering gaps, and taking steps to address those gaps on Wikidata. It is only in this way that those who choose to get involved in open data will be able to make Wikidata more reliable, equitable, and useful for all users and for anyone represented on Wikidata.

Interested in taking a course like the one Ken took? Visit learn.wikiedu.org to see current course offerings.

Following on from my blog post using OpenRefine for the first time, I continued my journey to fill Wikidata with all of the Tors on Dartmoor.

This post assumes you already have some knowledge of Wikidata, Quickstatements, and have OpenRefine setup.

Getting some data

I searched around for a while looking at various lists of tors on Dartmoor. Slowly I compiled a list that seemed to be quite complete from a variety of sources into a Google Sheet. This list included some initial names and rough OS Map grid coordinates(P613).

In order to load the data into OpenRefine I exported the sheet as a CSV and dragged it into OpenRefine using the same process as detailed in my previous post.

Reconciliation in OpenRefine

This data set doesn’t yet link to Wikidata at all! And that’s where the OpenRefine reconciliation features get used once again.

Column5 represents something that is close to a label for Wikidata items, and that is what I will use for reconciliation alongside matching the type of tor(Q1343179).

Reconciliation took a few minutes and matched the tors that already exists on Wikidata with the names that were loaded into OpenRefine. Depending on the data you’re reconciling with you might want to choose a more general type or even no type at all, but be prepared to do more manual work matching things.

The screenshot below shows the records, with reconciliation applied, filtered by judgement state (on the left hand side). “matched” refers to records that were already linked to a Wikidata item and “none” refer to those that need some manual work.

Note: this screenshot was taken after I performed my data load, hence many are matched, but it still illustrates the manual matching process.

Even the “matches” records should probably be checked, depending on the options that are used for reconciliation. Next, the records with no match need to either be connected to one of the found Wikidata items or set to “Create new item”.

The case described here is very simple, and there are many more details that can be taken into account with reconciliation. You can find more docs here.

Mutating a data element

The grid reference is in the data set is not yet in the correct format for Wikidata which expects a format with no spaces such as SX604940.

To do this the “Edit cells” > “Replace” option can be used to simply replace any whitespace with nothing.

Although the screenshot doesn’t show much, as whitespace is being replaced, this had the desired effect on the data!

There are also many other mutations that can be applied, including regex alterations which open up a world of possibilities.

Mapping to Wikidata

The “Schema” tab is the next one to look at, allowing mapping the simple table of data to Wikidata items and statements.

To get here I clicked “+ add item” and used tor(Q1343179) as the type for the items.

The name of the tor which is in my Column5 can be used as an English label.

Finally, the one data value from my table can be included as a Statement, using OS grid reference(P613) can be added referring to Column9 for the value. The data set also included a URL value in another column which was the source of the grid reference. This was also added as a Reference with a retrieved(P813) date.

Editing with Quickstatements

I’m sure there is a way to create these items within OpenRefine itself, however, I wanted to have try out the Quickstatements integration, which is why I chose this creation method.

Under the “Wikidata” menu there is an item allowing an “Export to QuickStatements”. Clicking this will general a list of Quickstatments commands (sample below).

Q1343179       Len     "Fox Tor (Fox Tor Mires)"
Q1343179        P613    "SX62616981"    S813    +2020-07-08T00:00:00Z/11        S854    "https://someURL"
Q1343179        P613    "SX74257896"    S813    +2020-07-08T00:00:00Z/11        S854    "https://someURL"
Q1343179        P613    "SX70908147"    S813    +2020-07-08T00:00:00Z/11        S854    "https://someURL"
Q1343179        P613    "SX55689094"    S813    +2020-07-08T00:00:00Z/11        S854    "https://someURL"

These commands can be pasted into a “New batch” on the quickstatments tool.

Clicking “Import V1 commands” and then “Run” will start making your edits.

The edits

You can see the initial batches of edits in the editgroups tool (which indexes this sort of batched editing) here and here. The first was a small test batch, the second completing the full run.

The post Creating new Wikidata items with OpenRefine and Quickstatements appeared first on Addshore.

Labs and Tool Labs being renamed

08:02, Wednesday, 08 2020 July UTC

(reposted with minor edits from https://lists.wikimedia.org/pipermail/labs-l/2017-July/005036.html)

TL;DR

  • Tool Labs is being renamed to Toolforge
  • The name for our OpenStack cluster is changing from Labs to Cloud VPS
  • The prefered term for projects such as Toolforge and Beta-Cluster-Infrastructure running on Cloud-VPS is VPS projects
  • Data Services is a new collective name for the databases, dumps, and other curated data sets managed by the cloud-services-team
  • Wiki replicas is the new name for the private-information-redacted copies of Wikimedia's production wiki databases
  • No domain name changes are scheduled at this time, but we control wikimediacloud.org, wmcloud.org, and toolforge.org
  • The Cloud Services logo will still be the unicorn rampant on a green field surrounded by the red & blue bars of the Wikimedia Community logo
  • Toolforge and Cloud VPS will have distinct images to represent them on wikitech and in other web contexts

In February when the formation of the Cloud Services team was announced there was a foreshadowing of more branding changes to come:

This new team will soon begin working on rebranding efforts intended to reduce confusion about the products they maintain. This refocus and re-branding will take time to execute, but the team is looking forward to the challenge.

In May we announced a consultation period on a straw dog proposal for the rebranding efforts. Discussion that followed both on and off wiki was used to refine the initial proposal. During the hackathon in Vienna the team started to make changes on Wikitech reflecting both the new naming and the new way that we are trying to think about the large suite of services that are offered. Starting this month, the changes that are planned (T168480) are becoming more visible in Phabricator and other locations.

It may come as a surprise to many of you on this list, but many people, even very active movement participants, do not know what Labs and Tool Labs are and how they work. The fact that the Wikimedia Foundation and volunteers collaborate to offer a public cloud computing service that is available for use by anyone who can show a reasonable benefit to the movement is a surprise to many. When we made the internal pitch at the Foundation to form the Cloud Services team, the core of our arguments were the "Labs labs labs" problem and this larger lack of awareness for our Labs OpenStack cluster and the Tool Labs shared hosting/platform as a service product.

The use of the term 'labs' in regards to multiple related-but-distinct products, and the natural tendency to shorten often used names, leads to ambiguity and confusion. Additionally the term 'labs' itself commonly refers to 'experimental projects' when applied to software; the OpenStack cloud and the tools hosting environments maintained by WMCS have been viable customer facing projects for a long time. Both environments host projects with varying levels of maturity, but the collective group of projects should not be considered experimental or inconsequential.

Using OpenRefine with Wikidata for the first time

23:20, Tuesday, 07 2020 July UTC

I have long known about OpenRefine (previously Google Refine) which is a tool for working with data, manipulating and cleaning it. As of version 3.0 (May 2018), OpenRefine included a Wikidata extension, allowing for extra reconciliation and also editing of Wikidata directly (as far as I understand it). You can find some documentation on this topic on Wikidata itself.

This post serves as a summary of my initial experiences with OpenRefine, including some very basic reconciliation from a Wikidata Query Service SPARQL query, and making edits on Wikidata.

In order to follow along you should already know a little about what Wikidata is.

Starting OpenRefine

I tried out OpenRefine in two different setups both of which were easy to set up following the installation docs. The setups were on my actual machine and in a VM. For the VM I also had to use the -i option to make the service listen on a different IP. refine -i 172.23.111.140

Getting some data to work with

Recently I have been working on a project to correctly add all Tors for Dartmoor in the UK to Wikidata, and that is where this journey would begin.

This SPARQL query allowed me to find all instances of Tor (Q1343179) on Wikidata. This only came up with 10 initial results, although this query returns quite a few more results now as my work has continued.

SELECT ?item ?itemLabel ?country ?countryLabel ?locatedIn ?locatedInLabel ?historicCounty ?historicCountyLabel
WHERE 
{
  ?item wdt:P31 wd:Q1343179.
  OPTIONAL { ?item wdt:P17 ?country }
  OPTIONAL { ?item wdt:P131 ?locatedIn }
  OPTIONAL { ?item wdt:P7959 ?historicCounty }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

And I used the download CSV file option for my data set to load into OpenRefine.

This may not be the best way to work with SPARQL and OpenRefine, but after reading the docs this is where I ended up, and it seemed to work quite well.

Loading the data

One of the first options that you’ll see is the option to load a file from your computer. This file can be in multiple formats, CSV included, so this loading process from the query service was straight forward. It’s nice to see how many options there are for different data formats and structures. Even loading data directly from Google sheets, or from the Clipboard is supported.

I picked a project name, hit the “Create Project” button, and my data was loaded!

Connecting to Wikidata

After a quick look around the UI I found the Wikidata button with the menu item “Manage Wikidata account”. Clicking on this prompted me to log in with my Username and Password.

In most cases you probably don’t want to enter your password into any 3rd party application, OpenRefine included. MediaWiki, and thus Wikidata, allows you to create separate passwords for use in applications that have slightly restricted permissions. You can read the MediaWiki docs here and find the page to create them on Wikidata here.

I created a new “bot” with the name “OpenRefine”, and a few basic rights that I figured OpenRefine might need, including:

  • Basic rights
  • High-volume editing
  • Edit existing pages
  • Create, edit, and move pages

This new account, with the username and password that is generated as part of this process, could then be used to log into OpenRefine without sharing the password of my main account.

Reconcile & Edit

This basic example does not use OpenRefine at a high level, and there is still a lot of cool magic to be explored. But as a first step, I simply want to make a couple of basic edits.

Looking at the list of tors I could see some that I knew that had a historic country value set, but did not include a value for located in. So this is where I started, manually filling the gaps in the locatedIn column (you can see this below where locatedIn has a value, but the locatedInLabel does not.

Next to connect this currently arbitrary collection of data that I have loaded from a CSV to Wikidata in some way with a bit of simple ID based reconciliation. OpenRefine has a handy feature for just this case that can be found in the “Reconcile” >> “Use values as identifiers” option of the column that you want to reconcile. In my case, this is the item column and also the locatedIn column that I had altered, both of which are needed for the edits.

Next I tried the “Upload edits to Wikidata” button, which brought me to a page allowing me to map my table of data to Wikidata statements. For this first batch of edits, this involved dragging the item field into the item space on the page, and then filling out the property used for my data, and dragging locatedIn into place.

Once finished it looked something like this:

One of the next pages allows you to review all of the statements that will be “uploaded” when you hit the button for review, and also a page provides you with any warnings for things you may have missed.

For my current case that stated that I wasn’t adding references yet, which I was aware of, but chose to skip in this case.

Then a small edit summary is needed and you can hit the “Upload edits” button!

The edits

You can see the initial batches of edits in the editgroups tool (which indexes this sort of batched editing) here and here.

The edit groups tool is helpfully linked in the edit summary of the batch of edits.

One issue (not editing / logged out)

With version 3.3 I ran into one issue where my Wikidata session would apparently get logged out and OpenRefine and the Wikidata Toolkit (which OpenRefine uses) would choke on this case.

I already filed a bug here and the ability to logout and log back in again should be fixed with the next OpenRefine release! (It look less than 3 hours for the patch to get written and merged)

Further

I have already continued editing with OpenRefine beyond this first basic batch and hope to continue writing some more posts, but for now I hope this very basic example serves someone well for a small set of initial edits.

I’d love to see more integration with the Wikidata Query Service and OpenRefine. I’m sure it is probably in the pipes. But with the workflow followed in this blog post there is a lot of back and forth between the two, creating a CSV, downloading it, uploading to OpenRefine, making changes etc. And in order to then continue an existing project with fresh data, you need to repeat the whole process.

The post Using OpenRefine with Wikidata for the first time appeared first on Addshore.

Extending the Met’s reach with Wikidata

20:55, Tuesday, 07 2020 July UTC

Jennie Choi is the General Manager of Collection Information at the Met Museum. She recently took two of Wiki Education’s courses on Wikidata and reflects on her experience with the Wikimedia community in this guest blog post.

Partnering with leading members of the Wiki community has been invaluable. We’ve learned a lot from their experience and expertise, but I wanted to expand my skill set and become more self-sufficient in editing and uploading images. My contributions started modestly at edit-a-thons hosted by the museum. During these events I learned the basics about statements, properties, and references. I filled out some items where properties were missing and manually uploaded a handful of images. Given the size of our collection (500,000 records online), and the large number of new images added since the Open Access launch (roughly 60,000). I needed to learn how to make contributions more quickly and at scale.

The Metropolitan Museum began working with the Wiki community in 2017 when we launched our Open Access program. Led by our Wikimedian-in-Residence, Richard Knipel, over 375,000 public domain images were uploaded to Wikimedia Commons that year. This resulted in a huge increase in visibility for images in our collection. Wikipedia articles on a wide range of topics including Henry VIII, Vincent Van Gogh, pineapples, Down Syndrome, and the economy of Japan have all used images from The Met’s collection. We have seen incredible growth in Wikipedia page views containing our files. Between April 2017 and April 2020 views increased by 576%.

With the launch of our public API in 2018 we expanded our collaboration with the community. During an AI hackathon with Microsoft and MIT that year we worked with Wikimedian strategist Andrew Lih who worked to create a Wikidata game that allowed users to validate depicts statements generated by an AI algorithm. We have continued working with Andrew as he has led the effort to create Wikidata items for works in our collection. Because of its structured data schema, links to other items, language independence, and reliance on Wikidata by internet search engines and voice assistants, we see great benefits in extending the reach of our collection by contributing to this important resource. Most rewarding is being able to reach new audiences and seeing our objects in new contexts. With Andrew’s guidance, over 14,000 items have been created for our objects. 

My first project was to enhance our existing Wikidata items wherever possible. Many of our items were missing images, dates, depicts, and other statements important for describing artworks. Using the Quick Statements tool I was able to add over 1,000 images to our public domain records. Earlier this year I mapped all our subject keywords to their corresponding Wikidata items using Open Refine and stored the Q numbers in our cataloguing database. This made it very easy to add depicts statements to over 4,000 Wikidata items where this was lacking. My next project will be to continue creating new items for works in our collection. We are fortunate to be building on Andrew’s work. He has made tremendous progress in developing a data model for the GLAM community and has also developed a crosswalk database mapping our thousands of object names to Wikidata items. While we do have a significant number of records on Wikidata, there are large areas of the collection that have not been added yet, like British satirical prints, Civil War photographs, African textiles, and our large baseball card collection, which many people probably don’t know we have. Tools like Quick Statements make creating items much easier but we still face challenges in creating items for more complex artworks. Many objects have multiple creators like the tapestry room Croome Court, which has several designers and makers who worked on portions of the room, as well as a manufacturer, and a workshop director. Most of the qualifiers for these names like “Room after a design by” and “Plaster ceiling and cornice moldings by do not exist on Wikidata. Other objects like suits of armor may consist of different components created during different and approximate time periods. Sword scabbards can have multiple dimensions and will often include weights. More work is needed in developing a data model that will allow us to more accurately describe our collection, which we can then share with other art museums.

In addition to enhancing and creating new Wikidata items I’ve started uploading more of our public domain images to Wikimedia Commons. During the past three years we have acquired more works and digitized thousands of new objects. We want to continue sharing our open access content with the world. Using the Patty Pan tool I uploaded over 5,000 images during the past month. I’m hoping the development of new tools will make it easier to add structured data statements to our Commons images, which will make our collection even more discoverable.

We’re still in the early stages of our work with Wikidata. There are many more areas I look forward to exploring, including creating records for all our artists, using queries to generate compelling data visualizations, adding translated content from our printed guidebook, and working with other museums to further the development of data models for complex artworks. I’d like to develop a strategy to improve accessibility to our images on Commons. With hundreds of thousands of available images, how can we help users find our files more easily? I’d also like to explore ways to keep our records up to date in an automated manner. Our staff make cataloging changes everyday, how can we ensure these changes are reflected on Wikidata? 

Contributing our content to Wikimedia Commons and Wikidata has allowed the Met to further fulfill our mission to connect people to creativity, knowledge, and ideas. We are seeing our works used in new contexts that go well beyond art history while hopefully creating some inspiration along the way.

Interested in taking a course like the one Jennie took? Visit learn.wikiedu.org to see current course offerings.


Header/thumbnail image by Sailko, CC BY-SA 3.0 via Wikimedia Commons.

Before I was paid to work on code for a living, it was my hobby. My favorite project from when I was young, before The Internet came to the masses, was a support library for my other programs (little widgets, games, and utilities for myself): a loadable graphics driver system for VGA and SVGA cards, *just* before Windows became popular and provided all this infrastructure for you. ;)

The basics

I used a combination of Pascal, C, and assembly language to create host programs (mainly in Pascal) and loadable modules (linked together from C and asm code). I used C for higher-level parts of the drivers like drawing lines and circles, because I could express the code more easily than in asm yet I could still create a tiny linkable bit of code that was self-sufficient and didn’t need a runtime library.

High performance loop optimizations and BIOS calls were done in assembly language, directly invoking processor features like interrupt calls and manually unrolling and optimizing tight loops for blits, fills, and horizontal lines.

Driver model

A driver would be compiled with C’s “tiny” memory model and the C and asm code linked together into a DOS “.com” executable, which was the simplest executable format devisable — it’s simply loaded into memory at the start of a 64-KiB “segment”, with a little space at the top for command line args. Your code could safely assume the pointer value of the start of the executable within that segment, so you could use absolute pointers for branches and local memory storage.

I kept the same model, but loaded it within the host program’s memory and added one more convention: an address table at the start of the driver, pointing to the start of the various standard functions, which was a list roughly like this:

  • set mode
  • clear screen
  • set palette
  • set pixel
  • get pixel
  • draw horizontal line
  • draw vertical line
  • draw arbitrary line
  • draw circle
  • blit/copy

Optimizations

IIRC, a driver could choose to implement only a few base functions like set mode & set/get pixel and the rest would be emulated in generic C or Pascal code that might be slower than an optimized version.

The main custom optimizations (rather than generic “make code go fast”) were around horizontal lines & fills, where you could sometimes make use of a feature of the graphics card — for instance in the “Mode X” variants of VGA’s 256-color mode used by many games of the era, the “planar” memory mode of the VGA could be invoked to write four same-color pixels simultaneously in a horiz line or solid box. You only had to go pixel-by-pixel at the left and right edges if they didn’t end on a 4-pixel boundary!

SVGA stuff sometimes also had special abilities you could invoke, though I’m not sure how far I ever got on that. (Mostly I remember using the VESA mode-setting and doing some generic fiddling at 640×480, 800×600, and maybe even the exotic promise of 1024×768!)

High-level GUI

I built a minimal high-level Pascal GUI on top of this driver which could do some very simple window & widget drawing & respond to mouse and keyboard events, using the low-level graphics driver to pick a suitable 256-color mode and draw stuff. If it’s the same project I’m thinking of, my dad actually paid me a token amount as a “subcontractor” to use my GUI library in a small program for a side consulting gig.

So that’s the story of my first paying job as a programmer! :)

Even more nostalgia

None of this was new or groundbreaking when I did it; most of it would’ve been old hat to anyone working in the graphics & GUI program library industries I’m sure! But it was really exciting to me to work out how the pieces went together with the tools available to me at the time, with only a room full of Byte! Magazine and Dr Dobb’s Journal to connect me to the outside world of programming.

I’m really glad that kids (and adults!) learning programming today have access to more people and more resources, but I worry they’re also flooded with a world of “everything’s already been done, so why do anything from scratch?” Well it’s fun to bake a cake from scratch too, even if you don’t *have* to because you can just buy a whole cake or a cake mix!

The loadable drivers and asymmetric use of programming languages to target specific areas of work are *useful*. The old Portland Pattern Repository Wiki called it “alternating hard and soft layers”. 8-bit programmers called it “doing awesome stuff with machine language embedded in my BASIC programs”. Embedded machine code in BASIC programs you typed in from magazines? That was how I lived in the late 1980s / early 1990s my folks!

Future

I *might* still have the source code for some of this on an old backup CD-ROM. If I find it I’ll stick this stuff up on GitHub for the amusement of my fellow programmers. :)

Tech News issue #28, 2020 (July 6, 2020)

00:00, Monday, 06 2020 July UTC
TriangleArrow-Left.svgprevious 2020, week 28 (Monday 06 July 2020) nextTriangleArrow-Right.svg
Other languages:
Deutsch • ‎English • ‎Nederlands • ‎français • ‎italiano • ‎magyar • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎српски / srpski • ‎українська • ‎עברית • ‎العربية • ‎മലയാളം • ‎ไทย • ‎中文 • ‎日本語 • ‎한국어
There are lists for all the governors of all the current Nigerian states. They exist on many Wikipedias. The information was known to be incomplete and based on lists on the English Wikipedia, I added information on Wikidata and as a result these lists may update with better data.

Obviously, when you copy data across to another platform, errors will occur. Sometimes it is me, sometimes it is in the data. I have only indicated when a governor was in office and predecessors and successors. 

The data is provided in a way that makes it easy to query; no information on elections (many governors were not elected) but proper start and end dates. The dates are as provided on the Wikipedia lists, articles for a governor are often more precise. People from Nigeria often are known by different names, I did add labels where I needed them for my disambiguation. 

When you want to know how many of these fine gentlemen are still alive, it will take some effort to kill of those who are still walking around according to Wikidata. It is relevant to know if a governor was elected or not. To do that properly you want to include election data elsewhere; there is no one on one relation between a position, elected officials and them being in office.

There is plenty to improve on the data. When people do, Listeria lists will update. Maybe someone will consider updating the English Wikipedia lists.
Thanks,
        GerardM