www.fgks.org   »   [go: up one dir, main page]

Episode 129: Simon Stier

18:02, Tuesday, 03 2023 January UTC

🕑 1 hour 28 minutes

Simon Stier is a researcher at Fraunhofer ISC in Germany, as well as a freelance software developer and consultant. He is the developer of the MediaWiki- and Semantic MediaWiki-based Open Semantic Lab platform.

Links for some of the topics discussed:

Post 26537

00:27, Tuesday, 03 2023 January UTC

Toolforge Bundle version 1.4.6 is released.

Atari Mandelbrot fractal: imul16

05:24, Monday, 02 2023 January UTC

Another nostalgia+practice project I’m poking at on the Atari 800 XL is a Mandelbrot fractal generator, which I’m still in early stages of work on. This is mostly an exercise in building a 16-bit integer multiplier for the MOS 6502 processor, which has only addition, subtraction, and bit-shift operations.

The Mandelbrot set consists of those complex-plane points which, when iterating over z_i+1 = z_i^2 + c (where c is the input coordinate and z_0 = 0) never escape beyond |z_i| > 2. Surprisingly this creates a really cool shape, which has been the subject of fascination for decades:

https://en.wikipedia.org/wiki/Mandelbrot_set#/media/File:Mandel.png

To implement this requires three multiplications per iteration, to calculate zx^2, zy^2, and zy*zx. Famous PC fractal program Fractint used for low zooms a 16-bit integer size, which is good because anything bigger gets real slow!

For higher zooms Fractint used a 32-bit integer and 29 fractional bits for Mandelbrot and Julia sets, which leaves a range from -4..3.9, plenty big enough. For the smaller 16 bit size that means 3.13 layout, should be plenty for a few zooms in on a 160×192 screen. :D Multiplication creates a 32-bit integer with twice the integer bits, so 6.26 with a larger range which covers the addition results for the new zx and zy values.

These then need to be shifted back up and multiplied to get zx^2, zy^2, and zy*zx for the next iteration; the boundary condition is zx^2 + zy^2 >= 4.

imul16

Integer multiplication when you have binary shifts and addition only is kinda slow and super annoying. Because you have to do several operations for each bit, every cycle adds up — a single 16-bit add is just 18 cycles while a multiply can run several *hundred* cycles, and varies based on input.

Note that a 650-cycle function means a runtime of about a half a millisecond on average (1.79 MHz processor, with about 30% of cycles taken by the display DMA). The whole shebang could easily take 2-3 ms per iteration with three multiplications and a number of additions and shifts.

Basically, for each bit in one operand, you either add, or don’t add, the other operand with the corresponding bitshift to the result. If you’re dealing with signed integers you need to either sign-extend the operands to 32 bits or negate the inputs and keep track of whether you need to negate the output; not extending can be faster because you can assume the top 16 bits are 0 and shortcut some operations. ;)

Status and next steps

imul16 seems to be working, though could maybe use more tuning. I’ve sketched out the mandelbrot iteration function but haven’t written it yet.

Another trick Fractint used was trying to avoid having to go to max iterations within the “Mandelbrot lake” by adding a check for periodic repetition; apparently when working with finite precision often you end up with the operations converging on a repeating sequence of zx & zy values that end up yielding themselves after one or a few iterations; these will never escape the boundary condition, so it’s safe to cut off without going to max iterations. I’ll have to write something up with a little buffer of saved values, perhaps only activated by an adjacent max-iters pixel.

Once the internals are working I’ll wrap a front-end on it: 4-color graphics display, and allow a point-n-zoom with arrow keys or maybe joystick. :D

Uploading scanned letters with Pattypan

02:30, Monday, 02 2023 January UTC

Pattypan is a great tool for uploading lots of files to Commons. I’m using it today for a set of scans of airgraphs. I scanned them all yesterday, and gave them hopefully-useful filenames that contain a prefix, date, and bit of info about each. These documents all look pretty much the same when viewed as thumbnails, so it’s important to be able to determine at least something by glancing at the filename. That said, it’s never good to ‘‘rely’’ on metadata being in the filename nor on all files having the same structure to their names.

Anyway, after creating the spreadsheet for these files, I did some tweaking to the data columns:

  1. Filled the source column based on the name column:
    • Formula: =CONCAT("{{HMW|wiki=",B2,"}}")
    • Result: {{HMW|wiki=Airgraph 1943-12-30 Adele to Murray}}
  2. Extracted the date:
    • Formula: =REGEX(B3,"[0-9]{4}-[0-9]{2}-[0-9]{2}")
    • Result: 1943-12-30
  3. Crucially, any column that uses a formula needs to be copied to a new column with ‘‘values only’’ (i.e. “Edit / Paste special / Values only”). Otherwise Pattypan can’t see the formula result.

Atari photo/video viewer project

19:29, Sunday, 01 2023 January UTC

I recently picked up a vintage Atari 800 XL computer like one I had as a kid in the 1980s, and have been amusing myself learning more the low-level programming in that constrained environment.

The 8-bit Atari graphics are good for 1979 but pretty primitive; some sprite-like overlays (“player/missile graphics”) and a background that can either be character-mapped or a bitmap, trading off resolution for colors: 320×192 at 2 colors, 160×192 at 4 colors, or 80×192 at 9 colors (limited by the number of palette registers handy when they implemented the extended modes).

This not only means you have relatively few colors available for photorealistic images, but a 40 byte * 192 line framebuffer is 7680 bytes, a large amount for a computer with a 64KB address space.

However you have a lot of flexibility too: any scanline can switch modes in the display list so you can mix high-res text or instruments with many-colored playfields, and you can change palette registers between scanlines if you get the timing right.

I wondered whether video would be possible — if you go for the high res mode, and *do* touch every pixel, how long would it take to process a frame? Well, I did the numbers and it’s a *terrible* frame rate. BUT — if you had uncompressed frames ready to go in RAM or ROM, you can easily cycle between frames at 60 Hz, letting the display processor see each fresh frame.

With enough bank-switched ROM to back it, you could stream 60 Hz video at 480 KiB per second. A huge amount of data for its day, but now you could put a processed GIF animation onto a cartridge. ;)

So I’ve got a few things I want to explore on this fun project:

  • dithering to 4 colors, with per-scanline palettes (working as of December 2022)
  • can you also embed audio? 4-bit PCM at 15.8 or 7.9 KHz (working at 7.9; 15.8 may require a tweak)
  • try adding a temporal component to the error-diffusion dithering
  • add a couple lines of text mode for subtitles/captions

Dithering and palette selection

I’ve got a dither implementation hacked together in JS which reads in an image, sizes it, and then walks through the scanlines doing an error-propagation dither combined with a palette reduction.

To start with, the complete Atari palette is 7 bits (3 bits luminance, 4 bits hue, where 0 is grayscale and 1-15 are various points around the NTSC QI hue wheel). I took an RGB list of the colors from the net and, after gamma adjustment to linear space, perform an error-diffusion dither that looks for the closest color from the available palette then divides up the difference from the original color among neighboring pixels. At the end of the scanline, we count how many colors were used, including black which cannot be changed. If the remaining colors are > 3, they’re ranked based on usage and closeness and the least scoring color is removed. This is continued until the dither selects only colors that fit.

Formatting and playback

Due to a quirk of the Atari’s display processor, a frame buffer can’t cross a 4096-byte boundary, so with a 40-byte screen width you have to divide it into two non-contiguous sections. Selecting a widescreen aspect ratio (also to leave room for captions later) means there’s room enough to fit in arrays for the palettes as well (3 bytes per scanline) an to fit audio (131 or 262 bytes depending on sample rate).

Note that for extra fun, the hardware register that gives you the current scanline number gives you the count *divided by two*. This is because the whole signal has 262 scanlines per frame, which is bigger than 256 and doesn’t fit in a byte! :D

So it makes sense to handle these by waiting until we’re synced up on line 0 and then doing an explicit timing loop with horizontal blanking waits (STA WSYNC). This way we know if we’re on the 0 or the 1 subline, and can use the VCOUNT register (0..130) as an index into arrays of palette or audio entries.

For testing without simulating bank-switching, I’m squishing two frames into RAM and switching between the two by making a complex display list: basically just the same thing twice, but pointing at different frame buffers and looping back around.

It seems to work pretty nice! But the timing is tight and I have to disable interrupts.

Audio

The Atari doesn’t have DMA-based PCM audio where you just slap in some bytes and it plays the audio… you either use the square-wave generators, or you manually set the volume level of the voices for each sample *at the right time*.

Using the scan line frequency is handy since we’re already in there changing palette entries during horizontal blanking. Every freq is about 15.8 KHz, every other line is 7.9 KHz, slightly worse than telephone frequency.

It seems to work at 7.9 at least, and I might be able to do 15.8 with ROM backing (bank-switching every frame makes things easier vs a long buffer in RAM). Note that you only get 4 bits of precision, and unpacking two samples from one byte is annoyingly expensive. ;)

Next steps

The next thing I’ll try is a tweak to the dither algorithm to try to drive a more direct dither pattern between temporally adjacent frames; at least on an LCD, the 60 Hz flip looks great and it should “blend” even better on a classic CRT with longer phosphor retention times.

Then I’ll see if I can make a 1 MiB bank-switched cartridge image from the assembler that I can load in the emulator (and eventually flash onto a cartridge I can get for the physical device) so I can try running some longer animations/videos.

No rush though; I gotta get the flashable cartridge. ;)

Blog blog blog 2023

16:04, Sunday, 01 2023 January UTC

I resolve this year to publish more long-form blog posts with whatever cool stuff I’m working on, for work or for fun.

I’m trying to treat social media as more ephemeral. I quit Twitter entirely last year, deleting the account; my mastodon.technology account has vanished with the server shutting down, and I’ve even set my new account to delete non-bookmarked posts after two weeks.

It’s fun to talk about my projects a couple hundred characters at a time, but it’s also really nice to put together a bigger post that can take you through something over time and collect all the pieces together.

A long-form blog, with updateable pages, allows for this, and I think makes for a better experience when you really *do* mean to publish something informative or interesting. Let’s bring embloggeration back!

weeklyOSM 649

11:19, Sunday, 01 2023 January UTC

20/12/2022-26/12/2022

lead picture

Overtaking distance measurement for cyclists [1] | © openbikesensor.org | map data © OpenStreetMap contributors

About us

  • The weeklyOSM team would like to express our sincere gratitude to the entire OpenStreetMap community and to all of weeklyOSM’s readers. Your dedication and hard work have made a significant impact on the world of map making and have contributed to the growth and success of OpenStreetMap. The weeklyOSM team wish you all the best for the new year and hope it will be filled with new opportunities and continued growth for the OpenStreetMap community. May you all have a happy and prosperous new year!

Mapping

  • Florian Lohoff has started a series on ‘Geometry Mapping Antipatterns’:
    • first example: ‘Keep left‘ – why your navigation device sometimes gives instructions on a straight road.
    • second example: ‘Landuse drag‘ – when landuse gets complicated.
  • InfosReseaux blogged a call for mapping hydropower infrastructure. It’s not only about dams and powerhouses, but also about waterways. The OSM tagging model has been continuously developing to describe this.
  • Over the past year, a group of four mappers has added the origin of the names of over 1000 streets in Moers to OSM. The results can be seen on the Open Etymology Map.

Mapping campaigns

  • OpenStreetMap Indonesia launched a GLAM (galleries, libraries, archives & museums) themed mapathon from 20 to 25 December 2022.
  • Legitimater enjoys mapping trees and by this making the map become greener. He invited mappers to join in and use the hashtag #OperationGreen on changesets.

Community

Maps

switch2OSM

  • Eugene Alvin Villar has created a small quiz with respect to OSM communities. Do you know where these three OSM URLs point to?
    1. openstreetmap.community (solution)
    2. community.openstreetmap.org (solution)
    3. openstreetmap.org/communities (solution)

Open Data

  • Topi Tyukanov has compiled the most liked tweets of the 30DayMapChallenge 2022 into one page.
  • Andrii Holovin wrote a two-part blog post (part1, part 2) on his thoughts about Overture and the future of OSM.

Software

Programming

  • Paul Norman let us know that the API database servers will be unavailable on Sunday 22 January 2023 between 10:00 and 15:00 UTC due to maintenance. Map editing (using iD, JOSM, etc), and replication updates will be paused during that period. Login to services that require osm.org authentication may be also unavailable.
  • Max Bo examined the proximity of the City of Sydney’s QMS billboards (we reported earlier) to Telstra public telephones using OSM data for an article they wrote.

Releases

  • The December update of Organic Maps has been released. As usual the maps have been updated and the routing and translations improved. Also KML tracks are handled better now.

Did you know …

  • mastodon near me? These are the Mastodon instances with a local connection that may or may not have room for newbies.

Upcoming Events

Where What Online When Country
Düsseldorf Düsseldorfer OpenStreetMap-Treffen 2022-12-30 flag
Stuttgart Stuttgarter Stammtisch 2023-01-03 flag
City of Westminster Missing Maps London Mapathon 2023-01-03 flag
Berlin OSM-Verkehrswende #43 (Online) 2023-01-03 flag
København OSMmapperCPH 2023-01-08 flag
OSMF Engineering Working Group meeting 2023-01-09
München Münchner OSM-Treffen 2023-01-10 flag
Berlin 175. Berlin-Brandenburg OpenStreetMap Stammtisch 2023-01-12 flag
Seattle World Railway Mapping Online Quarterly Meetup. 2023-01-14 flag
159. Treffen des OSM-Stammtisches Bonn 2023-01-17
London Geomob London 2023-01-18 flag
Karlsruhe Stammtisch Karlsruhe 2023-01-18 flag
Dar es Salaam State of the Map Tanzania 2023-01-19 – 2023-01-21 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by MatthiasMatthias, SK53, Strubbl, TheSwavu, derFred.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

On 1 January every year, we join free knowledge enthusiasts around the world in celebrating Public Domain Day. Because copyright protection for creative works only lasts for a certain number of years before it lapses, we can open a treasure trove full of books, works of art, songs, and early films on each New Year’s Day. Year by year, we are watching an ever-expanding Public Domain of Free Knowledge, on which Wikipedia and its sister projects are built.

So what’s in store for Public Domain Day 2023? Well, it depends on where you live, because copyright law, especially the term of protection, has not yet been fully harmonized across the globe. In most countries, copyright lapses a certain number of years (usually 50 or 70) after the respective author’s death. The United States (still) form a notable exception to this rule, calculating copyright protection based on the year in which a work was first published.

Public Domain Day 2023 in countries applying the “life plus 70” rule

For the countries where copyright ends 70 years after an author’s death, Public Domain Day 2023 is marked by the works of an educator and those of an explorer:

Maria Montessori (1870–1952)
American school child doing writing exercises according to the Montessori method

The Montessori method, founded by Italian educator Maria Montessori, is considered a milestone in the history of education and is still used in child development today, especially in kindergartens and primary schools. Her 1909 book The Method of Scientific Pedagogy Applied to Education in Children’s Houses can now be published and explored in the digital library of free-content textual sources, Wikisource.

Swedish geographer Sven Hedin achieved fame through several expeditions to Central Asia. For example, he was the first European to travel to the source of the Indus river in the Transhimalaya. He later published his discoveries in numerous books, including an atlas of Central Asia, which as of today can be freely referenced in Wikipedia and its sister projects. Because of his support for National Socialism, however, Hedin’s legacy as an explorer remains a difficult one.

In the field of literature, the year 1952 remains a sad memory: In the so-called Night of the Murdered Poets, numerous Soviet Jews were executed in Moscow after mock trials for their association with the Jewish Anti-Fascist Committee. Among the victims were well-known intellectuals, including a number of Yiddish-language authors. Wikipedia activists can use today’s date as an opportunity to remember the victims of this anti-semitic crime against the Jewish population of the Soviet Union under Josef Stalin.

Public Domain Day 2023 in the United States

The list of works entering the Public Domain in the United States today contains a number of milestones in the history of film: Fritz Lang’s epochal Metropolis; the first feature talkie The Jazz Singer; the first Academy Award for Best Picture winner Wings; as well as a number of early films by comedians Laurel and Hardy – they all are now no longer protected by copyright, which means that they are now free to be enjoyed – and re-used by today’s creative minds – both on Wikipedia and elsewhere, online and off.

As a final note, the works entering the Public Domain in the United States today also include the last two ‘Sherlock Holmes’ stories by Arthur Conan Doyle. In 2014, a U.S. Court of Appeals was tasked to answer the question of whether the characters of Sherlock Holmes and Doctor Watson can enjoy copyright protection in their own right, even though their first, classic whodunits had then already become free to share and re-use by anyone. The subject of this major legal dispute embodies the very spirit of Public Domain Day: After a certain period of time, all creative material must become a part of the public domain – and today’s artists are invited to make good use of these Creative Commons.

In this spirit, we are wishing everyone a very happy Public Domain Day 2023!

Reading in 2022

03:21, Sunday, 01 2023 January UTC

Every book should be read no more slowly than it deserves, and no more quickly than you can read it with satisfaction and comprehension.

– Mortimer J Adler, How to Read a Book

My trusty, hated Kindle
My trusty, hated Kindle

Reading only “1000 books before you die” used to strike me as unambitious.

Then I started tracking my reading, and I realized it would take me 40+ years to read 1000 books.

I needed to set goals to improve my natural average pace of 24 books per year. And in 2022 I eked out a respectable 55 books.

This post catalogs the systems and habits I used to boost my reading.

Motivation

I like reading.

When I contrast how I feel after I spend an hour doomscrolling Reddit vs an hour spent reading, there’s no comparison—reading always wins. Too much internet can leave me feeling desolate.

Nonfiction continues to be the best way to learn more about myriad topics. And science now touts the benefits of reading fiction.

But there’s so much to read and so little time. Plus, I worried I was losing what I’d already read. So I set goals and built habits to achieve those goals.

What is working well

  • My Kindle – I wish an open device existed that was as wonderful as my Kindle. I hate that I love it so much. But it’s a boon to my reading, and the benefits are hard to quibble over:

    • Front-lit, ePaper display so I can read at night without a light and without interfering with my sleep
    • Stores 100s of books
    • Whispersync keeps it synced with audiobooks on Audible
    • Stores highlights in MyClippings.txt—makes it easy to export highlights
    • Stores words you look up in the dictionary in vocab.db—makes it easy to make vocabulary words into Anki flashcards
    • Light enough to drop on your face while reading in bed (this is a big concern for me)
  • Reading notes – I highlight quotes I like and save them in Readwise.

    Notes in my Readwise library
    Notes in my Readwise library

    This happens automatically for books I read on my Kindle.

    For paper books, I stole my entire process from Cal Newport:

    • Read with a Zebra #2 in hand
    • Highlight interesting passages—underline or bracket or make a mark in the margins
    • For each page where I highlight a passage, I also make a line across the corner of the page
    • Later, I can flip through the book and find all the pages with lines to find my highlights
    • Then I’ll use Readwise’s “Add via photo” feature to add the highlights to the app

    Readwise can automatically export to online notetaking apps like Evernote. But I like to export each book’s notes to markdown and save them for quick ripgrepping and offline reading under ~/Documents/notes/brain.

  • Tracking – It’s surprising how much benefit you get from simply writing down the books your read somewhere.

    I used to forget whole books all the time.

    I’ve tracked every book I’ve read since 2016 on this blog. Posting it online may give me a bit of public accountability, but I think a plain text file would net you the same benefits.

What still needs improvement

  • Reviewing – I failed to write a review for each book I read this year. I started strong but faltered around book 30.

    I want to improve this next year. Maybe I should finally concede and join a social reading forum—it might help to have some social accountability.

    The anti-corporate, ActivityPub-backed Goodreads alternative BookWyrm could be a cool place.

Goals for 2023

I’m going for fifty books again.

Here are a few of my vague notions for reading in 2023:

  • Math – I want to read about math. I’ve got A.N. Whiteheads’s “An Introduction to Mathematics” and Mark C. Chu-Carroll’s Good Math on my list.
  • Trees – I read “The Overstory” by Richard Powers in 2020. In an interview with the Guardian in 2019, Powers said he’d read 120 books about trees while he was writing it. I wonder which was the best?
  • The Hainish Cycle books – I’m a sucker for Ursula K. Le Guin. The Dispossessed is one of my favorites. I’ve never read any other book in this series. Why not try a few in 2023?
  • Lonesome Dove by Larry McMurtry – In 1994, David Foster Wallace taught English 102 at Illinois State. His syllabus survives online. All the required reading is mass-market paperbacks. Lonesome Dove is one of these cheap paperbacks that also happens to have won the 1986 Pulitzer Prize for fiction, so it’s probably an OK read.
  • Moar US President biographies – reading a biography of every American president might be a fun project ¯\_(ツ)_/¯

Shrinking H2 database files

01:47, Saturday, 31 2022 December UTC

Our code review system Gerrit has several caches, the largest ones being backed up on disk. The disk caches offload memory usage and persist the data between restarts. As a Java application, the caches are stored in H2 database files and I recently had to find how to connect to them in order to inspect their content and reduce their size.

In short: java -Dh2.maxCompactTime=15000 ... would cause the H2 driver to compact the database upon disconnection.

context

During an upgrade, the Gerrit installation filed up the system root partition entirely (incident report for Gerrit 3.5 upgrade). The reason is two caches occupying 9G and 11G out of a the 40G system partition. Those caches hold differences to files made by patchsets and are stored in two files:

/var/lib/gerrit2/review_site/cache/ Size (MB)
git_file_diff.h2.db 8376
gerrit_file_diff.h2.db 11597

An easy fix would have been to stop the service, delete all caches, restart the service and let the application refile the cold caches. It is a short term solution, long term what if it is an issue in the application and we have to do the same all over again in the next few weeks? The large discrepancy also triggered my curiosity and I had to know the exact root cause to find a definitive fix to it. There started my journey of debugging.

They are all empty?

When looking at the cache through the application shows caches are way smaller at around 150MBytes:

ssh -p 29418 gerrit.wikimedia.org gerrit show-caches
  Name                          |Entries              |  AvgGet |Hit Ratio|
                                |   Mem   Disk   Space|         |Mem  Disk|
--------------------------------+---------------------+---------+---------+
D gerrit_file_diff              | 24562 150654 157.36m|  14.9ms | 72%  44%|
D git_file_diff                 | 12998 143329 158.06m|  14.8ms |  3%  14%|
                                               ^^^^^^^

One could assume some overhead but there is no reason for metadata to occupy hundred times more space than the actual data they are describing. Specially given each cached item is a file diff which is more than a few bytes. To retrieve the files locally I compressed them with gzip and they shrunk to a mere 32 MBytes! It is a strong indication those files are filled mostly with empty data which suggests the database layer never reclaims no more used blocks. Reclaiming is known as compacting in H2 database or vacuuming in Sqlite.

Connecting

Once I retrieved the files, I have tried to connect to them using the H2 database jar and kept doing mistakes after mistakes due to my completely lack of knowledge on that front:

Version matters

At first I tried with the latest version h2-2.1.214.jar and it did not find any data. I eventually found out the underlying storage system has been changed compared to version 1.3.176 used by Gerrit.I thus had to use an older version which can be retrieved from the Gerrit.war package.

File parameter which is not a file

I then wanted to a SQL dump of the database to inspect it using the Script java class: java -cp h2-1.3.176.jar org.h2.tools.Script, it requires a -url option which is a jdbc URI containing the database name. Intuitively I gave the full file name:

java -cp h2-1.3.176.jar org.h2.tools.Script -url jdbc:h2:git_file_diff.h2.db'

It returns instantly and generate the dump:

backup.sql
CREATE USER IF NOT EXISTS "" SALT '' HASH '' ADMIN;

Essentially an empty file. Looking at file on disk it created a git_file_diff.h2.db.h2.db file which is 24kbytes. Lesson learned, the h2.db suffix must be removed from the URI. I was then able to create the dump using:

java -cp h2-1.3.176.jar org.h2.tools.Script -url jdbc:h2:git_file_diff'

Which resulted in a properly sized backup.sql.

Web based admin

I have altered the SQL to make it fit Sqlite in order to load it in SqliteBrowser (a graphical interface which is very convenient to inspect those databases). Then I found invoking the jar directly starts a background process attached to the database and open my web browser to a web UI: java -jar h2-1.3.176.jar -url jdbc:h2:git_file_diff:

That is very convenient to inspect the file. The caches are are key value storages with a column keeping track of the size of each record. Summing them is how gerrit show-caches finds out the size of the caches (roughly 150Mbytes for the two diff caches).

Compacting solutions

The H2 Database feature page mentions empty space is to be re-used which is not the case as seen above. The document states when the database connection is closed, it compact it for up to 200 milliseconds. Gerrit establish the connection on start up and keep it up until it is shutdown at which point the compaction occurs. It is not frequent enough, and the small delay is apparently not sufficient to compact our huge databases. To run a full compaction several methods are possible:

SHUTDOWN COMPACT: this request an explicit compaction and terminates the connection. The documentation implies it is not subject to the time limit. That would have required a change in the Gerrit Java code to issue the command.

org.h2.samples.Compact script: H2 has a org.h2.samples.Compact to manually compact a given database, it would need some instrumentation to trigger it against each file after Gerrit is shutdown, possibly as a systemd.service ExecStopPost and iterating through each files.

jdbc URL parameter MAX_COMPACT_TIME: the 200 milliseconds can be bumped by adding the parameter to the JDBC connection URL (separated by a semi column ;). Again it would require a change in Gerrit Java code to modify the way it connects.

The beauty of open source is I could access the database source code. It is hosted in https://github.com/h2database/h2database in the version-1.3 tag which holds a subdirectory for each sub version. When looking at a setting, the database driver uses the following piece of code (code licensed under Mozilla Public License Version 2.0 or Eclipse Public License 1.0):

version-1.3.176/h2/src/main/org/h2/engine/SettingsBase.java
60     /**
61      * Get the setting for the given key.
62      *
63      * @param key the key
64      * @param defaultValue the default value
65      * @return the setting
66      */
67     protected String get(String key, String defaultValue) {
68         StringBuilder buff = new StringBuilder("h2.");
69         boolean nextUpper = false;
70         for (char c : key.toCharArray()) {
71             if (c == '_') {
72                 nextUpper = true;
73             } else {
74                 // Character.toUpperCase / toLowerCase ignores the locale
75                 buff.append(nextUpper ? Character.toUpperCase(c) : Character.toLowerCase(c));
76                 nextUpper = false;
77             }
78         }
79         String sysProperty = buff.toString();
80         String v = settings.get(key);
81         if (v == null) {
82             v = Utils.getProperty(sysProperty, defaultValue);
83             settings.put(key, v);
84         }
85         return v;
86     }

When retrieving the setting MAX_COMPACT_TIME it forges a camel case version of the setting name prefixed by h2. which gives h2.maxCompactTime then look it up in the JVM properties an if set pick its value.

Raising the compact time limit to 15 seconds is thus all about passing to java: -Dh2.maxCompactTime=15000.

Applying and resolution

7f6215e039 in our Puppet applies the fix and summarize the above. Once I applied, I restart Gerrit once to have the setting taken in account and restarted it a second time to have it disconnect from the databases with the setting applied. The results are without appeal. Here are the largest gains:

File Before After
approvals.h2.db 610M 313M
gerrit_file_diff.h2.db 12G 527M
git_file_diff.h2.db 8.2G 532M
git_modified_files.h2.db 899M 149M
git_tags.h2.db 1.1M 32K
modified_files.h2.db 905M 208M
oauth_tokens.h2.db 1.1M 32K
pure_revert.h2.db 1.1M 32K

The gerrit_file_diff and git_file_diff went from respectively 12GB and 8.2G to 0.5G which addresses the issue.

Conclusion

Setting the Java property -Dh2.maxCompactTime=15000 was a straightforward fix which does not require any change to the application code. It also guarantee the database will keep being compacted each time Gerrit is restarted and the issue that has lead to a longer maintenance window than expect would not reappear.

Happy end of year 2022!

References:

Outreachy report #39: December 2022

00:00, Friday, 30 2022 December UTC

This December, I improved our manual feedback review system, our asynchronous communications, and created a proposal for a new feedback interface on our website. All Outreachy organizers use a shared document where we write notes about feedback we’ve reviewed. Notes used to be written in freeform. Emoji keys were used to symbolize specific issues. We typically manually manage three queues that require further action: The payment queue. This queue has two sub-queues: one that adds interns to the next payment authorization, and another that logs all payment authorizations sent to be executed by Software Freedom Conservancy.

What a Wikipedia assignment looks like day-to-day

19:46, Tuesday, 27 2022 December UTC

Can’t get enough of other instructors’ experiences with the Wikipedia assignment? Dr. Laura Ingallinella of the University of Toronto has just published an excellent journal article in the Bibliotheca Dantesca: Journal of Dante Studies that details her successes, challenges, and learnings incorporating Wikipedia editing into her teaching at Wellesley College.

Dr. Ingallinella outlines the benefits of utilizing the Wikipedia assignment in her undergraduate class, which is dedicated to reading Dante’s Divine Comedy in English. Her insights can be applied across disciplines, beyond Dante Studies. In the article, Dr. Ingallinella covers the educational outcomes of the assignment and other applications of this work for educators interested in digital public scholarship and knowledge equity. And she lays out a set of best practices for utilizing Wiki Education’s free resources. Reading this, you’ll find a blueprint for how one instructor incorporates our trainings and Dashboard into an actual classroom environment. She answers questions like:

  • What does the assignment look like day by day?
  • How does the task of writing Wikipedia articles fit into larger discussions of knowledge equity in your field?
  • How do you set expectations with students who haven’t edited Wikipedia before, and have actually been told never to use it?

Dr. Ingallinella also provides insights into Academia’s acceptance (and nonacceptance) of Wikipedia, how representation of scholarly journal articles on Wikipedia benefits both public audiences and the academic field, and how a Wikipedia assignment provides students with a good entry point into the reference works important to your field.

Thank you Dr. Ingallinella for sharing your insights with us and our instructor community. We’re proud to support your work and that of many others each term. Every instructor who utilizes our resources (there are hundreds of you!) is part of a community doing this work across the US and Canada through our program. Reach out to us and each other, attend our office hours, present at conferences together, and let us know when you publish work like this. We love to share it. Read the article here!

The deadline to run a Wikipedia assignment in Spring 2023 has been extended! Submit your course by January 16 to ensure your spot in our free programs. Visit teach.wikiedu.org to find out more about how you can incorporate a Wikipedia project into your syllabus.

MySQL connection pooling in Rust for Toolforge

00:24, Tuesday, 27 2022 December UTC

Toolforge is a free cloud computing platform designed for and used by the Wikimedia movement to host various tools and bots. One of the coolest parts of using Toolforge is that you get access to redacted copies of the MediaWiki MySQL database replicas, aka the wiki replicas. (Note that whenever I say "MySQL" in this post I actually mean "MariaDB".)

In web applications, it's pretty common to use a connection pool, which keeps a set of open connections ready so there's less overhead when a new request comes in. But the wiki replicas are a shared resource and more importantly the database servers don't have enough connection slots for every tool that uses them to maintain idle connections. To quote from the Toolforge connection handling policy:

Usage of connection pools (maintaining open connections without them being in use), persistent connections, or any kind of connection pattern that maintains several connections open even if they are unused is not permitted on shared MySQL instances (Wiki Replicas and ToolsDB).

The memory and processing power available to the database servers is a finite resource. Each open connection to a database, even if inactive, consumes some of these resources. Given the number of potential users for the Wiki Replicas and ToolsDB, if even a relatively small percentage of users held open idle connections, the server would quickly run out of resources to allow new connections. Please close your connections as soon as you stop using them. Note that connecting interactively and being idle for a few minutes is not an issue—opening dozens of connections and maintaining them automatically open is.

But use of a connection pool in code has other benefits from just having idle connections open and ready to go. A connection pool manages the max number of open connections, so we can wait for a connection slot to be available rather than showing the user an error that the number of connections for our user has already been met. A pool also allows us to reuse open connections if we know something is waiting for them instead of closing them. (Both of those are real issues Enterprisey ran into with their new fast-ec tool: T325501, T325511; which caused me to finally investigate this.)

With that in mind, let's set up a connection pool using the mysql_async crate that doesn't keep any idle connections open. You can pass pool options programatically using a builder, or as part of the URL connection string. I was already using the connection string method, so that's the direction I went in because it was trivial to tack more options on.

Here's the annotated Rust code I ended up with, from the toolforge crate (source code):

impl fmt::Display for DBConnectionInfo {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        // pool_min=0 means the connection pool will hold 0 active connections at minimum
        // pool_max=? means the max number of connections the pool will hold (should be no more than
        //            the max_connections_limit for your user (default 10)
        // inactive_connection_ttl=0 means inactive connections will be dropped immediately
        // ttl_check_interval=30 means it will check for inactive connections every 30sec
        write!(
            f,
            "mysql://{}:{}@{}:3306/{}?pool_min=0&pool_max={}&inactive_connection_ttl=0&ttl_check_interval=30",
            self.user, self.password, self.host, self.database, self.pool_max
        )
    }
}

In the end, it was pretty simple to configure the pool to immediately close unused connections, while still getting us the other benefits! This was released as part of toolforge 5.3.0.

This is only half of the solution though, because this pool only works for connecting to a single database server. If your tool wants to support all the Wikimedia wikis, you're out of luck since the wikis are split across 8 different database servers ("slices").

Ideally our pool would automatically open connections on the correct database server, reusing them when appropriate. For example, the "enwiki" (English Wikipedia) database is on "s1", while "s2" has "fiwki" (Finnish Wikipedia), "itwiki" (Italian Wikipedia), and a few more. There is a "meta_p" database that contains information about which wiki is on which server:

MariaDB [meta_p]> select dbname, url, slice from wiki where slice != "s3.labsdb" order by rand() limit 10;
+---------------+--------------------------------+-----------+
| dbname        | url                            | slice     |
+---------------+--------------------------------+-----------+
| mniwiktionary | https://mni.wiktionary.org     | s5.labsdb |
| labswiki      | https://wikitech.wikimedia.org | s6.labsdb |
| dewiki        | https://de.wikipedia.org       | s5.labsdb |
| igwiktionary  | https://ig.wiktionary.org      | s5.labsdb |
| viwiki        | https://vi.wikipedia.org       | s7.labsdb |
| cswiki        | https://cs.wikipedia.org       | s2.labsdb |
| enwiki        | https://en.wikipedia.org       | s1.labsdb |
| mniwiki       | https://mni.wikipedia.org      | s5.labsdb |
| wawikisource  | https://wa.wikisource.org      | s5.labsdb |
| fiwiki        | https://fi.wikipedia.org       | s2.labsdb |
+---------------+--------------------------------+-----------+
10 rows in set (0.006 sec)

(Most of the wikis are on s3, so I excluded it so we'd actually get some variety.)

Essentially we want 8 different connection pools, and then a way to route a connection request for a database to the server that contains the database. We can get the mapping of database to slice from the meta_p.wiki table.

This is what the new WikiPool type aims to do (again, in the toolforge crate). At construction, it loads the username/password from the my.cnf file. Then when a new connection is requested, it lazily loads the mapping, and opens a connection to the corresponding server, switches to the desired database and returns the connection.

I've done some limited local testing of this, mostly using ab to fire off a bunch of concurrent requests and watching SHOW PROCESSLIST in another tab to observe all connections slots being used with no idle connections staying open. But it's not at a state where I feel comfortable declaring the API stable, so it's currently behind an unstable-pool feature, with the understanding that breaking changes may be made in the future, without a semver major bump. If you don't mind that, please try out toolforge 5.4.0 and provide feedback! T325951 tracks stabilizing this feature.

If this works interests you, the mwbot-rs project is always looking for more contributors, please reach out, either on-wiki or in the #wikimedia-rust:libera.chat room (Matrix or IRC).

Tech News issue #52, 2022 (December 26, 2022)

00:00, Monday, 26 2022 December UTC
previous 2022, week 52 (Monday 26 December 2022) next

There is no technical newsletter this week.

weeklyOSM 648

10:52, Sunday, 25 2022 December UTC

13/12/2022-19/12/2022

lead picture

Colour-coded buildings by age [1] | © uMap and SK53 | map data © OpenStreetMap contributors

Breaking news

  • The first meeting of the new OSMF Board after the 16th Annual General Meeting was held on Tuesday 20 December. New officers have been selected as Secretary, Personnel Committee Chair, Open Source Initiative OSM Delegate and the Trademark licensing subcommittee representative.
    • New officers roles are :
      • Chairperson: Guillaume Rischard
      • Secretary: Mikel Maron
      • Treasurer: Roland Olbricht
    • Non-officer roles are :
      • Sarah Hoffmann: Personnel Committee chair and Open Source Initiative OSM delegate
      • Mikel Maron: Finance Committee chair
      • Mateusz Konieczny: Trademark licensing subcommittee representative
      • Arnalie Vicario
      • Craig Allan
    • Officers’ contact emails.

Mapping

  • Fabio Lovato has built a map of fuel prices in Italy. Fabio explained how the map was made using data from Italy’s Ministry of Economic Development and uMap via a PHP script.
  • A request for comments has been made for healthcare=department, a tag for the departments of a hospital or a clinic.

Community

  • Who needs to be a heroic Google Maps Local Guide when you have weeklyOSM? OpenStreetMap Pontarlier, if you are reading this, thanks.
  • OpenStreetMap Perú was one of the organisers of the ‘Grassroots disaster risk management and OpenStreetMap LATAM 2022’, which was held from 24 to 27 November. Along with the usual conference activities there was a guided tour of the national disaster centre and the Caral archaeology site. meerkat88 , Caminando Cusco , and mapeadora have written diary entries on their conference experiences.
  • Nigeria held its first State of the Map from 1 to 3 December. A number of participants have written of their experiences, including Teeman whose journey to the conference was eventful, johanespeter9 who wanted to be local so ate local, and vicksun who gave thanks to the many people responsible for the success of the event.
  • Rounding out this week’s rush of conference reports are those from Pista ng Mapa, which this year included the State of the Map Asia. There are so many that we can only list a small sample. frozenrabi blogged about how excited he was to travel outside Nepal for the first time, dichapabe highlighted some of the social activities they participated in, and Mia Kristine let us in on the confessions of a first time attendee.

Open Data

  • A summary what we have found about Overture Maps:
    • The OSM Foundation’s view.
    • A number of thoughts about Overture Maps on the OSM Community forum.
    • A comment in the Russian OSM Blog .
    • An analysis in Le Temps .
    • ArsTECHNICA thinks that the idea is to ‘break the Google Maps monopoly’. AviationAnalysis sees it in the same way.
    • VentureBeat thinks that a combination of OSM data with AI is the target direction.
  • TUMI and Trufi Association presented a webinar exploring how transport data digitised in OSM can be developed, consumed, and iterated for better cities and sustainable mobility.

Software

  • Junior Flores has written an image recognition plugin for JOSM. Josm Magic Wand is designed to mark out the geometries of areas of similar colour and to merge the results where needed. Junior gave a presentation on Josm Magic Wand at Mapping USA 2022.

Programming

  • [1] Following up on his colour-coded buildings by age, SK53 has extended his Overpass Turbo code to work with any object in OSM.

Did you know …

  • … that you can track the progress of road gritters as they go about their business in Scotland? Take the opportunity to watch Spray Charles, Snowwayko Drift, Buzz Iceclear, Nitty Gritty, Mary Queen of Salt, and others as they carry out their work.

Other “geo” things

  • Apple has published improved maps of Switzerland and the Benelux countries. Improvements include colour highlighting of business districts. Buildings, parks, airports and shopping malls are captured in a higher level of detail than before.
  • OpenCage outlined their plan to move from Twitter to Mastodon as their primary social media channel for announcements.
  • GPS signals are being jammed in some Russian cities following Ukraine’s successful use of drone attacks. Wired reported how the GPS issues were first spotted by the monitoring system GPSJam, which uses data from planes to track problems with the satellite navigation system.

Upcoming Events

Where What Online When Country
Düsseldorf Düsseldorfer OpenStreetMap-Treffen 2022-12-30 flag
Stuttgart Stuttgarter Stammtisch 2023-01-03 flag
City of Westminster Missing Maps London Mapathon 2023-01-03 flag
København OSMmapperCPH 2023-01-08 flag
København OSMmapperCPH 2023-01-08 flag
OSMF Engineering Working Group meeting 2023-01-09
München Münchner OSM-Treffen 2023-01-10 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by PierZen, Strubbl, TheSwavu, TrickyFoxy, danilax86, derFred.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

Best MediaWiki Extensions in 2023

00:00, Sunday, 25 2022 December UTC

There are thousands of extensions available for MediaWiki, so it can be tough to know which ones are the most useful. To help you out, we've compiled a list of the 15 most useful MediaWiki extensions.

The list of the best MediaWiki extensions of course depends on your use case. We thus focused on popular and often useful extensions.

What is a MediaWiki Extension?

Extensions add functionality to MediaWiki, or enhance existing features. They are written in PHP, and need to be added to the wiki by someone who has server access before they can be enabled. Other terms for MediaWiki extensions are MediaWiki plugins and MediaWiki addons, though the latter two are rarely used. They are comparable to WordPress plugins.

#15: Cite

Add references, citations, and notes via the Cite extension. The footnotes are created automatically at the bottom of pages. Increase the credibility of your content or provide context via this feature-rich MediaWiki extension.

Categories: UI, data
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#14: Data Transfer

The Data Transfer extension is a practical tool to import data from CSV or spreadsheet files into your wiki. The import works by mapping the file content to templates and their parameters. Wiki pages are created automatically, and you can choose to do a full or partial import.

Data Transfer is an excellent tool for creating and updating structured data, so it works well together with Semantic MediaWiki.

Categories: data
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#13: Maps

Maps is the MediaWiki extension to visualize and work with geographical information. You can embed beautiful dynamic maps using either Leaflet or Google Maps.

The maps are highly customizable. Specify markers, lines, and polygons to show. Or choose the map layers, clustering options or the map's dimensions. Dozens of options give you an immense amount of control over the look and feel of the map, and it's behavior.

Maps comes with a visual map editor and GeoJSON support. It also provides parser functions for geocoding, coordinate formatting, and geospatial operations. Last but not least, it integrates with Semantic MediaWiki, enabling you to build maps from queried data.

Categories: visualization, data
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#12: CodeMirror

The CodeMirror extension is a practical tool. It provides syntax highlighting when editing the wikitext source of pages. It integrates both into the Visual Editor (via the so-called "2017 wikitext editor") and the WikiEditor.

Have you ever looked for a tool highlighting the wikitext syntax used for internal links, templates, or tags? When this one is for you. If you forgot to add a closing bracket or tag: It will show you.

Categories: UI, editing
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#11: Replace Text

The Replace Text extension provides a powerful tool for find-and-replace editing of strings within pages. Additionally, you can move pages via text replacements within their titles. Replace Text provides a concise interface with many options for wiki administrators.

Replace Text is a valuable tool for e. g. correcting common typographical errors or changing content on a larger scale, e. g. renaming categories or templates.

Categories: editing
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#10: CategoryTree

The CategoryTree extension allows you to view and navigate the category structure of a wiki easily. It provides a dynamic, interactive tree view of the category namespace, with support for expanding and collapsing categories.

CategoryTree is a helpful tool for exploring the category structure of a wiki and finding out what categories exist. It can also be useful for finding out which articles are in a particular category.

Categories: UI
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#9: PDF Handler

If you're like most people, you probably use PDFs all the time. But what if you want to use them in MediaWiki? Then PDF Handler is just what you need! It allows you to view PDFs inline in MediaWiki pages.

Categories: UI, file handling
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#8: InputBox

InputBox is handy little tool that allows you to add a form with a text box to a wiki page. You can create search forms, page creation forms, page moving forms and commenting forms.

Categories: UI, editing, forms
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#7: Gadgets

The Gadgets extension provides a way for users to define JavaScript or CSS based "gadgets" that other wiki users can then use.

Categories: scripting
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#6: ParserFunctions

The ParserFunctions extension adds a set of parser functions. These are great for power users who wish to do wikitext scripting. The added parser functions provide conditional logic, improved handling of time, tools for working with page titles and various string functions.

Categories: scripting
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#5: WikiEditor

The WikiEditor extension provides a toolbar for editing wikitext to the source editor. It helps those not familiar with wikitext to create links, lists, headers and more, and to format text.

Categories: UI, editing
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#4: Visual Editor

The Visual Editor extension provides a user interface for editing wiki pages that is similar to a word processor. It includes features such as a toolbar with formatting options, a drag-and-drop interface for adding and rearranging content, and a preview function that shows how the page will look after the changes are saved.

Categories: UI, editing
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#3: Page Forms

Page Forms allows you to add, edit, and query data using forms. It's perfect for creating data-rich pages without having to write any code.

Categories: UI, editing, forms, data
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#2: Semantic MediaWiki

Semantic MediaWiki provides a way to store and query data within a MediaWiki installation. This data can be used to generate reports and visualizations, or to integrate MediaWiki with other systems.

Categories: data, queries
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

#1: ConfirmEdit

While preventing spam is not sexy, it is essential for wikis where anyone can register, and for wikis with open editing. ConfirmEdit provides several captcha to fight spam. We recommend the QuestyCaptcha module, as it is very effective and easy to set up.

Categories: anti-spam
Bundled with MediaWiki
Available in ProWiki
Used by Wikimedia

Your Favourites

Did we miss your favourites? What MediaWiki extensions do you think are indispensable?

Let us know via a tweet to @ProWikis!

Further Resources

ProWiki comes with many MediaWiki extensions. Some of those are enabled by default, while others can be enabled via one click in the ProWiki admin panel.

If you are installing new extensions yourself, check out our LocalSettings.php and Installing MediaWiki Extensions with Composer guides.

The extensions bundled with MediaWiki are useful for many wikis. They are also easy to enable, in case you manage your wiki yourself.

See also: MediaWiki Extensions help and How to Install MediaWiki

USW S1E3: Fighting Gender Gap in USW Community

07:00, Saturday, 24 2022 December UTC
University Students Wikimedians members in a workshop in Iringa

Gender equality remains unfinished business in every country of the world. It can mean restricted access to education, a lower standing in society, less freedom to make decisions around their personal and family life, and lower wages for the jobs and work they do. Women and girls also experience rampant levels of violence and harassment.

Wiki Communities like the University Students Wikimedians Community have the role to play in making sure that we fight the gender gap that might occur in our community. Gender equity and inclusion are the essentials in the checklist for a healthy Wiki Community.

University Students Wikimedians being one of the Wiki Communities in Tanzania currently interacts with many students from different Colleges and Universities in Tanzania. How do they make sure gender balance is considered in their activities?

“Gender balance in all of our USW Communities is considered a priority in all the activities. Though sometimes it is not easy meeting this requirement as the C.O.T. we make sure that we nearly reach this target in all of our activities.” Magoiga (USW Program Coordinator)

“As far as I know, we girls have always been given the chance to participate fully with full support from our club leader and the other leaders. So far, I haven’t faced any harassment that would make me feel like leaving the community but also I have never heard anyone being harassed, I think this is due to the awareness created for us by the leaders and through that, we respect each other. We are as a family no matter the race or perspectives.” User: HappyDeLuckiest (USW Member)

Talking of the University Students in Tanzania we are referring to thousands of students who we would love to reach and create awareness over the Wikimedia Foundation with its projects. Running a program like this is never easy for you have to consider a lot of things which includes gender balance. The University Students have made sure that starting from the Top level to the lower level gender balance is considered where in leadership 50% are women and 50% are men. This was done to provide a chance for both genders to be able to participate and encourage team members to join and feel safe in the activities being conducted by the University Students Wikimedians.

Approaches/strategies used by University Students Wikimedians.

  1. Analyzing all possible causes of gender imbalance.

Before the beginning of any program, it is best advised to analyze and outline any possible causes that might cause gender imbalance and make plans on how you can tackle that.

  1. Setting friendly user-space policies.

The Wikimedia Foundation already has friendly user-space policies that guide all members and communities on how different issues should be addressed. However, customizing the policies and having a version of your community’s own is not illegal as long it does not violate the Foundations policies.

  1. Having female role models.

Having female role models at the leadership level will boost the engagement of female participants in the community.

  1. Initiate programs that are engaging to both genders.
  2. Address and monitor accordingly all issues (violence) that might arise even if it might seem meaningless they might have an impact on one or more participants.

Challenges being faced by the USW in fighting the gender gap.

The biggest challenge being faced by the University Students Wikimedians Community is the inactiveness of many females this is due lack of personal devices to many thus being able to contribute to the projects when rental devices (Laptops) are available.

Also, the adaption of the Wikimedia projects and its commitment to them are among the challenges that we are facing. The Computer stuff is indeed new to many thus needing more time for them to learn how to use them thus leading others to step down from the projects.

To date, the University Students Wikimedians have not recorded any violent incident and this is due to the implementation of the user-friendly space policy which stands as a shield to all of the members of the community.

How are you fighting the gender gap in your community? What are the challenges you are facing as a community in fighting the gender gap? Share with us your experience in the comments section below.

Write us an email at hello@uswiki.africa or reach us via WhatsApp at +255685261018. Connect with us on our social media platforms: Facebook, Instagram, Twitter, and LinkedIn

Writing raft

02:34, Saturday, 24 2022 December UTC

✏️⚡🔪
The club that’s write or die.

If you want to be a writer, you must do two things above all others: read a lot and write a lot.

– Stephen King, “On Writing”

In 2022, I made a Ulysses pact to force myself to write—either write or feel the white-hot shame of (temporary) banishment.

And it worked.

I wrote more blog posts in 2022 than in any previous year—more than the last three years combined.

I eked out two blog posts a month, every month, for the whole of last year. And I owe much of my success to my write-or-die crew—the writing raft.

⛵ What is a writing raft?

First proposed by Hrvoje Šimić, a writing raft is a club that forces you to write.

The rules are simple:

  1. You must publish a blog post by the end of the month.
  2. If you do not publish on time, you’re out of the club (for a month).
  3. The club is limited to 5 members.

Željko Filipin conceived of our little Junto towards the end of 2020.

Today we have three members: Me, Željko, and Kosta Harlan. And as of December 2022, the three of us have managed to stay on our raft for one. full. year.

In honor of this milestone, we’re all posting about lessons we learned over the past 12 months.

🍎 Lessons from a year of writing

Blogs should be easy to read. In 1997, Jakob Nielsen succinctly summarized “How Users Read on the Web”: “They don’t.” I use short sentences and omit needless words. And I try to make my blogs look easy to read to keep readers moving.

Get to the point. Nobody has time for throat-clearing—start with your point. If you need more details, add them later.

Your unconscious mind is a better writer than you. Writing and publishing on the same day used to be my habit. Now, I let my rough drafts sit for a day. And I often wake up with a better idea, clearer point, or different direction—even if nobody gives me feedback. My unconscious mind was working on my writing the whole time.

The core question is not how you do math but how does the unconscious do it. How is it that it’s demonstrably better at it than you are?

– Cormac McCarthy, “Stella Maris”

Practice in public. It’s the fastest way to improve. Another deal I made with myself this year is that after I publish a post, I must link it somewhere online.

Internet strangers are a fickle crowd, which makes them a great litmus test for your writing. If a post generates nothing but silence—why not tweak it?

Figure out how to say it better and try again. And try to learn something for next time.

Peer pressure is a tool. The writing raft has shown me: I need someone to notice if I skip a month of writing. Writing the first draft is painful. Knowing I have a deadline keeps me moving through the pain.

And conclusions. I’m so bad at conclusions. But I’m working on it ¯\_(ツ)_/¯.


WIKIMOVE Recap 2022

17:00, Friday, 23 2022 December UTC

Dear friends and listeners, 

We would like to wrap up the year with a recap of our first year with WIKIMOVE, the podcast about the future of the Wikimedia movement, produced by Wikimedia Deutschland. 

A massive thank you to our guests for sharing their takes on the future of our movement. And we’d like to thank our audience for their engagement and constructive feedback. We will be back in February with new episodes and adjustments, based on the evaluation survey (still open!). 

This year we released 7 episodes and talked to 15 guests

Our episodes were listened to in at least 52 countries, about 1580 times

Here is a list of 2022 WIKIMOVE episodes, in case you missed them: 

#1 Knowledge as a service, with Tochi Precious and Guillaume Paumier 

#2 Communitizing’ strategy, with Érica Azzellini and Lucas Piantá

#3 UNLOCKing innovation, with Kannika Thaimai and Ivana Madzarevic 

#4 Content and Knowledge Gaps, with Kiril Simeonovski, Daniel Bögre Udell and Lucy Crompton-Reid 

#5 Peer support, with Rebecca O’Neil and Jessica Stephenson 

#6 Hubs, with Johnny Alegre and Natalia Szafran-Kozakowska 

#7 Growing Communities, with Anass Sedrati and Pepe Flores 

All episodes are also available on the following platforms:

Spotify | Apple Podcast | Google Podcast | Stitcher

Soundcloud | Deezer | JioSaavn | Podchaser |Youtube playlist 

We wish you a happy end of year and look forward to further conversations around movement strategy next year. 

Your WIKIMOVE Team

Nikki, Nicole & Eva

Effective approaches to community building empower persons and groups within society to have the capacity to improve their lives in different aspects. Community building is a positive change and value-based process which aims to address imbalances in welfare and power based on inclusion, human rights, social justice, equity, equality, and other fields.

Looking at Tanzania we can say it is among the countries where it has taken some time for people to adapt to the wiki communities and sometimes it has been a challenge. But the question is what does it take to run/host/facilitate or train a wiki community?

This article is based on a survey performed by the University Students Wikimedians based in Tanzania where we shared the form with different organizers, facilitators, leaders, and trainers from other wiki communities across Tanzania.

Wiki Community can be defined as a collective of different individuals from different/same fields coming together to take collective action and create results to overcome the challenges faced in the area of jurisdiction while aligning with the Wikimedia Foundation mission.

So one might ask oneself:

  • Can I run a wiki community? If yes, what does it take?
  • What should I expect?
  • My checklist
  • What challenges might I face in the future and how can I overcome them?
  • Is there a support center that I can rely on when needed?

The above are some of the questions one can ask oneself. In this first part of the article, we will be tackling some of the questions and the remaining will be covered in part two.

Can I run a wiki community?

The answer to this question can be answered simply by a yes anyone is eligible to run a wiki community as long he/she aligns with the Wikimedia Foundation’s mission and policies.

The Wikimedia Foundation has no big limitations that might lead to one not being able to run a wiki community. No special training is needed for one to run a wiki community after all the wiki communities are community-led.

What does it take?

From our survey one of the questions stated What does it take to run a wiki community?” and one of the community members answered with just two words Passion and desire Lol! These two words might seem very little but yet very powerful, don’t underestimate them.

Passion is partly about enthusiasm, and enthusiasm is contagious. If we are excited about who we are and what we do, then other people will become excited too. And they’ll listen to us. And then they’ll begin to think they should be as enthusiastic as we are.

If you have a desire, you can build something in your life that will give you passion. And I do believe that passion is something you can build. You can build passion by adopting the words, actions, and attitudes that express it. This is not “faking” passion. You can’t fake passion because people will be able to tell you are faking and it will hurt you. But you can make passion a choice and a commitment, and once you commit, soon you will start to feel it in your blood and your bones. Your energy level will rise. You’ll begin to feel the real excitement. The people around you will begin to feel that excitement. And you’ll have taken your first step. You’ll be on your way.

However, lack of commitment is one of the key roadmaps toward running a wiki community. It is never easy bringing people of different races together and having them work on the same goal. It needs a lot of commitment.

“It wasn’t easy for me at first when starting the community, I used a lot of time and resources just to bring them together. After that, another hard part was to make them understand what I do and why should they do it. I had to commit most of my time and resource before we could even request support from the Wikimedia Foundation. So before thinking about a support request first show them that you’ve made progress by yourself, it will be easy for them to support you. Also, having the community members decide their community is one of the key aspects” Anonymous replied in the survey.

So as we have seen it is not just bout you want to run a community; the real question is “are you ready to set a good example to the other community members? Are you ready to start on your own before seeking support from others?” Setting a good example refers to you being an active contributor to the Wikimedia Foundation projects but also biding by the policies of the organization mostly the Universal Code of Conduct (UUC) and user-friendly space ever since you will be interacting with different people from different places with different perspectives on different issues.

This is the first part of this article which reflects on what it takes to build a wiki community. Let us know what you think in the comments section below on what you think it takes to build a wiki community in your community. We will be sharing soon the second part of the article.

Write us an email at hello@uswiki.africa or reach us via WhatsApp at +255685261018. Connect with us on our social media platforms: Facebook, Instagram, Twitter, and LinkedIn

Read our last article via the below links:

2022 Coolest Tool Awards: Thank you for the tools!

07:00, Friday, 23 2022 December UTC

Every year, the Wikimedia technical community celebrates software tools created and used by the community. The following are this year’s winners.

Wiki communities around the globe have diverse use cases and technical needs. Volunteer developers are often the first to experiment with new ideas, build local and global solutions and enhance the experience in our software. The Coolest Tool Award aims to acknowledge that, and make volunteer developers’ work more visible.

The fourth edition of the Coolest Tool Award took place on Friday, 16 December, 2022. The event was live streamed on Youtube in the MediaWiki channel. 

The video is also available on Wikimedia Commons. The award was organized and selected by the Coolest Tool Academy 2022 and friends, based on nominations from our communities.

A total of nine tools were awarded in the categories listed below, and two more tools received honorable mentions. 

A tool is a piece of software used in the Wikimedia ecosystem. Examples for tools include gagdets, MediaWiki extensions, websites, web services, APIs, applications, bots, or user scripts.

Thank you for all your work! 🎉

2022 winners

Newbie Friendly – Tools that help newcomers to get up and running

Citation Hunt provides an easy way for editors to improve article quality without getting overly lost in the depths of a full article. 

It gamifies the process of improving a wikipedia article.

Editor — Tools that augment editing

MoreMenu allows you to bring all your favourite tools with you to a wiki page. It adds a dropdown menu to pages that are not special pages.

The dropdown offers many useful links to advanced tools for Wikimedians, in one single place.

Reusable — Serves many wikis and projects

PetScan is a powerful and flexible tool for creating a collection of wiki pages using categories, page properties, Wikidata and more.

Developer — Tools that primarily serve developers

Patch demo makes writing, reviewing and testing patches without everyone having to set up a local test instance of MediaWiki.

Newcomer — New tools or tools by new developers

Bullseye is a powerful anti-abuse tool for investigating sockpuppetry and open proxies. It allows those with access to the tool to look up varied information about IP addresses that before this tool existed were on 5 different pages.

Diversity — Tools that help include a variety of people, languages, cultures

Wiki99 helps create more inclusive content by helping identify articles that need to be created in various topics! It is a project to create 99 Wikipedia articles of a certain topic in every language. It helps editors find out which articles they could create or edit.

Impact — Tools that have broad or deep impact

Scholia’s uniqueness is its ability to link everything from articles, authors to organization and funding. Scholia also shows off the potential of Wikidata. Especially the effort to make Wikidata into a database of published scholarship.

Mobile — Mobile apps and mobile-focused tools

TwinkleMobile brings the famous Twinkle tool to your mobile devices! Twinkle adds a dropdown menu to automate some common tasks, like reporting vandalism, warning vandals,

requesting deletion or protection, or tagging articles.

Eggbeater — Tools in use for more than 10 years

Open Refine helps you to clean and easily add bulk data to wikidata! Upload thousands of items and files on Wikidata and Wikimedia Commons with relative ease with Open Refine

Honorable mentions

Created Articles is a collection of notebooks to analyze articles created by a user on Wikipedia.

Wikibooks Printable Version is a multilingual module that automatically creates a wikibook print version from its table of contents complete with navigation.

Big thanks to the 2022 academy, the Technical Engagement team, everyone who nominated their favorite tools, everyone who contributed code, and everyone who spread the word about the Coolest Tool Award!

For more details on the awarded projects, please see the Coolest_Tool_Award/2022 page.
To see winners from last year, please see the 2021 article.

Komla — for the 2022 Coolest Tool Academy

Revisiting WikiConvention Francophone 2022

23:35, Thursday, 22 2022 December UTC

November 19th marked the start of the 6th edition of the WikiConvention Francophone, the annual event which brings together French-speaking Wikimedian communities from around the globe.

The main event for French-speaking Wikimedians

This year, over 160 participants hailing from every corner of the French-speaking world gathered in Paris, as it was the first in-person edition since 2019. On top of representing an opportunity for heartfelt reunion among the French-speaking Wikimedians of the world, the WikiConvention having strewn together a large group of friends across the years, this year’s convention was rich in surprises and novelty.

Theme-ordered sessions were given concurrently, and participants could chose to attend conferences on Wikimedia projects, or to partake in workshops or in discussions regarding the state of the movement. Out of over 80 sessions held this year, participants with interests in GLAM subjects could chose to listen to a report on open culture in France, outlining the challenges faced when it comes to the digitization of written sources and their subsequent publication. People interested in educational projects could discover “Wikeys”, a new board game designed for students as an entry to user contribution on Wikipedia. Practiced Wikimedians could also attend workshops on Toolforge, the tool-making hosting environment for Wikimedia projects, or on Open Refine, an underused tool with uses in Commons, Wikidata and Wikimedia projects overall.

Building capacity across the world

At the heart of the French-speaking Wikimedian sphere is WikiFranca, which held its board meeting and reelected its board members during the WikiConvention Francophone. According to its president, Éliane Yao Sigan, the meeting was a means to formalize WikiFranca’s status juridically as an association, but it also marked a milestone in its quick development.

The two days preceding the convention were dedicated to learning and exchange sessions. Those were an opportunity for capacity-building between the French-speaking chapters and user groups. Users from 3 different continents had discussions on project-building and association governance. The sessions shed light on the existing geographical disparities between the user groups, regarding capacity and means.

Participants were put in cross-national work groups, and were assigned a project which they were meant to put to practice, step by step. The first day started with a session on project planning (many of the participants had never worked on a project plan before) followed by others on institutional partnerships, financing, and on volunteer implication.

The day ended with a session on how to build an action plan and define an annual budget. Many user groups operated without an annual budget, and all the participants in that situation are now intent on setting one with their boards. With the remaining sessions spent on strategizing and governance, the learning days were met with enthusiasm. Minette Lontsie from Cameroon recalls the exposure to discrepancies between the user groups as having had a bolstering effect on her and on her projects. In addition to that, the sessions helped to establish partnerships between the members of affiliate groups from around the world.

Free internet and European regulation

The organizers also held a round-table discussion on the Digital Services Act, which was approved by the European Council in October of this year. This discussion, moderated by Joëlle Toledano, from the French Digital Council, included participants from online platformgroups (Wikimedia Foundation, Dailymotion, Microsoft) as well as regulatory and institutional bodies (ARCOM and French Council of State). Many subjects were discussed during the 2-hour open session, among which: the regulation of online platforms through the new law, and the challenge of taking into consideration the differences between those, as to their aims and business model.

Keeping opportunities border-free

This year, a large number of grants were handed, especially to Wikimedians from Africa and from Haiti. Unfortunately, this year again, a large amount of them had their visas rejected. People who were supposed to give a lecture and who did not receive their visas had the opportunity to design posters detailing the content of their conference. Those were then displayed in a specific room. Nonetheless, an increasing number of African grantees managed to attend the convention, marking a step towards the recognition of Wikimedia events worldwide.

Next year, the WikiConvention will take place in Abidjan, Ivory Coast. Wikimedians will be able to discover a wonderful city in the heart of west Africa, and to meet with the members of the ever-growing Ivorian user group, many of whom delighted us with their presence in Paris. On top of that, it will be an opportunity for French-speaking Wikimedians of Africa to attend the WikiConvention with much less trouble.

Creating the Wikimedia Design System

23:04, Thursday, 22 2022 December UTC
pattern of shapes and colors

Why are we creating a design system?

Wikipedia has been around for two decades, and the Wikimedia ecosystem has grown over the years into multiple wikis and other platforms with numerous features. These interfaces serve vast and varied audiences, and are designed and built by many different people and organizations. Without standards for design and reusable patterns in place, these interfaces will lack consistency and may not serve all of our users equally. And because of the age of our systems and software, onboarding new designers and developers can be difficult to the point where many are excluded.

We’ve made significant efforts over the years to standardize user interfaces (UIs) across features and projects, such as the Design Style Guide and UI libraries like OOUI. We also decided to start using a modern JavaScript framework for front-end development in MediaWiki. Now, we have the opportunity and resources to create a single, central system built with modern tools to support design and front-end development across Wikimedia projects.

This kind of system is called a design system. It is a collection of guidelines, standards, and reusable elements that help people easily and quickly design and implement features that are consistent, scalable, and accessible for everyone. Designing and building interfaces across Wikimedia is challenging: we must ensure that we’re supporting billions of users with different needs, while maintaining a consistent experience across features and wikis. A design system houses design standards and solutions, plus tools for implementing those designs in code, all in a single place. This way, users of the design system can avoid solving the same problems over and over, and focus their energy on building great features for our users more quickly and with a wider array of contributors.

Introducing Codex

Codex will be the design system for Wikimedia. It is currently under development and is already being used by a few projects run by volunteers, Wikimedia Deutschland, and the Wikimedia Foundation. 

Let’s take a closer look at the different parts of Codex.

Guidelines

There are two groups of guidelines within Codex:

  • User guidelines: These guidelines help people use Codex in a consistent, effective way. User guidelines are critical to helping us meet the goal of enabling people with various backgrounds and levels of experience to contribute to the movement through building user-facing features. At the time of publishing this article, Codex user guidelines are a work in progress.
  • Collaboration guidelines: A single team at a single organization can’t and shouldn’t be responsible for building the entire design system for Wikimedia. Codex is meant to be collaboratively designed, implemented, and evolved by people across the movement. The parts of Codex that exist so far are the result of work done by many different people. Read more about how you can contribute to Codex.

Standards

Codex contains documented standards for designing and building things across Wikimedia. These standards include things like:

  • Design principles: These agreements drive design decisions by helping us recognize who we’re serving and what we’re prioritizing. One important example is the web accessibility standards that we have committed to meet. Read more about Wikimedia design principles.
  • Visual style principles: This set of primitive styles are the foundation of our design tokens (more on this below), and they guide the visual style of our components and patterns. These principles include things like color, typography, and media. Learn about Wikimedia visual style principles.

Reusable elements

Examples of reusable elements within Codex
Examples of reusable elements within Codex

There are various reusable elements within Codex that translate the design standards into tokens, assets, and components that can be used to build UIs in alignment with the design system:

  • Design tokens: Design tokens are the smallest pieces of our design system. These are design decisions codified into reusable variables that can be used to style components and patterns. We use them in place of hard-coded values (such as hex values for color or pixel values for spacing) in order to maintain a scalable and consistent visual system. For example, @color-progressive is a token with the value #36c, which is the hex code for the blue color you’ll see on buttons and other elements that help you move forward in a process. Design tokens are critical to sharing design decisions across our designs and code. Read more about Codex design tokens.
  • Assets: Assets are UI elements with strong visual meaning, used to support the recognition of available actions (icons), to generate engagement (illustrations), or to represent our brand/organization (logos). Codex includes a collection of icons, a library of illustrations, and logos for Wikimedia entities.
  • Components: Codex contains an ever-growing library of components, like buttons and inputs, that can be used to construct larger features. Components are a combination of elements and assets, styled by design tokens, that allow users to accomplish different tasks. They can be reused across our products to support multiple use cases in a variety of contexts. Most of these components are built with Vue.js, while some are CSS based and do not require JavaScript. Check our Codex component demos.

How will this impact designers, developers, and others?

If you help create UIs somewhere in the Wikimedia ecosystem, you may be wondering how Codex will affect your work.

Benefits

We hope that people working on the front-end will experience the benefits of having a centralized, modern design system:

  • More consistent products: Codex will provide consistent, accessible, well-tested patterns that can be used across Wikimedia, leading to a more cohesive and equitable experience for all of our users. Providing a great experience regardless of language, disability, device, browser, or connection speed is critical to our movement.
  • Scalability and flexibility: A good system can be scalable to any type of user, any language, any culture and many use cases.
  • Single source of truth: Right now, many of us who work on the front-end struggle to know where to start when it comes to finding documentation or implementing designs or code. Codex will be a single landing point for design and front-end work, which we hope will reduce confusion and streamline processes.
  • Focus on solving complex problems: Since systems feature pre-made decisions, designers and developers can spend less time thinking about those decisions and use their time solving other complex problems in their projects. Pre-made decisions let users of the system act quickly: for instance, with a Figma design system, designers complete their objectives 34% faster than without[1].
  • Easier to contribute: We’re adopting modern, popular tools like Figma (for design) and Vue.js (for development) to lower barriers to entry, bringing in more contributors from different backgrounds. Using these modern tools will make design and development easier and faster. It will also make it easier for any designer and developer to Codex itself, creating more reusable elements for everyone to share.

Considerations

As we begin to introduce Codex and migrate to a modern system and tools, there are various things to consider:

  • When to start using Codex: Codex is a work in progress, but a few projects already use it (see the Built with Vue page for details). When thinking about adopting Codex for a new project or migrating an existing project, you should consider if Codex contains enough of the components you would need, if you’re ready to start using Vue.js, and if you need to support no-JavaScript environments. We are also currently focused on the web, but intend to bring apps into the system in the future. You can read more about when to use Vue and Codex on our FAQ.
  • What about existing or legacy systems: Many projects are still built with existing libraries like OOUI, or even deprecated libraries like mediawiki.ui or jQuery UI. Using Codex might be a great opportunity to migrate off of these technologies. That said, we will be keeping OOUI aligned with Codex in terms of design for as long as possible.
  • How to get involved with this work: Codex won’t work without contributions from many folks around the movement! If you’re interested in contributing to Codex or providing feedback, check out our team page and get in touch with us.

Technical Enablement

Vue.js is now shipped as part of MediaWiki core; however, implementing Codex across Wikimedia is dependent upon a number of technical workstreams that the Design Systems team is shepherding. You can read about the current status of our technical infrastructure work.

The Design Systems Team

In collaboration with Wikimedia Foundation teams, partners and volunteers, the Design Systems Team develops an overarching strategy for front-end design and engineering across Wikimedia. The Design Systems Team manages Codex and stewards its growth and adoption, as well as implementing infrastructure that supports its use in Wikimedia software. 

As of December 2022, the Design Systems Team at WMF includes ten humans across five different functions:

How can I stay up to date and get involved?

“High-performing teams establish platforms (such as github.com repositories and other content publishing tools) that enable an array of individuals to continuously evolve a system’s definition.”[2] While the Design Systems Team is shepherding this work and is ultimately accountable for its successes and failures, we collaborate with and welcome contributions from people across the Wikimedia movement – this structure is known as a federated model. 

We welcome contributions from everyone! There are several ways to contribute, or simply follow our work:

  • Subscribe to and comment on tasks in our project management system, Phabricator
  • Contribute to the design process (such as designing new components or icons)
  • Suggest new components and design tokens
  • Writing and submitting code
  • Reviewing code
  • Updating and expanding library documentation

Contributions to Codex are covered by the Code of Conduct for Wikimedia technical spaces.

We’re excited about breaking down technical barriers to accessing free knowledge, and hope that you are, too! 

Wiki Education hosted webinars all of October to celebrate Wikidata’s 10th birthday. Below is a summary of our fourth event. Watch the webinar in full on Youtube. And access the recordings and recaps of the other three events here.

For our fourth and final webinar celebrating Wikidata’s birthday, Hilary Thorsen, Julian Chambliss, Kate Topham, and Justin Wigard each shared how they invite newcomers into the linked open data fold. What does Wikidata allow that other platforms don’t? What advice do they have for getting people started? And what do we mean when we say we’re building a “community of practice”?

From upper left: Kate Topham, Will Kent, Julian Chambliss, Justin Wigard, and Hilary Thorsen in our webinar.

Hilary got her start with Wikidata as Wikimedian-in-Residence for the Linked Data for Production Project. While there, she helped library colleagues advance their own projects and had fun answering their linked data questions. She decided to capture that expertise and disseminate it even more widely through the LD4 Wikidata affinity group and has been doing so since April 2019.

Justin is a Postdoctoral Research Fellow in the Distant Viewing Lab at University of Richmond, where he works and teaches courses on comics and popular culture. For Justin, Wikidata provides fruitful ways of thinking about community engagement and facilitating open data work in the humanities classroom. He’s thinking about ways we can connect the dots between the classroom, the academy, and Wikidata’s global community of users.

Kate is a Digital Archivist at Michigan State University (MSU). She specializes in metadata, data migration, and digital collections. Kate got into Wikidata as a form of data cleaning through her work in Open Refine, which she utilizes so often she refers to it as her “software spouse.” She’s interested in using Wikidata for research and making things that are hidden more visible to everyday people.

Julian is a professor of English and the Val Berryman Curator of History at the MSU Museum at Michigan State University. He leads the Department of English Graphic Possibilities Research Workshop, a group that brings comic studies faculty and graduate students together to contribute to Wikidata. Justin and Kate have contributed extensively to Graphic Possibilities–Justin as a recent PhD graduate of MSU and Kate in her archivist role. Together, the group is creating a data set from MSU’s library of comic art metadata collection to share with the world.

What Wikidata allows that other platforms don’t

In Wikidata, you can describe collection items in more depth and nuance, disrupting library authority and traditional modes of collaboration. The possibilities are endless when you can crowd-source corrections to your data and share results with an audience that spans the globe.

Julian, Justin, and Kate often consult Wikipedia to fill in missing data in their catalogs, which is how they found their way to Wikidata. They appreciated the abundance of information already in the repository, but also saw the gaps. Filling them was a worthwhile pursuit, not only for the project but for the many researchers that would come after them. “We began to think of Wikidata as a means of providing that information about comics that would really enhance peoples’ ability to write about them,” Julian said. Wikidata allows you to provide detail and nuance to an item in an unparalleled way. That freedom was an attractive feature for Hilary, too. “With Wikidata, the sky’s the limit,” she added. “And you can find anything that interests you and add it if it’s notable enough. I found that exciting.”

Julian is curious about how we can make nuances around culture more visible in a data record. Wikidata is useful in surfacing the omissions in a record, especially related to race and gender. “We can’t change the library record [to be more inclusive],” Julian noted. “But we can do something in Wikidata that has a substantive impact in peoples’ ability to understand what the record is showing, or what it doesn’t show. Questions of race and metadata are linked in a way that’s a challenge, but it’s something we have to wrestle with.”

Other Wikidatans can help. As the Graphic Possibilities team were combing their collection, they discovered some errors in their bibliographic data. “All our Marmaduke comics were attributed to the wrong person. And that same problem existed in a lot of places,” Kate said. “By bringing together this community in Wikidata we could figure out where the errors were and that community of knowledge and practice allowed for us to improve.”

“I love the way Wikidata disrupts library authority,” Kate continued. “We can incorporate different expertise and ways of seeing the world. The way we structure things is better because we can draw on so many different communities.”

Wikidata is a tool for examining topics in new, multidisciplinary ways. Justin invites his humanities students to create visualizations about comics in the platform, where they see the instant ramifications of their work. “They think about how their work extends beyond the classroom, beyond the gated silo of academia. And for me, I can connect with colleagues I didn’t know before. It’s not just linked data, it’s linked people.

The value of the Wikidata community has been a through-thread across our Speakers Series. “Before, cataloging had been internal and focused only on what I was working on at my institution,” Hilary shared. “But with Wikidata it becomes so much easier to collaborate with people around the world and contribute to other projects and learn something new. It broadens the way you can contribute and it’s a more accessible practice too. You can start participating in linked data immediately, which before was really hard to do. Overall, the community is what drew me to Wikidata and what makes all the contributions so worthwhile and keeps me coming back.”

Wikidata as a “community of practice”

Wikidata provides a forum for anyone to participate in discussions around data integrity. With archives of past discussions, decisions are transparent and up for friendly debate. And Wikidatans share a deep interest in adapting until we get it right. As the Graphic Possibilities team said more than once, it’s a community of practice.

Given that the platform can be a little overwhelming at first, it’s important to give newbies different modes of entry and participation. “That’s more sustainable for the long run,” Hilary said of her work with LD4. “People don’t always have time to join every call or working hour, but because we have consistent programming, people know that if they miss a week they can join the next week.” Justin, who helps lead Wikidata edit-a-thons with Graphic Possibilities, noted that the platform was great for both synchronous and asynchronous work as the pandemic forced them to transition to remote work. “We had to try to find ways to reach folks who were not fluent in comics or Wikidata or might not be digital experts, but still wanted to be part of a community.”

When asked what was most helpful in building community around the Graphic Possibilities project, Kate thought of two things. “Hilary Thorsen and Will Kent,” Kate said with a smile. “There’s so much within Wikidata and Wikipedia that we joke about satisfies the need for nerds to correct each other. And I feel like both of you have provided a model for this very generous, opening space that makes working with linked data, and Wikidata in particular, a lot easier. This whole thing is a big conversation and we get to decide what the best way forward is.”

Advice for bringing others into linked data

The Graphic Possibilities team has successfully invited comics-interested scholars from across institutions to join them in edit-a-thons and build their own capacity around linked open data. Having scaffolded events with clear, narrowly defined goals is helpful in fostering this community of learning. “It’s easy to get lost in the weeds, so we set firm boundaries about what to work on, what to avoid, and we have really clear tutorials and troubleshoot issues,” Justin shared. “Wikidata can be overwhelming if you’re not prepared for it. Having that support is helpful. And recognizing that smaller goals can be just as effective as something lofty. We actually started to scale our projects back so we can achieve more with less.”

Preparation is also key. Keep events focused and small, but have a back-up plan for what to work on in case you finish early. And be prepared to let people pursue their interests. “Allowing for creativity within your scoped event can be powerful and fun,” Kate added.

The future of Wikidata

Wikidata has grown so much over the last 10 years – it just hit 100 million items this year. We only see it becoming more important to library curricula, job training, and the World Wide Web as a whole.

“It’s a necessary skill,” Hilary said. “Five years from now, you’ll want to have that on your resume.”

“Wikidata and other open source repositories are going to become increasingly necessary and relevant as other avenues of data become more monitored, privatized, siloed,” said Justin. “There’s something really powerful and amazing about Wikidata and the fact that it’s grown so much over 10 years. … I want to see more of that, more projects, in more classrooms. I want to see what other people do with it that I haven’t thought about.”

“Understanding data becomes a fundamental question of civil society,” Julian added. “I’m no Wikidata expert, but I do recognize the tremendous potential in Wikidata to support really interesting conversations. How does a data description actually translate to how society operates? How do we tell stories with data? Students at some level have been born consumers of technology but explaining how it works is a real problem for them. Data especially is particularly complicated for them. I’ve said, you know, these platforms aren’t actually free. The thing they’re selling is you. If you don’t have a sense of data literacy, you’re going to be in trouble. If you get a little sense of it, you begin to understand that data is intrinsically connected to your life.”

Check out LD4 here and Graphic Possibilities here

If you’re the kind of learner who seeks community and guidance on your journey, the Wikidata Institute has three upcoming training courses starting in January, March, and May 2023.

How rich and famous people influence Wikipedia

09:33, Thursday, 22 2022 December UTC

There were two prominent stories this week about how rich and famous people tried to influence Wikipedia's coverage, and depending on your point of view, got their way. I think the coverage of both stories missed the mark so I'd like to dive into them a bit deeper.

But first, Canada is currently discussing enacting a new gun control law, known as Bill C-21. A prominent ice hockey player, Montreal Canadiens goalie Carey Price, spoke out in opposition to the bill, aligning himself with the Canadian Coalition for Firearm Rights. At the same time the CCFR was under fire for creating a online coupon code, "POLY", which people assumed referred to the 1989 École Polytechnique massacre (the group denies this).

If you had wanted to look up the Canadian Coalition for Firearm Rights on Wikipedia prior to December 7, you wouldn't have found anything. You probably wouldn't have learned that in 2019 they asked members to file complaints against a doctor who called for a ban on assault rifles, or that their CEO shot his first firearm in...the United States.

I'm not very in tune with Canadian politics, so it's unclear to me how prominent this group is actually (doesn't seem to be on the level of the NRA in the US). But Price put them on the map and now there's a Wikipedia article that will educate people on its history. (It's even been approved to go on the Main Page, just pending scheduling.) 1 point for rich and famous people influencing Wikipedia's coverage for the better.

OK, so now onto author Emily St. John Mandel, who is divorced and wanted Wikipedia to not falsely say she was married. She posted on Twitter, "Friends, did you know that if you have a Wikipedia page and you get a divorce, the only way to update your Wikipedia is to say you’re divorced in an interview?"

She then did an interview in Slate, where she was specifically asked and answered that she was divorced.

The thing is, that probably wasn't necessary. Yes, Wikipedia strongly prefers independent, reliable sources as the "Wikipedia:Reliable sources" policy page goes into great detail about. But in certain cases, using the person themselves as a source is fine. In the section "Self-published and questionable sources as sources on themselves", the policy lists 5 criteria that should be met:

  1. The material is neither unduly self-serving nor an exceptional claim.
  2. It does not involve claims about third parties (such as people, organizations, or other entities).
  3. It does not involve claims about events not directly related to the subject.
  4. There is no reasonable doubt as to its authenticity.
  5. The Wikipedia article is not based primarily on such sources.

On top of this, Wikipedia has a strict policy regarding biographies of living persons (BLP), that would lend more weight to using the self-published source.

If Mandel had just tweeted, "I'm divorced now.", that would've been fine. In fact, the first person to update her article with a citation about her divorce used her tweet, not the Slate interview! In the past I've also used people's tweets to remove incorrect information from Wikipedia.

(That said, people do lie about their age, height, etc. So far the worst case I've ever run into was Taio Cruz, who reached the level of sending in a fake birth certificate. You can read the talk page, it's a giant mess.)

And then there's Elon Musk (sigh), who tweeted about how Wikipedia is biased, right after an "Articles for deletion" discussion was started on the Twitter Files article.

Vice covered it with: "We Are Watching Elon Musk and His Fans Create a Conspiracy Theory About Wikipedia in Real Time". It goes into good detail about the Wikipedia deletion process, but I don't fully agree with the conclusion that this is how the process is supposed to work, and how it usually works.

I cast a vote in the discussion, stating it was easily notable and an obvious keep. By the time it was closed, the tally was 73 keep votes, 27 delete votes, and 23 merge votes. Wikipedians will tell you that these discussions are not a vote, rather the conclusion is based on the strength of the arguments. But in this case, I want to focus on the direction of the discussion rather than the final result.

At the time Musk tweeted (Dec 6, 18:46 UTC), the vote count was 12 delete votes, 4 keep votes, 4 merge votes (I should say that I'm relying on Enterprisey's vote-history analysis for these numbers). The votes post-tweet were 69 keep, 15 delete, 19 merge. That's a pretty big shift!

I would like to think that Wikipedians would have reached the same (and IMO correct) conclusion regarding the existence of the Twitter Files article without Musk's "intervention", but it's hard to say that for sure.

But, as I've hopefully demonstrated, Musk is not alone in trying to influence Wikipedia. Rich and famous people do it all the time, for entirely different goals, and sometimes without even realizing it!

Seven Years – or thereabouts

21:53, Wednesday, 21 2022 December UTC

In January I will have been at the Wikimedia Foundation for 7 years. second week, two months, one year, and five year reflections.">1 My role has changed a lot over those seven years, as has the organization and the wider Wikimedia movement. At the end of this year, I wanted to take a second to write down what I do and why. This is a pseudo-introduction post for social media (where I don’t have much of a presence) and a chance to pen a, “What I do for a living” blog post.

One of the big things I do is help with movement-facing communications stuff for the Wikimedia Foundation. The non-profit that supports Wikipedia and other free-knowledge projects. My job is to get teams at the Foundation to talk to the volunteers and share what they’re doing. Lots of behind-the-scenes feedback and input on how to find folks and where to talk to them – before we go and talk to them!

That’s always ongoing, never ending work that most of the time works well. Teams write their thinking down and understand what the community values (no surprises!). Folks know about the work we’re doing and can get involved. We understand their needs and concerns and address them. I try to be the voice of the community – as best any one person can – in internal conversations. So we’re as understanding and aligned as a 700+ org of folks, the majority of which are not contributors and are new to this community, can be. 

The other big thing I do is help run a community news and event blog called Diff. https://diff.wikimedia.org This is also ongoing, never ending work, that most of the time works well. It’s the more fun, direct work I do in support of the first big thing I mentioned.

The name is super dorky. It’s named after the “differential” view between two edits on a wiki and the difference volunteers make in their work. I get to help share what people are working on from around the world in the pursuit of free knowledge. I’m like the hype man for the Wikimedia movement. Ok, maybe just a hype man, but I love my job and feel very lucky that I get to do this for a living. 

In 2022, Diff saw 188,427 visitors making up 386,331 views. We published 640 posts in dozens of languages from close to 300 authors. We have over 720 email subscribers. On the scale of Wikipedia that’s small potatoes (English Wikipedia saw 96 Billion views in 2022), but on the scale of the movement of volunteers – editors, organizers, affiliates, staff, etc. – I’m happy with what we’ve done. For comparison we say we have about 300k contributors across all projects and languages, so to reach 188k “visitors” of that group, and a little beyond, is pretty good in my book. 

Diff is very open. You can login with your Wikimedia account and submit a draft. I keep the site running on the software/feature side of things, documentation, and helping review the drafts that come in and answer questions from authors. That last bit takes up a lot of time. I really appreciate everyone who takes the time to write a post and I am here for giving folks the platform and support to share their work. 

Here’s a short list of some of my favorite posts this year. 

Araisyohei, a volunteer from Japan takes on a behind-the-scenes tour of OYA Soichi Library, a small magazine library in Tokyo. There’s some great photos of their event and an even more amazing video tour of this tiny library embedded in the post. 

JA: https://diff.wikimedia.org/ja/2022/06/13/日本随一の雑誌専門図書館でエディッタソンを有/

EN: https://diff.wikimedia.org/2022/06/22/editathon-at-oya-soichi-library-japanese-magazine-library/

Every year Jimmy Wales celebrates Wikimedians for their efforts. Expanded in recent years, the “Wikimedian of the Year” awards are always a highlight. These folks are doing such unique and important work in their free time.

EN (Numerous language selectable in the drop-down): https://diff.wikimedia.org/2022/08/14/celebrating-the-2022-wikimedians-of-the-year/

One of the folks who won an award this year was Annie Rauwerda, from @depthsofwikipedia fame. I was fortunate enough to interview Annie in late 2021, right as she was blowing up. #humblebrag

https://diff.wikimedia.org/2021/12/07/from-the-depths-of-wikipedia-an-interview-with-wikimedian-and-influencer-annie-rauwerda/

Her work now has its own Wikipedia article in eight languages!

https://en.wikipedia.org/wiki/Depths_of_Wikipedia

Wikimedians host photo contests throughout the year on various themes. Wiki loves Folklore, Wiki Loves Monuments, Wiki Loves Africa, Picture of the Year, and more. These contests capture the diversity of life on the planet and the amazing talents volunteers have – and share freely. I’m in awe and humbled every time we publish a recap of a contest. Here’s a recent one from the Wiki Loves Africa 2022 contest. 

https://diff.wikimedia.org/2022/12/06/intimate-glimpses-of-home-expressed-in-wiki-loves-africas-photo-competition-on-wikipedia/

Another from April and the Wiki Loves Monuments 2021 contest. 

https://diff.wikimedia.org/2022/04/20/take-a-journey-around-the-world-with-the-wiki-loves-monuments-winners-2021/

One of the cornerstones – maybe _the_ cornerstone of what makes Wikipedia work are citations to reliable sources. Access to these sources can be challenging. Many are behind paywalls or in journals that are hard to access. The Foundation helps by building a service called the Wikipedia Library, where volunteers can get free access to these sources to help create and improve articles. 

https://diff.wikimedia.org/2022/01/19/the-wikipedia-library-accessing-free-reliable-sources-is-now-easier-than-ever/

I work with a lot of smart folks who are trying to figure out how to create, sustain, and grow healthy and independent communities. One way we do that is by developing programs, training, and resources for communities to succeed. In this three-part(!) series, Alex Stinson explores how organizing helps the movement grow in relation to our 2030 movement strategy. 

https://diff.wikimedia.org/2022/04/05/part-i-anyone-can-edit-is-not-a-strategy-for-growing-the-wikimedia-movement/

Editing an encyclopedia seems like a boring, harmless endeavor. Until you realize that there are people who don’t want this to happen. They don’t like facts. Or laws that could greatly hinder how volunteers can contribute and what we’re able to host. Our legal department and our global advocacy team are some of the most caring, invested folks I know making sure people can express facts – and themselves – in areas of the world where that is dangerous. 

https://diff.wikimedia.org/2022/04/20/how-smart-is-the-smart-copyright-act/

https://diff.wikimedia.org/2022/07/12/what-does-the-wikimedia-foundations-human-rights-impact-assessment-mean-for-the-wikimedia-movement/

We also love to republish articles from elsewhere on the web. Wikimedian and deep learning enthusiast Colin Morris shared his work in trying to discover the _least_ viewed article on Wikipedia. 

https://diff.wikimedia.org/2022/06/06/in-search-of-the-least-viewed-article-on-wikipedia/

Last, but not least, our product tames take building software for everyone very seriously. We have a new desktop interface (and I think secretly a new mobile interface too) coming in January. In this post the product manager, Olga a good friend and foxhole comrade, talks about how the web team approaches developing their work with equity in mind. The sort of thoughtful product development we need to see more of in the world. 

EN (and seven languages): 

https://diff.wikimedia.org/2022/08/18/prioritizing-equity-within-wikipedias-new-desktop/


The Wikimedia movement is messy. People can be jerks and the barrier to entry is far too high for my liking. I show up every day trying to increase awareness and participation of what folks are doing. To gather people together and connect interests and ideas. It’s funny to be working in the blog mines in 2022 – not just working – but thriving when so many folks consider a blog as an old antiquated thing. I think they have a place and more folks should turn to them to share what they are doing and learn from others. I don’t know where I go from here professionally. Something I’ve been talking about with folks, but whatever is next I hope is more of this. Positivity, working together to tell the story of our movement, and supporting one another through difficult times.

We finally wrapped up the third edition of the Wikimedia Accelerator UNLOCK! And this year, it was all about collaboration and cross-regional exchange. Together with our partners Wikimedia Serbia and Impact Hub Belgrade, we were able to set up a strong and value-driven partnership. And together, we selected  and supported a diverse group of projects and teams that were as committed to knowledge equity as we were. 


This UNLOCK Insights Report 2022 provides a deep dive into our co-created program setup as well as the experiences of the project teams throughout the program.

Insights into our co-created program and support setup

UNLOCK Insights Report 2022 – Overview of the program and support structure https://commons.wikimedia.org/wiki/File:UNLOCK_Insights_Report_2022_Program_and_support_structure.png

Building upon recommendation 9 – Innovate in free knowledge – of the Movement Strategy Wikimedia Deutschland (WMDE) initiated the UNLOCK program with the ultimate goal to promote innovative ideas and projects that break down social and technical barriers – projects that achieve knowledge equity. The third edition of the UNLOCK Accelerator was co-designed and co-hosted by WMDE, Wikimedia Serbia (WMRS) and Impact hub Belgrade. One of our core motivations was to explore and implement how to best create a learning environment that invites all participants – as they come from different regions – to collaborate and strengthen their innovative capacities. 

Our highlight was creating a cohort learning experience for the participants. Through our jointly created working principles we were able to establish a safe space where open exchange was possible. Complemented by rituals and methods (have a look at our UNLOCK toolbox), participants could share their successes and challenges in the project and product development process as well as their experiences in working together as a team. 

With Wikimedia Serbia and Impact Hub Belgrade on our side, we were also able to expand our international networks and could pull from a larger and more diverse pool of experts who lended their knowledge and skills set to our program participants.

UNLOCK Insights Report 2022 – Cohort learning sessions within the UNLOCK Accelerator program 2022 https://commons.wikimedia.org/wiki/File:UNLOCK_Insights_Report_2022_Cohort_learning.png

We gathered further learning and tips for successfully navigating challenges in an international and interdisciplinary collaboration based on our experiences with WMRS and Impact Hub Belgrade – have a closer look here.

Impact unpacked: some highlights

UNLOCK Insights Report 2022 – Evaluation of the growth potential of participants after the UNLOCK program. https://commons.wikimedia.org/wiki/File:UNLOCK_Insights_Report_2022_Cohort_growth.png
  • In just a few months, teams were able to accelerate their ideas: 71,4% of participants stated that the UNLOCK Accelerator helped them to advance themselves and the development of their project. 
  • Most valued program elements included:
    • Cohort learning with peer-to-peer exchange and expert input sessions;
    • Group mentorship sessions; and
    • Individual mentorship sessions
  • What project teams wish to have more of? 1:1 and individualized sessions with mentors / coaches
  • TOP 3 challenges ahead:
    • Professionalization in skills and making the project become more than ‘just a passion project’;
    • (Financial) sustainability and exploring different models for open source projects; as well as
    • Stakeholder engagement and elaborating partnership options.

Dive into the report!

Check out the UNLOCK Insights Report 2022 – here. The basis of this report are regular debriefs and retrospectives with our partners, 1:1 feedback sessions as well as anonymous surveys with the participants of the program, as well as our own evaluation sessions following each milestone of the program. Enjoy the read!

Reach out to me if you have further questions regarding the report. Besides, we are more than happy to engage in further conversations and eager to hear additional perspectives, insights, and feedback from across the movement related to recommendation 9 – Innovate in free knowledge. 

With 1.2 million respondents, the Peoples’ Climate Vote (2020/2021) is the largest survey of public opinion on
climate change ever conducted (50 countries). People were asked about their belief in the climate emergency and which policies, across six areas – energy, economy, transportation, farms and food, protecting people, and nature
– that they would like their government to enact.

The first question was “Do you think climate change is a global emergency?” (Yes/No).

And Italy is in first position together with UK: 81% of respondents from Italy think climate change is a global emergency.

The second question was “If yes, what should the world do about it?” (a. Do everything necessary, urgently / b. Act slowly while we learn more about what to do / c. The world is already doing enough / d. Do nothing)

And Italy is the first one here: 78% of respondents from Italy who answered Yes to the first question say we should do Everything Necessary, Urgently in response.

I did not have at all the perception that in Italy there is a high (and higher than all other countries) awareness about climate change as a global emergency but let me appreciate it anyway: now it is time to move from belief and awareness to action.

The survey asked people which of 18 climate policies they would like their country to pursue to address climate change. Overall, the most popular among participating countries were

– Conserve forests and land (54%)
– Use solar, wind and renewable power (53%)
– Climate friendly farming techniques (52%)
– Invest more money in green business and jobs (50%)

Wikipedia Library has a new partner that would greatly support those editors editing wiki projects about Ottoman history: Wikilala.

Wikilala is a Turkey-based digital repository and search engine with a very ambitious purpose: collecting and digitizing all printed Turkish documents printed between 1729 when the first print book was produced in in Turkey and 1928 when the Ottoman Turkish alphabet based on Arabic script was replaced with Latin-based modern Turkish alphabet. Additionally, tens of early modern Republican period papers were uploaded as well including Ulus, Akşam, Milliyet and many others. The initiative resulted with repository consisting more than tens of thousands of documents in printed form, including more than one million pages of newspapers and journals along with thousands of books concerning the history, culture, sociology and geography of the Ottoman Empire and early modern Turkey; and now all is freely accessible to Wikipedia Library users

But that’s not all.  The most important feature of Wikilala is its search mechanism working with both Arabic and Latin alphabet and therefore allowing users  to search the repository and find relevant information, even if they can not read  Arabic script. This feature is expected to be a great support for Wikipedians as well.

This partnership was made possible with the when Wikilala founder Sadi Özgür was contacted by some members of Wikimedia Community User Group Turkey. Sadi Özgür told our colleagues that Wikilala provides a  detailed account of social and cultural life in Ottoman Empire, political events and literary works of the period; and they are very excited to possibility of seeing Wikimedia projects improved using the material provided at Wikilala.

Please find Wikilala partner page at Wikipedia Library here: Partner page 

Episode 128: BTB Digest 20

22:29, Tuesday, 20 2022 December UTC

🕑 22 minutes

It's another BTB Digest! Hear highlights from five recent episodes. Lawrence McCray and Dave Anderson discuss some tradeoffs in data structuring, Cindy Cicalese shares developments in authentication, Marc Laporte promotes the use of structured data (in Tiki), Jacqueline Wong describes the difficulties in running a video game wiki, William Beutler ponders whether people should donate to the Wikimedia Foundation, and more!