I have recently needed to work with MediaWiki API. I wanted to create a trivial script to update UID/GID assignment table from its text counterpart. Sounds trivial? Well, it was not, as update-wiki-table script proves.
MediaWiki API really feels like someone took the webpage and replaced HTML templates with JSON, preserving all the silly aspects that do not make any sense. In this short article, I would like to summarize my experience by pointing out what is wrong with it, why and how it could be done much better.
How many requests does it take?
How many API requests does it take to change a Project page? None because you can’t grant your bot password permissions to do that. Sadly, that’s not a joke but the reality in Gentoo Wiki. Surely it is our fault for not configuring it properly — or maybe upstream’s for making that configuration so hard? Nevertheless, the actual table had to be moved to public space to resolve that.
Now, seriously, how many requests? I would have thought: maybe two. Actually, it’s four. Five, if you want to be nice. That is, in order:
- request login token,
- log in using the login token,
- request CSRF token,
- update the page using CSRF token,
- log out.
Good news: you don’t need to fetch yet another token to log out. You can use the same CSRF token you used to edit the Wiki page!
Now, what’s the deal with all those tokens? CSRF attacks are a danger to people using a web browser! How would you issue a CSRF attack against an API client if the client has pretty clearly defined what it’s supposed to do? Really, requesting extra tokens is just busywork that makes the API unpleasant and slow.
So, the bare minimum: remove those useless tokens, and get down to three requests. Ideally, since I only care to perform a single action, the API would let me provide credentials along with it without the login-logout ping-pong. This would get it down to one request.
The Python examples in MediaWiki documentation use requests module (e.g. API:Edit example). Since I don’t like extraneous external dependencies, I’ve initially rewritten it to use the built-in urllib. That was a mistake, and a badly documented one.
I’ve gotten as far as to the login request. However, it repeatedly claimed that I’m implicitly requesting a login token (which is deprecated) and gave me a new one rather than actually logging me in. Except that I did pass the login token!
It turned out that everything hinges on… cookies. Sure, it’s my fault for not reading the API documentation thoroughly. It is upstream’s fault for making a really silly API that requires a deep browser emulation to work, and for providing horribly misguided error messages.
Bot passwords, bot usernames
Now, MediaWiki prohibits you from using your account credentials to log in, without engaging in ever bigger hoops. Of course, that makes sense — I neither want my password stored somewhere script-accessible, nor give the script full admin powers. Instead, I am supposed to obtain a bot password and grant it specific permissions. Feels like typical case of an API key? Well, it’s not.
Not only I do need to explicitly pass username with the autogenerated bot password but I need to pass a special bot username. This is just plain silly. Since bot passwords are autogenerated, it should be trivially possible to enforce their uniqueness and infer the correct bot account from that. There is no technical reason to require username/password pair for bot login, and it just adds complexity for no benefit.
I am actively using both Bugzilla and GitHub APIs. Both work fine with a simple API token that I keep stored in an unstructured text file. Now I’m being picky but why has MediaWiki to be a special snowflake here?
Summary, or the ideal API
How should the MediaWiki API look like, if done properly? For a start, it would be freed of all its~useless complexity. Instead of bot username/password pair, just a single unique API key. No login tokens, no CSRF tokens, no cookies! Just issue a login request with your API key, get a session key in return and pass it to other requests. Or even better — just pass the API key directly to all the requests, so simple one-shot actions such as edits would actually take one request.