{"id":978,"date":"2019-09-22T08:44:59","date_gmt":"2019-09-22T06:44:59","guid":{"rendered":"https:\/\/blogs.gentoo.org\/mgorny\/?p=978"},"modified":"2019-09-22T09:37:22","modified_gmt":"2019-09-22T07:37:22","slug":"the-gruesome-mediawiki-api","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/mgorny\/2019\/09\/22\/the-gruesome-mediawiki-api\/","title":{"rendered":"The\u00a0gruesome MediaWiki API"},"content":{"rendered":"<p>I&nbsp;have recently needed to&nbsp;work with MediaWiki API.  I&nbsp;wanted to&nbsp;create a&nbsp;trivial script to&nbsp;update <a rel=\"external\" href=\"https:\/\/wiki.gentoo.org\/wiki\/Project:Quality_Assurance\/UID_GID_Assignment\">UID\/GID assignment table<\/a> from&nbsp;its&nbsp;text counterpart.  Sounds trivial?  Well, it was not, as&nbsp;<a rel=\"external\" href=\"https:\/\/gitweb.gentoo.org\/data\/api.git\/tree\/bin\/update-wiki-table.py\">update-wiki-table<\/a> script proves.<\/p>\n<p>MediaWiki API really feels like someone took the&nbsp;webpage and&nbsp;replaced HTML templates with JSON, preserving all the&nbsp;silly aspects that do not&nbsp;make any&nbsp;sense.  In&nbsp;this short article, I would like to&nbsp;summarize my experience by&nbsp;pointing out what is&nbsp;wrong with it, why and&nbsp;how it could be done much better.<\/p>\n<p><!--more--><\/p>\n<h2>How many requests does it take?<\/h2>\n<p>How many API requests does it take to&nbsp;change a&nbsp;Project page?  None because you can&#8217;t grant your bot password permissions to&nbsp;do&nbsp;that.  Sadly, that&#8217;s not&nbsp;a&nbsp;joke but&nbsp;the&nbsp;reality in&nbsp;Gentoo Wiki.  Surely it is our fault for not&nbsp;configuring it properly \u2014 or&nbsp;maybe upstream&#8217;s for&nbsp;making that configuration so hard?  Nevertheless, the&nbsp;actual table had&nbsp;to be&nbsp;moved to&nbsp;public space to&nbsp;resolve that.<\/p>\n<p>Now, seriously, how many requests?  I&nbsp;would have thought: maybe two.  Actually, it&#8217;s four.  Five, if&nbsp;you want to be&nbsp;nice.  That is, in&nbsp;order:<\/p>\n<ol>\n<li>request login token,<\/li>\n<li>log in using the&nbsp;login token,<\/li>\n<li>request CSRF token,<\/li>\n<li>update the page using CSRF token,<\/li>\n<li>log out.<\/li>\n<\/ol>\n<p>Good news: you don&#8217;t need to&nbsp;fetch yet another token to&nbsp;log out.  You can use the&nbsp;same CSRF token you used to&nbsp;edit the&nbsp;Wiki page!<\/p>\n<p>Now, what&#8217;s the&nbsp;deal with all those tokens?  <abbr title=\"Cross-Site Request Forgery\">CSRF<\/abbr> attacks are a&nbsp;danger to&nbsp;people using a&nbsp;web browser!  How would you issue a&nbsp;CSRF attack against an&nbsp;API client if&nbsp;the&nbsp;client has pretty clearly defined what it&#8217;s supposed to do?  Really, requesting extra tokens is just busywork that makes the&nbsp;API unpleasant and&nbsp;slow.<\/p>\n<p>So, the&nbsp;bare minimum: remove those useless tokens, and&nbsp;get down to&nbsp;three requests.  Ideally, since I&nbsp;only care to&nbsp;perform a&nbsp;single action, the&nbsp;API would let me provide credentials along with it without the&nbsp;login-logout ping-pong.  This would get it down to&nbsp;one request.<\/p>\n<h2>Cookies, sir?<\/h2>\n<p>The&nbsp;Python examples in&nbsp;MediaWiki documentation use <a rel=\"external\" href=\"https:\/\/python-requests.org\/\">requests<\/a> module (e.g. <a href=\"https:\/\/www.mediawiki.org\/wiki\/API:Edit#Sample_code\">API:Edit example<\/a>).  Since I&nbsp;don&#8217;t like extraneous external dependencies, I&#8217;ve initially rewritten it to use the&nbsp;built-in <a rel=\"external\" href=\"https:\/\/docs.python.org\/3\/library\/urllib.html\">urllib<\/a>.  That was a&nbsp;mistake, and&nbsp;a&nbsp;badly documented one.<\/p>\n<p>I&#8217;ve gotten as&nbsp;far as&nbsp;to&nbsp;the&nbsp;login request.  However, it&nbsp;repeatedly claimed that I&#8217;m implicitly requesting a&nbsp;login token (which is deprecated) and&nbsp;gave me a&nbsp;new one rather than actually logging me in.  Except that I did pass the&nbsp;login token!<\/p>\n<p>It turned out that everything hinges on\u2026 cookies.  Sure, it&#8217;s my fault for not&nbsp;reading the&nbsp;API documentation thoroughly.  It is upstream&#8217;s fault for making a&nbsp;really silly API that requires a&nbsp;deep browser emulation to&nbsp;work, and&nbsp;for providing horribly misguided error messages.<\/p>\n<p>Why should an&nbsp;API use cookies in&nbsp;the&nbsp;first place?  You don&#8217;t need to pass data behind my back!  Since I&nbsp;am writing the&nbsp;API client, I&nbsp;am more than happy to&nbsp;pass whatever data needs to be&nbsp;passed explicitly, in&nbsp;API requests.  After all, you require me to&nbsp;pass lots of&nbsp;tokens explicitly anyway, so&nbsp;why not&nbsp;actually make them do something useful?!<\/p>\n<h2>Bot passwords, bot usernames<\/h2>\n<p>Now, MediaWiki prohibits you from using your account credentials to&nbsp;log&nbsp;in, without engaging in&nbsp;ever bigger hoops.  Of&nbsp;course, that makes sense \u2014 I neither want my password stored somewhere script-accessible, nor&nbsp;give the&nbsp;script full admin powers.  Instead, I&nbsp;am supposed to&nbsp;obtain a&nbsp;<a rel=\"external\" href=\"https:\/\/wiki.gentoo.org\/wiki\/Special:BotPasswords\">bot password<\/a> and&nbsp;grant it specific permissions.  Feels like typical case of&nbsp;an&nbsp;API key?  Well, it&#8217;s not.<\/p>\n<p>Not only I do need to&nbsp;explicitly pass username with the&nbsp;autogenerated bot password but I need to pass a&nbsp;special bot username.  This is just plain silly.  Since bot passwords are autogenerated, it should be&nbsp;trivially possible to&nbsp;enforce their uniqueness and&nbsp;infer the&nbsp;correct bot account from that.  There is no&nbsp;technical reason to&nbsp;require username\/password pair for bot login, and&nbsp;it just adds complexity for no&nbsp;benefit.<\/p>\n<p>I&nbsp;am actively using both Bugzilla and&nbsp;GitHub APIs.  Both work fine with a&nbsp;simple API token that I keep stored in&nbsp;an&nbsp;unstructured text file.  Now I&#8217;m being picky but&nbsp;why has MediaWiki to be a&nbsp;special snowflake here?<\/p>\n<h2>Summary, or&nbsp;the&nbsp;ideal API<\/h2>\n<p>How should the&nbsp;MediaWiki API look like, if done properly?  For a&nbsp;start, it would be&nbsp;freed of&nbsp;all its~useless complexity.  Instead of&nbsp;bot username\/password pair, just a&nbsp;single unique API key.  No&nbsp;login tokens, no&nbsp;CSRF tokens, no&nbsp;cookies!  Just issue a&nbsp;login request with your API key, get a&nbsp;session key in&nbsp;return and&nbsp;pass it to&nbsp;other requests.  Or&nbsp;even better \u2014 just pass the&nbsp;API key directly to&nbsp;all the&nbsp;requests, so&nbsp;simple one-shot actions such as&nbsp;edits would actually take one request.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&nbsp;have recently needed to&nbsp;work with MediaWiki API. I&nbsp;wanted to&nbsp;create a&nbsp;trivial script to&nbsp;update UID\/GID assignment table from&nbsp;its&nbsp;text counterpart. Sounds trivial? Well, it was not, as&nbsp;update-wiki-table script proves. MediaWiki API really feels like someone took the&nbsp;webpage and&nbsp;replaced HTML templates with JSON, preserving all the&nbsp;silly aspects that do not&nbsp;make any&nbsp;sense. In&nbsp;this short article, I would like to&nbsp;summarize my &hellip; <a href=\"https:\/\/blogs.gentoo.org\/mgorny\/2019\/09\/22\/the-gruesome-mediawiki-api\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;The\u00a0gruesome MediaWiki API&#8221;<\/span><\/a><\/p>\n","protected":false},"author":137,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[9],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/978"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/users\/137"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/comments?post=978"}],"version-history":[{"count":18,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/978\/revisions"}],"predecessor-version":[{"id":996,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/978\/revisions\/996"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/media?parent=978"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/categories?post=978"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/tags?post=978"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}