{"id":121,"date":"2012-04-25T20:32:00","date_gmt":"2012-04-25T18:32:00","guid":{"rendered":"https:\/\/blogs.gentoo.org\/mgorny\/?p=121"},"modified":"2013-07-23T11:06:58","modified_gmt":"2013-07-23T09:06:58","slug":"five-commandments-for-xml-format-designers","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/mgorny\/2012\/04\/25\/five-commandments-for-xml-format-designers\/","title":{"rendered":"Five commandments for XML\u00a0format designers"},"content":{"rendered":"<p>If you&#8217;re designing an\u00a0XML-based data format, then I beg you, please read the\u00a0few following rules and\u00a0obey them. XML may look easy, and\u00a0even <em>is<\/em> easy but that doesn&#8217;t mean that writing <em>a\u00a0good<\/em> one is. And\u00a0if you&#8217;re going to invent second HTML, then please, just use JSON or\u00a0any other random container. That will be easier for\u00a0you, and\u00a0easier for\u00a0us.<\/p>\n<p><!--more--><\/p>\n<h2>1.\u00a0Thou shalt always write a\u00a0schema<\/h2>\n<p>Every XML format should be\u00a0well described. And\u00a0no, your ten-stanza poem is\u00a0not\u00a0enough. Complete, dedicated Wiki neither. These usually describe nicely (or\u00a0less nicely) how to <em>write<\/em> your XML. That could be great if\u00a0that&#8217;s all you&#8217;re interested in. But if that&#8217;s supposed to\u00a0be some public format, there is one more important thing\u2026<\/p>\n<p>It&#8217;s called reading. Or\u00a0parsing. Or\u00a0just transforming. If\u00a0you need to\u00a0handle random XML files, coming from various sources, written by\u00a0random people, you have to\u00a0know what you can expect and\u00a0what can you assume. It&#8217;s not\u00a0enough to\u00a0say what <code>&lt;x\/&gt;<\/code> does \u2014 I need to\u00a0know where it can appear and\u00a0what I can find inside.<\/p>\n<p>There are already well-deployed XML description formats such as\u00a0DTD, Relax-NG or\u00a0XML Schema. Please use one of\u00a0them, I will be grateful. Not only they describe the\u00a0format strictly and\u00a0accurately but they also provide a\u00a0very simple means to\u00a0validate XML files. It&#8217;s helpful both to\u00a0us, who parse it, and\u00a0to\u00a0people who actually write such XML.<\/p>\n<p>An\u00a0XML without spec is an\u00a0XML where every element can appear anywhere in\u00a0the\u00a0document. In\u00a0other words, it&#8217;s\u00a0not even XML but\u00a0an\u00a0ugly <q>tag soup<\/q>.<\/p>\n<h2>2.\u00a0Thy XML shalt be\u00a0structured, not\u00a0flat<\/h2>\n<p>XML provides means to\u00a0create neat, hierarchical structures. <em>Use them.<\/em> If\u00a0your documents consists of\u00a0logical parts like sections or\u00a0chapters, put their complete content in\u00a0a\u00a0single <code>&lt;section\/&gt;<\/code> or <code>&lt;chapter&gt;<\/code>, or\u00a0any other thing that may come into your head. That&#8217;s the\u00a0correct way of\u00a0doing that in\u00a0XML.<\/p>\n<p>Random headings and\u00a0separators are not\u00a0enough. Even if\u00a0your spec says they always and\u00a0definitely start a\u00a0new section, that&#8217;s <em>not\u00a0enough<\/em>. If\u00a0you don&#8217;t believe us, try splitting that thing into parts yourself. Especially when you have sub-headings, sub-sub-headings and\u00a0so on.<\/p>\n<p>A\u00a0flat-structured XML is no\u00a0real XML. It&#8217;s just a\u00a0text file with a\u00a0few unnecessary elements.<\/p>\n<h2>3.\u00a0Thou shalt split text into blocks using XML, not\u00a0text delimeters<\/h2>\n<p>Even if\u00a0you think that&#8217;ll make writing much easier, <em>do\u00a0not<\/em> ever try to\u00a0use simple character delimiters to\u00a0split text into blocks. If\u00a0you need a\u00a0list, create a\u00a0list of\u00a0XML elements. Like the\u00a0following:<\/p>\n<pre><code>&lt;l&gt;elem1&lt;\/l&gt;\r\n&lt;l&gt;elem2&lt;\/l&gt;\r\n&lt;l&gt;elem3&lt;\/l&gt;<\/code><\/pre>\n<p>And\u00a0yes, I know <code>elem1,elem2,elem3<\/code> is\u00a0shorter and\u00a0easier to\u00a0type. But guess what \u2014 it&#8217;s hell to\u00a0parse. It isn&#8217;t even XML \u2014 you either have to handle it externally or\u00a0create a\u00a0complex recursive template which will split it and\u00a0handle each token separately. That&#8217;s very bad.<\/p>\n<p>An\u00a0XML which uses random delimeters to\u00a0create lists is no\u00a0XML. It&#8217;s\u00a0called <abbr title=\"Comma-separated values\">CSV<\/abbr>.<\/p>\n<h2>4.\u00a0Thou shalt not\u00a0allow insane structures<\/h2>\n<p>Even if\u00a0you think noone will create an\u00a0insane structure in\u00a0your document, it&#8217;s not\u00a0enough. Saying it&#8217;s disallowed on your awesome Wiki is not\u00a0enough either. <em>Forbid it<\/em> if\u00a0it&#8217;s supposed to be\u00a0forbidden.<\/p>\n<p>Otherwise, someone finally will use it. He or\u00a0she will deliberately ignore your warning because <q>it works<\/q>. And\u00a0even if\u00a0they don&#8217;t, we will have to\u00a0support it anyway in\u00a0a\u00a0compliant parser.<\/p>\n<p>If\u00a0you expect your data to\u00a0be\u00a0interchangeable with widely used formats, take a\u00a0look at\u00a0them. Don&#8217;t allow insane things which none of\u00a0these formats do \u2014 or\u00a0we&#8217;ll have to\u00a0either refuse to\u00a0convert some files, convert them incorrectly or\u00a0waste our time writing complex blocks converting them to\u00a0sane ones.<\/p>\n<p>Simply, <em>don&#8217;t do\u00a0it<\/em>. Even HTML doesn&#8217;t do that\u2026 well, that much.<\/p>\n<h2>5.\u00a0Thou shalt write readable XML, not\u00a0bytecode<\/h2>\n<p>The\u00a0major point of\u00a0using XML is\u00a0that the\u00a0data is both readable to\u00a0machines and\u00a0humans. Leave it that way. You have the\u00a0whole human language at\u00a0your disposal, so don&#8217;t write zeros, ones and\u00a0other random numbers which are\u00a0explained on your great Wiki.<\/p>\n<p>Say, an\u00a0attribute called <code>type<\/code> should actually name some type. Say, <code>article<\/code> can be\u00a0some type. <code>1<\/code> usually ain&#8217;t. And\u00a0if that <q>type<\/q> only describes width of\u00a0indent, then name it so! Calling it a\u00a0<q>type<\/q> is\u00a0as\u00a0useful as\u00a0calling it a\u00a0<q>thing<\/q>. Or\u00a0<code>some-other-thing<\/code> and\u00a0<code>a-third-thing<\/code>.<\/p>\n<p>XML without human-readable text is no\u00a0XML. Hell, even byte-compiled XML should have readable element names! That&#8217;s the\u00a0whole point with it. Otherwise, you just end up developing another custom, useless format.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you&#8217;re designing an\u00a0XML-based data format, then I beg you, please read the\u00a0few following rules and\u00a0obey them. XML may look easy, and\u00a0even is easy but that doesn&#8217;t mean that writing a\u00a0good one is. And\u00a0if you&#8217;re going to invent second HTML, then please, just use JSON or\u00a0any other random container. That will be easier for\u00a0you, and\u00a0easier &hellip; <a href=\"https:\/\/blogs.gentoo.org\/mgorny\/2012\/04\/25\/five-commandments-for-xml-format-designers\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Five commandments for XML\u00a0format designers&#8221;<\/span><\/a><\/p>\n","protected":false},"author":137,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[1],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/121"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/users\/137"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/comments?post=121"}],"version-history":[{"count":6,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/121\/revisions"}],"predecessor-version":[{"id":211,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/121\/revisions\/211"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/media?parent=121"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/categories?post=121"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/tags?post=121"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}