{"id":307,"date":"2014-09-05T10:31:15","date_gmt":"2014-09-05T08:31:15","guid":{"rendered":"https:\/\/blogs.gentoo.org\/mgorny\/?p=307"},"modified":"2014-09-05T10:54:45","modified_gmt":"2014-09-05T08:54:45","slug":"bash-pitfalls-globbing-everywhere","status":"publish","type":"post","link":"https:\/\/blogs.gentoo.org\/mgorny\/2014\/09\/05\/bash-pitfalls-globbing-everywhere\/","title":{"rendered":"Bash pitfalls: globbing everywhere!"},"content":{"rendered":"<p>Bash has many subtle pitfalls, some of\u00a0them being able to live unnoticed for a\u00a0very long time. A\u00a0common example of\u00a0that kind of\u00a0pitfall is ubiquitous filename expansion, or\u00a0globbing. What many script writers forget about to notice is that practically anything that looks like a\u00a0pattern and\u00a0is not\u00a0quoted is\u00a0subject to globbing, including unquoted variables.<\/p>\n<p>There are two extra snags that add up to this. Firstly, many people forget that not only asterisks (<code>*<\/code>) and\u00a0question marks (<code>?<\/code>) make up patterns \u2014 square brackets (<code>[<\/code>) do that as\u00a0well. Secondly, by\u00a0default bash (and\u00a0POSIX shell) take failed expansions literally. That is, if\u00a0your glob does not match any\u00a0file, you may not even know that you are globbing.<\/p>\n<p>It&#8217;s all just a\u00a0matter of\u00a0running in\u00a0the\u00a0proper directory for the\u00a0result to\u00a0change. Of\u00a0course, it&#8217;s often unlikely \u2014 maybe even close to impossible. You can work towards preventing that by\u00a0running in\u00a0a\u00a0safe directory. But in\u00a0the\u00a0end, writing predictable software is\u00a0a\u00a0fine quality.<\/p>\n<p><!--MORE--><\/p>\n<h2>How to notice mistakes?<\/h2>\n<p>Bash provides a\u00a0two major facilities that could help you stop mistakes \u2014 shopts <em>nullglob<\/em> and\u00a0<em>failglob<\/em>.<\/p>\n<p>The\u00a0nullglob option is\u00a0a\u00a0good choice for a\u00a0default for your script. After enabling it, failing filename expansions result in\u00a0no\u00a0parameters rather than verbatim pattern itself. This has two important implications.<\/p>\n<p>Firstly, it makes iterating over optional files easy:<\/p>\n<pre><code>for f in a\/* b\/* c\/*; do\r\n    some_magic \"${f}\"\r\ndone<\/code><\/pre>\n<p>Without nullglob, the above may actually return <code>a\/*<\/code> if no\u00a0file matches the\u00a0pattern. For this reason, you would need to add an\u00a0additional check for existence of\u00a0file inside the\u00a0loop. With nullglob, it will just \u2018omit\u2019 the\u00a0unmatched arguments. In\u00a0fact, if none of\u00a0the\u00a0patterns match the\u00a0loop won&#8217;t be run even once.<\/p>\n<p>Secondly, it turns every accidental glob into null. While this isn&#8217;t the\u00a0most friendly warning and\u00a0in\u00a0fact it may have very undesired results, you&#8217;re more likely to notice that something is going wrong.<\/p>\n<p>The\u00a0failglob option is\u00a0better if you can assume you don&#8217;t need to match files in\u00a0its scope. In\u00a0this case, bash treats every failing filename expansion as a\u00a0fatal error and\u00a0terminates execution with an\u00a0appropriate message.<\/p>\n<p>The\u00a0main advantage of\u00a0failglob is that it makes you aware of\u00a0any mistake before someone hits it the\u00a0hard way. Of course, assuming that it doesn&#8217;t accidentally expand into something already.<\/p>\n<p>There is also\u00a0a\u00a0choice of\u00a0noglob. However, I wouldn&#8217;t recommend it since it works around mistakes rather than fixing them, and\u00a0makes the\u00a0code rely on a\u00a0non-standard environment.<\/p>\n<h2>Word splitting without globbing<\/h2>\n<p>One of\u00a0the\u00a0pitfalls I myself noticed lately is the\u00a0attempt of\u00a0using unquoted variable substitution to do word splitting. For example:<\/p>\n<pre><code>for i in ${v}; do\r\n    echo \"${i}\"\r\ndone<\/code><\/pre>\n<p>At\u00a0a\u00a0first glance, everything looks fine. <code>${v}<\/code> contains a\u00a0whitespace-separated list of\u00a0words and\u00a0we iterate over each word. The\u00a0pitfall here is that words in\u00a0<code>${v}<\/code> are subject to filename expansion. For\u00a0example, if\u00a0a\u00a0lone asterisk would happen to be\u00a0there (like <code>v='10 * 4'<\/code>), you&#8217;d actually get all files in\u00a0the\u00a0current directory. Unexpected, isn&#8217;t it?<\/p>\n<p>I am aware of\u00a0three solutions that can be used to accomplish word splitting without implicit globbing:<\/p>\n<ol>\n<li>setting <code>shopt -s noglob<\/code> locally,<\/li>\n<li>setting <code>GLOBIGNORE='*'<\/code> locally,<\/li>\n<li>using the\u00a0swiss army knife of\u00a0<code>read<\/code> to perform word splitting.<\/li>\n<\/ol>\n<p>Personally, I dislike the\u00a0first two since they require set-and-restore magic, and\u00a0the\u00a0latter also has the\u00a0penalty of\u00a0doing the\u00a0globbing then discarding the\u00a0result. Therefore, I will expand on using <code>read<\/code>:<\/p>\n<pre><code>read -r -d '' -a words &lt;&lt;&lt;\"${v}\"\r\nfor i in \"${words[@]}\"; do\r\n    echo \"${i}\"\r\ndone<\/code><\/pre>\n<p>While normally <code>read<\/code> is used to read from\u00a0files, we can use the\u00a0<em>here string<\/em> syntax of\u00a0bash to feed the\u00a0variable into it. The\u00a0<code>-r<\/code> option disables backslash escape processing that is undesired here. <code>-d ''<\/code> causes read to\u00a0process the\u00a0whole input and\u00a0not\u00a0stop at any\u00a0delimiter (like newline). <code>-a words<\/code> causes it to put the\u00a0split words into array <code>${words[@]}<\/code> \u2014 and\u00a0since we know how to safely iterate over an\u00a0array, the\u00a0underlying issue is solved.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bash has many subtle pitfalls, some of\u00a0them being able to live unnoticed for a\u00a0very long time. A\u00a0common example of\u00a0that kind of\u00a0pitfall is ubiquitous filename expansion, or\u00a0globbing. What many script writers forget about to notice is that practically anything that looks like a\u00a0pattern and\u00a0is not\u00a0quoted is\u00a0subject to globbing, including unquoted variables. There are two extra snags &hellip; <a href=\"https:\/\/blogs.gentoo.org\/mgorny\/2014\/09\/05\/bash-pitfalls-globbing-everywhere\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Bash pitfalls: globbing everywhere!&#8221;<\/span><\/a><\/p>\n","protected":false},"author":137,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[3],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/307"}],"collection":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/users\/137"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/comments?post=307"}],"version-history":[{"count":7,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/307\/revisions"}],"predecessor-version":[{"id":315,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/posts\/307\/revisions\/315"}],"wp:attachment":[{"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/media?parent=307"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/categories?post=307"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gentoo.org\/mgorny\/wp-json\/wp\/v2\/tags?post=307"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}