Bash pitfalls: globbing everywhere!

Bash has many subtle pitfalls, some of them being able to live unnoticed for a very long time. A common example of that kind of pitfall is ubiquitous filename expansion, or globbing. What many script writers forget about to notice is that practically anything that looks like a pattern and is not quoted is subject to globbing, including unquoted variables.

There are two extra snags that add up to this. Firstly, many people forget that not only asterisks (*) and question marks (?) make up patterns — square brackets ([) do that as well. Secondly, by default bash (and POSIX shell) take failed expansions literally. That is, if your glob does not match any file, you may not even know that you are globbing.

It’s all just a matter of running in the proper directory for the result to change. Of course, it’s often unlikely — maybe even close to impossible. You can work towards preventing that by running in a safe directory. But in the end, writing predictable software is a fine quality.

How to notice mistakes?

Bash provides a two major facilities that could help you stop mistakes — shopts nullglob and failglob.

The nullglob option is a good choice for a default for your script. After enabling it, failing filename expansions result in no parameters rather than verbatim pattern itself. This has two important implications.

Firstly, it makes iterating over optional files easy:

for f in a/* b/* c/*; do
    some_magic "${f}"
done

Without nullglob, the above may actually return a/* if no file matches the pattern. For this reason, you would need to add an additional check for existence of file inside the loop. With nullglob, it will just ‘omit’ the unmatched arguments. In fact, if none of the patterns match the loop won’t be run even once.

Secondly, it turns every accidental glob into null. While this isn’t the most friendly warning and in fact it may have very undesired results, you’re more likely to notice that something is going wrong.

The failglob option is better if you can assume you don’t need to match files in its scope. In this case, bash treats every failing filename expansion as a fatal error and terminates execution with an appropriate message.

The main advantage of failglob is that it makes you aware of any mistake before someone hits it the hard way. Of course, assuming that it doesn’t accidentally expand into something already.

There is also a choice of noglob. However, I wouldn’t recommend it since it works around mistakes rather than fixing them, and makes the code rely on a non-standard environment.

Word splitting without globbing

One of the pitfalls I myself noticed lately is the attempt of using unquoted variable substitution to do word splitting. For example:

for i in ${v}; do
    echo "${i}"
done

At a first glance, everything looks fine. ${v} contains a whitespace-separated list of words and we iterate over each word. The pitfall here is that words in ${v} are subject to filename expansion. For example, if a lone asterisk would happen to be there (like v='10 * 4'), you’d actually get all files in the current directory. Unexpected, isn’t it?

I am aware of three solutions that can be used to accomplish word splitting without implicit globbing:

  1. setting shopt -s noglob locally,
  2. setting GLOBIGNORE='*' locally,
  3. using the swiss army knife of read to perform word splitting.

Personally, I dislike the first two since they require set-and-restore magic, and the latter also has the penalty of doing the globbing then discarding the result. Therefore, I will expand on using read:

read -r -d '' -a words <<<"${v}"
for i in "${words[@]}"; do
    echo "${i}"
done

While normally read is used to read from files, we can use the here string syntax of bash to feed the variable into it. The -r option disables backslash escape processing that is undesired here. -d '' causes read to process the whole input and not stop at any delimiter (like newline). -a words causes it to put the split words into array ${words[@]} — and since we know how to safely iterate over an array, the underlying issue is solved.