Unexpected carriage returns
Explaining a strange output from sed
Posted by Owen Stephens on December 16, 2019
This time, a quick blog post to show a gotcha that we encountered recently,
with sed
seemingly not working as we expected (unsurprising spoiler:
sed
is just fine, the problem was our input). We started with a file that was
something similar to the following (but with thousands of rows):
$ head -n3 example.csv
ref
AB0001
BC0002
We're working with a simple "CSV" file (which has no comma separators, since there was only a single column). Our aim was to generate a SQL query that was restricted to these refs, for example:
SELECT * FROM the_table WHERE ref IN
-- We wanted to generate the following line...
('AB0001','BC0002')
Our first thought was "we can use sed
to easily add quotes around these
refs", so we quickly ran through a simple example to test our sed
command:
$ echo -e 'abc\ndef' | sed "s/.*/'&'/"
'abc'
'def'
So far, so good. We used the &
replacement to reference the whole matched
portion (see the manual for details), and say "match any number of
any characters, and replace the match with itself, but surrounded by single
quotes".
Unexpected output
However, when we ran the csv file through our previously tested sed
program,
the output was not as we expected:
$ head -n3 example.csv | sed "s/.*/'&'/"
'ref
'AB0001
'BC0002
It appeared that the final single quote wasn't being printed. On seeing this,
we tried to check that our sed
replacement was working:
$ head -n3 example.csv | sed "s/.*/'%%%&'/"
'%%%ref
'%%%AB0001
'%%%BC0002
Hmm, we were still not seeing the trailing quote, but additional leading characters were being printed. We wondered what would happen if we tried adding some additional trailing characters (this time, by chance, we added characters outside the single quotes):
$ head -n3 example.csv | sed "s/.*/'&'%%%/"
'%%%
'%%%001
'%%%002
Aha! It's as if the trailing characters are overwriting the already-printed
line... That sounded to us like the behaviour of a carriage return (\r
), so
we checked if there were any, with file
:
$ file example.csv
example.csv: ASCII text, with CRLF line terminators
Bingo! We'd received this file from a colleague who uses Windows, and
we'd thus inherited their line-endings style. As a quick fix,
we ran dos2unix
to convert \r\n
line-endings into \n
, and tried our
original command again:
$ head -n3 example.csv | dos2unix | sed "s/.*/'&'/"
'ref'
'AB0001'
'BC0002'
Success; now all that remained was for us to:
- Remove the header row with:
tail -n+2
- Use
paste
to join lines with commas, as per here, using:paste -s -d,
- Surround in parentheses with:
sed 's/.*/(&)/'
this left us with:
$ dos2unix < example.csv | sed "s/.*/'&'/" | tail -n+2 | \
paste -s -d, | sed 's/.*/(&)/'
('AB0001','BC0002','CD0003','XY1337')
and we were done.
Why was this happening?
sed
operates by reading a line at a time, removing the
trailing newline character, applying the command(s) and then printing the
result with the newline character added back.
This means that when matching against a file with \r\n
line endings, sed
operates as follows:
- Read in a line:
AB0001\r\n
- Remove the newline character:
AB0001\r
- Apply the
s//
command:'AB0001\r'
- Write out the result, with a trailing newline character:
'AB0001\r'\n
Notice how in step 3 we added '
after the \r
. This means that when the
string is printed to the terminal, the second '
overwrites the first '
, as
the cursor is returned to the start of the line by the \r
character. To
demonstrate that this is what is happening, we can change the second single
quote to another character, and check that we only see that new character
(since the quote is being overwritten):
$ echo -e 'abc\r' | sed "s/.*/'&~/"
~abc
Notice that now the '
is overwritten by the ~
.
Bonus use of \r
Due to the "overwriting" behaviour of carriage return characters, they can be used to create simple progress bars, for interactive console applications. A small example in Ruby is:
puts "Progressing..."
21.times do |i|
bar = "#{"=" * i}#{" " * (20 - i)}"
counter = "#{i}/20"
print "\r[#{bar}]#{counter}"
sleep 0.1
end
puts "\nDone!"
This prints output to the terminal as:
which is neat, given a small amount of code. For something more production-ready that is based on the same approach under the hood, check out the ruby progressbar library.