How to process a file line by line in a Linux bash script

A terminal window on a Linux computer system.

It’s pretty easy to read the contents of a Linux text file line by line in a shell script, provided you run into some subtle issues. Here’s how to do it safely.

Files, text and idioms

Every programming language has a set of idioms. These are the standard, easy ways to accomplish a set of common tasks. They are the elementary or default way of using one of the characteristics of the language the programmer is working with. They become part of a mental blueprint programmer’s toolkit.

Actions such as reading data from files, working with loops, and exchanging the values ​​of two variables are good examples. The programmer will know at least one way to achieve his ends in a generic or basic way. Perhaps that is sufficient for the requirement at hand. Or maybe they will embellish the code to make it more efficient or applicable to the specific solution they are developing. But having the basic language at your fingertips is a great starting point.

Knowing and understanding idioms in a language also makes it easier to acquire a new programming language. Knowing how things are built in one language and looking for the equivalent, or the closest thing, in another language is a good way to appreciate the similarities and differences between the programming languages ​​you already know and the one you are learning.

Read lines from a file: one line

In Bash, you can use a while loop on the command line to read each line of text from a file and do something with it. Our text file is called “data.txt”. Contains a list of the months of the year.

January
February
March
.
.
October
November
December

Our single line single is:

while read line; do echo $line; done < data.txt

The while loop reads a line from the file, and the flow of execution of the small program passes into the body of the loop. The echo command writes the line of text in the terminal window. The read attempt fails when there are no more lines to read and the loop is finished.

A nice trick is the ability to redirect a file into a loop. In other programming languages, you would need to open the file, read it, and close it again when you are done. With Bash, you can simply use file redirection and let the shell handle all that low-level stuff for you.

Of course, this phrase is not very useful. Linux already provides cat command, which does exactly that for us. We have created a long way to replace a three letter command. But it visibly demonstrates the principles of reading a file.

That works quite well, up to a point. Suppose we have another text file that contains the names of the months. In this file, the escape sequence for a new line character has been added to each line. We will call it “data2.txt”.

Januaryn
Februaryn
Marchn
.
.
Octobern
Novembern
Decembern

Let’s use our one-liner in our new file.

while read line; do echo $line; done < data2.txt

The backslash escape character ” “It has been discarded. The result is that an “n” has been added to each line. Bash is interpreting the backslash as the start of a escape sequence. Often times, we don’t want Bash to interpret what it is reading. It may be more convenient to read a line in its entirety (backslash escape sequences and all) and choose what to parse or replace yourself, within your own code.

If we want to do any meaningful processing or analysis of the lines of text, we will need to use a script.

Read lines from a file with a script

Here is our script. It’s called “script1.sh”.

#!/bin/bash

Counter=0

while IFS='' read -r LinefromFile || [[ -n "${LinefromFile}" ]]; do

    ((Counter++))
    echo "Accessing line $Counter: ${LinefromFile}"

done < "$1"

We establish a variable called Counter to zero, then we define our while circle.

The first statement in the while line is IFS='' . IFS means internal field separator. Contains values ​​that Bash uses to identify word boundaries. By default, the read command removes leading and trailing blanks. If we want to read the lines from the file exactly as they are, we need to configure IFS be an empty string.

We could set this once out of the loop, just like we set the value of Counter . But with more complex scripts, especially those with a lot of user-defined functions in them, it is possible that IFS it could be set to different values ​​elsewhere in the script. Ensuring that IFS is set to an empty string each time the while The iterating loop ensures that we know what its behavior will be.

We are going to read a line of text in a variable called LinefromFile . We are using the -r (read backslash as a normal character) to ignore backslashes. They will be treated like any other character and will not receive any special treatment.

There are two conditions that will satisfy the while loop and allows the text to be processed by the body of the loop:

  • read -r LinefromFile : When a line of text is correctly read from the file, the read The command sends a success signal to the while , and the while loop passes the flow of execution to the body of the loop. Note that the read The command needs to see a newline character at the end of the text line to consider it a successful read. If the file is not a POSIX compatible text file, the the last line may not include a new line character. If he read command see the end of file marker (EOF) before the line ends with a new line, no treat it as a successful read. If that happens, the last line of text will not be passed into the body of the loop and will not be processed.
  • [ -n "${LinefromFile}" ] : We need to do some additional work to handle non-POSIX compliant files. This comparison checks the text that is read from the file. If it is not terminated with a newline character, this comparison will still return success to the while circle. This ensures that any trailing line fragments are processed by the body of the loop.

These two clauses are separated by the logical operator OR ” || “So that if either The clause returns success, the text retrieved is processed by the body of the loop, whether there is a newline character or not.

In the body of our loop, we are increasing the Counter variable by one and using echo to send some output to the terminal window. The line number and text for each line are displayed.

We can still use our redirect trick to redirect a file into a loop. In this case, we are redirecting $ 1, a variable that contains the name of the first command line parameter that you passed to the script. With this trick, we can easily pass the name of the data file that we want the script to work on.

Copy and paste the script into an editor and save it with the file name “script1.sh”. Use the chmod command to make it executable.

chmod +x script1.sh

Let’s see what our script does with the data2.txt text file and the backslashes it contains.

./script1.sh data2.txt

Each character on the line is displayed literally. Backslashes are not interpreted as escape characters. They are printed as regular characters.

Pass the line to a function

We are still echoing the text on the screen. In a real world programming scenario, we would probably be about to do something more interesting with the line of text. In most cases, it is good programming practice to handle the post-processing of the line in another function.

This is how we could do it. This is “script2.sh”.

#!/bin/bash

Counter=0

function process_line() {

    echo "Processing line $Counter: $1"

}

while IFS='' read -r LinefromFile || [[ -n "${LinefromFile}" ]]; do

    ((Counter++))
    process_line "$LinefromFile"

done < "$1"

We define our Counter variable as before, and then we define a function called process_line() . The definition of a function should appear before the function is called first in the script.

Our function will be passed the line of text just read in each iteration of the while circle. We can access that value within the function using the $1 variable. If two variables had been passed to the function, we could access those values ​​using $1 Y $2 and so on for more variables.

The While loop is mostly the same. There is only one change within the body of the loop. The echo The line has been replaced by a call to process_line() function. Note that you do not need to use the brackets “()” in the function name when you are calling it.

The name of the variable that contains the text line, LinefromFile , is enclosed in quotes when passed to the function. This suits lines that have spaces in them. Without the quotes, the first word is treated as $1 by the function, the second word is considered $2 , and so. Using quotation marks ensures that the entire line of text is handled, collectively, as $1. Please note that this is no the same $1 A containing the same data file passed to the script.

Because Counter has been declared in the main body of the script and not within a function, it can be referenced within the process_line() function.

Copy or write the above script in an editor and save it with the file name “script2.sh”. Make it executable with chmod :

chmod +x script2.sh

Now we can run it and pass in a new data file, “data3.txt”. This has a list of the months and a line with many words.

January
February
March
.
.
October
November nMore text "at the end of the line"
December

Our command is:

./script2.sh data3.txt

The lines are read from the file and passed one by one to the process_line() function. All lines display correctly, including the one with the backspace, quotation marks, and multiple words.

Building blocks are useful

There is a line of thought that says that a language must contain something unique to that language. That is not a belief that I subscribe to. The important thing is that it makes good use of the language, is easy to remember, and provides a reliable and robust way to implement some functions in your code.

Leave a Reply