Get a grip on searching file contents with grep

Who doesn’t have to search files for specific content in them. At some point, when working with computers, you would like to find files containing specific text/data/string/content/information or whatever term you use. Linux users have always boasted of being able to use grep utility. Windows users have relied on finding files using easy user interface and Select-String cmdlet. With WSL2, you can use traditional linux utilities to assist with your work on Windows OS as well. Let’s see the few variants of grep to help us searching the information that we seek.

For the purpose of this blog post, we will be searching files on a Windows 10 machine.

Setup WSL2

For full instructions, refer to the official instructions for enabling WSL2. These are the instructions for those, who dont want to bother to read through whole document:

  • Verify if you are running on the correct version of Windows 10 by using winver. For x64 systems, it should be version 1903 or higher, with Build 18362 or higher. For ARM64 systems, it should be version 2004 or higher, with Build 19041 or higher.
  • Open a PowerShell window as admin and run below command. Restart when prompted.
Enable-WindowsOptionalFeature -Online -FeatureName VirtualMachinePlatform, Microsoft-Windows-Subsystem-Linux
  • After the reboot, setup WSL default to WSL2:
wsl --set-default-version 2

Install Linux Distro

You can now install Linux Distro of your choice by going to Windows Store from start panel and then install the same. For the purpose of this blog post, we’ll use Ubuntu 20.04 LTS. If you don’t want to use the Windows Store, then follow the steps in the WSL docs for manual install.

Once installed, now startup your distro. For first time, it will ask you to setup username and password like any other place. Go ahead and do the same. Let’s also go ahead and enable nopassword login for the group %sudo:

# Edit the sudoers with the visudo command
sudo visudo

# Change the %sudo group to be password-less
%sudo   ALL=(ALL:ALL) NOPASSWD: ALL

Also, lets update the package repos and distribution for distro. For Ubuntu, run below:

# Update the repositories and list of the packages available
sudo apt update
# Update the system based on the packages installed > the "-y" will approve the change automatically
sudo apt upgrade -y

grep is available out of the box in most linux distros, so you do not need to download and install it.

Search for given string in Single file

A very basic usage of grep involves searching specific string in a single file. This can simply be obtained by grep literal_string filename. The output will matches all the lines containing the string literal_string from the file specified with each new match present on a new line:

mohit@monhitgoyalco:/mnt/d/mohit/src/bash$ grep sum break-point.sh
# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum

This file break-point.sh just happens to be present in our current directory. We could have given full path (in unix-like format) and it would happily process it:

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum /mnt/d/mohit/src/bash/break-point.sh
# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum

Search for given string in Multiple files

This can be done by passing regular expressions for the filename. For example, if you want to find a string literal_string with files ending in .sh, you can use *.sh as second argument. If you want to search all files in a given directory, use the wildcard (*):

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum *.sh
break-point.sh:# echo the sum of values
break-point.sh:sum=$(($one_val + $two_val))
break-point.sh:echo $sum
ps4-variable.sh:# echo the sum of values
ps4-variable.sh:sum=$(($one_val + $two_val))
ps4-variable.sh:echo $sum

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum *
break-point.sh:# echo the sum of values
break-point.sh:sum=$(($one_val + $two_val))
break-point.sh:echo $sum
ps4-variable.sh:# echo the sum of values
ps4-variable.sh:sum=$(($one_val + $two_val))
ps4-variable.sh:echo $sum

As you may have noticed, the filename of each matching file is printed first in the output, before the line containing the matching string.

Not all the files through which you want to search may be that conveniently located. Of course, the shell doesn’t care how much pathname you type, so we could have done something like this:

grep sum ../d/*.sh ../d/mohit/*.sh ../*

Again, the second argument can be a more complex regular expressions. These regular expressions are not the same as the shell’s pattern matching, though they can look similar at times.

Perform Case Insensitive Search

Shell search using grep is case-sensitive, as its nature of underlying OS. However windows users are used to perform case-insensitive searches. One can perform case insensitive search through use of grep -i:

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -i sum break-point.sh
# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum
# Sum found

This option is particularly useful for finding words anywhere that you might have mixed-case text.

Getting just the filenames from Search

If you are not interested in output of all lines containing the matching pattern and are just interested in filenames, you can do so using grep -l:

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -l sum *.sh
break-point.sh
debug-trap.sh
file-globbing.sh
ps4-variable.sh
unset-var.sh

This can be quite useful for scripting as it allows us to use pipeline to output file names to do further processing. Put the grep command inside $() and those filenames can be used on the command line.

Do note that if grep finds more than one match per file, it still only prints the name once. If grep finds no matches, it gives no output.

Searching files recursively or all files in a given directory

You can search for all files in a given path or folder or directory, including sub-directories using grep -r. By default, grep search does not include searching in sub-directories.

Searching files for content not matching given string

When you want to display the lines which does not matches the given string, use the option grep -v as shown below. You can also do a case-insensitive invert search by mixing it with -i:

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -v -i sum break-point.sh
#!/bin/bash
trap read debug # enables debug trap
set -x # enables trace mode
one_val=10
two_val=5

set +x # disables trace mode
trap - debug # disables debug trap

Counting the number of matches

If you want to count number of matches, the given string is present in set of files given, you can use grep -c. You can mix and match with any of the other options like -i, -v etc.

# couting matches in single file
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -c sum break-point.sh
3

# counting matches in a set of files
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -c sum *.sh
break-point.sh:3
debug-trap.sh:4
dynamic-arguments.sh:0
file-globbing.sh:2
passing-arguments.sh:0
ps4-variable.sh:3
pstree.sh:0
script-parts.sh:0
set-verbose.sh:0
set-xtrace.sh:0
syntax-check.sh:0
unset-var.sh:6


# counting non-matches in a set of files
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -c -v sum *.sh
break-point.sh:9
debug-trap.sh:5
dynamic-arguments.sh:15
file-globbing.sh:8
passing-arguments.sh:24
ps4-variable.sh:6
pstree.sh:1
script-parts.sh:14
set-verbose.sh:7
set-xtrace.sh:9
syntax-check.sh:12
unset-var.sh:6

Show only the matched string (and skip full line)

By default grep will show the line which matches the given pattern/string, but if you want the grep to show out only the matched string of the pattern then use the grep -o option:

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -o sum break-point.sh
sum
sum
sum

It might not be that much useful when you give the string straight forward. But it becomes very useful when you give a regex pattern and trying to see what it matches.

Checking for full words (and not parts of it)

By default, grep searches for parts of words as well in the file contents. So a search for sum can match words like summary, summation, summer, summarizer, summersault, etc. which might not be what you intended. Its easy to ignore when you are manually searching but its useful when you are using scripts, as you need to make your scripts as precise as possible. For this, you can use grep -w:

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum break-point.sh
# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum
# summary summer summersault

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -w sum break-point.sh
# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum

Displaying lines before/after/around the match

When troubleshooting an issue, you need to often search log files and check for presence of messages like error/exception/failure/denied etc. It can be useful to show few lines before/after/around the matching lines. For this we can use -A n to show n lines after matching line, -B n to show n lines before matching line and -C n to show n lines both before and after search.

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ cat break-point.sh
#!/bin/bash
trap read debug # enables debug trap
set -x # enables trace mode
one_val=10
two_val=5

# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum
set +x # disables trace mode
trap - debug # disables debug trap
# Sum found

# summary summer summersault

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -w sum -A 2 break-point.sh
# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum
set +x # disables trace mode
trap - debug # disables debug trap

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -w sum -B 2 break-point.sh
two_val=5

# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -w sum -C 2 break-point.sh
two_val=5

# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum
set +x # disables trace mode
trap - debug # disables debug trap

By default, grep will remove any duplicate lines when displaying output, which is particularly useful in this case.

Searching Output from another Command

When writing a script, more often than not, you will be using pipeline to make your commands more useful. Sometimes, before acting on all input supplied by the previous command, you may want to filter it using grep. For this you just need to pipeline the output of command to grep.

For example, below command will filter output from ls to select a particular file and then use awk to get file name and then use rm to remove that file:

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ ls -al | grep break-point | rm -i $(awk '{print $9}')
rm: remove regular file 'break-point.sh'?

If you also want to have grep search through error messages that come from the previous command, be sure to redirect its error output into standard output before the pipe:

make somefile 2>&1 | grep -i error

This command attempts to compile some hypothetical piece of code. We redirect standard error into standard output (2>&1) before we proceed to pipe (|) the output into grep, where it will search case-insensitively (-i) looking for the string error.

Pare down Output from grep using grep

As grep accepts input data from another command, you can combine multiple grep commands in pipeline to reduce data that you want to see. For example, in below second command, we remove the lines matching word summary:

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum break-point.sh
# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum
# summary summer summersault

mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum break-point.sh | grep -v summary
# echo the sum of values
sum=$(($one_val + $two_val))
echo $sum

Searching for text patterns and not strings

You can not only search for literal strings, but also for patterns using regular expressions. A regular expression may be followed by one of several repetition operators:

  • ? The preceding item is optional and matched at most once.
  • * The preceding item will be matched zero or more times.
  • + The preceding item will be matched one or more times.
  • {n} The preceding item is matched exactly n times.
  • {n,} The preceding item is matched n or more times.
  • {,m} The preceding item is matched at most m times. This is a GNU extension.
  • {n,m} The preceding item is matched at least n times, but not more than m times.

The period . matches any single character. A set of characters enclosed in square brackets (e.g., [abc]) matches any one of those characters (e.g., “a” or “b” or “c”). If the first character inside the square brackets is a caret, then it matches any character that is not in that set. Many more combinations and meta characters are possible allow you to mix and match various combinations.

Not only this, there are also extended regular expressions. Below is taken verbatim from man pages for grep:

Pattern Syntax
       -E, --extended-regexp
              Interpret PATTERNS as extended regular expressions (EREs, see below).

       -F, --fixed-strings
              Interpret PATTERNS as fixed strings, not regular expressions.

       -G, --basic-regexp
              Interpret PATTERNS as basic regular expressions (BREs, see below).  This is the default.

       -P, --perl-regexp
              Interpret PATTERNS as Perl-compatible regular expressions (PCREs).  This option is experimental when combined with the -z (--null-data)  option,  and
              grep -P may warn of unimplemented features.

Perform grep on Compressed files

zgrep is simply a version of grep that can be used to search through various compressed and uncompressed file types (which types are understood varies from system to system). All the options that applies to the grep command also applies to the zgrep command.

grep is a large utility and not all options are covered here. If you would like to know more, go through its man pages.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s