Who doesn’t have to search files for specific content in them. At some point, when working with computers, you would like to find files containing specific text/data/string/content/information or whatever term you use. Linux users have always boasted of being able to use grep
utility. Windows users have relied on finding files using easy user interface and Select-String
cmdlet. With WSL2, you can use traditional linux utilities to assist with your work on Windows OS as well. Let’s see the few variants of grep to help us searching the information that we seek.
For the purpose of this blog post, we will be searching files on a Windows 10 machine.
Setup WSL2
For full instructions, refer to the official instructions for enabling WSL2. These are the instructions for those, who dont want to bother to read through whole document:
- Verify if you are running on the correct version of Windows 10 by using winver. For x64 systems, it should be version 1903 or higher, with Build 18362 or higher. For ARM64 systems, it should be version 2004 or higher, with Build 19041 or higher.
- Open a PowerShell window as admin and run below command. Restart when prompted.
Enable-WindowsOptionalFeature -Online -FeatureName VirtualMachinePlatform, Microsoft-Windows-Subsystem-Linux
- After the reboot, setup WSL default to WSL2:
wsl --set-default-version 2
Install Linux Distro
You can now install Linux Distro of your choice by going to Windows Store from start panel and then install the same. For the purpose of this blog post, we’ll use Ubuntu 20.04 LTS. If you don’t want to use the Windows Store, then follow the steps in the WSL docs for manual install.
Once installed, now startup your distro. For first time, it will ask you to setup username and password like any other place. Go ahead and do the same. Let’s also go ahead and enable nopassword login for the group %sudo:
# Edit the sudoers with the visudo command sudo visudo # Change the %sudo group to be password-less %sudo ALL=(ALL:ALL) NOPASSWD: ALL
Also, lets update the package repos and distribution for distro. For Ubuntu, run below:
# Update the repositories and list of the packages available sudo apt update # Update the system based on the packages installed > the "-y" will approve the change automatically sudo apt upgrade -y
grep is available out of the box in most linux distros, so you do not need to download and install it.
Search for given string in Single file
A very basic usage of grep involves searching specific string in a single file. This can simply be obtained by grep literal_string filename
. The output will matches all the lines containing the string literal_string
from the file specified with each new match present on a new line:
mohit@monhitgoyalco:/mnt/d/mohit/src/bash$ grep sum break-point.sh # echo the sum of values sum=$(($one_val + $two_val)) echo $sum
This file break-point.sh just happens to be present in our current directory. We could have given full path (in unix-like format) and it would happily process it:
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum /mnt/d/mohit/src/bash/break-point.sh # echo the sum of values sum=$(($one_val + $two_val)) echo $sum
Search for given string in Multiple files
This can be done by passing regular expressions for the filename. For example, if you want to find a string literal_string
with files ending in .sh
, you can use *.sh
as second argument. If you want to search all files in a given directory, use the wildcard (*
):
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum *.sh break-point.sh:# echo the sum of values break-point.sh:sum=$(($one_val + $two_val)) break-point.sh:echo $sum ps4-variable.sh:# echo the sum of values ps4-variable.sh:sum=$(($one_val + $two_val)) ps4-variable.sh:echo $sum mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum * break-point.sh:# echo the sum of values break-point.sh:sum=$(($one_val + $two_val)) break-point.sh:echo $sum ps4-variable.sh:# echo the sum of values ps4-variable.sh:sum=$(($one_val + $two_val)) ps4-variable.sh:echo $sum
As you may have noticed, the filename of each matching file is printed first in the output, before the line containing the matching string.
Not all the files through which you want to search may be that conveniently located. Of course, the shell doesn’t care how much pathname you type, so we could have done something like this:
grep sum ../d/*.sh ../d/mohit/*.sh ../*
Again, the second argument can be a more complex regular expressions. These regular expressions are not the same as the shell’s pattern matching, though they can look similar at times.
Perform Case Insensitive Search
Shell search using grep is case-sensitive, as its nature of underlying OS. However windows users are used to perform case-insensitive searches. One can perform case insensitive search through use of grep -i
:
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -i sum break-point.sh # echo the sum of values sum=$(($one_val + $two_val)) echo $sum # Sum found
This option is particularly useful for finding words anywhere that you might have mixed-case text.
Getting just the filenames from Search
If you are not interested in output of all lines containing the matching pattern and are just interested in filenames, you can do so using grep -l
:
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -l sum *.sh break-point.sh debug-trap.sh file-globbing.sh ps4-variable.sh unset-var.sh
This can be quite useful for scripting as it allows us to use pipeline to output file names to do further processing. Put the grep command inside $()
and those filenames can be used on the command line.
Do note that if grep finds more than one match per file, it still only prints the name once. If grep finds no matches, it gives no output.
Searching files recursively or all files in a given directory
You can search for all files in a given path or folder or directory, including sub-directories using grep -r
. By default, grep search does not include searching in sub-directories.
Searching files for content not matching given string
When you want to display the lines which does not matches the given string, use the option grep -v
as shown below. You can also do a case-insensitive invert search by mixing it with -i
:
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -v -i sum break-point.sh #!/bin/bash trap read debug # enables debug trap set -x # enables trace mode one_val=10 two_val=5 set +x # disables trace mode trap - debug # disables debug trap
Counting the number of matches
If you want to count number of matches, the given string is present in set of files given, you can use grep -c
. You can mix and match with any of the other options like -i
, -v
etc.
# couting matches in single file mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -c sum break-point.sh 3 # counting matches in a set of files mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -c sum *.sh break-point.sh:3 debug-trap.sh:4 dynamic-arguments.sh:0 file-globbing.sh:2 passing-arguments.sh:0 ps4-variable.sh:3 pstree.sh:0 script-parts.sh:0 set-verbose.sh:0 set-xtrace.sh:0 syntax-check.sh:0 unset-var.sh:6 # counting non-matches in a set of files mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -c -v sum *.sh break-point.sh:9 debug-trap.sh:5 dynamic-arguments.sh:15 file-globbing.sh:8 passing-arguments.sh:24 ps4-variable.sh:6 pstree.sh:1 script-parts.sh:14 set-verbose.sh:7 set-xtrace.sh:9 syntax-check.sh:12 unset-var.sh:6
Show only the matched string (and skip full line)
By default grep will show the line which matches the given pattern/string, but if you want the grep to show out only the matched string of the pattern then use the grep -o
option:
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -o sum break-point.sh sum sum sum
It might not be that much useful when you give the string straight forward. But it becomes very useful when you give a regex pattern and trying to see what it matches.
Checking for full words (and not parts of it)
By default, grep searches for parts of words as well in the file contents. So a search for sum can match words like summary, summation, summer, summarizer, summersault, etc. which might not be what you intended. Its easy to ignore when you are manually searching but its useful when you are using scripts, as you need to make your scripts as precise as possible. For this, you can use grep -w
:
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum break-point.sh # echo the sum of values sum=$(($one_val + $two_val)) echo $sum # summary summer summersault mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -w sum break-point.sh # echo the sum of values sum=$(($one_val + $two_val)) echo $sum
Displaying lines before/after/around the match
When troubleshooting an issue, you need to often search log files and check for presence of messages like error/exception/failure/denied etc. It can be useful to show few lines before/after/around the matching lines. For this we can use -A n
to show n lines after matching line, -B n
to show n lines before matching line and -C n
to show n lines both before and after search.
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ cat break-point.sh #!/bin/bash trap read debug # enables debug trap set -x # enables trace mode one_val=10 two_val=5 # echo the sum of values sum=$(($one_val + $two_val)) echo $sum set +x # disables trace mode trap - debug # disables debug trap # Sum found # summary summer summersault mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -w sum -A 2 break-point.sh # echo the sum of values sum=$(($one_val + $two_val)) echo $sum set +x # disables trace mode trap - debug # disables debug trap mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -w sum -B 2 break-point.sh two_val=5 # echo the sum of values sum=$(($one_val + $two_val)) echo $sum mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep -w sum -C 2 break-point.sh two_val=5 # echo the sum of values sum=$(($one_val + $two_val)) echo $sum set +x # disables trace mode trap - debug # disables debug trap
By default, grep will remove any duplicate lines when displaying output, which is particularly useful in this case.
Searching Output from another Command
When writing a script, more often than not, you will be using pipeline to make your commands more useful. Sometimes, before acting on all input supplied by the previous command, you may want to filter it using grep. For this you just need to pipeline the output of command to grep.
For example, below command will filter output from ls
to select a particular file and then use awk
to get file name and then use rm
to remove that file:
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ ls -al | grep break-point | rm -i $(awk '{print $9}') rm: remove regular file 'break-point.sh'?
If you also want to have grep search through error messages that come from the previous command, be sure to redirect its error output into standard output before the pipe:
make somefile 2>&1 | grep -i error
This command attempts to compile some hypothetical piece of code. We redirect standard error into standard output (2>&1
) before we proceed to pipe (|
) the output into grep, where it will search case-insensitively (-i
) looking for the string error
.
Pare down Output from grep using grep
As grep accepts input data from another command, you can combine multiple grep commands in pipeline to reduce data that you want to see. For example, in below second command, we remove the lines matching word summary
:
mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum break-point.sh # echo the sum of values sum=$(($one_val + $two_val)) echo $sum # summary summer summersault mohit@mohitgoyalco:/mnt/d/mohit/src/bash$ grep sum break-point.sh | grep -v summary # echo the sum of values sum=$(($one_val + $two_val)) echo $sum
Searching for text patterns and not strings
You can not only search for literal strings, but also for patterns using regular expressions. A regular expression may be followed by one of several repetition operators:
- ? The preceding item is optional and matched at most once.
- * The preceding item will be matched zero or more times.
- + The preceding item will be matched one or more times.
- {n} The preceding item is matched exactly n times.
- {n,} The preceding item is matched n or more times.
- {,m} The preceding item is matched at most m times. This is a GNU extension.
- {n,m} The preceding item is matched at least n times, but not more than m times.
The period . matches any single character. A set of characters enclosed in square brackets (e.g., [abc]
) matches any one of those characters (e.g., “a” or “b” or “c”). If the first character inside the square brackets is a caret, then it matches any character that is not in that set. Many more combinations and meta characters are possible allow you to mix and match various combinations.
Not only this, there are also extended regular expressions. Below is taken verbatim from man pages for grep:
Pattern Syntax -E, --extended-regexp Interpret PATTERNS as extended regular expressions (EREs, see below). -F, --fixed-strings Interpret PATTERNS as fixed strings, not regular expressions. -G, --basic-regexp Interpret PATTERNS as basic regular expressions (BREs, see below). This is the default. -P, --perl-regexp Interpret PATTERNS as Perl-compatible regular expressions (PCREs). This option is experimental when combined with the -z (--null-data) option, and grep -P may warn of unimplemented features.
Perform grep on Compressed files
zgrep
is simply a version of grep that can be used to search through various compressed and uncompressed file types (which types are understood varies from system to system). All the options that applies to the grep command also applies to the zgrep command.
grep is a large utility and not all options are covered here. If you would like to know more, go through its man pages.