The
grep command can be viewed as a simplified or specialized database query, where the database consists of plain text files and each line represents a record. The
grep command is used to retrieve those lines (records) from a file that match the
regular expression specified as part of the command.
Let's say you have a list of people specified by their first, middle, and last names, and you want to find all individuals that have first name "Elvis" and last name "Travolta" and any middle name. For this task you could use the following regular expression as search string:
Elvis .* Travolta
The period matches any character, and the '*' (star) means: match the preceding character (in this case the '.') as many times as necessary to make the regular expression match the line. If the star follows an expression enclosed in parentheses, that expression is matched as many times as necessary.
To illustrate the use of this regular expression as part of a grep command, let's assume the name of the file containing the list is guests.txt. Then the grep command would look like this:
grep 'Elvis .* Travolta' guests.txt
The general syntax of the grep command is
grep flags regular-expression file-name
If you are only interested in the number of lines that match the specified regular expressions, you can use the
-c flag. For example,
grep -c 'Elvis .* Travolta' guests.txt
This would tell you, how many Elvis Travoltas are on the guest list.
With the
-v flag you effectively query for the complement, that is, all lines that do
not match the specified regular expression. For example,
grep -v 'Elvis .* Travolta' guests.txt
You can combine flags (also called "options") by listing all the option letters after the dash as in this example:
grep -vc 'Elvis .* Travolta' guests.txt
Frequently you don't know which of the letters of the words you are searching for are in upper case. With the "-i" flag you can make your query case insensitive, as in this example:
grep -i 'july.*2003' meetings.txt
You can apply your search to multiple files using wild card characters in the file name specification. For example the query
grep -i 'july.*2003' class*.txt
will find all lines (in the current directory) that match the query string in all files whose name starts with "class" and ends with ".txt". The file names will be added at the front of lines being printed to the output.
In order to include all subdirectories in the search you can add recursion with the "-r" flag:
grep -ri 'july.*2003' class*.txt
It is also possible to explicitly exclude groups of files from the search. The following example applies the search to all files in the currently directory and its subdirectories, except for files with the extension "doc":
grep -ri 'july.*2003' --exclude="\.doc" *
The "-l" option (lower case "L") will only list the names of the files that contain a line matching the query. The "-L" flag only lists the names of the files, that do
not contain such a line.
Some versions of grep, such as egrep, can process regular expressions with disjunctions (the logical 'or'). For example,
egrep 'Br(ow|au)n' guests.txt
will retrieve any lines that contain either "Brown" or "Braun". The vertical bar '|' and parentheses are used to list the alternative substrings.
The command
sed is frequently used in combination with
grep , as it allows the modification of the selected lines.