1. Tech

Your suggestion is on its way!

An email with a link to:


was emailed to:

Thanks for sharing About.com with others!

7.1 Linux Advanced Text Processing Tools
Learn advanced Linux commands
 Related Resources
• Linux Newbie Administrator Guide
• 0. Linux Benefit
• 1. Before Installation
• 2. Linux Resources/Help
• 3. Basic Operations FAQ
• 4. Newbie Admin FAQ
• ~ 4.1 Lilo
• ~ 4.2 Drives
• ~ 4.3 X-Windows
• ~ 4.4 Configurations
• ~ 4.5 Networking
5. Shortcuts / Commands
• 6. Linux Applications
• 7. Learn Linux Commands
• A. How to Upgrade Kernel?

Regular expressions (regexpr)

Regular experessions are used for "pattern" matching in search, replace, etc. They are often used with utilities (e.g., grep, sed) and programming languages (e.g., perl). The shell command dir, uses a slightly modifed flavour of regular expressions (the two main differences are noted below). This brief writeup includes almost all the features of standard regular expression--regexpressions are not as complicated as they might seem at first. Definitely worth a closer look at.

In regular expressions, most characters just match themselves. So to search for string "peter", I would just use a searchstring "peter". The exceptions are so-called "special characters" ("metacharacters"), which have special meaning.

The regexpr special characters are:  "\" (backslash), "." (dot), "*" (asterisk), "[" (bracket), "^" (caret, special only at the beginnig of a string), "$" (dollar sign, special only at the end of a string). A character terminating a pattern string is also special for this string.

The backslash, "\" is used as an "escape" character, i.e., to quote a subsequent special character.
Thus, "\\" searches for a backslash, "\." searches for a dot, "\*" searches for the asterisk, "\[" searches for the bracket, "\^" searches for the caret even at the begining of the string, "\$" searches for the dollar sign even at the end of the string.

Backslash followed by a regular (non-special) character may gain a special meaning. Thus, the symbols \<  and  \>  match  an  empty string at the beginning and the end of a word, respectively.  The symbol  \b  matches  the empty  string  at the edge of a word, and \B matches the empty string provided it's not at the edge of a word.
The dot, ".", matches any single character. [The dir command uses "?" in this place.] Thus, "m.a" matches "mpa" and "mea" but not "ma" or "mppa".

Any string is matched by ".*" (dot and asterisk). [The dir command uses "*" instead.]  In general, any  pattern followed by "*" matches zero or more occurences of this pattern. Thus, "m*" matches zero or more occurances of "m". To search for one or more "m", I could use "mm*".

The * is a repetition operator. Other repetition operators are used less often--here is the full list:
*       the proceding item is to be matched zero or more times;

\+      the preceding item is to be matched one or more times;

\?      the preceding item is  optional and matched at most once;

\{n}    the preceding item is to be matched exactly n times;

\{n,}   the preceding item is to be matched n or more times;

\{n,m}  the  preceding item is to be matched at least n times, but not more than m times.
The caret, "^", means "the beginning of the line". So "^a" means "find a line starting with an "a".

The dollar sign, "$", means "the end of the line". So "a$" means "find a line ending with an "a".

Example. This  command searches the file myfile for lines starting with an "s" and ending with an "n", and prints them to the standard output (screen):

cat myfile | grep '^s.*n$'

Any character terminating the pattern string is special, precede it with a backslash if you want to use it within this string.

The bracket, "[" introduces a set.  Thus [abD] means: either a or b or D. [a-zA-C] means any character from a to z or from A to C.

Attention with some characters inside sets. Within a set, the only special characters are "[", "]", "-", and "^", and the combinations "[:", "[=", and "[.". The backslash is not special within a set.

Useful categories of characters are (as definded by the POSIX standard): [:upper:]  =upper-case letters, [:lower:] =lower-case letters, [:alpha:]  =alphabetic (letters) meaning upper+lower, [:digit:] =0 to 9, [:alnum:] =alphanumeric meaning alpha+digits, [:space:] =whitespace meaning <Space>+<Tab>+<Newline> and similar, [:graph:] =graphically printable characters except space, [:print:] =printable characters including space, [:punct:] =punctuation characters meaning graphical characters minus alpha and digits, [:cntrl:] =control characters meaning non-printable characters, [:xdigit:] = characters that are hexadecimal digits.

Example. This command scans the output of the dir command, and prints lines containing a capital letter followed by a digit:

dir -l | grep '[[:upper:]][[:digit:]]'

Next > Back to "Learn Linux Commands"

Can't find what you are looking for?
Search the

Stay up-to-date!
Subscribe to the Linux free newsletter.

©2015 About.com. All rights reserved.