Windows.  Viruses.  Notebooks.  Internet.  office.  Utilities.  Drivers

Good time, guests!

In today's article I want to touch on such a huge topic as Regular Expressions. I think everyone knows that the topic of regexes (as regular expressions are called in slang) is immense in the volume of one post. Therefore, I will try to briefly, but as clearly as possible, gather my thoughts together and convey them to you in.

To begin with, there are several varieties of regular expressions:

1. Traditional regular expressions(they are basic, basic and basic regular expressions(BRE))

  • the syntax of these expressions is defined as deprecated, but nevertheless is still widely used and is used by many UNIX utilities
  • Basic regular expressions include the following metacharacters (more on their meanings below):
    • \( \) - original for ( ) (extended)
    • \(\) - original for () (extended)
    • \n, Where n- number from 1 to 9
  • Features of using these metacharacters:
    • The asterisk must come after the expression that matches the single character. Example: *.
    • Expression \( block\)* should be considered illegal. In some cases, it matches zero or more repetitions of a string block. In others, it matches the string block* .
    • Within a character class, special character values ​​are generally ignored. Special cases:
    • To add a ^ character to a set, it must not be placed there first.
    • To add a character - to a set, it must be placed first or last there. For example:
      • DNS name template, which can include letters, numbers, minus and delimiter dot: [-0-9a-zA-Z.] ;
      • any character except minus and digit: [^-0-9] .
    • To add a [ or ] character to a set, it must be placed there first. For example:
      • matches ] , [ , a or b .

2. Extended regular expressions(they are extended regular expressions(ERE))

  • The syntax of these expressions is similar to the syntax of basic expressions, except:
    • Removed the use of backslashes for metacharacters ( ) and () .
    • A backslash before a metacharacter cancels its special meaning.
    • Rejected theoretically irregular construction \ n .
    • Added metacharacters + , ? , | .

3. Perl Compatible Regular Expressions(they are Perl-compatible regular expressions(PCRE))

  • have a richer yet predictable syntax than even the POSIX ERE, and are therefore often used by applications.

Regular Expressions consist of patterns, or rather set a template search. The template consists from rules searches, which are made up of characters And metacharacters.

Search rules determined by the following operations:

Enumeration |

Vertical bar (|) separates the valid options, we can say - logical OR. For example, "gray|grey" matches gray or gray.

grouping or union()

Round brackets are used to determine the scope and precedence of operators. For example, "gray|grey" and "gr(a|e)y" are different patterns, but they both describe a set containing gray And gray.

Quantify() ? * +

Quantifier after a character or group determines how many times previous expression may occur.

general expression, repetitions can be from m to n inclusive.

general expression, m or more repetitions.

general expression, no more than n repetitions.

smoothn repetitions.

Question mark means 0 or 1 times, the same as {0,1} . For example, "colou?r" matches and color, And color.

Star means 0, 1 or any number once ( {0,} ). For example, "go*gle" matches ggle, google, google and etc.

Plus means at least 1 once ( {1,} ). For example, "go+gle" matches google, google etc. (but not ggle).

The exact syntax for these regular expressions is implementation dependent. (i.e. in basic regular expressions symbols ( And )- escaped with a backslash)

Metacharacters, saying plain language are symbols that do not correspond to their real meaning, that is, a symbol. (dot) is not a dot, but any one character, etc. I ask you to familiarize yourself with the metacharacters and their meanings:

. corresponds alone any character
[something] Corresponds any individual character from among those enclosed in brackets. In this case: The character "-" is interpreted literally only if it is located immediately after the opening or before the closing bracket: or [-abc]. Otherwise, it denotes a character interval. For example, matches "a", "b", or "c". corresponds to letters of the lower case of the Latin alphabet. These notations can also be combined: matches a, b, c, q, r, s, t, u, v, w, x, y, z. To match the characters "[" or "]", it is enough that the closing bracket was the first character after the opening character: matches "]", "[", "a", or "b". single character from among those which are not in brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any character except lower case characters in the Latin alphabet.
^ Matches the beginning of text (or the beginning of any line if the mode is line-by-line).
$ Matches the end of the text (or the end of any line if the mode is inline).
\(\) or () Declares a "marked subexpression" (grouped expression) that can be used later (see next element: \ n). A "marked subexpression" is also a "block". Unlike the other operators, this one (in the traditional syntax) requires a backslash, in the extended and Perl syntax, the \ - character is not needed.
\n Where n is a number from 1 to 9; corresponds n th marked subexpression (e.g. (abcd)\0, i.e. abcd characters are marked with zero). This design is theoretical irregular, it was not accepted in extended regular expression syntax.
*
  • Star after an expression that matches a single character, matches zero or more copies this (preceding) expression. For example, "*" matches the empty string, "x", "y", "zx", "zyx", etc.
  • \n*, Where n is a digit from 1 to 9, matches zero or more occurrences to match n-th marked subexpression. For example, "\(a.\)c\1*" matches "abcab" and "abcaba" but not "abcac".

An expression enclosed in "\(" and "\)" and followed by "*" should be considered invalid. In some cases, it matches zero or more occurrences of the parenthesized string. In others, it matches the parenthesized expression, given the "*" character.

\{x,y\} Corresponds to the last ( forthcoming) to a block occurring at least x and no more y once. For example, "a\(3,5\)" matches "aaa", "aaaa", or "aaaaa". Unlike the other operators, this one (in the traditional syntax) requires a backslash.
.* Denoting any number of any characters between two parts of a regular expression.

Metacharacters help us to use various correspondences. But how can a metacharacter be represented by an ordinary character, that is, the character [ (square bracket) by the value of a square bracket? Just:

  • must be preliminarily shield) metacharacter (. * + \ ? ( )) followed by a backslash. For example \. or \[

To simplify the task of some character sets, they were combined into the so-called. character classes and categories. POSIX has standardized the declaration of certain character classes and categories, as shown in the following table:

POSIX class likewise designation
[:upper:] upper case characters
[:lower:] lower case characters
[:alpha:] uppercase and lowercase characters
[:alnum:] numbers, upper and lower case characters
[:digit:] numbers
[:xdigit:] hexadecimal digits
[:point:] [.,!?:…] punctuation marks
[:blank:] [\t] space and TAB
[:space:] [\t\n\r\f\v] skip characters
[:cntrl:] control symbols
[:graph:] [^ \t\n\r\f\v] seal symbols
[:print:] [^\t\n\r\f\v] print characters and skip characters

In regex there is such a thing as:

greed regex

I will try to describe as clearly as possible. Let's say we want to find all HTML tags in some text. Having localized the problem, we want to find the values ​​between< и >, along with those parentheses. But we know that tags have different lengths and there are at least 50 tags themselves. Listing them all, enclosing them in metacharacters, is too laborious a task. But we know that we have an expression.* (asterisk dot) characterizing any number of any characters in a string. Using this expression, we will try to find in the text (

So, How to create RAID level 10/50 on the LSI MegaRAID controller (also relevant for: Intel SRCU42x, Intel SRCS16):

) all values ​​between< и >. As a result, the ENTIRE string will match this expression. why, because the regex is greedy and tries to capture ANY ALL the number of characters between< и >, respectively, the entire line, starting < p>So... and ending ...> will belong to this rule!

I hope the example makes it clear what greed is. To get rid of this greed, you can go the following way:

  • consider symbols, Not matching the desired pattern (for example:<[^>]*> for the above case)
  • get rid of greed by adding a quantifier definition as non-greedy:
    • *? - "not greedy" ("lazy") equivalent *
    • +? - "not greedy" ("lazy") equivalent +
    • (n)? - "not greedy" ("lazy") equivalent of (n,)
    • .*? - "non-greedy" ("lazy") equivalent.*

I would like to add all of the above. extended regular expression syntax:

Regular expressions in POSIX are similar to the traditional Unix syntax, but with the addition of some metacharacters:

Plus indicates that previous symbol or group may repeat one or more times. Unlike an asterisk, at least one repetition is required.

Question mark does previous character or optional group. In other words, in the corresponding line it may be absent or present smooth one once.

vertical bar shares alternative options regular expressions. One character specifies two alternatives, but there may be more, it is enough to use more vertical lines. It must be remembered that this operator uses the maximum possible part of the expression. For this reason, the alternative operator is most often used inside parentheses.

The use of backslashes has also been deprecated: \(…\) becomes (…) and \(…\) becomes (…).

At the end of the post, here are some examples of using regex:

$ cat text1 1 apple 2 pear 3 banana $ grep p text1 1 apple 2 pear $ grep "pp*" text1 1 apple 2 pear $ cat text1 | grep "l\|n" 1 apple 3 banana $ echo -e "find an\n* here" | grep "\*" * here $ grep "pl\?.*r" text1 # p, on lines with r 2 pear $ grep "a.." text1 # lines with a followed by at least 2 characters 1 apple 3 banana $ grep "" text1 # search for lines containing 3 or p 1 apple 2 pear 3 banana $ echo -e "find an\n* here\nsomewhere." | grep "[.*]" * here somewhere..name]$ echo -e "123\n456\n789\n0" | grep "" 123 456 789 $ sed -e "/\(a.*a\)\|\(p.*p\)/s/a/A/g" text1 # replace a with A in all lines where after a comes a or after p comes p 1 Apple 2 pear 3 bAnAnA *\./ LAST WORD./g" First. A LAST WORD. This is a LAST WORD.

Sincerely, Mc.Sim!

In order to fully process texts in bash scripts with sed and awk, you just need to understand regular expressions. Implementations of this most useful tool can be found literally everywhere, and although all regular expressions are arranged in a similar way, based on the same ideas, working with them has certain features in different environments. Here we will talk about regular expressions that are suitable for use in Linux command line scripts.

This material is intended as an introduction to regular expressions for those who may not know what regular expressions are. Therefore, let's start from the very beginning.

What are regular expressions

For many, when they first see regular expressions, the thought immediately arises that they have a meaningless jumble of characters in front of them. But this, of course, is far from the case. Take a look at this regex for example


In our opinion, even an absolute beginner will immediately understand how it works and why it is needed :) If you don’t quite understand, just read on and everything will fall into place.
A regular expression is a pattern that programs like sed or awk use to filter text. Templates use regular ASCII characters that represent themselves, and so-called metacharacters that play a special role, for example, allowing you to refer to certain groups of characters.

Regular expression types

Implementations of regular expressions in various environments, such as programming languages ​​such as Java, Perl, and Python, Linux tools like sed, awk, and grep have certain specialties. These features depend on the so-called regular expression processing engines, which deal with the interpretation of patterns.
Linux has two regular expression engines:
  • An engine that supports the POSIX Basic Regular Expression (BRE) standard.
  • An engine that supports the POSIX Extended Regular Expression (ERE) standard.
Most Linux utilities conform to at least the POSIX BRE standard, but some utilities (including sed) only understand a subset of the BRE standard. One of the reasons for this limitation is the desire to make such utilities as fast as possible in word processing.

The POSIX ERE standard is often implemented in programming languages. It allows you to use a lot of tools when developing regular expressions. For example, these can be special character sequences for frequently used patterns, such as searching for individual words or sets of numbers in the text. Awk supports the ERE standard.

There are many ways to develop regular expressions, depending on the opinion of the programmer, and on the features of the engine under which they are created. It's not easy to write generic regular expressions that any engine can understand. Therefore, we will focus on the most commonly used regular expressions and look at the specifics of their implementation for sed and awk.

POSIX BRE regular expressions

Perhaps the simplest BRE pattern is a regular expression for finding an exact match of a sequence of characters in text. This is how searching for a string in sed and awk looks like:

$ echo "This is a test" | sed -n "/test/p" $ echo "This is a test" | awk "/test/(print $0)"

Finding text by pattern in sed


Finding text by pattern in awk

You may notice that the search for a given pattern is performed without taking into account the exact location of the text in the string. In addition, the number of occurrences does not matter. After the regular expression finds the given text anywhere in the string, the string is considered suitable and is passed for further processing.

When working with regular expressions, keep in mind that they are case sensitive:

$ echo "This is a test" | awk "/Test/(print $0)" $ echo "This is a test" | awk "/test/(print $0)"

Regular expressions are case sensitive

The first regular expression did not find any matches, since the word "test", which begins with a capital letter, does not occur in the text. The second, configured to search for a word written in capital letters, found a suitable string in the stream.

In regular expressions, you can use not only letters, but also spaces and numbers:

$ echo "This is a test 2 again" | awk "/test 2/(print $0)"

Finding a piece of text containing spaces and numbers

Spaces are treated by the regular expression engine as regular characters.

Special symbols

When using different characters in regular expressions, there are a few things to keep in mind. For example, there are some special characters, or metacharacters, that require a special approach when used in a template. Here they are:

.*^${}\+?|()
If one of these is needed in the pattern, it will need to be escaped with a backslash (backslash) - \ .

For example, if you need to find a dollar sign in the text, it must be included in the template, preceded by an escape character. Let's say there is a file myfile with the following text:

There is 10$ on my pocket
The dollar sign can be detected with a pattern like this:

$ awk "/\$/(print $0)" myfile

Using a special character in a template

In addition, the backslash is also a special character, so if you want to use it in a template, you also need to escape it. It looks like two slashes following each other:

$ echo "\ is a special character" | awk "/\\/(print $0)"

Backslash escaping

Although the forward slash is not in the above list of special characters, attempting to use it in a regular expression written for sed or awk will result in an error:

$ echo "3 / 2" | awk "///(print $0)"

Incorrect use of a forward slash in a template

If it is needed, it must also be escaped:

$ echo "3 / 2" | awk "/\//(print $0)"

Escaping a forward slash

Anchor symbols

There are two special characters for anchoring a pattern to the beginning or end of a text string. The cap symbol - ^ allows you to describe sequences of characters that are at the beginning of text lines. If the pattern you are looking for appears elsewhere in the string, the regular expression will not respond to it. The use of this symbol looks like this:

$ echo "welcome to likegeeks website" | awk "/^likegeeks/(print $0)" $ echo "likegeeks website" | awk "/^likegeeks/(print $0)"

Search for a pattern at the beginning of a string

The ^ symbol is designed to search for a pattern at the beginning of a line, while the case of characters is also taken into account. Let's see how this will affect the processing of a text file:

$ awk "/^this/(print $0)" myfile


Search for a pattern at the beginning of a line in text from a file

When using sed, if you place an escape anywhere inside a pattern, it will be treated like any other normal character:

$ echo "This ^ is a test" | sed -n "/s ^/p"

Cap not at start of pattern in sed

In awk, when using the same pattern, the given character must be escaped:

$ echo "This ^ is a test" | awk "/s \^/(print $0)"

A lid not at the beginning of a pattern in awk

With the search for text fragments at the beginning of the line, we figured it out. What if you need to find something at the end of a line?

The dollar sign - $ , which is the anchor character for the end of the line, will help us with this:

$ echo "This is a test" | awk "/test$/(print $0)"

Finding text at the end of a line

Both anchor characters can be used in the same pattern. Let's process the file myfile , the contents of which are shown in the figure below, using the following regular expression:

$ awk "/^this is a test$/(print $0)" myfile


A pattern that uses special characters for the beginning and end of a string

As you can see, the template reacted only to a string that fully corresponded to the given sequence of characters and their location.

Here's how to filter out empty lines using anchor characters:

$ awk "!/^$/(print $0)" myfile
In this template, I used the negation symbol, the exclamation mark - ! . Thanks to the use of such a pattern, strings are searched for that do not contain anything between the beginning and end of the string, and thanks to exclamation point only lines that do not match this pattern are printed.

Dot character

The dot is used to search for any single character, except for the newline character. Let's pass the file myfile to such a regular expression, the contents of which are given below:

$ awk "/.st/(print $0)" myfile


Using dot in regular expressions

As can be seen from the output, only the first two lines from the file match the pattern, since they contain the sequence of characters "st" preceded by another character, while the third line does not contain a suitable sequence, and the fourth line does, but it is in at the very beginning of the line.

Character classes

A dot matches any single character, but what if you want to limit the set of characters you're looking for more flexibly? In such a situation, you can use character classes.

Thanks to this approach, you can organize a search for any character from a given set. To describe a character class, square brackets - are used:

$ awk "/th/(print $0)" myfile


Description of a character class in a regular expression

Here we are looking for a sequence of characters "th" preceded by the character "o" or the character "i".

Classes come in handy when looking for words that can start with either an uppercase or lowercase letter:

$ echo "this is a test" | awk "/his is a test/(print $0)" $ echo "This is a test" | awk "/his is a test/(print $0)"

Search for words that may start with a lowercase or uppercase letter

Character classes are not limited to letters. Other characters can be used here as well. It is impossible to say in advance in what situation the classes will be needed - it all depends on the problem being solved.

Negating character classes

Symbol classes can also be used to solve the reverse problem described above. Namely, instead of searching for symbols included in the class, you can organize a search for everything that is not included in the class. In order to achieve this behavior of a regular expression, you need to put a ^ sign in front of the list of class characters. It looks like this:

$ awk "/[^oi]th/(print $0)" myfile


Search for characters not in a class

IN this case will find sequences of 'th' characters that are not preceded by 'o' or 'i'.

Character ranges

In character classes, you can describe ranges of characters using dashes:

$ awk "/st/(print $0)" myfile


Describing a range of characters in a character class

IN this example the regular expression matches the character sequence "st" preceded by any character located, in alphabetical order, between the characters "e" and "p".

Ranges can also be created from numbers:

$ echo "123" | awk "//" $ echo "12a" | awk "//"

Regular expression for search for three any numbers

A character class can contain multiple ranges:

$ awk "/st/(print $0)" myfile


Character class consisting of multiple ranges

This regular expression will match all "st" sequences preceded by characters from ranges a-f and m-z .

Special character classes

BRE has special character classes that can be used when writing regular expressions:
  • [[:alpha:]] - matches any alphabetic character written in upper or lower case.
  • [[:alnum:]] - matches any alphanumeric character, namely characters in the ranges 0-9 , A-Z , a-z .
  • [[:blank:]] - Matches a space and a tab.
  • [[:digit:]] - any numeric character from 0 to 9 .
  • [[:upper:]] - alphabetic characters in upper case- A-Z .
  • [[:lower:]] - lower case alphabetic characters - a-z .
  • [[:print:]] - matches any printable character.
  • [[:punct:]] - matches punctuation marks.
  • [[:space:]] - whitespace characters, in particular - space, tab, characters NL , FF , VT , CR .
You can use special classes in templates like this:

$ echo "abc" | awk "/[[:alpha:]]/(print $0)" $ echo "abc" | awk "/[[:digit:]]/(print $0)" $ echo "abc123" | awk "/[[:digit:]]/(print $0)"


Special character classes in regular expressions

Asterisk symbol

If you place an asterisk after a character in a pattern, this will mean that the regular expression will work if the character appears in the string any number of times - including the situation when the character is absent in the string.

$ echo "test" | awk "/tes*t/(print $0)" $ echo "tessst" | awk "/tes*t/(print $0)"


Using the * character in regular expressions

This wildcard character is usually used to work with words that are constantly misspelled, or for words that allow different variants correct spelling:

$ echo "I like green color" | awk "/colou*r/(print $0)" $ echo "I like green color " | awk "/colou*r/(print $0)"

Finding a word that has different spellings

In this example, the same regular expression matches both the word "color" and the word "colour". This is due to the fact that the character "u", followed by an asterisk, can either be absent or occur several times in a row.

Another useful feature stemming from the asterisk character is to combine it with a dot. This combination allows the regular expression to respond to any number of any characters:

$ awk "/this.*test/(print $0)" myfile


Template that responds to any number of any characters

In this case, it does not matter how many and what characters are between the words "this" and "test".

The asterisk can also be used with character classes:

$ echo "st" | awk "/s*t/(print $0)" $ echo "sat" | awk "/s*t/(print $0)" $ echo "set" | awk "/s*t/(print $0)"


Using the asterisk with character classes

In all three examples, the regular expression works because the asterisk after the character class means that if any number of "a" or "e" characters are found, or if they are not found, the string will match the given pattern.

POSIX ERE regular expressions

POSIX ERE templates that support some Linux utilities, may contain additional characters. As already mentioned, awk supports this standard, but sed does not.

Here we will look at the most commonly used characters in ERE patterns, which will be useful for you when creating your own regular expressions.

▍Question mark

The question mark indicates that the preceding character may occur once or not at all in the text. This character is one of the repetition metacharacters. Here are some examples:

$ echo "tet" | awk "/tes?t/(print $0)" $ echo "test" | awk "/tes?t/(print $0)" $ echo "tesst" | awk "/tes?t/(print $0)"


Question mark in regular expressions

As you can see, in the third case, the letter “s” occurs twice, so the regular expression does not respond to the word “tesst”.

The question mark can also be used with character classes:

$ echo "tst" | awk "/t?st/(print $0)" $ echo "test" | awk "/t?st/(print $0)" $ echo "tast" | awk "/t?st/(print $0)" $ echo "taest" | awk "/t?st/(print $0)" $ echo "teest" | awk "/t?st/(print $0)"


Question mark and character classes

If there are no characters from the class in the string, or one of them occurs once, the regular expression works, but as soon as two characters appear in the word, the system no longer finds a match for the pattern in the text.

▍Plus symbol

The plus sign in the pattern indicates that the regular expression will match the match if the preceding character occurs one or more times in the text. At the same time, such a construction will not react to the absence of a symbol:

$ echo "test" | awk "/te+st/(print $0)" $ echo "teest" | awk "/te+st/(print $0)" $ echo "tst" | awk "/te+st/(print $0)"


Plus sign in regular expressions

In this example, if there is no “e” character in the word, the regular expression engine will not find matches in the text. The plus symbol also works with character classes - in this way it is similar to the asterisk and the question mark:

$ echo "tst" | awk "/t+st/(print $0)" $ echo "test" | awk "/t+st/(print $0)" $ echo "teast" | awk "/t+st/(print $0)" $ echo "teeast" | awk "/t+st/(print $0)"


Plus sign and character classes

In this case, if the string contains any character from the class, the text will be considered to match the pattern.

▍ Curly braces

Curly brackets that can be used in ERE patterns are similar to the characters discussed above, but they allow you to more precisely specify the required number of occurrences of the character that precedes them. You can specify a limit in two formats:
  • n - a number specifying the exact number of searched occurrences
  • n, m - two numbers that are interpreted as follows: "at least n times, but not more than m".
Here are examples of the first option:

$ echo "tst" | awk "/te(1)st/(print $0)" $ echo "test" | awk "/te(1)st/(print $0)"

Curly braces in patterns, finding the exact number of occurrences

In older versions of awk, you had to use the --re-interval command-line switch in order for the program to recognize intervals in regular expressions, but in newer versions you don't.

$ echo "tst" | awk "/te(1,2)st/(print $0)" $ echo "test" | awk "/te(1,2)st/(print $0)" $ echo "teest" | awk "/te(1,2)st/(print $0)" $ echo "teeest" | awk "/te(1,2)st/(print $0)"


Spacing given in curly braces

In this example, the character "e" must occur 1 or 2 times in the string, then the regular expression will respond to the text.

Curly braces can also be used with character classes. The principles already familiar to you apply here:

$ echo "tst" | awk "/t(1,2)st/(print $0)" $ echo "test" | awk "/t(1,2)st/(print $0)" $ echo "teest" | awk "/t(1,2)st/(print $0)" $ echo "teeast" | awk "/t(1,2)st/(print $0)"


Curly braces and character classes

The template will react to the text if the character "a" or the character "e" occurs once or twice in it.

▍Logical “or” symbol

Symbol | - a vertical bar, means a logical "or" in regular expressions. When processing a regular expression containing several fragments separated by such a character, the engine will consider the parsed text as a match if it matches any of the fragments. Here is an example:

$ echo "This is a test" | awk "/test|exam/(print $0)" $ echo "This is an exam" | awk "/test|exam/(print $0)" $ echo "This is something else" | awk "/test|exam/(print $0)"


Boolean "or" in regular expressions

In this example, the regular expression is configured to search for the words "test" or "exam" in the text. Pay attention to the fact that between the template fragments and the | symbol separating them. there should be no spaces.

Regular expression fragments can be grouped using parentheses. If you group a certain sequence of characters, it will be perceived by the system as a regular character. That is, for example, repetition metacharacters can be applied to it. Here's what it looks like:

$ echo "Like" | awk "/Like(Geeks)?/(print $0)" $ echo "LikeGeeks" | awk "/Like(Geeks)?/(print $0)"


Grouping Regular Expression Fragments

In these examples, the word "Geeks" is enclosed in parentheses, followed by a question mark. Recall that the question mark means "0 or 1 repetition", as a result, the regular expression will match both the string "Like" and the string "LikeGeeks".

Practical examples

Now that we've covered the basics of regular expressions, it's time to do something useful with them.

▍Counting the number of files

Let's write a bash script that counts files located in directories that are written to the PATH environment variable. In order to do this, you will first need to form a list of paths to directories. Let's do this with sed, replacing colons with spaces:

$ echo $PATH | sed "s/:/ /g"
The replace command supports regular expressions as patterns for searching text. In this case, everything is extremely simple, we are looking for a colon symbol, but no one bothers to use something else here - it all depends on the specific task.
Now we need to go through the resulting list in a loop and perform the necessary actions to count the number of files there. General scheme the script will be like this:

Mypath=$(echo $PATH | sed "s/:/ /g") for directory in $mypath do done
Now let's write the full text of the script, using the ls command to get information about the number of files in each of the directories:

#!/bin/bash mypath=$(echo $PATH | sed "s/:/ /g") count=0 for directory in $mypath do check=$(ls $directory) for item in $check do count=$ [ $count + 1 ] done echo "$directory - $count" count=0 done
When running the script, it may turn out that some directories from PATH do not exist, however, this will not prevent it from counting files in existing directories.


File count

The main value of this example is that using the same approach, you can solve much more complex problems. Which one depends on your needs.

▍Verifying email addresses

There are websites with huge collections of regular expressions that allow you to check addresses Email, phone numbers, and so on. However, it is one thing to take ready-made, and quite another to create something yourself. So let's write a regular expression to validate email addresses. Let's start with the analysis of the initial data. For example, here is an address:

[email protected]
The username, username , can consist of alphanumeric characters and some other characters. Namely, this is a dot, dash, underscore, plus sign. The username is followed by the @ sign.

Armed with this knowledge, let's start assembling the regular expression from its left side, which serves to check the username. Here's what we got:

^(+)@
This regular expression can be read as follows: "At the beginning of the line must be at least one character from those in the group given in square brackets, and after that there must be an @ sign."

Now it's the hostname queue - hostname . The same rules apply here as for the username, so the template for it would look like this:

(+)
The top-level domain name is subject to special rules. There can only be alphabetic characters, which must be at least two (for example, such domains usually contain a country code), and no more than five. All this means that the template for checking the last part of the address will be like this:

\.({2,5})$
You can read it like this: "First there must be a period, then - from 2 to 5 alphabetic characters, and after that the line ends."

Having prepared the patterns for the individual parts of the regular expression, let's put them together:

^(+)@(+)\.({2,5})$
Now it remains only to test what happened:

$echo" [email protected]" | awk "/^(+)@(+)\.((2,5))$/(print $0)" $ echo " [email protected]" | awk "/^(+)@(+)\.((2,5))$/(print $0)"


Validating an email address with regular expressions

The fact that the text passed to awk is displayed on the screen means that the system recognized it as an email address.

Results

If the regular expression for checking email addresses that you met at the very beginning of the article seemed completely incomprehensible then, we hope that now it no longer looks like a meaningless set of characters. If this is true, then this material has served its purpose. In fact, regular expressions are a topic that you can study all your life, but even the little that we have analyzed can already help you write scripts that process texts in a rather advanced way.

In this series of materials, we usually showed very simple examples of bash scripts that literally consisted of a few lines. Let's look at something bigger next time.

Dear readers! Do you use regular expressions when processing text in command line scripts?

One of the most useful and versatile commands in Linux terminal- "grep" command. Grep is an acronym that stands for "global regular expression print" (i.e., "search everywhere for matching regular expression lines and output them"). This means that grep can be used to see if input matches given patterns.

This seemingly trivial program is very powerful when used correctly. Its ability to sort input based on complex rules makes it a popular binder in many command chains.

This tutorial looks at some of the features of the grep command and then moves on to using regular expressions. All the techniques described in this guide can be applied to managing a virtual server.

Usage Basics

In its simplest form, grep is used to match literal patterns in text file. This means that if the grep command receives a search word, it will print every line of the file that contains that word.

As an example, you can use grep to search for lines containing the word "GNU" in version 3 of the GNU General Public License on an Ubuntu system.

cd /usr/share/common-licenses
grep "GNU" GPL-3
GNU GENERAL PUBLIC LICENSE





13. Use with the GNU Affero General Public License.
under version 3 of the GNU Affero General Public License into a single
...
...

The first argument, "GNU", is the template to look for, and the second argument, "GPL-3", is the input file to look for.

As a result, all lines containing the text pattern will be displayed. In some Linux distributions the searched pattern will be highlighted in the displayed lines.

General options

By default, grep simply looks for strongly specified patterns in the input file and prints the lines it finds. However, grep's behavior can be changed by adding some additional flags.

If you want to ignore the case of the search parameter and look for both uppercase and lowercase variations of the pattern, you can use the "-i" or "--ignore-case" utilities.

For example, you can use grep to search the same file for the word "license" in upper, lower, or mixed case.

grep -i "license" GPL-3
GNU GENERAL PUBLIC LICENSE
of this license document, but changing it is not allowed.
The GNU General Public License is a free, copyleft license for
The licenses for most software and other practical works are designed
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it also applies to


"This License" refers to version 3 of the GNU General Public License.
"The Program" refers to any copyrightable work licensed under this
...
...

As you can see, the output contains "LICENSE", "license", and "License". If there was an instance of "LiCeNsE" in the file, it would also be output.
If you want to find all lines that do not contain the specified pattern, you can use the "-v" or "--invert-match" flags.

As an example, you can use the following command to search the BSD license for all lines that do not contain the word "the":

grep -v "the"BSD
All rights reserved.
Redistribution and use in source and binary forms, with or without
are met:
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS"" ​​AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
...
...

As you can see, the last two lines were output as not containing the word "the" because the "ignore case" command was not used.

It is always useful to know the line numbers where matches were found. They can be found using the "-n" or "--line-number" flags.

If you apply this flag in the previous example, the following output will be displayed:

grep -vn "the" BSD
2:All rights reserved.
3:
4:Redistribution and use in source and binary forms, with or without
6:are met:
13: may be used to endorse or promote products derived from this software
14: without specific prior written permission.
15:
16:THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS"" ​​AND
17:ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
...
...

You can now refer to the line number as needed to make changes on each line that does not contain "the".

Regular Expressions

As mentioned in the introduction, grep stands for "global regular expression print". A regular expression is a text string that describes a specific search pattern.

Different applications and programming languages ​​use regular expressions in slightly different ways. This guide covers only a small subset of how Grep patterns are described.

Letter matches

The above examples of searching for the words "GNU" and "the" looked for very simple regular expressions that exactly matched the string of characters "GNU" and "the".

It is more correct to represent them as matches of strings of characters than as matches of words. As you become familiar with more complex patterns, this distinction will become more significant.

Patterns that exactly match the given characters are called "alphabetic" because they match the pattern letter by letter, character for character.

All alphabetic and numeric characters (as well as some other characters) match literally unless they have been modified by other expression engines.

Anchor matches

Anchors are special characters that indicate the location in a string of a desired match.

For example, you can specify that the search only looks for strings containing the word "GNU" at the very beginning. To do this, you need to use the anchor "^" before the literal string.

In this example, only the lines containing the word "GNU" at the very beginning are output.

grep "^GNU" GPL-3
GNU General Public License for most of our software; it also applies to
GNU General Public License, you may choose any version ever published

Similarly, the "$" anchor can be used after a literal string to indicate that the match is valid only if the character string being searched is at the end of the text string.

The following regular expression outputs only those lines that contain "and" at the end:

grep "and$" GPL-3
that there is no warranty for this free software. For both users" and
The precise terms and conditions for copying, distribution and


alternative is allowed only occasionally and noncommercially, and
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
provisionally, unless and until the copyright holder explicitly and
receives a license from the original licensors, to run, modify and
make, use, sell, offer for sale, import and otherwise run, modify and

Match any character

The dot (.) is used in regular expressions to indicate that any character can appear at the specified location.

For example, if you want to find matches containing two characters and then the sequence "cept", you would use the following pattern:

grep "..cept" GPL-3
use, which is precisely where it is most unacceptable. Therefore, we
infringement under applicable copyright law, except executing it on a
tells the user that there is no warranty for the work (except to the

form of a separately written license, or stated as exceptions;
You may not propagate or modify a covered work except as expressly
9. Acceptance Not Required for Having Copies.
...
...

As you can see, the words "accept" and "except" are displayed in the results, as well as variations of these words. The pattern would also match the sequence "z2cept" if there was one in the text.

Expressions in brackets

By placing a group of characters in square brackets (""), you can indicate that any of the characters in the brackets can be in this position.

This means that if you need to find strings containing "too" or "two", you can briefly specify these variations using the following pattern:

grep "to" GPL-3
your programs, too.

Developers that use the GNU GPL protect your rights with two steps:
a computer network, with no transfer of a copy, is not conveying.

Corresponding Source from a network server at no charge.
...
...

As you can see, both variations were found in the file.

Bracketing characters also provides several useful features. You can specify that the pattern matches everything except the characters in brackets by starting the list of characters in brackets with the "^" character.

In this example, the template ".ode" is used, which must not match the sequence "code".

grep "[^c]ode" GPL-3
1. Source code.
model, to give anyone who possesses the object code either (1) a
the only significant mode of use of the product.
notice like this when it starts in an interactive mode:

It is worth noting that the second output line contains the word "code". This is not a regex or grep error.

Rather, this line was inferred because it also contains the pattern-matching "mode" sequence found in the word "model". That is, the string was output because it matched the pattern.

Another useful feature of brackets is the ability to specify a range of characters instead of typing each character separately.

This means that if you want to find every line that starts with a capital letter, you can use the following pattern:

grep "^" GPL-3
GNU General Public License for most of our software; it also applies to

license. Each licensee is addressed as "you". "Licenses" and


System Libraries, or general-purpose tools or generally available free
source.

...
...

Due to some inherent sorting issues, it is better to use the POSIX standard character classes instead of the character range used in the example above for a more accurate result.
There are many character classes not covered in this guide; for example, to perform the same procedure as in the example above, you can use the character class "[:upper:]" in parentheses.

grep "^[[:upper:]]" GPL-3
GNU General Public License for most of our software; it also applies to
States should not allow patents to restrict development and use of
license. Each licensee is addressed as "you". "Licenses" and
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
System Libraries, or general-purpose tools or generally available free
source.
User Product is transferred to the recipient in perpetuity or for a
...
...

Repeat pattern (0 or more times)

One of the most commonly used metacharacters is the character "*", which means "repeat the previous character or expression 0 or more times".

For example, if you want to find every line with opening or closing parentheses that contain only letters and single spaces between them, you can use the following expression:

grep "(*)" GPL-3

distribution (with or without modification), making available to the
than the work as a whole, that (a) is included in the normal form of
Component, and (b) serves only to enable use of the work with that
(if any) on which the executable work runs, or a compiler used to
(including a physical distribution medium), accompanied by the
(including a physical distribution medium), accompanied by a
place (gratis or for a charge), and offer equivalent access to the
...
...

How to avoid metacharacters

Sometimes you may want to look for a literal dot or a literal open parenthesis. Because these characters are certain value in regular expressions, you need to "escape" them by telling grep not to use their special meaning in this case.

These characters can be escaped by using a backslash (\) before a character that usually has a special meaning.

For example, if you want to find a string that starts with a capital and ends with a dot, you can use the following expression. The backslash before the last dot tells the command to "avoid" it, so that the last dot represents a literal dot and does not have the meaning "any character":

grep "^.*\.$" GPL-3
source.
License by making exceptions from one or more of its conditions.
License would be to refrain entirely from conveying the Program.
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
SUCH DAMAGES.
Also add information on how to contact you by electronic and paper mail.

Extended regular expressions

The grep command can also be used with the extended regular expression language by using the "-E" flag, or by calling the "egrep" command instead of "grep".

These commands open up the possibilities of "extended regular expressions". Extended regular expressions include all the basic metacharacters, as well as additional metacharacters to express more complex matches.

Grouping

One of the simplest and most useful features of extended regular expressions is the ability to group expressions and use them as a whole.

Parentheses are used to group expressions. If you need to use parentheses outside of extended regular expressions, they can be "escaped" with a backslash

grep "\(grouping\)" file.txt
grep -E "(grouping)" file.txt
egrep "(grouping)" file.txt

The above expressions are equivalent.

alternation

Just as square brackets specify different possible matches for a single character, alternation allows you to specify alternate matches for strings of characters or sets of expressions.

The vertical bar character "|" is used to denote alternation. Alternation is often used in grouping to indicate that one of two or more options should be considered a coincidence.

In this example, you need to find "GPL" or "General Public License":

grep -E "(GPL|General Public License)" GPL-3
The GNU General Public License is a free, copyleft license for
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it also applies to
price. Our General Public Licenses are designed to make sure that you
Developers that use the GNU GPL protect your rights with two steps:
For the developers" and authors" protection, the GPL clearly explains
authors" sake, the GPL requires that modified versions be marked as
have designed this version of the GPL to prohibit the practice for those
...
...

Alternation can be used to choose between two or more options; to do this, you need to enter the remaining options in the selection group, separating each with the pipe character "|".

quantifiers

In extended regular expressions, there are metacharacters that indicate how often a character repeats, much like the "*" metacharacter indicates matches of the previous character or string of characters 0 or more times.

To indicate a character match 0 or more times, you can use the character "?". It will make the previous character or set of characters essentially optional.

In this example, by adding the sequence "copy" to the optional group, the matches "copyright" and "right" are displayed:

grep -E "(copy)?right" GPL-3
Copyright (C) 2007 Free Software Foundation, Inc.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
"Copyright" also means copyright-like laws that apply to other kinds of
...
...

The "+" symbol matches expressions 1 or more times. It works almost like the "*" character, but when using "+", the expression must match at least 1 time.

The following expression matches the string "free" plus 1 or more non-whitespace characters:

grep -E "free[^[:space:]]+" GPL-3
The GNU General Public License is a free, copyleft license for
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
When we speak of free software, we are referring to freedom, not
have the freedom to distribute copies of free software (and charge for

freedoms that you received. You must make sure that they, too, receive
protecting users" freedom to change the software. The systematic
of the GPL, as needed to protect the freedom of users.
patents cannot be used to render the program non-free.

Number of match repetitions

Curly braces ("( )") can be used to specify the number of repetitions of matches. These characters are used to indicate the exact number, range, and upper and lower limits on the number of times an expression can match.

If you want to find all strings that contain a combination of three vowels, you can use the following expression:

grep -E "(3)" GPL-3
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
receive it, in any medium, provided that you conspicuously and
give under the previous paragraph, plus a right to possession of the
covered work so as to satisfy simultaneously your obligations under this
If you need to find all words that are 16-20 characters long, use the following expression:
grep -E "[[:alpha:]](16,20)" GPL-3
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
c) Prohibiting misrepresentation of the origin of that material, or

conclusions

In many cases, the grep command is useful for finding patterns within files or within a hierarchy. file system. It saves a lot of time, so you should familiarize yourself with its parameters and syntax.

Regular expressions are even more versatile and can be used in many popular programs. For example, many text editors use regular expressions to find and replace text.

Moreover, advanced programming languages ​​use regular expressions to execute procedures on specific pieces of data. The ability to work with regular expressions will be useful in solving common problems related to the computer.

Tags: ,

Regular expressions are a very powerful tool for pattern matching, processing, and modifying strings that can be used to solve a variety of problems. Here are the main ones:

  • Checking text input;
  • Find and replace text in a file;
  • Batch rename files;
  • Interaction with services such as Apache;
  • Checking a string against a pattern.

This is not a complete list, regular expressions allow you to do a lot more. But for new users, they may seem too complicated, since they are formed using special language. But given the possibilities provided, Linux regular expressions should be known and used by everyone. System Administrator.

In this article, we are going to cover bash regular expressions for beginners so that you can understand all the features of this tool.

Two types of characters can be used in regular expressions:

  • regular letters;
  • metacharacters.

Regular characters are letters, numbers, and punctuation marks that make up any string. All texts are made up of letters and you can use them in regular expressions to find the desired position in the text.

Metacharacters are something else, they are what give power to regular expressions. With metacharacters, you can do a lot more than looking for a single character. You can search for character combinations, use a dynamic number of characters, and select ranges. All special characters can be divided into two types, these are replacement characters that replace ordinary characters, or operators that indicate how many times a character can be repeated. The syntax for a regular expression would look like this:

regular_symbol special character_operator

wildcard_replacement special character_operator

  • - literal special characters begin with a backslash, and it is also used if you need to use a special character in the form of any punctuation mark;
  • ^ - indicates the beginning of the line;
  • $ - indicates the end of the line;
  • * - indicates that the previous character can be repeated 0 or more times;
  • + - indicates that the previous character should be repeated more than one or more times;
  • ? - the previous character can occur zero or one time;
  • (n)- indicates how many times (n) you need to repeat the previous character;
  • (n,n)- the previous character can be repeated from N to n times;
  • . - any character except line feed;
  • — any character specified in brackets;
  • x|y— symbol x or symbol y;
  • [^az]- any character, except for those indicated in brackets;
  • — any character from the specified range;
  • [^a-z]- any character that is not in the range;
  • b— denotes a word boundary with a space;
  • B- indicates that the character must be inside a word, for example, ux will match uxb or tuxedo, but will not match Linux;
  • d- means that the character is a digit;
  • D- non-digit character;
  • n— line feed character;
  • s- one of the space characters, space, tab, and so on;
  • S- any character other than a space;
  • t- tab character;
  • v— vertical tab character;
  • w— any alphabetic character, including underscore;
  • W- any alphabetic character, except for underscore;
  • uXXX- Unicdo symbol.

It is important to note that a slash must be used before literal special characters to indicate that the special character follows. The reverse is also true, if you want to use a special character that is used without a slash as a normal character, then you have to add a slash.

For example, you want to find the string 1+ 2=3 in the text. If you use this string as a regular expression, you won't find anything, because the system interprets the plus as a special character that says that the previous one must be repeated one or more times. So it needs to be escaped: 1 + 2 = 3. Without escaping, our regular expression would only match the string 11=3 or 111=3 and so on. You don't need to put a dash before the equals, because it's not a special character.

Regular Expression Examples

Now that we have covered the basics and you know how everything works, it remains to consolidate the knowledge gained about linux grep regular expressions in practice. Two very useful special characters are ^ and $, which indicate the beginning and end of a line. For example, we want to get all users registered in our system whose name starts with s. Then you can use the regular expression "^s". You can use the egrep command:

egrep "^s" /etc/passwd

If we want to select lines by the last character in the line, we can use $. For example, let's select all system users, without a shell, records about such users end with false:

egrep "false$" /etc/passwd

To display usernames that start with s or d use this expression:

egrep "^" /etc/passwd

The same result can be obtained by using the "|" symbol. The first option is more suitable for ranges, and the second is more often used for ordinary or / or:

egrep "^" /etc/passwd

Now let's select all users whose name is not three characters long. The username ends with a colon. We can say that it can contain any alphabetic character, which must be repeated three times, before the colon:

egrep "^w(3):" /etc/passwd

conclusions

In this article, we covered Linux regular expressions, but that was just the very basics. If you dig a little deeper, you will find that you can do a lot more interesting things with this tool. The time spent learning regular expressions will definitely be worth it.

At the end of the lecture from Yandex about regular expressions:

The grep utility is a very powerful tool for finding and filtering textual information. This article shows several examples of its use, which will allow you to appreciate its capabilities.
The main use of grep is to search for words or phrases in files and output streams. You can search by typing in command line query and search scope (file).
For example, to find the string "needle" in the hystack.txt file, use the following command:

$ grep needle haystack.txt

As a result, grep will display all occurrences of needle that it encounters in the contents of the haystack.txt file. It is important to note that in this case, grep is looking for a set of characters, not a word. For example, lines containing the word "needless" and other words that contain the sequence "needle" will be displayed.


To tell grep that you are looking for a particular word, use the -w switch. This key will restrict the search to only the specified word. A word is a query delimited on both sides by any whitespace characters, punctuation marks, or line breaks.

$ grep -w needle haystack.txt

You don't have to limit your search to just one file, grep can also search through a group of files, and the search results will list the file that matches. The -n switch will also add the line number in which a match was found, and the -r switch will allow you to execute recursive search. This is very handy when searching among files with program source texts.

$ grep -rnw function_name /home/www/dev/myprogram/

The filename will be listed before each match. If you need to hide filenames, use the -h switch, on the contrary, if only filenames are needed, then specify the -l switch
In the following example, we will search for URLs in an IRC log file and show the last 10 matches.

$ grep -wo http://.* channel.log | tail

The -o option tells grep to output only the pattern match, not the entire line. The grep output is piped to the tail command, which prints the last 10 lines by default.
Now we will count the number of messages sent to the irc channel by certain users. For example, all the messages that I sent from home and from work. They differ in nickname, at home I use the nickname user_at_home, and at work, user_at_work.

$ grep -c "^user_at_(home|work)" channel.log

With the -c option, grep only prints the number of matches found, not the matches themselves. The search string is enclosed in quotation marks because it contains special characters that the shell might recognize as control characters. Note that quotation marks are not included in the search pattern. The backslash "" is used to escape service characters.
Let's search through the messages of people who like to "shout" in the channel. By “shouting” we mean messages written in blondy-style, one CAPITAL LETTERS. To exclude random hits of abbreviations from the search, we will search for words of five or more characters:

$ grep -w "+(5,)" channel.log

For a more detailed description, see the grep man page.
A few more examples:

# grep root /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin

Displays lines from the /etc/passwd file that contain the string root.

# grep -n root /etc/passwd 1:root:x:0:0:root:/root:/bin/bash 12:operator:x:11:0:operator:/root:/sbin/nologin

In addition, the line numbers containing the search string are displayed.

# grep -v bash /etc/passwd | grep -v nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin :/sbin/halt news:x:9:13:news:/var/spool/news: mailnull:x:47:47::/var/spool/mqueue:/dev/null xfs:x:43:43: X Font Server:/etc/X11/fs:/bin/false rpc:x:32:32:Portmapper RPC user:/:/bin/false nscd:x:28:28:NSCD Daemon:/:/bin/false named:x:25:25:Named:/var/named:/bin/false squid:x:23:23::/var/spool/squid:/dev/null ldap:x:55:55:LDAP User: /var/lib/ldap:/bin/false apache:x:48:48:Apache:/var/www:/bin/false

Checks which users are not using bash, excluding those user accounts that have nologin as their shell.

# grep -c false /etc/passwd 7

Counts the number of accounts that have /bin/false as their shell.

# grep -i games ~/.bash* | grep -v history

This command lists lines from all files in the current user's home directory that start with ~/.bash, except for those files that have the string history in their names, in order to exclude the matches found in the ~/.bash_history file in which can be the same string in upper or lower case. Please note that the search for the word "games" is carried out, you can substitute any other instead.
grep command and regular expressions

Unlike the previous example, now we will display only those lines that begin with the string "root":

# grep ^root /etc/passwd root:x:0:0:root:/root:/bin/bash

If we want to see which accounts weren't using the shell at all, we look for lines ending in ":":

# grep:$ /etc/passwd news:x:9:13:news:/var/spool/news:

To check if the PATH variable in the ~/.bashrc file is exported, first select the lines with "export" and then look for lines that begin with the string "PATH"; in this case, MANPATH and others will not be displayed possible ways:

# grep export ~/.bashrc | grep "PATH" export PATH="/bin:/usr/lib/mh:/lib:/usr/bin:/usr/local/bin:/usr/ucb:/usr/dbin:$PATH"

Character classes

An expression in square brackets is a list of characters enclosed within the characters [" and "]"". It matches any single character in this list; if the first character of the list is "^", then it matches any character that is NOT present in the list. For example, the regular expression "" matches any single digit.

Inside an expression in square brackets, you can specify a range consisting of two characters separated by a hyphen. Then the expression matches any single character that, according to the sorting rules, falls inside these two characters, including these two characters; this takes into account the collating sequence and character set specified in the locale. For example, when the default locale is C, the expression "" is equivalent to the expression "". There are many locales where sorting is done in dictionary order, and in these locales "" is not usually equivalent to "", in them, for example, it can be equivalent to the expression "". To use the traditional interpretation of a bracketed expression, you can use the C locale by setting environment variable LC_ALL value "C".

Finally, there are character classes that are specifically named and are specified within square bracket expressions. Additional information see the man pages or the grep documentation for these predefined expressions.

# grep /etc/group sys:x:3:root,bin,adm tty:x:5: mail:x:12:mail,postfix ftp:x:50: nobody:x:99: floppy:x:19: xfs:x:43: nfsnobody:x:65534: postfix:x:89:

The example displays all lines that contain either the character "y" or the character "f".
Generic characters (metacharacters)

Use "." to match any single character. If you want to get a list of all English words taken from a dictionary containing five characters starting with "c" and ending with "h" (handy for solving crossword puzzles):

# grep " " /usr/share/dict/words catch clash cloth coach couch cough crash crush

If you want to display lines that contain a dot character as a literal, use the -F option with the grep command. Symbols "< " и «>» means the presence of an empty string before and, respectively, after the specified letters. This means that the words in the words file must be written appropriately. If you want to find all words in the text according to the specified patterns without taking into account empty lines omit the characters "< " и «>”, for a more precise search for only words, use the -w switch.

To similarly search for words that can contain any number of characters between "c" and "h", use an asterisk (*). The following example selects all words starting with "c" and ending with "h" from the system dictionary:

# grep " " /usr/share/dict/words caliph cash catch cheesecloth cheetah --output omitted--

If you want to search for a literal asterisk character in a file or output stream, use single quotes to do so. The user in the example below first tries to find the "asterisk" in the /etc/profile file without using quotes, resulting in nothing. When quotes are used, the result is printed to the output stream:

# grep * /etc/profile # grep "*" /etc/profile for i in /etc/profile.d/*.sh ; do

If you notice an error, select a piece of text and press Ctrl + Enter
SHARE: