Thursday, November 12, 2015

Regular Expression Concept

Regular Expression concepts

A regular expression, often called a pattern, is an expression used to specify a set of strings required for a particular purpose. A simple way to specify a finite set of strings is to list its elements or members.

Boolean "or"
A vertical bar separates alternatives. For example, abc|abb can match "abc" or "abb".

Grouping
Parentheses are used to define the scope and precedence of the operators (among other uses). For example, abc|abb and ab(c|b) are equivalent patterns which both describe the set of "abc" or "abb".

Quantification
    ?     The question mark indicates zero or one occurrences of the preceding element. For example, colou?r matches both "color" and "colour".

    *     The asterisk indicates zero or more occurrences of the preceding element. For example, ab*c matches "ac", "abc", "abbc", "abbbc", and so on.

    +     The plus sign indicates one or more occurrences of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".

    {n}     The preceding item is matched exactly n times.

    {min,}     The preceding item is matched min or more times.

    {min,max}     The preceding item is matched at least min times, but not more than max times.

      .      Dot Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".

  [ ]  A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].
The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].

    [^ ]  Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.

     ^         Matches the starting position within the string. In line-based tools, it matches the starting position of any line.

      $        Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.