RegEx

Regular Expressions (RegEx or RegExp) are special text string for describing search pattern. The RegEx may match or not match a string being searched. If it matches, it returns true, if it does not match, it returns false.

In RegEx, the following characters have special meaning. To match the literal character, these characters have to be escaped by preceding “\”.

  • ^ = beginning of a string or negating a character range.
  • $ = end of string.
  • *+? = denotes repetition, * is zero or more, + is one or more, and ? is zero or one. “a*” matches “”, “a”, “aa”, “aaa”, etc. “a+” does not match “”, but matches “a”, “aa”, “aaa”, etc. And, “a?” only matches “” and “a”.
  • . = any single character, alphanumeric, space, or other.
  • [] = a single character. What's inside defines what king of character it can be. Ranges of characters are denoted by “-”. “[a-zA-Z0-9]” means any alphanumeric character. “[^a-zA-Z0-9]” any NON-alphanumeric character, such as period, space, slashes, parentheses, quotes, question mark … etc.
  • () = defines more than one character. (abc) matches any single “abc” occurrence, but (abc)+ matches “abc”, “abcabc”, abcabcabc“, etc.
  • | = means OR, but not often used because ”(a|b|c)“ is the same as ”[a-c]“. It can be useful for matching either strings of longer than one character, such as ”(facile|easy)“.
  • {} = denotes a repetition range. “\.[a-zA-Z]{2,3}” means an alphabetic string of 2-3 characters, useful for domain names. Do not use ”-“ inside ”{}“, the ”,“ means a range inside ”{}“. But, inside ”[]“ use ”-“ only, ”,“ is literal comma inside ”[]“.
  • \ = used to escape all the above special characters to mean their literal. Use it for ^.[$()|*+?{\. In php3, if you would like to find or match the literal character “\” in a string, use “\\” in your RegEx. I think in php4+ you don't have to escape the literal “\” character. To make it more confusing, escaping some alphabetic characters changes their meaning to non-alphanumeric. “\s” means space, “\n” new line, “\r” cursor return, “\t” tab.

Example RegEx

To validate an email address, which can include underscore and dashes, the RegEx using php POSIX type function “eregi” would look like this:

eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$",$email)

Notice that eregi() is case insensitive (as opposed to ereg()), and thus [a-z] really means [a-zA-Z] in eregi() function. Useful regular expression tutorial links