Differences

This shows you the differences between two versions of the page.

Link to this comparison view

computers:regex [2015/03/24 05:42] (current)
Line 1: Line 1:
 +====== RegEx ======
  
 +Regular Expressions (RegEx or RegExp) are special text string for describing search pattern. The RegEx may match or not match a string being searched. If it matches, it returns true, if it does not match, it returns false.
 +
 +In RegEx, the following characters have special meaning. To match the literal character, these characters have to be escaped by preceding "​\"​.
 +
 +  * ''​^''​ = beginning of a string or negating a character range.
 +  * ''​$''​ = end of string.
 +  * ''​*+?''​ = denotes repetition, * is zero or more, + is one or more, and ? is zero or one. "​a*"​ matches "",​ "​a",​ "​aa",​ "​aaa",​ etc. "​a+"​ does not match "",​ but matches "​a",​ "​aa",​ "​aaa",​ etc. And, "​a?"​ only matches ""​ and "​a"​.
 +  * ''​.''​ = any single character, alphanumeric,​ space, or other.
 +  * ''​[]''​ = a single character. What's inside defines what king of character it can be. Ranges of characters are denoted by "​-"​. "​[a-zA-Z0-9]"​ means any alphanumeric character. "​[^a-zA-Z0-9]"​ any NON-alphanumeric character, such as period, space, slashes, parentheses,​ quotes, question mark ... etc.
 +  * ''​()''​ = defines more than one character. (abc) matches any single "​abc"​ occurrence, but (abc)+ matches "​abc",​ "​abcabc",​ abcabcabc",​ etc.
 +  * ''​|''​ = means OR, but not often used because "​(a|b|c)"​ is the same as "​[a-c]"​. It can be useful for matching either strings of longer than one character, such as "​(facile|easy)"​.
 +  * ''​{}''​ = denotes a repetition range. "​\.[a-zA-Z]{2,​3}"​ means an alphabetic string of 2-3 characters, useful for domain names. Do not use "​-"​ inside "​{}",​ the ","​ means a range inside "​{}"​. But, inside "​[]"​ use "​-"​ only, ","​ is literal comma inside "​[]"​.
 +  * ''​\''​ = used to escape all the above special characters to mean their literal. Use it for ^.[$()|*+?​{\. In php3, if you would like to find or match the literal character "​\"​ in a string, use "​\\"​ in your RegEx. I think in php4+ you don't have to escape the literal "​\"​ character. To make it more confusing, escaping some alphabetic characters changes their meaning to non-alphanumeric. "​\s"​ means space, "​\n"​ new line, "​\r"​ cursor return, "​\t"​ tab.
 +
 +
 +====== Example RegEx ======
 +To validate an email address, which can include underscore and dashes, the RegEx using php POSIX type function "​eregi"​ would look like this:
 +
 +<code php>
 +eregi("​^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,​3})$",​$email)
 +</​code>​
 +
 +Notice that ''​eregi()''​ is case insensitive (as opposed to ''​ereg()''​),​ and thus [a-z] really means [a-zA-Z] in ''​eregi()''​ function. ​ Useful regular expression tutorial links