Regex or Regular Expression
What is RegEx?
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. It is commonly used for finding, finding and replacing, or validating strings based on specific patterns.
For example, using specific characters such as d outside [ ] will match any digit character, ^ denotes the start, and $ denotes the end. Therefore, /^dd$/ will match any exact 2-digit text.
To Remember
- The asterisk (*) matches the character preceding it zero or more times.
- The plus (+) matches the character preceding it one or more times.
- The question mark (?) matches zero or one of the preceding characters.
- The dot (.) matches exactly one character.
- […] matches any character inside brackets.
- [^…] matches every character except the ones inside brackets.
- D represents a non-digit character.
- d represents a digit character.
- w is the same as regex [A-Za-z0-9_].
- W is the same as regex [^A-Za-z0-9_].
- b is the same as regex (^w|w$|Ww|wW).
- s matches white space.
- S matches anything but white space.
- ^ matches the beginning of a line or string, and $ matches the end of a line or string.
- A matches the beginning of a string.
- {M,N} denotes the minimum M and the maximum N match count.
# Python code snippets for regex examples
import re
print(re.findall(".", "Hello")) # ['H', 'e', 'l', 'l', 'o']
print(re.findall(".*", "Hello")) # ['Hello', '']
<!-- More code snippets follow -->
Greedy vs. Reluctant vs. Possessive Quantifiers
In Python and some other implementations, the common quantifiers (*, +, and ?) are greedy by default, matching as many characters as possible. They can backtrack.
In Java, quantifiers may be made possessive by appending a plus sign, which disables backing off even if it would allow the overall match to succeed.
A reluctant or “non-greedy” quantifier first matches as little as possible and can backtrack.
Enter your regex: .*test // Greedy quantifier Enter input string to search: xtestxxxxxxtest I found the text "xtestxxxxxxtest" starting at index 0 and ending at index 15. Enter your regex: .*?test // Reluctant quantifier Enter input string to search: xtestxxxxxxtest I found the text "xtest" starting at index 0 and ending at index 5. I found the text "xxxxxxtest" starting at index 5 and ending at index 15. Enter your regex: .*+test // Possessive quantifier Enter input string to search: xtestxxxxxxtest No match found.
A Complete Example In Python
Finds all matches starting with “This” and ends with “P followed by 3 to 5 word characters and dot in a single line”:
# Python code snippet for complete example
import re
text = "This matches given regular expression in PHP.n"
text += "This matches given regular expression in Python.n"
text += "This matches given regular expression in C.n"
text += "This matches given regular expression in Pearl."
result = re.findall("This.* Pw{3,5}.", text)
if result:
print(result)
else:
print("No match")
['This matches given regular expression in Python.', 'This matches given regular expression in Pearl.']
