Regex or Regular Expression
What is RegEx?
A RegEx, short for Regular Expression, is a powerful tool used for pattern matching within strings. It consists of a sequence of characters that define a search pattern, allowing you to search for, manipulate, and validate text based on specific criteria.
Regular expressions are widely used in programming, text processing, and data validation tasks to efficiently handle complex search patterns.
Key Concepts to Remember
- *: Matches the preceding character zero or more times.
- +: Matches the preceding character one or more times.
- ?: Matches zero or one of the preceding characters.
- .: Matches any single character.
- […]: Matches any single character inside the brackets.
- [^…]: Matches any single character not in the brackets.
- \d: Matches any digit character (equivalent to [0-9]).
- \D: Matches any non-digit character.
- \w: Matches any word character (equivalent to [a-zA-Z0-9_]).
- \W: Matches any non-word character.
- \b: Matches a word boundary.
- \s: Matches any whitespace character.
- \S: Matches any non-whitespace character.
- ^: Matches the start of a line or string.
- $: Matches the end of a line or string.
- {M,N}: Matches the preceding element at least M and not more than N times.
Example Using Python
Here are some basic examples of using regular expressions in Python:
# Python code snippets for regex examples
import re
print(re.findall(".", "Hello")) # Output: ['H', 'e', 'l', 'l', 'o']
print(re.findall(".*", "Hello")) # Output: ['Hello', '']
Greedy vs. Reluctant vs. Possessive Quantifiers
In regular expressions, quantifiers determine the number of occurrences of a character or group in a pattern. Understanding the differences between greedy, reluctant, and possessive quantifiers is crucial for efficient pattern matching.
- Greedy Quantifier (*): Matches as many characters as possible while still allowing the overall match to succeed.
- Reluctant Quantifier (*?): Matches as few characters as possible and still allows the match to succeed.
- Possessive Quantifier (*+): Matches as many characters as possible and does not backtrack, even if it causes the overall match to fail.
Let’s see these quantifiers in action:
Enter your regex: .*test // Greedy quantifier Enter input string to search: xtestxxxxxxtest Match: "xtestxxxxxxtest" from index 0 to 15. Enter your regex: .*?test // Reluctant quantifier Enter input string to search: xtestxxxxxxtest Matches: "xtest" from index 0 to 5, "xxxxxxtest" from index 5 to 15. Enter your regex: .*+test // Possessive quantifier Enter input string to search: xtestxxxxxxtest No match found.
A Complete Example In Python
Let’s consider a more complex example where we want to extract specific patterns from a text using regular expressions in Python:
# Python code snippet for a complete example
import re
text = "This matches given regular expression in PHP.n"
text += "This matches given regular expression in Python.n"
text += "This matches given regular expression in C.n"
text += "This matches given regular expression in Pearl."
result = re.findall("This.* Pw{3,5}.", text)
if result:
print(result)
else:
print("No match")
The output will be:
['This matches given regular expression in Python.', 'This matches given regular expression in Pearl.']
