Vinod Sebastian – B.Tech, M.Com, PGCBM, PGCPM, PGDBIO

Hi I'm a Web Architect by Profession and an Artist by nature. I love empowering People, aligning to Processes and delivering Projects.

Advertisements




Regex or Regular Expression

What is RegEx?

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. It is commonly used for finding, finding and replacing, or validating strings based on specific patterns.

For example, using specific characters such as d outside [ ] will match any digit character, ^ denotes the start, and $ denotes the end. Therefore, /^dd$/ will match any exact 2-digit text.

To Remember

  • The asterisk (*) matches the character preceding it zero or more times.
  • The plus (+) matches the character preceding it one or more times.
  • The question mark (?) matches zero or one of the preceding characters.
  • The dot (.) matches exactly one character.
  • […] matches any character inside brackets.
  • [^…] matches every character except the ones inside brackets.
  • D represents a non-digit character.
  • d represents a digit character.
  • w is the same as regex [A-Za-z0-9_].
  • W is the same as regex [^A-Za-z0-9_].
  • b is the same as regex (^w|w$|Ww|wW).
  • s matches white space.
  • S matches anything but white space.
  • ^ matches the beginning of a line or string, and $ matches the end of a line or string.
  • A matches the beginning of a string.
  • {M,N} denotes the minimum M and the maximum N match count.
# Python code snippets for regex examples
import re

print(re.findall(".", "Hello"))  # ['H', 'e', 'l', 'l', 'o']
print(re.findall(".*", "Hello"))  # ['Hello', '']
<!-- More code snippets follow -->

Greedy vs. Reluctant vs. Possessive Quantifiers

In Python and some other implementations, the common quantifiers (*, +, and ?) are greedy by default, matching as many characters as possible. They can backtrack.

In Java, quantifiers may be made possessive by appending a plus sign, which disables backing off even if it would allow the overall match to succeed.

A reluctant or “non-greedy” quantifier first matches as little as possible and can backtrack.

Enter your regex: .*test // Greedy quantifier
Enter input string to search: xtestxxxxxxtest
I found the text "xtestxxxxxxtest" starting at index 0 and ending at index 15.

Enter your regex: .*?test // Reluctant quantifier
Enter input string to search: xtestxxxxxxtest
I found the text "xtest" starting at index 0 and ending at index 5.
I found the text "xxxxxxtest" starting at index 5 and ending at index 15.

Enter your regex: .*+test // Possessive quantifier
Enter input string to search: xtestxxxxxxtest
No match found.

A Complete Example In Python

Finds all matches starting with “This” and ends with “P followed by 3 to 5 word characters and dot in a single line”:

# Python code snippet for complete example
import re

text = "This matches given regular expression in PHP.n"
text += "This matches given regular expression in Python.n"
text += "This matches given regular expression in C.n"
text += "This matches given regular expression in Pearl."

result = re.findall("This.* Pw{3,5}.", text)

if result:
    print(result)
else:
    print("No match")
['This matches given regular expression in Python.', 'This matches given regular expression in Pearl.']
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x