Vinod Sebastian – B.Tech, M.Com, PGCBM, PGCPM, PGDBIO

Hi I'm a Web Architect by Profession and an Artist by nature. I love empowering People, aligning to Processes and delivering Projects.

Tag: Regex

Regex

  • Regex or Regular Expression

    Regex or Regular Expression

    What is RegEx?

    A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. It is commonly used for finding, finding and replacing, or validating strings based on specific patterns.

    For example, using specific characters such as d outside [ ] will match any digit character, ^ denotes the start, and $ denotes the end. Therefore, /^dd$/ will match any exact 2-digit text.

    To Remember

    • The asterisk (*) matches the character preceding it zero or more times.
    • The plus (+) matches the character preceding it one or more times.
    • The question mark (?) matches zero or one of the preceding characters.
    • The dot (.) matches exactly one character.
    • […] matches any character inside brackets.
    • [^…] matches every character except the ones inside brackets.
    • D represents a non-digit character.
    • d represents a digit character.
    • w is the same as regex [A-Za-z0-9_].
    • W is the same as regex [^A-Za-z0-9_].
    • b is the same as regex (^w|w$|Ww|wW).
    • s matches white space.
    • S matches anything but white space.
    • ^ matches the beginning of a line or string, and $ matches the end of a line or string.
    • A matches the beginning of a string.
    • {M,N} denotes the minimum M and the maximum N match count.
    # Python code snippets for regex examples
    import re
    
    print(re.findall(".", "Hello"))  # ['H', 'e', 'l', 'l', 'o']
    print(re.findall(".*", "Hello"))  # ['Hello', '']
    <!-- More code snippets follow -->

    Greedy vs. Reluctant vs. Possessive Quantifiers

    In Python and some other implementations, the common quantifiers (*, +, and ?) are greedy by default, matching as many characters as possible. They can backtrack.

    In Java, quantifiers may be made possessive by appending a plus sign, which disables backing off even if it would allow the overall match to succeed.

    A reluctant or “non-greedy” quantifier first matches as little as possible and can backtrack.

    Enter your regex: .*test // Greedy quantifier
    Enter input string to search: xtestxxxxxxtest
    I found the text "xtestxxxxxxtest" starting at index 0 and ending at index 15.
    
    Enter your regex: .*?test // Reluctant quantifier
    Enter input string to search: xtestxxxxxxtest
    I found the text "xtest" starting at index 0 and ending at index 5.
    I found the text "xxxxxxtest" starting at index 5 and ending at index 15.
    
    Enter your regex: .*+test // Possessive quantifier
    Enter input string to search: xtestxxxxxxtest
    No match found.

    A Complete Example In Python

    Finds all matches starting with “This” and ends with “P followed by 3 to 5 word characters and dot in a single line”:

    # Python code snippet for complete example
    import re
    
    text = "This matches given regular expression in PHP.n"
    text += "This matches given regular expression in Python.n"
    text += "This matches given regular expression in C.n"
    text += "This matches given regular expression in Pearl."
    
    result = re.findall("This.* Pw{3,5}.", text)
    
    if result:
        print(result)
    else:
        print("No match")
    ['This matches given regular expression in Python.', 'This matches given regular expression in Pearl.']