Introduction to Regular Expressions in Python by MarsDevs.
A Regular Expression (RegEx or RE) is a special sequence of characters that uses a search pattern to find a string or set of strings. These are supported by Python, Java, R, and more.
Various common uses of regular expressions are mentioned below.
Python has a module, "re," that supports the use of regex. It either returns the first match or none. Consider the following example:
The r character (r'portal') is for RAW, not a regex. The raw string is different from the regular string. It interprets the \ character as a \ character but not an escape character. The regular expression has its character for escaping on purpose.
Consider another example,
Let's discuss how to write a regex using metacharacters or special sequences.
Meta Characters
MetaCharacters are essential as they will be used in the functions of module re. These are briefly explained below.
(1) Backslash (\)
It is used to remove or drop the special meaning of a character. Please take a look at the following examples. A dot (.) is a special character here; if you want to find it in the given string, we use \ -
(2) Square Brackets ([])
It is used to search a set of characters.?
For example,
We can invert using the caret(^) symbol.
(3) Caret (^)
It is used to match the beginning of the string i.e. it checks whether the string starts with the given character or not.
For example,
(4) Dollar ($)
It is used to match the end of the string i.e. it checks whether the string ends with the given character or not.
For example,
(5) Dot (.) -
It is used to match only a single character except for the newline character (\n).
For example,
(6) Or (|)
It is an operator that checks whether the pattern is present in the string, before or after the or symbol.
For example,?
(7) Question Mark (?)
It checks whether the string before the question mark occurs at least once in the regex in the sequence.
For example,
(8) Star (*)
It matches zero or more occurrences of the regex before the * symbol in the sequence.
For example,
(9) Plus (+)
It matches one or more occurrences of the regex before the * symbol in the sequence.
For example,
(10) Braces ( {m, n} )
This matches any repetition before the regex that includes both m to n.
For example,
(11) Group ( () )
It is used to group various regular expressions together and then find a match in a string.
For example,
Special Sequences
These do not match the actual character in the string, rather it specifies the specific location in the search string where the match should occur. This makes it easier to write commonly used patterns.
Sequence
Description
Syntax
Example
\A
It matches if the string begins with the given character.
\Amars
marsdevs
\b
It matches if the word begins or ends with the given character. \b(string) - for the beginning check.
(string)\b - for the ending check.
\bmars
marsdevs
\B
The string must not begin or end with the given regex. It is the opposite of \b.
\Bde
marsdevs
\d
It matches any decimal digit [0-9].
\d
领英推荐
marsdevs1
\D
It matches any non-digit character [^0-9]. It is the opposite of \d.
\D
marsdevs1
\s
It matches any whitespace character.
\s
mars devs
\S
It matches any non-whitespace character. It is the opposite of \s.
\S
mars devs
\w
It matches any alphanumeric character [a-zA-Z0-9_].?
\w
MarsDevs1
\W
It matches any non-alphanumeric character. It is the opposite of \w.
\W
#@%<
\Z
It matches if the string ends with the given regex.
devs\Z
marsdevs
To implement the above sequences consider the following example code with the given string.
Regex Module
In Python, there is a module named re that is used for regular expressions in Python. Import this module by using the import statement.
Syntax
There are various functions provided by this module for working with regex in Python. We will briefly discuss these functions here.
(1) re.findall()
It returns as a list of all non-overlapping matches of the pattern in the string. The string is scanned from left to right, and matches are returned in the order they were found.
Consider the following example,
(2) re.compile()
Regular expressions (RE) are compiled into a Pattern object, which contains methods for various tasks such as finding pattern matching or string substitutions.
Consider the following example,
(3) re.split()
First, a string is split by the occurrences of a character or pattern, then it returns the remaining characters (other than that pattern) from the string.
Syntax
re.split(pattern, string, maxsplit=0, flags=0)
It denotes
Consider the following example,
(4) re.sub()
It means substring. It is used to find the pattern/substring in the given string. If found then replace by "repl". Counts, checks and maintains how often it happens.
Syntax
Consider the following example,
(5) re.subn()
It is the same as sub() except in its own way of providing output. subn() returns a tuple with the number of replacements and a new string.
Syntax
Consider the following example,
(6) re.escape()
It returns the string with all non-alphanumeric backslashes, which helps match an arbitrary literal string that can contain regular expression metacharacters in it.
Syntax
Consider the following example,
It prints
The\ Python\ programming\ language\ was\ first\ released\ on\ February\ 20,\ 1991\.
(7) re.search()
It returns either None (if not matched) or a ‘re’.MatchObject contains information about the matched part of the string. This method stops after the first match.
Consider the following example,
Match Object
It contains all the information about the search and the result and if no matches are found then none will be returned. There are various commonly used methods and properties of the Match object. These are briefly explained below.
Consider the following example to understand these.