Introduction to Regular Expressions in Python by MarsDevs.
MarsDevs introduces you to Regular Expressions in Python.

Introduction to Regular Expressions in Python by MarsDevs.

A Regular Expression (RegEx or RE) is a special sequence of characters that uses a search pattern to find a string or set of strings. These are supported by Python, Java, R, and more.

Various common uses of regular expressions are mentioned below.

  • To find patterns in a string or file.
  • To find a string or a substring in a file.
  • To split the string into substrings.
  • To replace part of a string with another string.
  • To validate email format.

Python has a module, "re," that supports the use of regex. It either returns the first match or none. Consider the following example:

regex example python

The r character (r'portal') is for RAW, not a regex. The raw string is different from the regular string. It interprets the \ character as a \ character but not an escape character. The regular expression has its character for escaping on purpose.

Consider another example,

regex example python-1

Let's discuss how to write a regex using metacharacters or special sequences.

Meta Characters

MetaCharacters are essential as they will be used in the functions of module re. These are briefly explained below.

(1) Backslash (\)

It is used to remove or drop the special meaning of a character. Please take a look at the following examples. A dot (.) is a special character here; if you want to find it in the given string, we use \ -

Backslash example python

(2) Square Brackets ([])

It is used to search a set of characters.?

For example,

  • [0, 4] is the same as [01234]
  • [a-d] is same as [abcd]
  • [^a-d] is the same as any number except a, b, c, or d
  • [^0-3] is the same as any number except 0, 1, 2, 3, or 4

We can invert using the caret(^) symbol.

(3) Caret (^)

It is used to match the beginning of the string i.e. it checks whether the string starts with the given character or not.

For example,

  • ^M checks if the input string starts with M or not
  • ^Moh checks if the input string starts with Moh or not.

(4) Dollar ($)

It is used to match the end of the string i.e. it checks whether the string ends with the given character or not.

For example,

  • $i checks if the input string ends with i or not
  • $dhi checks if the input string ends with dhi or not.

(5) Dot (.) -

It is used to match only a single character except for the newline character (\n).

For example,

  • x.y will allow any string in place of a dot(.), a number of characters should be at least 1, i.e., xay, xaby, xbbby, etc.
  • .. will have at least 2 characters.

(6) Or (|)

It is an operator that checks whether the pattern is present in the string, before or after the or symbol.

For example,?

  • X|y will match any string that contains x or y such as xxx, yyy, xaby, etc.

(7) Question Mark (?)

It checks whether the string before the question mark occurs at least once in the regex in the sequence.

For example,

  • xy?z will be matched for the string xz, xzy, wxyz but not matched for xyyz because there are two y's. Similarly, it will not match xywz because y is not followed by z.

(8) Star (*)

It matches zero or more occurrences of the regex before the * symbol in the sequence.

For example,

  • xy*z will be matched for the string xz, xyz, xyyyc, etc. but not xywz because of out-of-sequence.

(9) Plus (+)

It matches one or more occurrences of the regex before the * symbol in the sequence.

For example,

  • xy+z will be matched for the string xyz, xyyz, xyyyc, etc. but not xz, xywz, etc.

(10) Braces ( {m, n} )

This matches any repetition before the regex that includes both m to n.

For example,

  • x{2, 4} will be matched for the string xxxy, yxxxxz, fxxd, etc., but not xy, xyz, etc.

(11) Group ( () )

It is used to group various regular expressions together and then find a match in a string.

For example,

  • (ab) is a group that can be matched in string ababaahabdyy.

Special Sequences

These do not match the actual character in the string, rather it specifies the specific location in the search string where the match should occur. This makes it easier to write commonly used patterns.

Sequence

Description

Syntax

Example

\A

It matches if the string begins with the given character.

\Amars

marsdevs

\b

It matches if the word begins or ends with the given character. \b(string) - for the beginning check.

(string)\b - for the ending check.

\bmars

marsdevs

\B

The string must not begin or end with the given regex. It is the opposite of \b.

\Bde

marsdevs

\d

It matches any decimal digit [0-9].

\d

marsdevs1

\D

It matches any non-digit character [^0-9]. It is the opposite of \d.

\D

marsdevs1

\s

It matches any whitespace character.

\s

mars devs

\S

It matches any non-whitespace character. It is the opposite of \s.

\S

mars devs

\w

It matches any alphanumeric character [a-zA-Z0-9_].?

\w

MarsDevs1

\W

It matches any non-alphanumeric character. It is the opposite of \w.

\W

#@%<

\Z

It matches if the string ends with the given regex.

devs\Z

marsdevs

To implement the above sequences consider the following example code with the given string.

sequences example code

Regex Module

In Python, there is a module named re that is used for regular expressions in Python. Import this module by using the import statement.

Syntax

regex example python-2.png

There are various functions provided by this module for working with regex in Python. We will briefly discuss these functions here.

(1) re.findall()

It returns as a list of all non-overlapping matches of the pattern in the string. The string is scanned from left to right, and matches are returned in the order they were found.

Consider the following example,

No alt text provided for this image

(2) re.compile()

Regular expressions (RE) are compiled into a Pattern object, which contains methods for various tasks such as finding pattern matching or string substitutions.

Consider the following example,

re.compile

(3) re.split()

First, a string is split by the occurrences of a character or pattern, then it returns the remaining characters (other than that pattern) from the string.

Syntax

re.split(pattern, string, maxsplit=0, flags=0)

It denotes

  • pattern -?regular expression?
  • maxsplit -?considered to be 0, if 1 then the string will be split only once, resulting in a list of length 2. It is an optional parameter.
  • flag -?helps to shorten code. eg: flags = re.IGNORECASE. It is an optional parameter.

Consider the following example,

re.split example python

(4) re.sub()

It means substring. It is used to find the pattern/substring in the given string. If found then replace by "repl". Counts, checks and maintains how often it happens.

Syntax

re.sub

Consider the following example,

resub example

(5) re.subn()

It is the same as sub() except in its own way of providing output. subn() returns a tuple with the number of replacements and a new string.

Syntax

resubn

Consider the following example,

re.sub example python

(6) re.escape()

It returns the string with all non-alphanumeric backslashes, which helps match an arbitrary literal string that can contain regular expression metacharacters in it.

Syntax

re.escape

Consider the following example,

re.escape example python

It prints

The\ Python\ programming\ language\ was\ first\ released\ on\ February\ 20,\ 1991\.

(7) re.search()

It returns either None (if not matched) or a ‘re’.MatchObject contains information about the matched part of the string. This method stops after the first match.

Consider the following example,

re.search

Match Object

It contains all the information about the search and the result and if no matches are found then none will be returned. There are various commonly used methods and properties of the Match object. These are briefly explained below.

  1. Getting the string and regex
  2. Getting the index of the matched object
  3. Getting matched substring

Consider the following example to understand these.

Match Object

要查看或添加评论,请登录

Vish B的更多文章

社区洞察

其他会员也浏览了