Learn How to Use Strings in Python 3
Text processing is an important component of computer programming as it allows the user to interact with the software. Without text we wouldn’t be able to type search queries in Google, write essays in MS Word, or send our beloved emojis via SMS. You can handle text in python by understanding the string class.
A string in Python is text encapsulated within delimiters which are either single, double, or triple quotes. This tutorial will provide an introduction to the string class in python. You’ll learn basic concepts such as how to create a string, built-in operators such as slicing, and a sample of some of the common string functions in the python library. Note, in the following examples (>>> and …) indicates prompt.
Creating Strings in Python
Below are three valid ways to create strings in python.
>>> e = 'Hello'
>>> s = "Spanish"
>>> i = """Ciao"""
>>> e
'Hello'
>>> s
'Spanish'
>>> i
'Ciao'
Now, a question you may have brewing in your head is which one to use? Well, lets take some advice from the guardian of python which is the Python Software Foundation.
In plain English: String literals can be enclosed in matching single quotes (‘) or double quotes (“)
So, according to this quote (no pun intended) it doesn’t matter, it really boils down to which one you prefer.
Escape Characters
Kon’nichiwa, genkidesuka
This is Japanese for Hello, how are you?
If we plug this friendly message into python and run the program we get this unexpected error:
>>> j = 'Kon'nichiwa, genkidesuka'
File "", line 1
j = 'Kon'nichiwa, genkidesuka'
^
Ugh, not exactly what we had in mind. After observing the error we can conclude that the culprit is not the result of the letter a, but instead the apostrophe. One solution is to escape the apostrophe; an escape character is one that alters the interpretation of subsequent characters. Here’s the corrected code:
>>> j = 'Kon\'nichiwa, genkidesuka'
>>> j
"Kon'nichiwa, genkidesuka"
As you can see the backslash tells the compiler to add the apostrophe to the string. However, there’s a couple of more solutions to this. We could mismatch the quotes, so the following would be ok.
>>> j = "Kon'nichiwa, genkidesuka"
>>> j
"Kon'nichiwa, genkidesuka"
Or, this could work with triple quotes:
>>> j = """Kon'nichiwa, genkidesuka"""
>>> j
"Kon'nichiwa, genkidesuka"
To see the list of escape characters available read the python docs.
Triple quotes are useful if you want to create a string that spans multiple lines. This can easily be accomplished by wrapping the content in triple quotes as shown below:
shakespeare = """
"Love is blind"
"The game is afoot"
"Seen better days"
"Good riddance"
"""
Another use case for triple quotes is to create docstrings which are a convenient way of accessing documentation within python modules, classes, functions, or methods. An example of how to access the docstring of a function interactively is indicated below:
>>> def addnumbers(a, b):
""" this function sums two numbers together. """
return a + b
>>> help(addnumbers)
Help on function addnumbers in module __main__:
addnumbers(a, b)
this function sums two numbers
together.
(END)
String Indexing in Python
Strings can be broken down into individual characters, and those characters have a corresponding index. Let’s look at the following string in python:
s = 'jambo'
The below diagram illustrates how string indexing works.
string indexes in python
As you can see the first letter or ‘j’ corresponds to to the 0th index, the second letter ‘a’ corresponds to the first index, and the last letter ‘o’ corresponds to the 4th index. We can access the value of an index by using subscript notation which is a way to identify the element of a string. In python, the name of the string is followed by square brackets and looks like the following: variable[index]
A concrete example is shown below:
s = 'jambo'
print(s[0])
print(s[1])
print(s[2])
print(s[3])
print(s[4])
j
a
m
b
o
Since indexes in python start at 0 (don’t ask me why), we can safely deduce that the last index of a string will always be one less than the length of the string. In this case jambo consists of 5 letters, and we can compute the index of the last element by taking it’s length and subtracting one. There’s a built in len() function in python that helps us with this. An example of how to compute the last element of a string is shown below:
print('The last index of {} is [{}] which = {}'.
format('jambo',len('jambo') - 1, s[len('jambo')-1])
The last index of jambo is [4] which = o
Negative indices are permitted in python. The range for negative indices are from-len() … -1. A diagram of negatives indices is shown below:
negative indices in python
String Slicing in Python
String slicing is the process of extracting one or more characters from a string. The syntax for slicing in python is as follows:
str[start:end:step(optional]
Let’s look at a simple example:
b = 'Bonjour'
print(b[0:3])
Output: Bon
An illustration of what’s happening is shown below:
string slicing in python
When the string is sliced from [0:3] the returned string starts at 0 and stops at 3. The returned string is everything from indexes 0, 1, and 2. The ending index or 3 indicates where the slicing stops; it doesn’t actually get counted in the returned string. Below are more examples of how to slice strings in Python:
e = 'Hello'
# includes elements at index 0 and 1
print(e[0:2])
# includes elements from index 0-4
print(e[0:5])
# includes all elements
print(e[::])
# start at 0, and count every other 2nd element
print(e[::2])
He Hello Hello Hlo
Strings can also be sliced using positive and negative indexes:
print('e[2:-2] = {}'.format(e[2:-2]))
print('e[0:-3] = {}'.format(e[0:-3]))
print('e[2:-1] = {}'.format(e[2:-1]))
# from positive 4 to -3 returns empty list
print('e[4:-3] = {}\n'.format(e[4:-3]))
print('e[::2] = {}'.format(e[::2]))
print('e[::-1] = {}'.format(e[::-1]))
e[2:-2] = l
e[0:-3] = He
e[2:-1] = ll
e[4:-3] =
e[::2] = Hlo
e[::-1] = olleH
Common string operations in python:
s = 'Hola '
print('{} + Amigos! = {}'.format(s, s + 'Amigos!'))
print('{} x {} = {}'.format('s', 3, s * 3))
# membership testing
print('{} in {} = {}'.format('h'.upper(), s, 'h' in s))
print('{} not in {} = {}'.format('h', s, 'h' not in s))
# tricky, it has the extra space
print('The length of {} = {}'.format(s, len(s)))
print('The letter {} appears {} time'.format(s[3], s.count('a')))
Hola + Amigos! = Hola Amigos!
s x 3 = Hola Hola Hola
H in Hola = False
h not in Hola = True
The length of Hola = 5
The letter a appears 1 time
Built-in string functions in python 3.0+
g = 'good-day'
print('{} to upper case = {}'.format(g, g.upper()))
print('{} capitalized is {}'.format(g, g.capitalize()))
print('In {} replace {} with {} : {}\n'.format(g, 'd', 'f',
g.replace('d', 'f', 1).capitalize()))
print('Center \'{}\' \n{}\nThe length was {} but is now {}'.
format(g, g.center(50, '*'), len(g), len(g.center(50, '*'))))
print('{} ends with {}? {}'.format(g, 'i', g.endswith('i')))
print('{} index is: {}'.format(g, g.index('a')))
print('{} joined with {} is: {}'.format("' '", g, ' '.join(g)))
print('{} split at {} is {}'.format(g, '-', g.split('-')))
print('{} in titlecase format is: {}'.format(g, g.title()))
good-day to upper case = GOOD-DAY
good-day capitalized is Good-day
In good-day replace d with f : Goof-day
Center 'good-day'
*********************good-day*********************
The length was 8 but is now 50
good-day ends with i? False
good-day index is: 6
' ' joined with good-day is: g o o d - d a y
good-day split at - is ['good', 'day']
good-day in titlecase format is: Good-Day
This article was originally published on Purcell Consult.
================================================================= Want to learn how to use Python's most popular IDE Pycharm? In the free pdf guide "Getting the Hang of PyCharm" you'll learn all of the amazing features in PyCharm along with how to get started with data science. Subscribe to the Purcell Consult newsletter and get started A.S.A.P.