Python - Lesson 6 (Data structures - Strings Part 3)
Kannan Piedy
IIM | Python, Data science | Technical Lead and Manager with experience leading large teams and building large projects.
For those who are here for the first time I say welcome and to others Welcome back. We are here for another lesson on Python. Today we will be covering the last few functions belonging to class str and its uses with some examples.
If you have not read the previous lessons, I'd suggest you to refer it up with these links.
- Lesson 1 : Python Fundamentals
- Lesson 2 : Arithmetic & Assignment Operators
- Lesson 3 : Operators & Precedence
- Lesson 4: Data Structures - Strings Part 1
- Lesson 5: Data Structures - Strings Part 2
Now without further ado we'll dig right in.
Removing Leading and Trailing Spaces
Leading and trailing spaces are annoying , they usually end up at places where we least expect it and might cause a lot of trouble. For example.
In [1]: name1 = "Jenny"
In [2]: name2 = raw_input("please enter your name : ")
Jenny
In [3]: name1 == name2
False
You might be wondering what happened. How come the system wouldn't give the result as True, until you spot the culprit.
In [4]:name2
Out [4]:'Jenny '
That extra space at the end of the string value, that causes the equality operator to fail. Hence to remove trailing spaces like in the case above, we use rstrip (basically a right strip function).
In [5]: name2_stripped = name2.rstrip()
In [6]: name2_stripped == name1
Out [6]: True
To remove leading spaces like ' Jenny' we we use something similar, called lstrip (basically a left strip function)
In [7]: value1 = " John"
In [8]: value2 = " John"
In [9]: value1.lstrip() == value2.lstrip()
Out [9]: True
To remove both leading and trailing spaces we can always use strip , which is how most people use it.
In [10]: value1 = " JOIN "
In [11]: value1.strip()
Out [11]: 'JOIN'
Simple ? , well here is the kicker all three strip functions can take an optional argument as a substring. Where the optional argument is the value which needs to be removed.
In [12]: value1 = "aaaaaaabasdasdasdassaaaabbbbbbbbabaaaaaa"
In [13]: value1.strip("a")
Out [13]: 'basdasdasdassaaaabbbbbbbbab'
I would also like to point out that giving a sub-string greater than length 1 has a different result than one you might be assuming it would do.
In [14]: filename = "text.txt"
In [15]: value = "text"
In [16]: value == filename.rstrip(".txt")
Out [16]: False
You might be wondering what happened there. What happens in this scenario is that from the string rstrip will recursively remove any of the characters in the sub-string provided which appears as the last term in the iteration until the last term of the output is no longer present in the sub string. For example :
STEP 1 : CHECK LAST CHARACTER IN VARIABLE FILENAME
STEP 2: IF LAST CHARACTER EXISTS IN SUB-STRING , REMOVE THE CHARACTER FROM THE STRING
STEP 3 : REPEAT STEP 1 and 2 UNTIL LAST CHARACTER IS NOT IN SUB-STRING
STEP 4 : RETURN RESULT
In [17]: filename.rstrip(".txt") Out [17]: te
If I were to use strip , rather than rstrip. The same operation occurs for both the sides of the string object. The recursive checking of the last and first characters of the string will keep happening until both characters are not present in the sub string.
In [18]: filename.strip(".txt") Out [18]: e
The next function we can look at is swapcase as the name implies it swaps the case of the given string.
In [19]: a = "This is a Test"
In [20]: a.swapcase()
Out[20]: 'tHIS IS A tEST'
The next function we will be seeing is ljust. It returns the left justified value of the string , given a particular width and character of choice. For example :
string_1 = "Hi there how are you"
string_2 = string_1.ljust(25,"$")
print string_2
---
Hi there how are you$$$$$
we gave the character as "$" and we mentioned that the total width of the string after ljust must be 25 characters.
To prefix we use zfill
string_1 = "Hi there how are you"
string_2 = string_1.zfill(25)
print string_2
---
00000Hi there how are you
However note that zfill does not take the optional argument.
These 3 functions zfill, ljust and center are the padding functions in strings.
How do we replace one character with another, to handle this scenario we have the replace function in python.
string_value = "I bought a dog"
string_value2 = string_value.replace("dog","cat")
print string_value2
--
I bought a cat
However keep in mind that there might be more than one occurrences of the repeated word.
In [21]: string_value = "this is a good book and I like reading good books"
In [22]: string_value.replace("book", "magazine")
Out [22]: 'this is a good magazine and I like reading good magazines'
If you notice both occurrences of the word have been replaced, to replace only specific number of occurrences , replace takes an optional argument which by default is all occurrences.
In [23]: string_value.replace("book", "magazine", 1)
Out [23]: 'this is a good magazine and I like reading good books'
There are 3 more major functions left at the moment that we will be covering.
Splitting a string , Joining an iterator object and Partitioning a string
A string may be split based on any parameter , like for example it can be used for segregating a row in a csv file, it can be used to break a sentence into words or it can be used to separate a paragraph into its corresponding sentences.
In [24]: string_value = "this is a good book"
In [25]: string_value.split()
Out [25]: ['this', 'is', 'a', 'good', 'book']
We will go further into the properties of List data structures in the next lesson. but for now assume it works similar to any sequence of data. Split by default takes " " as an argument.
Please Note however that giving an empty separator will not break a string into each character.
In [26]: string_value.split("") ValueError: empty separator
We can however break a string into constituent characters with each element being a value of a list by doing so :
In [27]: print list(string_value)
['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 'g', 'o', 'o', 'd', ' ', 'b', 'o', 'o', 'k']
Typecasting of a string into a list breaks it into its constituent characters. However you may ask me how do we separate based on only a particular separator for example like the first or the last.
partition
Partition is a special case of split , where we are splitting based on the first occurrence of the argument.
In [28]: string_value = "10880 Malibu Point, Malibu, California"
To get the state we need to split by the last "," and to get the first part of the address we need to split by the first ","
In [29]: list_value = string_value.partition(",")
In [30]: list_value
Out [30]: ('10880 Malibu Point', ',', ' Malibu, California')
In [31]: start_value = list_value[0]
In [32]: list_value2 = string_value.rpartition(",")
In [33]: list_value2
Out [33]: ('10880 Malibu Point, Malibu', ',', ' California')
In [34]: end_value = list_value2[-1]
As you can notice above by using partition and rpartition we can accomplish a few special case scenarios. For anyone who is wondering what the address is , it is Tony Stark's.
Note : the argument for partition is mandatory unlike in the case of split.
To join a list back into a string , we may use the join function.
The syntax would be as such :
"seperator".join(iterable)
For example :
In [35]: list_value = ['this', 'is', 'a', 'good', 'book']
In [36]: result = " ".join(list_value)
In [37]: result
Out [37]: "this is a good book"
I hope most concepts touched on in this are understandable. We will be touching base on a few important concepts within python programming like conditional statements and then return back to List Data structures and looping constructs.
Thank you for being patient through the lesson. Practice various functions that have been explained throughout the string data structure types. there are a few which have been left out , which we will touch when we look into importing other packages, classes and objects etc.
Thank you for your continued support and see you in the next lesson.