Python - Lesson 5 (Data Structures - Strings Part 2)
Kannan Piedy
IIM | Python, Data science | Technical Lead and Manager with experience leading large teams and building large projects.
Welcome back to another lesson of Python Fundamentals. We'll be continuing where we left off on the previous lesson on strings. If you are new here I'd suggest you to take some time to read the lessons leading up to this day.
- Lesson 1 : Python Fundamentals
- Lesson 2 : Arithmetic & Assignment Operators
- Lesson 3 : Operators & Precedence
- Lesson 4: Data Structures - Strings Part 1
I thank you for starting this journey and without further ado , we'll dive right in.
Collecting a String Input
Allowing standard input operations is a very important aspect at this point, there are 2 types of input that we take in python, we'll go through them now.
In using Python 2.x versions of python you would be able to collect 2 common types of input from the user.
In [1]: input_value1 = input("Please enter your desired value :")
Please enter your desired value :1
In [2]: input_value2 = raw_input("Please enter a desired value :")
Please enter your desired value :1
The difference between both here is that , the type of input in this case is int whereas the type of raw_input is str.
In [3]: print type(input_value1)
<type 'int'>
print type(input_value2)
<type 'str'>
However in Python 3.x to collect input as an integer you might have to do the following :
input_value1 = input("please enter your value: ")
input_value2 = int(input_value1)
This is the way to type cast a variable of one type into another. Let us look at some common type casting cases below.
string_value = raw_input("string you want to be printed here when collecting an input")
integer_value = int(string_value)
float_value = float(integer_value)
re_stringified_float_value = str(float_value)
But there are some issues in this that we should be careful about , Let us look at these cases below. When typecasting non-numeric values into int or float.
In [5]: a = raw_input("Enter anything you want : ")
Enter anything you want : abc
In [6]: b = int(a)
ValueError: invalid literal for int() with base 10: 'abc'
As you may notice it threw an exception under ValueError as it could not convert it into the relevant value. However you may ask me now, what if I have a hexadecimal value "abc", how do I convert it. Hexadecimal values are represented as such.
In [7]: input_value = raw_input("Enter the value : ")
Enter the value : 0xabc
In [8]: b = int(input_value, 0)
In [9]: b
Out[9]: 2748
Please Note however that in Python 3.x the type casting int function with any base other than 0 takes zero-padded strings but if you give the base as 0, then it does not. For example :
a = "033"
b = int(a)
print b
--
33
However :
a = "033"
b = int(a, 0)
ValueError: invalid literal for int() with base 0: '033'
This issue exists only in Python 3.x and not Python 2.x
We have seen cases where a given input if it is not a valid parameter , we might get into issues of type casting with a ValueError, Let us look at how to check the strings content.
isalpha , isspace, isdigit and others
There are a few functions in the string module which allows us to check for certain conditions for example :
value = raw_input("enter an input : ")
condition_value_is_space = value.isspace()
print condition_value_is_space
The above use case can be used to check if the given value is a space parameter, please note however that this is a character level check and always checks the entire string to consist of only spaces and nothing else. It accepts multiple spaces also to return true.
a = " "
print a.isspace()
True
The next function is isalpha , This can be used to check if the string is made up of only alphabets. For example the input is for a name field and you do not want to take any other parameter other than alphabets (There are names with special characters in them so never make this mistake.). Then this is a good way to do so.
name = raw_input("please enter your first name :")
print name.isalpha()
Similarly we may use isdigit to check if a given number is a numeric value :
year = raw_input("please enter your birth year")
print year.isdigit()
You may have a doubt about why I did not use isinstance over the class int in this instance. Remember , cost here is of type string and I must check this before I can typecast. Please note however that if you give a float value like 3.14 the result of isdigit will be False, due to the special character "."
There are other functions similar in use, namely : islower , isupper, istitle , isalnum(Stands for alphanumeric). They can be used for the following use cases.
a = "HELLO"
b = "hello"
c = "Hello There"
d = "hello123"
print a.isupper()
print b.islower()
print c.istitle()
print d.isalnum()
----
True
True
True
True
Please Note however that istitle expects the value to be in title case not just capitalised. To explain the difference let us look at another function as well title
a = "hello world"
b = a.capitalize()
c = a.title()
print b
print c
--
Hello world
Hello World
Title case is when every word in the sentence is capitalized, whereas Capitalize works on the first character of the string.
__contains__ is another function that exists, it can be used to check if a value is present in the string. Here is an example where we collect an input from a user and check if there is an article (a, an, the) present in it.
sentence = "this is a nice world"
article_a_found = sentence.__contains__(" a ")
article_an_found = sentence.__contains__(" an ")
article_the_found = sentence.__contains__(" the ")
article_found = article_a_found or article_an_found or article_the_found
print article_found
For those who are thinking , "Hey but this is a terrible way to do it". I know but I'm having to use only those functions and operators that I have already taught before , but we will get to better methods soon I promise.
You may be wondering about the use cases where this might fail , for example :
input_value = "the girl looks fabulos"
For those who are wondering that I spelled fabulous incorrectly , I got 'U' .
To handle this kind of situations that is cases where the article in question occurs in the beginning of a sentence or the end.
startswith
In [29]: string_value = "the girl looks fabulous"
In [30]: checking_article_the = string_value.startswith("the ")
In [31]: print checking_article_the
True
Similarly we also have a function endswith
In [32]: filename = "testcase.txt"
In [33]: txt_file_check = filename.endswith(".txt")
In [34]: print txt_file_check
True
I think It is time to come back to a topic which we had touched a while ago , Operators.
When we were going through Lesson 3 : Operators and Precedence (Check link above) , I explained there are a few more operators we will go into later. We will look into one of those right now.
Python Membership Operators
We may use these operators rather than __contains__ , the operators are
In [30]: print "Alice" in "Alice in wonderland"
True
To avoid comparison issues with case sensitivity , we usually convert them to lower during checking.
In [31]: print a.lower() in b.lower()
There is another module very similar to the Index module which we saw a while ago , It is the find module. This can be used to find if a substring is present in a string. It returns the index if the value is found, else it returns -1 , this is the major difference between the 2. Let me explain.
In [32]: a = "abcdefg"
In [33]: a.index("z")
ValueError: substring not found
In the above case as you can see , it raises an exception , Let us look at find.
In [34]: a.find("z") -1
index and find both take an optional parameter, same as slicing. Start and End.
The syntax of both follows something like this :
start_index_where_substring_is_found = string_obj.find(substring, start_index, end_index)
However please note, that the index that is returned is with respect to the primary string object and not the slice. For example :
a = "hello there"
b = a.find("e", 5, 10)
c = a[5:10].find("e")
In the above case you have to realise that b and c will have varied results.
In [43]: b Out[43]: 8 In [45]: c Out[45]: 3
This is because c , is returning the index based on the substring after the slice, whereas b returns the index of the primary string before the slice.
format
Often I've come across scenarios where I've had to use a template , on to which I'd add parameters to vary responses. To do this in python we have a neat little function called format.
It replaces specific parameters on the string with arguments we may specify later. For example .
template_value = "Hi {}, Good {} . I {} to inform you that you have {}."
passed_candidate_morning = template_value.format("Jack", "morning", "am pleased", "passed")
passed_candidate_later = template_value.format("Jane", "evening", "am pleased", "passed")
failed_candidate_later = template_value.format("Jim", "evening", "regret", "failed")
print passed_candidate_morning
print passed_candidate_later
print failed_candidate_later
------------------------
Hi Jack, Good morning . I am pleased to inform you that you have passed.
Hi Jane, Good evening . I am pleased to inform you that you have passed.
Hi Jim, Good evening . I regret to inform you that you have failed.
The above cases showcases clearly how we can use a string as a template using format , we will later come across true Template class objects and their uses but for now, this should do.
I hope we learnt a lot today. I'll get back to you soon with more information on strings in Lesson 6 : Data structures - Strings Part 3.
Thank you.