Python Interview Questions Set 3
Intermediate Level
What are Pandas?
Pandas is an open-source python library that has a very rich set of data structures for data-based operations. Pandas with their cool features fit in every role of data operation, whether it be academics or solving complex business problems. Pandas can deal with a large variety of files and are one of the most important tools to have a grip on.
What are data frames?
A pandas dataframe is a data structure in pandas that is mutable. Pandas have support for heterogeneous data which is arranged across two axes. ( rows and columns). Reading files into pandas:- 12 Import pandas as pddf=p.read_csv(“mydata.csv”) Here, df is a pandas data frame. read_csv() is used to read a comma delimited file as a dataframe in pandas.
What is a Pandas Series?
Series is a one-dimensional panda’s data structure that can data of almost any type. It resembles an excel column. It supports multiple operations and is used for single-dimensional data operations. Creating a series from data: Code: import pandas as pd data=["1",2,"three",4.0] series=pd.Series(data) print(series) print(type(series))
What do you understand about pandas groupby?
A pandas groupby is a feature supported by pandas that are used to split and group an object. Like the sql/mysql/oracle groupby it is used to group data by classes, and entities which can be further used for aggregation. A dataframe can be grouped by one or more columns. Code: df = pd.DataFrame({'Vehicle':['Etios','Lamborghini','Apache200','Pulsar200' ], 'Type':["car","car","motorcycle","motorcycle"]}) df Output:
What do you understand about pandas groupby?
A pandas groupby is a feature supported by pandas that are used to split and group an object. Like the sql/mysql/oracle groupby it is used to group data by classes, and entities which can be further used for aggregation. A dataframe can be grouped by one or more columns. Code: df = pd.DataFrame({'Vehicle':['Etios','Lamborghini','Apache200','Pulsar200' ], 'Type':["car","car","motorcycle","motorcycle"]}) df Output: To perform groupby type the following code: df.groupby('Type').count()
How to create a dataframe from lists?
To create a dataframe from lists, 1) create an empty dataframe 2) add lists as individuals columns to the list Code: df=pd.DataFrame() bikes=["bajaj","tvs","herohonda","kawasaki","bmw"] cars=["lamborghini","masserati","ferrari","hyundai","ford"] df["cars"]=cars df["bikes"]=bikes df Output:
领英推荐
How to create a data frame from a dictionary?
A dictionary can be directly passed as an argument to the DataFrame() function to create the data frame. Code: import pandas as pd bikes=["bajaj","tvs","herohonda","kawasaki","bmw"] cars=["lamborghini","masserati","ferrari","hyundai","ford"] d={"cars":cars,"bikes":bikes} df=pd.DataFrame(d) df Output:
How to combine dataframes in pandas?
Two different data frames can be stacked either horizontally or vertically by the concat(), append(), and join() functions in pandas. Concat works best when the data frames have the same columns and can be used for concatenation of data having similar fields and is basically vertical stacking of dataframes into a single dataframe. Append() is used for horizontal stacking of data frames. If two tables(dataframes) are to be merged together then this is the best concatenation function. Join is used when we need to extract data from different dataframes which are having one or more common columns. The stacking is horizontal in this case. Before going through the questions, here’s a quick video to help you refresh your memory on Python.
What kind of joins does pandas offer?
Pandas have a left join, inner join, right join, and outer join.
How to merge dataframes in pandas?
Merging depends on the type and fields of different dataframes being merged. If data has similar fields data is merged along axis 0 else they are merged along axis 1.
Give the below dataframe drop all rows having Nan.
The dropna function can be used to do that. df.dropna(inplace=True) df
How to access the first five entries of a dataframe?
By using the head(5) function we can get the top five entries of a dataframe. By default df.head() returns the top 5 rows. To get the top n rows df.head(n) will be used.
How to access the last five entries of a dataframe?
By using the tail(5) function we can get the top five entries of a dataframe. By default df.tail() returns the top 5 rows. To get the last n rows df.tail(n) will be used.
How to fetch a data entry from a pandas dataframe using a given value in index?
To fetch a row from a dataframe given index x, we can use loc. Df.loc[10] where 10 is the value of the index. Code: import pandas as pd bikes=["bajaj","tvs","herohonda","kawasaki","bmw"] cars=["lamborghini","masserati","ferrari","hyundai","ford"] d={"cars":cars,"bikes":bikes} df=pd.DataFrame(d) a=[10,20,30,40,50] df.index=a df.loc[10]