Data Analysis Project

Data Analysis Project



Analysing Placement dataset and taking insights from the data.

Importing the Libraries.

Reading Dataset

Dataset

First, we want to check whether null or nan values are there in the dataset.


1. Replace the Nan values with the correct ones and justify why you chose the same.

In the given dataset about placements, Salary has 67 nan values.

Not-placed students cannot get a salary. So, we replaced the value of their salary as 0 using fillna.


2. How many of them are not Placed?

The number of Not Placed students is 67.


3. Find the reason for non placement from the dataset.

- We going to find the median of not-placed students and placed students and then compare them to find the reason.

The reason for non-placement:

We found a median for non-placed and placed students. we compared them. Medians show thee exact average of students in every stage.

By that, we found, that the average of Non Placed students from ssc_p to mba_p is less than 68%. The average of Placed students is above 68% from ssc_p to etest_p.

From this, we can justify that the students who got below 68% are not placed. who got above 68% are placed. thank you.


4. What kind of relation between salary and mba_p?

By using Correlation we found the relationship between 2 columns, salary, and mba_p. 13% Directly proportional. It is a Positive Correlation.


5. Which specialization is getting a minimum salary?

Mkt&HR and Mkt&Fin specialization getting a minimum salary. The minimum salary is 2,00,000.


6. how many of them get above 500000 salary?

Ans: 3 of them from the dataset getting above 500000 salary. in that 2 male and 1 female.


7. Test the Analysis of variance between etest_p and mba_p at significance level %% ( Make decision using Hypothesis testing)

ANOVA- Analysis Of Variance

H0-There is no significant between these columns. H1- There is a significant between these columns. Accept H0 ,Reject H1.

P value is greater than 5%. So Accept H0 and Reject H1.


8. Test the similarity between the degree_t(Sci&Tech) and specialization(Mkt&HR) with respect to salary at significant level of 5%(Make decision using Hypothesis testing)

To find similarity we use T-test.

Independent Sample-Unpaired T-test. Different group(degree_t, spcialization) but same condition (salary).

P value is less than 5%. So accept the Alternative hypothesis and Reject the Null hypothesis.


9. Convert the normal distribution to the standard normal distribution for salary columns.

stdNBgraph(dataset["salary"])        

10. What is the probability Density Function of the salary range from 700000 to 900000 ?


get_pdf_probability(dataset["salary"],700000,900000)        

The probability Density Function of the salary range = 0.0005


11)Test the similarity between the degree_t(sci&Tech) with respect to etest_p and mba_p at significance level of 5%(Make decision using Hypothesis Testing)

Dependent sample- paired T Test. Same ggroup(degree_t)but Different condition(etest_p,mba_p)

Ans: Accept Null hypothesis and Reject alternate Hypothesis. There is no similarity between etest_p and degree_t and mba_p mark and degree_t.


12. Which parameter is highly correlated with salary?


ssc_p and Salary have a high relation. It's 0.538090. others are smaller than this.?

Ans: ssc_p is highly correlated with salary.


13. Plot any useful graph and explain it.

Thank You!

That's about it for this article.

I am always interested and eager to connect with like-minded people and explore new opportunities. Feel free to follow, connect, and interact with me on?LinkedIn,?Twitter,?and?YouTube. My social media---?click here?You can also reach out to me on my social media handles. I am here to help you. Ask me any doubts regarding AI and your career.

Wishing you good health and a prosperous journey into the world of AI!

Best regards,

Heerthi Raja H

要查看或添加评论,请登录

Heerthi Raja H的更多文章

社区洞察

其他会员也浏览了