Descriptive Data Analysis of Sales Data

Descriptive Data Analysis of Sales Data

This project/case study is part of my data analytics portfolio. I conducted descriptive data analysis of sales data. I conducted this analysis using R programming language. I also did an analysis of the same data using SQL but i will write a separate article for that, this article contains analysis only using R. Data consisted of 3 CSV files. Entire data analysis was done in the RStudio.

Setting up Environment

Installing Packages

install.packages("tidyverse")
install.packages("dplyr")
install.packages("lubridate")        

Loading Libraries

library(tidyverse)
library(dplyr)
library(lubridate)
library(ggplot2)        

Loading Data Files

Orders_List <- read.csv("ListOfOrders.csv")
Order_Details <- read.csv("OrderDetails.csv")
Sales_Target <- read.csv("SalesTarget.csv")        

Data Cleaning

Checking Duplicates in all data files

sum(duplicated(Orders_List))
sum(duplicated(Order_Details))
sum(duplicated(Sales_Target))        

When i was exploring data in the excel file i noticed that total number of rows in "List of Orders" are 500 but when i checked total number of rows in RStudio it showed 560 rows this means there were 60 blank rows in "List of Orders" file.

No alt text provided for this image

So i removed blank rows from "List of Orders" data file.

blankrows <- apply(Orders_List == "", 1, all)
Orders_List <- Orders_List[!blankrows, ]        

Then i checked duplicate values in each data file.

sum(duplicated(Orders_List))
sum(duplicated(Order_Details))
sum(duplicated(Sales_Target))        

No duplicate values were found in any of the files.

No alt text provided for this image

Then i converted "Order.Date" column of the List of Orders table to the default data format of R language because later on while analyzing and digging down the data it will create problems.

Orders_List$Order.Date <- as.Date(Orders_List$Order.Date, "%d-%m-%Y")        

Analysis

I created a new data frame "Combined" by performing inner join on Orders_List and Order_Details as none of the files have all the required columns needed for data analysis in one single file.

Combined <- merge(Orders_List, Order_Details, by = "Order.ID", all = FALSE)        

1 - Finding top 10 cities with highest profit along with total products sold and total customers in those cities.

A <- Combined %>% group_by(City) %>% 
summarize(Profit = sum(Profit),
Products_Sold = sum(Quantity),
Total_Customers = n_distinct(CustomerName)) %>% 
arrange(desc(Profit)) %>% 
head(n=10)        

Output

No alt text provided for this image

Plotting

Then i created visualization in R using ggplot showing top 10 cities with most profit generated.

ggplot(data = B %>% slice(1:5), aes(x=reorder(City, -Profit), y=Profit, fill=City)) + 
geom_col() + 
geom_line(aes(group=1), color = "white") +
geom_text(aes(label = Profit), position = position_stack(vjust = 0.5)) +
labs(title = "Top 10 Cities With Most Profit", x = "Cities") +
theme(
? ? panel.background = element_rect(fill = "black"),
? ? plot.title = element_text(hjust = 0.5, face = "bold"),
? ? panel.grid = element_blank(),
? ? plot.background = element_rect(),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold"),
? ? legend.position = "none")        

Data Viz


No alt text provided for this image

2 - Top 10 Cities with most revenue generated

Top10CitiesWithMostRevenueGenerated <- Combined %>% group_by(City) %>%
? summarise(Revenue = sum(Amount)) %>%?
? arrange(desc(Revenue)) %>% head(n=10)?        

Output

No alt text provided for this image

Plotting

ggplot(data = Top10CitiesWithMostRevenueGenerated, aes(x=reorder(City, -Revenue), y=Revenue, fill = City)) +
? geom_col() + labs(title = "Top 10 Cities With Most Revenue Generated", x = "Cities", y = "Revenue") +
? theme(
? ? legend.position = "none",
? ? panel.grid = element_blank(),
? ? panel.background = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold")
? )        

Data Viz

No alt text provided for this image

3 - Total Sales and Total Orders by day of the week

C <- Combined %>% 
mutate(Day = weekdays(Order.Date)) %>%
group_by(Day) %>% 
summarize(Total_Orders = n(), 
Total_Sales = sum(Amount))?        

Output

No alt text provided for this image

Plotting

For plotting first of all i gave a proper order to the days of the week so they appear in an order in visualization.

C$Day <- factor(C$Day, levels=c("Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Saturday", "Sunday"))        

Now plotting days of the week against total orders using ggplot.

ggplot(data = C, aes(x= Day, y=Total_Orders, fill = Total_Orders)) +
geom_col() +
geom_text(aes(label = Total_Orders), position = position_stack(vjust = 0.5), color = "white") +
labs(title = "Total Orders by Day of the Week", x = "Days", y = "Total Orders") +
theme(
? ? panel.background = element_blank(),
? ? plot.title = element_text(hjust = 0.5, face = "bold"),
? ? panel.grid = element_blank(),
? ? #plot.background = element_rect(fill = "skyblue"),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold"),
? ? legend.position = "none")        

1st data viz

No alt text provided for this image

Now plotting days of the week against total sales using ggplot.

ggplot(data = C, aes(x= Day, y=Total_Sales, fill = Total_Sales)) +
geom_col() +
geom_text(aes(label = Total_Sales), position = position_stack(vjust = 0.5), color = "white") +
labs(title = "Total Sales by Day of the Week", x = "Days", y = "Total Sales") +
theme(
? ? panel.background = element_blank(),
? ? plot.title = element_text(hjust = 0.5, face = "bold"),
? ? panel.grid = element_blank(),
? ? #plot.background = element_rect(fill = "skyblue"),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold"),
? ? legend.position = "none")?        

2nd data viz

No alt text provided for this image

4 - Calculating How Many Times the Sales Target was HIT and How Many Times the Sales Target was not HIT

Step-1

Dataframe1 <- Combined %>% mutate(Month.of.Order.Date = format(Order.Date, "%b-%y")) %>%?
? group_by(Month.of.Order.Date, Category) %>% summarize(Total = sum(Amount))        

Step-2

Dataframe2 <- merge(Dataframe1, Sales_Target, by = c("Month.of.Order.Date", "Category"), all.y = TRUE)        

Step-3

Dataframe2 <- Dataframe2 %>% mutate(Hit_or_Fail = ifelse(Dataframe2$Total < Dataframe2$Target, "Fail", "Hit"))        

Step-4

Dataframe2$Category <- as.factor(Dataframe2$Category)
        

Step-5

HIT_OR_FAIL <- Dataframe2 %>% group_by(Category) %>%?
? mutate(Hit = sum(case_when(Hit_or_Fail == "Hit" ~ 1, TRUE ~ 0 )),
? ? ? ? ? ?Fail = sum(case_when(Hit_or_Fail == "Fail" ~ 1, TRUE ~ 0 ))) %>%
? ? ? ? ? ? ? ? ? ? select(Hit, Fail) %>% distinct()        

Output

No alt text provided for this image

Plotting

For this plot, i created a stacked bar chart using ggplot. But to create stacked bar chart long-form data is more suitable so first converted output to long format from wide format.

Long_Form <- gather(HIT_OR_FAIL, Variable, Value, -Category)        

Now Plotting

ggplot(data = Long_Form, aes(x = Category, y = Value, fill = Variable)) + 
geom_col() +
geom_text(aes(label = Value), position = position_stack(vjust = 0.5), col="white") +
labs(title = "No. of Times Sales Target was Hit and Failed", y = "No. of Times") +
scale_fill_manual(values = c("#0000FF", "#006400")) +
? theme(
? ? panel.grid = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5),
? ? panel.background = element_rect(fill = "white")
? ? ) +
? guides(fill = guide_legend(title = "Target"))        

Data viz

No alt text provided for this image

5 - Calculating profit and quantity sold by each month

D<-Combined %>% 
mutate(Month_Name = format(Order.Date, "%B")) %>% group_by(Month_Name) %>%
? summarise(Profit = sum(Profit),
? ? ? ? ? Quantity_Sold = sum(Quantity)) %>% arrange(desc(Profit))?        

Output

No alt text provided for this image

Plotting

Now to plot data first i ordered months, i gave the same order as months appear in the calendar. This data starts from April-2018 and ends at March-2019.

D$Month_Name <- factor(D$Month_Name, levels = c("April", "May", "June", "July", "August", "September", 
"October", "November", "December", "January", "February", 
"March"))        

Now plotting months against total quantity sold

ggplot(data = D, aes(x = Month_Name, y = Quantity_Sold)) +
geom_area(aes(group = 1), size = 1.5, fill = "steelblue") +?
geom_text(aes(label=Quantity_Sold), color = "black", face = "bold") +?
labs(title = "Total Quantity Sold Each Month", x = "Month", y = "Total Quantity Sold") +
theme(
? ? plot.title = element_text(hjust = 0.5, face = "bold"),
? ? panel.background = element_rect(fill = "white"),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold")
? )        

1st data viz

No alt text provided for this image

Now plotting total profit against months

ggplot(data = D, aes(x = Month_Name, y = Profit)) +
? geom_col(fill = "purple") +?
? geom_text(aes(label=Profit), color = "black", position = position_stack(vjust = 0.5)) +?
? labs(title = "Total Profit Each Month", x = "Month", y = "Total Profit") +
? theme(
? ? plot.title = element_text(hjust = 0.5, face = "bold"),
? ? panel.background = element_rect(fill = "white"),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold")
? )        

2nd data viz

No alt text provided for this image

6 - Calculating the percentage contribution of each month in total profit.

D2<- Combined %>% mutate(Month_Name = format(Order.Date, "%B")) %>% group_by(Month_Name) %>%
? ? ? ? summarise(Profit = sum(Profit),
? ? ? ? ? ? Quantity_Sold = sum(Quantity)) %>% filter(Profit >= 0) %>%?
? ? ? ? ? ? ? ? ?mutate (Percentage_of_Sales = round((Profit/sum(Profit))*100,2)) %>%?
? ? ? ? ? ? ? ? ? ? ? arrange(desc(Profit))??        

Output

No alt text provided for this image

Now Plotting

In this viz, i am making a pie chart.

pie(D2$Percentage_of_Sales, labels = paste(D2$Month_Name,","
 ,round(D2$Percentage_of_Sales,2), "%"),
? ? col = rainbow(length(D2$Month_Name)), main = "Percentage of Profit by Month")        

Data viz

No alt text provided for this image

7 - Calculating the percentage contribution of each month in total loss.

D4<- Combined %>% mutate(Month_Name = format(Order.Date, "%B")) %>% 
group_by(Month_Name) %>
? ? ? ? summarise(Profit = sum(Profit),
? ? ? ? ? ? Quantity_Sold = sum(Quantity)) %>% filter(Profit <= 0) %>%?
? ? ? ? ? ? ? ? ?mutate (Percentage_of_Loss = round((Profit/sum(Profit))*100,2)) %>%?
? ? ? ? ? ? ? ? ? ? ? arrange(desc(Profit))%        

Output

No alt text provided for this image

Now Plotting

pie(D4$Percentage_of_Loss, labels = paste(D4$Month_Name,"," ,round(D4$Percentage_of_Loss,2), "%")
? ? col = rainbow(length(D4$Month_Name)), main = "Percentage of Loss shared by Each Month"),        

Data viz

No alt text provided for this image

8 - Heat Map of the Number of Orders with respect to Month and Day of the Week

Step-1

Adding "Month" column in the Combined data frame.

Combined <- Combined %>% mutate(Month = format(Order.Date, format = "%B"))?        

Step-2

Adding "Day" column in the Combined data frame.

Combined <- Combined %>% mutate(Day = weekdays(Order.Date))        

Step-3

Now creating the data frame HeatMap which will be used to make data viz.

HeatMap <- Combined %>% group_by(Month, Day) %>% 
summarise(Orders = length(unique(Order.ID)))        

Step-4, step-5 and step-6 are performed to make the data viz appearance understandable.

Step-4

Months are ordered in this way because data starts from April-2018 and ends on March-2019

HeatMap$Month <- factor(HeatMap$Month, levels = c("April", "May", "June", "July", "August", "September", "October", "November", "December", "January", "Februaru", "March"))        

Step-5

Rearranging days of the week so they appear in proper order in data viz.

HeatMap$Day <- factor(HeatMap$Day, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))        

Step-6

I am checking the maximum and minimum values in the orders column because maximum and minimum values will be used in scale_fill_gradient for the intensity of the colour in heatmap.

max(HeatMap$Orders)
min(HeatMap$Orders)        

Now Plotting

ggplot(data = HeatMap, aes(x=Month, y=Day, fill=Orders)) +
geom_tile() +
? scale_fill_gradient(low = "#90EE90", high = "#006400", limits = c(1,16)) +
? labs(title = "Heat Map", x = "Months", y = "Days") +
? theme(
? ? plot.title = element_text(face = "bold", hjust = 0.5),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold")
? )        

Data Viz

No alt text provided for this image


9 - Top 5 Customers who brought in the most revenue and the total number of orders they placed

Top5CustomersWithMostRevenue <- Combined %>% 
group_by(CustomerName) %>%
summarise(Revenue = sum(Amount),
Orders = length(unique(Order.ID))) %>% 
arrange(desc(Revenue)) %>%
head(n=5)        

Output

No alt text provided for this image

Plotting

First, i converted data from wide format to long format because to create a dodge bar or column chart long format data is more suitable.

Long_Form_Top5CustomersWithMostRevenue <- pivot_longer(Top5CustomersWithMostRevenue, 
col = -CustomerName, 
names_to = "Revenue_AND_Orders", 
values_to = "Values")        

Now code for creating the plot.

ggplot(data = Long_Form_Top5CustomersWithMostRevenue, aes(x = reorder(CustomerName, ifelse(Revenue_AND_Orders == "Revenue", -Values, Values), FUN = min), y = Values, fill = Revenue_AND_Orders)) 
? geom_col(position = "dodge") +
? geom_text(aes(label = Values), position = position_dodge(width = 0.9), vjust = -0.2) +
? scale_fill_manual(values = c("#000080", "#FF0000")) + ylim(0, 9178) +
? labs(title = "Top 5 Customers who brought in most revenue and No of orders placed by them", x = "Customer Names",
? ? ? ?y = "Values(Revenue AND Orders Placed)") +
? theme(
? ? panel.background = element_blank(),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold"),
? ? panel.grid = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5)
? ) +
? guides(fill = guide_legend(title = NULL))

+        

Data Viz

No alt text provided for this image


You must have noticed that the columns for "Orders" are very small and are hardly visible. Reason behind it is that values of orders are very low than values of revenue. Colour of columns for orders is blue.

10 - Top 5 customers who placed the most number of orders along with revenue generated by them

Top5CustomersWithMostNumberofOrders <- Combined %>% 
group_by(CustomerName) %>$
summarise(Orders = length(unique((Order.ID))), 
Revenue = sum(Amount)) %>%?
arrange(desc(Orders)) %>% 
head(n=5)        

Output

No alt text provided for this image

Plotting

First, i converted data from wide format to long format because to create a dodge bar or column chart long format data is more suitable.

Long_Form_Top5CustomersWithMostNumberofOrders <- pivot_longer(Top5CustomersWithMostNumberofOrders, 
cols = -CustomerName, 
names_to = "Orders_AND_Revenue", 
values_to = "Values")        

Now code for creating the plot.

ggplot(data = Long_Form_Top5CustomersWithMostNumberofOrders, aes(x = reorder(CustomerName, ifelse(Orders_AND_Revenue == "Orders", -Values, Values), FUN = min), y = Values, fill = Orders_AND_Revenue)) 
? geom_col(position = "dodge") +
? geom_text(aes(label = Values), position = position_dodge(width = 0.9), vjust = -0.2) +
? scale_fill_manual(values = c("#A020F0", "#8B4000")) + ylim(0, 9178) +
? labs(title = "Top 5 Customers with most orders and revenue generated by them", x = "Customer Names",
? ? ? ?y = "Values(Orders AND Revenue)") +
? theme(
? ? panel.background = element_blank(),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold"),
? ? panel.grid = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5)
? ) +
? guides(fill = guide_legend(title = NULL))+        

Data Viz

No alt text provided for this image

You must have noticed that the columns for "Orders" are very small and are hardly visible. Reason behind it is that values of orders are very low than the values of revenue. Colour of the columns for orders is purple.

11 - Top 5 Customers who ordered the most number of items and revenue generated by them

Top5CustomersWithMostQuantity <- Combined %>%
? group_by(CustomerName) %>%?
? summarise(Total_Quantity = sum(Quantity), Revenue = sum(Amount)) %>%
? arrange(desc(Total_Quantity)) %>%
? head(n=5)?        

Output

No alt text provided for this image

Plotting

First, i converted data from wide format to long format because to create a dodge bar or column chart long format data is more suitable.

Long_Form_Top5CustomersWithMostQuantity <- pivot_longer(Top5CustomersWithMostQuantity
? ? ? ? ? ? cols = -CustomerName, names_to = "TotalQuantity_AND_Revenue",
? ? ? ?     values_to = "Values"),        

Now code for creating the plot.

ggplot(data = Long_Form_Top5CustomersWithMostQuantity, aes(x = reorder(CustomerName, ifelse(TotalQuantity_AND_Revenue == "Total_Quantity", -Values, Values), FUN = min), y = Values, fill = TotalQuantity_AND_Revenue)) 
? geom_col(position = "dodge") +
? geom_text(aes(label = Values), position = position_dodge(width = 0.9), vjust = -0.2) +
? scale_fill_manual(values = c("#0000FF", "#006400")) +
? labs(title = "Top 5 Customers who ordered the most number of items and revenue generated by them", x = "Customer Names",
? ? ? ?y = "Values(Total Quantity AND Revenue)") +
? theme(
? ? panel.background = element_blank(),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold"),
? ? panel.grid = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5)
? ) +
? guides(fill = guide_legend(title = NULL))+        

Data Viz

No alt text provided for this image

You must have noticed that the columns for "Total_Quantity" are very small and are hardly visible. The reason behind it is that values of "Total_Quantity are very low than the values of revenue. Colour of the columns for Total_Quantity is green.

12 - Revenue and Percentage of Revenue Generated by Each Category

RevenueByCategory <- Combined %>% group_by(Category) %>%
? summarise(Revenue = sum(Amount)) %>%
? mutate(Revenue_Percentage = round(Revenue/sum(Revenue)*100,2)) %>% arrange(desc(Revenue_Percentage))?        

Output

No alt text provided for this image

Plotting

pie(RevenueByCategory$Revenue_Percentage, 
paste(RevenueByCategory$Category,",",RevenueByCategory$Revenue_Percentage,"%"), 
col = rainbow(length(RevenueByCategory$Category)), 
main = "Percentage of Revenue Generated by Each Category")        

Data Viz

No alt text provided for this image

13 - Revenue, Profit, Loss and Profit Percentage contribution by each sub-category of Clothing

Clothing_Category_Revenue_Profit_ProfitPercentage <- Combined %>% 
filter(Category == "Clothing") %>% 
group_by(Sub.Category) %>%
summarise(Revenue = sum(Amount),
Profit = sum(Profit)) %>% 
mutate(Profit_Percentage = round(Profit/sum(Profit)*100,2)) %>% 
arrange(desc(Revenue))        

Output

No alt text provided for this image

Plotting

ggplot(data = Clothing_Category_Revenue_Profit_ProfitPercentage,
aes(x = reorder(Sub.Category, Profit_Percentage), y = Profit_Percentage, fill = Sub.Category)) +
geom_col() + coord_flip() +
labs(title = "Percentage of Profit generated by each Sub-Category of Clothing from 32.23% original
contribution of Clothing in overall profit",?
y = "Profit Percentage", x = "Sub-Categories of Clothing") +
geom_text(aes(label = paste(Profit_Percentage,"%"), hjust = -0.25, vjust = 0.5)) +
theme(
? ? legend.position = "none",
? ? panel.grid = element_blank(),
? ? panel.background = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold")
? )?        

Data Viz

No alt text provided for this image


14 - Revenue, Profit, Loss and Profit Percentage contribution by each sub-category of Electronics

Electronics_Category_Revenue_Profit_ProfitPercentage <-?Combined %>% 
filter(Category == "Electronics") %>% 
group_by(Sub.Category) %>%
summarise(Revenue = sum(Amount),
Profit = sum(Profit)) %>% 
mutate(Profit_Percentage = round(Profit/sum(Profit)*100,2)) %>% 
arrange(desc(Revenue))        

Output

No alt text provided for this image

Here we have a negative value of profit which means this category is in loss. So to calculate the accurate profit percentage for each category i eliminated negative values of profit and then calculated the profit percentage for each subcategory again (values in other columns are correct).

Electronics_Profit_Percentge <- Electronics_Category_Revenue_Profit_ProfitPercentage %>% 
filter(Profit >= 0) %>% 
mutate(Profit_Percentage = round(Profit/sum(Profit)*100,2)) %>%
arrange(desc(Profit_Percentage))        

Output

No alt text provided for this image

Plotting

ggplot(data = Electronics_Profit_Percentge,
? ? ? ?aes(x = reorder(Sub.Category, Profit_Percentage), y = Profit_Percentage, fill = Sub.Category)) +
? geom_col() + coord_flip() +
? labs(title = "Percentage of Profit generated by each Sub-Category
 of Electronics from 38.3% original contribution of Electronics in overall profit",?
? ? ? ?y = "Profit Percentage", x = "Sub-Categories of Electronics") +
? geom_text(aes(label = paste(Profit_Percentage,"%"), hjust = -0.10, vjust = 0.5)) +
? theme(
? ? legend.position = "none",
? ? panel.grid = element_blank(),
? ? panel.background = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold")
? )?        

Data Viz

No alt text provided for this image

15 - Revenue, Profit, Loss and Profit Percentage contribution by each sub-category of Furniture

Furniture_Category_Revenue_Profit_ProfitPercentage <- Combined %>% 
filter(Category == "Furniture") %>% 
group_by(Sub.Category) %>
summarise(Revenue = sum(Amount),
Profit = sum(Profit)) %>% 
mutate(Profit_Percentage = round(Profit/sum(Profit)*100,2)) %>% 
arrange(desc(Revenue))%        

Output

No alt text provided for this image

Here we have a negative value of profit which means this category is in loss. So to calculate the accurate profit percentage for each category i eliminated negative values of profit and then calculated the profit percentage for each subcategory (values in other columns are correct).

Furniture_Profit_Percentge <- Furniture_Category_Revenue_Profit_ProfitPercentage %>% 
filter(Profit >= 0) %>% 
mutate(Profit_Percentage = round(Profit/sum(Profit)*100,2)) %>% 
arrange(desc(Profit_Percentage))        

Output

No alt text provided for this image

Plotting

ggplot(data = Furniture_Profit_Percentge,
? ? ? ?aes(x = reorder(Sub.Category, Profit_Percentage), y = Profit_Percentage, fill = Sub.Category)) +
? geom_col() + coord_flip() +
? labs(title = "Percentage of Profit generated by each Sub-Category of Furniture from 29.4% original contribution of Furniture in overall profit",?
? ? ? ?y = "Profit Percentage", x = "Sub-Categories of Furniture" ) +
? geom_text(aes(label = paste(Profit_Percentage,"%"), hjust = -0.15, vjust = 0.5)) +
? theme(
? ? legend.position = "none",
? ? panel.grid = element_blank(),
? ? panel.background = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold")
? )?        

Data Viz

No alt text provided for this image

16 - Top 5 States With Most Revenue along with their Profit, Total Orders and No of Customers.

Top_5_States <- Combined %>% group_by(State) %>% 
summarise(Revenue = sum(Amount),
Profit = sum(Profit),
Total_Orders = length(unique(Order.ID)),
No_of_Customers = length(unique(CustomerName))) %>%
arrange(desc(Revenue)) %>% head(n=5)        

Output

No alt text provided for this image

This data is in wide form and i want to create a dodge column chart and to create that long format data is more suitable so i am converting this wide form data to long form.

Long_Form_Top_5_States <- pivot_longer(Top_5_States, cols = -State, 
names_to = "Variables", 
values_to = "Values")        

Plotting

As data has been converted to long-form so now i will create the plot.

ggplot(data = Long_Form_Top_5_States, aes(x = reorder(State, -Values), y = Values, fill = Variables)) 
geom_col(position = "dodge") + labs(title = "Top 5 States With Most Revenue along with their Profit, Total Orders and No of Custmers", x = "States", y = "Values") +
geom_text(aes(label = Values), position = position_dodge(width = 0.9), vjust = -0.75, hjust = 0.50) +
guides(fill = guide_legend(title = NULL)) +?
? scale_fill_manual(labels = c("No of Customers", "Profit", "Revenue", "Total Orders"),?
? ? ? ? ? ? ? ? ? ? values = c("red", "navyblue", "darkgreen", "purple")) +
? theme(
? ? panel.grid = element_blank(),
? ? panel.background = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold")
? ? )+        

Data Viz

No alt text provided for this image

As you can see in the data viz columns for No of Customers, and Total Orders are very small and are hardly visible, reason behind it is that values for these are very small as compared to values of other columns (Revenue and Profit).

17 - Comparing Sales of Current Month with sales of Last Month

Sales_Comparison <- Combined %>%
? mutate(Month = factor(Month, levels = c("April", "May", "June", "July", "August", "September", "October", "November", "December", "January", "February", "March"))) %>%
? group_by(Month) %>% summarise(Total_Sales = sum(Amount)) %>%
? mutate(Previous_Month_Sales = lag(Total_Sales))?        



No alt text provided for this image

Notice that months are starting from April and are ending at March it's reason is that data is starting from April-2018 and is ending at March-2019, also notice that the month of April does not have any value in "Previous Month Sales" column because data is starting from April-2018.

This data is in wide format and to make dodge bar chart i am converting this data to long format.

Long_Form_Sales_Comparison <- pivot_longer(Sales_Comparison, 
cols = -Month, names_to = "CurrentMonth_VS_PreviousMonth", 
values_to = "Sales_Values")        

Output

No alt text provided for this image

Plotting

ggplot(data = Long_Form_Sales_Comparison, aes(x = Month, y = Sales_Values, fill = CurrentMonth_VS_PreviousMonth)) +
? geom_col(position = "dodge") +
? scale_fill_manual(labels = c("Previous Month Sales", "Current Month Sales"), values = c("navyblue", "maroon")) +?
? labs(title = "Current Month and Previous Month Sales Comparison", x = "Months", y = "Total Sales") +
? theme(
? ? panel.background = element_blank(),
? ? panel.grid = element_blank(),
? ? plot.title = element_text(face = "bold", hjust = 0.5),
? ? axis.title.x = element_text(face = "bold"),
? ? axis.title.y = element_text(face = "bold")
? ) +
? guides(fill = guide_legend(title = NULL))        

Data Viz

No alt text provided for this image

18 - Displaying the details (in terms of “order_date”, “order_id”, “State”, and “CustomerName”) for the first order in each state and ordering the result by “order_id”

B <- Combined %>% 
group_by(State) %>% 
select(Order.ID, Order.Date) %>%
mutate(Row_Number = row_number()) %>% 
filter(Row_Number == 1) %>% 
arrange(Order.ID)?%>%
select(-c(Row_Number))        

Output

No alt text provided for this image

19 - Top 5 Dates on Which Most Revenue was Generated

Top5DatesOnWhichHighestAmountWasSpent <- Combined %>%
? group_by(Order.Date) %>%?
? summarise(Total_Amount = sum(Amount)) %>%
? arrange(desc(Total_Amount)) %>%
? head(n=5)?        

Output

No alt text provided for this image

20 - Correlation and Scatter plot (showing correlation between variables) between Revenue(Amount) and Profit

cor(Combined$Amount, Combined$Profit)        

Output

0.2420446        

Correlation between Revenue(Amount) and Profit came out to be "0.2420446" which indicates that relation between Revenue(Amount) and Profit is not strong. Relation is positive but it is not a strong relation, it is a weak relation which indicates that increasing the revenue will not necessarily increase the profit.

Scatter Plot

plot(Combined$Amount, Combined$Profit, main = "Revenue vs Profit", xlab = "Revenue", ylab = "Profit") 
? abline(lm(Combined$Profit ~ Combined$Amount), col = "red")+        
No alt text provided for this image

21 - Correlation and Scatter plot (showing correlation between variables) between Revenue(Amount) and Quantity Ordered

cor(Combined$Amount, Combined$Quantity)        

Output

0.3520518        

Correlation between Revenue(Amount) and Quantity Ordered came out to be "0.3520518" which indicates that the relation between Revenue(Amount) and Profit is not strong. Relation is positive but it is not a strong relation, it is a weak relation which indicates that increasing the quantity ordered will not necessarily increase the profit.

Scatter Plot

plot(Combined$Quantity, Combined$Amount, main = "Quantity Ordered vs Revenue", xlab = "Quantity Ordered", ylab = "Revenue") 
? abline(lm(Combined$Amount ~ Combined$Quantity), col = "red")+        
No alt text provided for this image

22 - Correlation and Scatter plot (showing correlation between variables) between Profit and Quantity Ordered

cor(Combined$Profit, Combined$Quantity)        

Output

0.00244136        

Correlation between Profit and Quantity Ordered came out to be "0.00244136" which indicates that the relation between Quantity Ordered and Profit is not strong. Relation is positive but it is not a strong relation, rather it is a very weak relation which indicates that increasing the quantity ordered will not necessarily increase the profit.

Scatter Plot

plot(Combined$Quantity, Combined$Profit, main = "Quantity Ordered vs Profit", xlab = "Quantity Ordered", ylab = "Profit") 
? abline(lm(Combined$Quantity ~ Combined$Profit), col = "red")+        
No alt text provided for this image

要查看或添加评论,请登录

Ahmad Nawaz的更多文章

社区洞察

其他会员也浏览了