登录查看更多内容

The Year Gone By

Null Space

PGDM Research and Analytics Cell, Madras School Of Economics

发布日期: 2021年1月5日

“The year that changed everything” - finally comes to an end. We now look forward to new beginnings and hopefully, more positivity as well. But before we relaunch into 2021, we’d like to look back in reflection and perhaps attain some proper perspective from 2020. Yes, this year has indeed been defined by the horrors of the COVID-19 pandemic - it has caused unspeakable damage on economic and social fronts - however, it has also been associated with plenty of innovation and learning. As a platform rooted in the fields of Data Science and Finance, Null Space would like to recollect and enumerate certain key aspects that might prove useful in one's quest to ‘learn’ Data Science.

Walk before you run: The powerful resources of analysis at hand, in the form of countless Python/R libraries - provide convenience and ease-of-use. It is quite tempting to take some Dataset and delve deep into formulating complex model designs involving - SVMs, Decision Trees, Multivariate Regression - and so forth. However, it might prove useful at times to take a step back and to “get to know the data”. The simplicity is often key in this aspect - we cannot stress the fact enough that extremely valuable information can be derived from simple measures like - “The average”, “The Variance”, “The range”, “The correlation” and from simple Data Visualization techniques that involve - “The bar Chart”, “The histogram”, “The scatterplot”, “The boxplot”.
What is the data telling us? : It is instructive to then develop theories and possibilities from the key measures we enumerated in the 1st point. We might want to pose questions like - “I thought X and Y would be intimately related, but they are not showing any correlation whatsoever! Why might that be?” - “X and Y are almost perfectly in sync with each other, should I perhaps remove one of them, or combine them into a single comprehensive measure ?” - “The average measures of various variables are vastly different in scale, should I consider standardizing these variables ?” - “I am interested in knowing what exactly impacts Z because I feel that certain Z values can help the business, what factors can I consider ?” - “I wonder why certain values are so vastly different from the rest of the data (outliers), should I remove them? Or I should probably find out the reason for such abnormality ?”
An appropriate model: Certain datasets might respond quite nicely to simple, yet elegant models like - Logistic Regression and the Naive Bayes algorithm - it perhaps might not be necessary to employ a Sequential ANN or Boosted Decision Tree in the first place. It is also quite handy to craft models involving techniques like - Parameter Regularization and Cross-Validation checks - to develop a fairly good degree of confidence in the model. Lastly, before moving head-on with classification and quantitative prediction, one might also want to have a good look at Clustering techniques - to unearth inherent patterns in data and perhaps some key variables as well.
Interpret: An often ignored aspect indeed is - actually spending significant time to interpret the numbers that algorithms throw out. It is easy to completely lose track of the problem at hand and instead get embroiled in the nitty-gritty of hyperparameter tuning and model building in Python - we must snap out of such phases. Firmly anchoring our model building process, must be the “problem at hand - what are we trying to solve ?”. After obtaining, say, regression results, we must be able to extract meaningful insights from the estimated regression coefficients - “A Beta of 1.2 suggests that a Rs.100 increase in advertising expenditure leads to an additional revenue of Rs.120. Let us think about increasing the ad spend budget this time around” - Or in a classification setting, we might be interested in - “Whether the borrower is a student or not has a significant effect on the likelihood that the borrower will, in fact, default. We might want to factor in various risk-mitigating measures due to such relations”.
Know thy Math: The age is such that one does not need to know what an algorithm even does, but still, use and apply it - owing to the extreme convenience provided by Python’s ML libraries. Such ill-informed application of algorithms can potentially result in “bad” strategies and “incorrect” insights - thereby reducing people’s confidence in data analysis methods. However, developing basic intuitions and the requisite mathematical underpinnings behind popular algorithms, indeed helps provide a thorough and deep understanding of not only - “what the algorithms do” - but also about - “how they can be effectively applied”.
Reporting with Clarity: In any business presentation concerning “data generated insights”, what is often not desirable is - to present a raw Jupyter notebook containing intimidating programs - rather the aim should be to - present the key insights, backing them up with technical proof and packaged in a neat, tidy, friendly-looking document, that might be understood by the layperson as well. One might want to get familiar with some popular report-compiling software like - LaTeX, Markdown in R, and Python.
Read the Background: Every analysis is motivated by a problem statement. And every problem statement comes from a business situation. Finally, the business situation arises from the business and its external environment. It is therefore imperative that we undertake special efforts to understand the - external business environment, the internal functioning of the business, and the situation - extremely thoroughly. One mustn’t attempt to fit various business problems in “template” models - rather, the attempt should be to craft customized models, based on the unique situation being faced by the business. It is also quite important to be able to relate the theory of finance and economics to real-world business scenarios.
An assortment of Topics to focus on: The workings of Classification algorithms like SVM, Decision Trees, and Logistic Regression - The principles of dimensionality reduction using Principal Component Analysis - Quantitative prediction using Multivariate Regression and associated tests of Hypotheses - Fundamentals of Linear Algebra - Fundamentals of Probability and Statistics - Fundamentals of Optimization - The theories of Microeconomics and Finance - Programming with Python (in the raw sense, not with libraries).
Special mention: Bayesian Statistics provides a foundation for powerful models and analysis methods. One might want to delve deeper into this area.

As a student-driven platform, Null Space remains committed to building thorough and reliable resources of notes concerning Data Science and Finance - indeed with a renewed vigor. We sincerely hope to stay engaged with you all - sharing and propagating knowledge. Wishing everyone a Happy New Year!

要查看或添加评论，请登录

Null Space的更多文章

Under the Hood

2021年2月9日

Under the Hood

The brilliance of Object Oriented Programming makes life convenient for us - it gives us abstraction. Abstraction is…

The Year Gone By

Null Space

PGDM Research and Analytics Cell, Madras School Of Economics

Null Space的更多文章

社区洞察

其他会员也浏览了

Do you know where your VWAP curve is? A glimpse of the shifting volume profiles in the US before and after the pandemic

What we love and hate about R

Data Science 101: A Beginner's Journey into the World of Big Data

The Soft Skills Every Data Scientist Needs to Succeed

Joining the Data Industry in 2025

?? Ridge vs. Lasso: Tuning Models for Stock Markets ??

Asking Good Questions to Data.

Breaking the Data Science Mold: Reflections after four BBQs in My First two Months in the US.

Unlocking the Power of Data Science: The Expertise of Brian Namanya

DECISION TREES AND TITANIC DATASET

Null Space的更多文章

Under the Hood

社区洞察

其他会员也浏览了

Do you know where your VWAP curve is? A glimpse of the shifting volume profiles in the US before and after the pandemic

What we love and hate about R

Data Science 101: A Beginner's Journey into the World of Big Data

The Soft Skills Every Data Scientist Needs to Succeed

Joining the Data Industry in 2025

?? Ridge vs. Lasso: Tuning Models for Stock Markets ??

Asking Good Questions to Data.

Breaking the Data Science Mold: Reflections after four BBQs in My First two Months in the US.

Unlocking the Power of Data Science: The Expertise of Brian Namanya

DECISION TREES AND TITANIC DATASET