登录查看更多内容

A Three-Step Summary of How I Approach Data

Jacqueline Rollins

Data Scientist, M.S. | Fraud Detection | Payment Services

发布日期: 2022年12月1日

I’m writing this to briefly summarize how I pragmatically approach data science problems. I don’t believe intractable problems exist, although I’ll admit we lack the tools and insight to solve all problems at present. Creativity is the ingredient needed to ameliorate this deficiency –?I digress.?

First, I identify the atomic unit of data. For images, this would be the pixel. For financial data, this would be the transaction. For language processing, this would be the character. For social network analysis, this would be the node. These atomic units don’t have much information in isolation, but I have trouble coming up with business problems that hinge on a single pixel (say outside Facebook ad campaigns). ?These atomic data units form local structures with other similar data units in a local proximity. The exact definitions of similar and local change are based on the questions being asked, but I conceptualize them as informational analogs to elements and molecules. When these local structures are aggregated, we have some information of substance. ?

The second step is to build data structures with feature engineering. While machine learning algorithms are fantastic at finding anomalies and classifying data,?it is the job of the data scientist to create the initial structures on which the algorithms will operate. This is where art meets science, and experience in the field reduces development time. ?

领英推荐

What is Data Science?

evozon 2 年前

Getting Started with Data Science at ONLEI Technologies

Aashi Parashar 9 个月前

Mastering Vector Embeddings: A Comprehensive Guide to…

Souvik Bose 1 年前

Determining what structures to create is determined via in-depth conversations with subject matter experts and key stakeholders. In the Crisp-DM model, this qualifies as business understanding. Rapid prototyping to explore the design space is paramount – experience can inform which direction to initially head in, but only iterative development will get you to an optimized answer. The best solution always depends on what executive leadership wants to solve. In the future, I’ll elaborate on what structures I have a proclivity for and what I’ve found works for the problems I’ve worked on.?

The third step is algorithm selection and tuning. The type of algorithm chosen depends on the business problem, feature engineering, and desired output. The exact algorithm selected needs to meet a variety of technical requirements, such as model size, execution time, maintainability, community support, and available documentation. Performance on the relevant metrics is also important but should be considered in tandem with other constraints. For example, a business problem that requires optimizing for single class precision may not be best solved by the model with the highest AUC ROC. ?

In summary, I’m echoing what experienced data scientists reiterate: the solution is found in understanding the data and cleaning the data is foundational in constructing a robust and performant solution to business problems. ?

要查看或添加评论，请登录

Jacqueline Rollins的更多文章

Quickly Reverse a One-Hot Encoding

2022年5月18日

Quickly Reverse a One-Hot Encoding

An expanded oversampling technique: SMOTe with a One-Hot-Reversal Oversampling involves mimicking existing data and…
A scrabble grab bag of soup

2021年11月8日

A scrabble grab bag of soup

Temporarily, temporal crannies lodge numb agony in a swirling continuity, my temples. I notice instants, but can't…
MibbblE th Exaetayshun of Goger

2021年10月31日

MibbblE th Exaetayshun of Goger

Prieambl Lumbr humbl thru th rumbl canaut crumbl if ae fal Musl thru th rubl as ae siek th sors of al Tasitlie th fier…
Chapter 2

2021年10月21日

Chapter 2

Midday Eve You already know everything about me, so there isn’t a whole lot of reason to write this. Anyways, the first…

1 条评论

A Three-Step Summary of How I Approach Data

Jacqueline Rollins

Data Scientist, M.S. | Fraud Detection | Payment Services

领英推荐

Jacqueline Rollins的更多文章

社区洞察

其他会员也浏览了

What does a data scientist do?

The Industrialisation and Professionalisation of Data Science: 12 Questions

Behind "Big Data" and "AI": Elements of Modern Data Science

How Companies Can Prepare Themselves for Data Science Adoption

K-Nearest Neighbors

Navigating the Data Tsunami: Strategies for Success in Data Science

Why Data Science projects fail?

On the data science culture

Missing Data: Navigating the Maze

What’s in a question? (or how not to fail doing data science)

领英推荐

Jacqueline Rollins的更多文章

Quickly Reverse a One-Hot Encoding

A scrabble grab bag of soup

MibbblE th Exaetayshun of Goger

Chapter 2

社区洞察

其他会员也浏览了

What does a data scientist do?

The Industrialisation and Professionalisation of Data Science: 12 Questions

Behind "Big Data" and "AI": Elements of Modern Data Science

How Companies Can Prepare Themselves for Data Science Adoption

K-Nearest Neighbors

Navigating the Data Tsunami: Strategies for Success in Data Science

Why Data Science projects fail?

On the data science culture

Missing Data: Navigating the Maze

What’s in a question? (or how not to fail doing data science)