A Three-Step Summary of How I Approach Data
a female data scientist generated by data science

A Three-Step Summary of How I Approach Data

I’m writing this to briefly summarize how I pragmatically approach data science problems. I don’t believe intractable problems exist, although I’ll admit we lack the tools and insight to solve all problems at present. Creativity is the ingredient needed to ameliorate this deficiency –?I digress.?

First, I identify the atomic unit of data. For images, this would be the pixel. For financial data, this would be the transaction. For language processing, this would be the character. For social network analysis, this would be the node. These atomic units don’t have much information in isolation, but I have trouble coming up with business problems that hinge on a single pixel (say outside Facebook ad campaigns). ?These atomic data units form local structures with other similar data units in a local proximity. The exact definitions of similar and local change are based on the questions being asked, but I conceptualize them as informational analogs to elements and molecules. When these local structures are aggregated, we have some information of substance. ?

The second step is to build data structures with feature engineering. While machine learning algorithms are fantastic at finding anomalies and classifying data,?it is the job of the data scientist to create the initial structures on which the algorithms will operate. This is where art meets science, and experience in the field reduces development time. ?

Determining what structures to create is determined via in-depth conversations with subject matter experts and key stakeholders. In the Crisp-DM model, this qualifies as business understanding. Rapid prototyping to explore the design space is paramount – experience can inform which direction to initially head in, but only iterative development will get you to an optimized answer. The best solution always depends on what executive leadership wants to solve. In the future, I’ll elaborate on what structures I have a proclivity for and what I’ve found works for the problems I’ve worked on.?

The third step is algorithm selection and tuning. The type of algorithm chosen depends on the business problem, feature engineering, and desired output. The exact algorithm selected needs to meet a variety of technical requirements, such as model size, execution time, maintainability, community support, and available documentation. Performance on the relevant metrics is also important but should be considered in tandem with other constraints. For example, a business problem that requires optimizing for single class precision may not be best solved by the model with the highest AUC ROC. ?

In summary, I’m echoing what experienced data scientists reiterate: the solution is found in understanding the data and cleaning the data is foundational in constructing a robust and performant solution to business problems. ?

要查看或添加评论,请登录

Jacqueline Rollins的更多文章

  • Quickly Reverse a One-Hot Encoding

    Quickly Reverse a One-Hot Encoding

    An expanded oversampling technique: SMOTe with a One-Hot-Reversal Oversampling involves mimicking existing data and…

  • A scrabble grab bag of soup

    A scrabble grab bag of soup

    Temporarily, temporal crannies lodge numb agony in a swirling continuity, my temples. I notice instants, but can't…

  • MibbblE th Exaetayshun of Goger

    MibbblE th Exaetayshun of Goger

    Prieambl Lumbr humbl thru th rumbl canaut crumbl if ae fal Musl thru th rubl as ae siek th sors of al Tasitlie th fier…

  • Chapter 2

    Chapter 2

    Midday Eve You already know everything about me, so there isn’t a whole lot of reason to write this. Anyways, the first…

    1 条评论

社区洞察

其他会员也浏览了