登录查看更多内容

12 tricks to crack your data science problems

Swathi Young

AI/ML Strategy Architect & Technology Innovator | CTO | Georgetown MBA |

发布日期: 2019年9月4日

1. Understand the business use case :

We often are part of projects that start right with the implementation steps or usually planning without pausing to think about the goals and business drivers of the organization. With your AI and data science projects it becomes imperative to think through the goals. For example, if your goal is customer retention then looking at your customer service logs to get insights into the feedback would be helpful.

Starting with these mission-critical attributes would help you go a long way in successful implementation:

Business drivers and mission of the organization
Current Customers and their feedback
Portfolio of products and services
Determine your success criteria

2. Think broader about the solution :

Whenever we think of a project, we always think in terms of the trio of Time, Money and Scope. Thinking in terms of project management factors often leads to contrived solutions.

I suggest thinking broader as if you don't have the constraints of time, money, scope or resources. Narrow it down by adding constraints one at a time. this helps with innovative thinking and brings about inventive solutions.

3. Deal with data the right way

How do you think about the data? Do you think of a spreadsheet with columns, rows, and cells? Or do you think of a database with multiple data sources.

Whatever is the format of your data, the role of data in your solution is crucial in order to detect patterns using machine learning algorithms.

Artificial Intelligence depends on data mining methods that extract useful information from large data sets and data is the driver of the algorithm.

The 5 Vs of data: Value, Velocity, Variability, Veracity, and Volume will determine the solution that you would employ.

4. Plan with human-centric design :

One of the things that separate machine learning from traditional software development is that we are coding for the likelihood of an outcome based on past outcomes. Therefore, the need for human-centric design is crucial.

For example, when we recently designed and developed an intelligent bot for a doctor's office, we took time to meet with the doctor and identifying their customer personas, the types of issues they address and the outcomes they achieve for their patients.

We walked through the steps of prospecting, customer on-boarding, and finally customer results. This helped us create an intelligent agent that appears seamless to the customers and the conversation flows in a natural manner.

Other things to keep in mind are :

What metrics are you measuring?
What are the inputs and outputs of this system and where do they come from?
What is the ROI?

5. Engage business users during the process

This seems self-explanatory but you would be surprised at how often we forget to do this. We might be the experts at implementing machine-learning algorithms, gathering insights from data and implementing complex technologies but the business users are the subject matter experts.

If you are working in the healthcare sector, you would spend a lot of time with the physicians and other team members to explain the business process, what the data elements mean and how they measure outcomes.

While technologists can work across diverse industry sectors, it is only the business users who are in the know about their industry, their metrics as well as their data.

6. Understand the process workflow:

Today's organizations are running a mile a minute and often do not discuss, document or debate their business processes.

This usually leads to interesting discoveries by the business since it might be the first time that they are thinking through their processes.

For example if the algorithm is used for medical image analysis and diagnostics, questions about when are the images taken, what steps do they undergo before arriving at a physician's desk and what determines the diagnostics, are some things to think and talk about with the business users. In this case, they are the radiologists, the physicians, and other technicians.

7. Simple pilot implementation :

Enterprises have several legacy systems that have been brought together over the years and form a complex web that makes data difficult to figure out.

You can overcome this complexity of interconnected legacy systems with an assessment of current technology infrastructure. A technology architecture gap evaluation can help all stakeholders understand what systems are in place, the synergies between these systems, and what meaningful data exists.

Armed with this knowledge, get a clearer understanding of the successes and shortcomings of how they currently operate, you can develop small pilot initiatives that can be validated before you proceed with an overall solution.

8. Break departmental silos :

In large organizations, the left hand usually does not talk to the right. Bringing multiple departments, teams, and sometimes vendors together would help break the silos and move you towards developing a holistic solution.

Reaching out to relevant stakeholders to ensure that misunderstandings or corporate policies don’t impede successful execution is a must.

Ensuring that all parties involved are informed about the inputs, outputs, and the success criteria will lead to an effective and efficient solution.

9. Select the right tool and the algorithm :

Select the language: R or Python
Select and research the algorithm: You can find plenty of open-source implementations of algorithms that you can code review, diagram, internalize and reimplement in another language.

10. Keep updated about the latest tools :

Machine learning and data science is an evolving field and every day there are new discoveries made, new tools introduced and more open-source projects available for reference.

Various research publications, books, blogs, and GitHub repositories can help you be on top of the learning curve and avoid any rework in the long run.

11. Unit test :

Always ensure that you are testing to make sure your machine learning model works. The easiest way to do this is to use a small subset of your data to overfit the model. This would be a quick test to confirm your model is sound.

You can also use test-driven model development that helps you test in small modules. Tests can be written for functions and methods, whole classes, programs, web services, whole machine learning pipelines, neural networks, random forests, mathematical implementations and many more.

12. Present business results in a business language :

Almost all business users are concerned about the business outcomes rather than the magic that is behind the scenes (in this case, the machine learning models). Good business presentations include presenting the representative data elements that were used (eg, customer engagement, customer feedback, customer's buying patterns etc.) and the outcomes the model provided. Also, make sure to include the accuracy levels since AI models are not exact results but present predictive analytics about the results.

Anthony Gatlin

Graph Database Ultra-Enthusiast

5 年

I loved the article--especially the mind map of machine learning algorithms in item 9. I am going to save that off as a handy reference.? I would also recommending adding in Spark MLlib into your list of tools. R is great but cumbersome and slow. Python is flexible but slow. Spark MLlib is blazingly fast. Yes, you can wrap Spark MLlib and access it with Python, but it is better with Scala--or even Java or Kotlin.? I would also recommend adding an item in the article about choosing the right tool to feed your algorithms. Graph databases, like Neo4j, offer the opportunity to provide a much richer data input to Machine Learning algorithms than other data sources. Even before passing data into a Machine Learning algorithm, one can run a variety of clustering, community detection, classification, and natural language processing algorithms on a the data to further enrich it. If you want to turn feature engineering from a chore into a joy, just try doing feature engineering from a graph. Wow! You will love it!

Anthony Gatlin

Graph Database Ultra-Enthusiast

5 年

Very nice article! I love it!

Amy Hodler

Helping people ?? Graph Analytics | B2B Marketing | Advisor | Author & Speaker // Want to understand why seeing connections matter? Let's talk.

5 年

Nice article Swathi!? Your first few points tie in with something I've been mulling over a lot lately: How to make sure you have a clean/responsible "AI Supply Chain."? I'd love to chat over ideas with you sometime.?

5 次回应

查看更多评论

要查看或添加评论，请登录

Swathi Young的更多文章

Accelerating Africa with AI: Governance, Growth & Gains

2025年3月4日

Accelerating Africa with AI: Governance, Growth & Gains

Artificial Intelligence (AI) is making significant strides across Africa’s public sector, driving innovation in…
The Rise of AI in Public Sector: A look at 2000+ use cases

2025年2月5日

The Rise of AI in Public Sector: A look at 2000+ use cases

For those who assume that public sector agencies lag behind in AI adoption, think again. According to the latest data…

2 条评论
AI in Government: Cutting Costs Without Cutting Corners

2024年12月10日

AI in Government: Cutting Costs Without Cutting Corners

In Evergreen County, where bureaucratic systems have stalled efficient service delivery, Mira, a senior official…

3 条评论
Should Governments Build or Buy foundational models? A Billion $$$ question

2024年12月3日

Should Governments Build or Buy foundational models? A Billion $$$ question

Artificial Intelligence (AI) is no longer the future—it is the present, actively transforming industries, reshaping…

5 条评论
Around the World in Eight Countries: AI Use Cases

2024年11月26日

Around the World in Eight Countries: AI Use Cases

Countries around the world are harnessing the power of artificial intelligence to transform citizen services and…

5 条评论
Data Talent Strategies in Federal Agencies: Building the workforce of tomorrow

2024年10月10日

Data Talent Strategies in Federal Agencies: Building the workforce of tomorrow

In today's rapidly evolving digital landscape, federal agencies are increasingly recognizing the critical importance of…
AI for Greener Government: EPA Use Cases

2024年9月19日

AI for Greener Government: EPA Use Cases

As the United Nations is presenting various sustainability-related discussions, solutions, working groups, at multiple…

2 条评论
Digital Dawn: An AI Readiness Checklist

2024年9月12日

Digital Dawn: An AI Readiness Checklist

A few years back I had the honor of co-authoring the AI playbook for the US government, which was published in 2018…

4 条评论
Red Teams and Responsible Tech: Revolutionizing AI Reliability

2024年9月4日

Red Teams and Responsible Tech: Revolutionizing AI Reliability

Uncovering AI Bias: A Story of Red Team Discovery Sarah gasped, staring at her screen. "It's amplifying bias, not just…

1 条评论
Gridlock to Green Lights: A story of AI-Powered Traffic Transformation

2024年8月20日

Gridlock to Green Lights: A story of AI-Powered Traffic Transformation

Sitting in rush hour traffic, while taking my son to his soccer practice, I was thinking about “how can AI be used to…

2 条评论

See all articles

12 tricks to crack your data science problems

Swathi Young

AI/ML Strategy Architect & Technology Innovator | CTO | Georgetown MBA |

Swathi Young的更多文章

社区洞察

其他会员也浏览了

What is Data Observability? Do you need it?

Understanding Data Science: A Beginner’s Guide for Business Leaders

“Maximizing the Benefits of Data Science”: A Comprehensive Guide for Businesses with Golden Eagle

Fundamental Data Challenges That Will Block Your Predictive Analytics

Use of Data Science for Better Business Decisions

Why Data Science Projects Fail: Key Lessons for Success

How Data Science projects can deliver accelerated business impact

Data science on a large scale – can it be done?

Data Engineering vs. Data Science: Bridging the Gap for Better Business Decisions

What Does Automation in Data Science Mean?

Swathi Young的更多文章

Accelerating Africa with AI: Governance, Growth & Gains

The Rise of AI in Public Sector: A look at 2000+ use cases

AI in Government: Cutting Costs Without Cutting Corners

Should Governments Build or Buy foundational models? A Billion $$$ question

Around the World in Eight Countries: AI Use Cases

Data Talent Strategies in Federal Agencies: Building the workforce of tomorrow

AI for Greener Government: EPA Use Cases

Digital Dawn: An AI Readiness Checklist

Red Teams and Responsible Tech: Revolutionizing AI Reliability

Gridlock to Green Lights: A story of AI-Powered Traffic Transformation

社区洞察

其他会员也浏览了

What is Data Observability? Do you need it?

Understanding Data Science: A Beginner’s Guide for Business Leaders

“Maximizing the Benefits of Data Science”: A Comprehensive Guide for Businesses with Golden Eagle

Fundamental Data Challenges That Will Block Your Predictive Analytics

Use of Data Science for Better Business Decisions

Why Data Science Projects Fail: Key Lessons for Success

How Data Science projects can deliver accelerated business impact

Data science on a large scale – can it be done?

Data Engineering vs. Data Science: Bridging the Gap for Better Business Decisions

What Does Automation in Data Science Mean?