登录查看更多内容

3 Simple Ways to Improve AI Right Now

Monika Wahi

Epidemiology & Biostatistics Consultant a/k/a Data Scientist | Exclusive and innovative solutions for data science challenges in public health, research and education

发布日期: 2023年6月3日

*I may be compensated when you click on my links, but you will not be charged.

We’ve been hearing a lot about the perils of AI recently. Actually, there are a few management tricks that we can apply right now to improve AI without any fancy technology. I’ll explain them here.

1. Improve Training Data

I cringe whenever I read another headline of how ChatGPT has been integrated into something. That’s because the training data for ChatGPT was mostly Wikipedia and scraped Python code from the internet.?

So how do you improve training data? You build it yourself. I recently received an invite to join “The Remotasks Community”. Here is some of the wording from their e-mail invite:

Do you have a background in STEM (Computer Science, Mathematics, Physics, Biology or Chemistry)? Are you interested in helping to train AI models to become better writers? We have several open projects where we are looking for talented writers to help train generative artificial intelligence models to become better writers.

If this sounds like a good deal to you – and it did to me – it probably isn’t. I found a site with only eight reviews on it, but the comments to the reviews were revealing. They basically suggested that there was a lot of boring work for little pay.

No alt text provided for this image — Building a solid training dataset unfortunately involves a lot of boring tasks.

Building training data is necessary, but is comprised of many boring tasks. Even if you pick a relatively narrow domain and seed it with a good corpus (e.g., a scientific domain seeded with scientific literature), you still have to do validation exercises a la Capcha – basically the “tasks” part of the Remotasks Community.

I feel we need to get working on this problem. In other words, I don’t think the Remotasks Community solution will work. We need to come up with better ways of specifying what training data we need for what model, and how to generate it. I think it will be easier if we develop AI tightly within a target domain rather than designing it to be broadly applicable (like ChatGPT).

2. Involve “Diverse” Teams in AI Development

This actually is NOT a plug for the traditional “diversity, equity, and inclusion”, so please don’t accuse me of being part of the “woke mob” (even though I probably am). By “diverse”, I mean broadly representative of the population who will use the AI.

I will give this example I read about in a business magazine to illustrate exactly what I mean. There is a nail polish company that is famous for not only for having fabulous colors and coming up with new color combinations all the time, but for naming the colors zany names. For example, they might give a name like, “My coffee is cold” to a dark, radiant coffee color.

领英推荐

The AI Vanguard Newsletter #6

Danny Butvinik 1 年前

6 elements of an effective AI prompts or how to get…

Alex Velinov 9 个月前

Thrive in the Global AI & LLM Revolution

Mohamed Khalil Hamdi 1 个月前

When the CEO was asked in a magazine interview about how they were always able to pull off this marketing ploy successfully over and over, they responded by saying they actually didn’t do anything special. They just deliberately hired individuals in their marketing department who were – first - their nail polish customers, and?- second –were all different ages, from very young to very old. I’m assuming the group was heavy on women, since more women use nail polish than men, but they definitely should have male representation if they want to accurately reflect their customers.

Then, each month, they’d have a meeting where the marketing department invented and named colors, and came up with the marketing plan for the next month. By simply involving members of their segments as being actual employees, they saved a lot of trouble with market research, and trying to test color names or marketing messages.

In the same way, having a diverse AI development team can save you a lot of trouble, because different people will spot different things going wrong. You won’t sink into groupthink. I am reminded of a presentation I saw from a team at DataRobot, where they discussed how different members of their team weighed in on their AI bias-testing protocol.

3. Improve Explainability

Just like you have to be able to explain the covariates in your regression model to interpret it, you have to also be able to explain how the independent variables (“features”) in your AI model actually contribute to the outcome you are modeling. I first learned of the problem of AI explainability by taking a short but interesting course on LinkedIn Learning.

I was surprised that this was a problem, because I could not imagine throwing all my covariates into a model without creating a priori hypotheses about how they were associated with the outcome I’m modeling. But then I realized that AI is being taught in a way that is kind of kludgy. You are supposed to just throw features in willy nilly – smoke ‘em if you got ‘em – if they don’t fit, transform ‘em – until you are happy with your metrics. You can imagine how an epidemiologist like me would be horrified by this Wild West attitude toward introducing covariates in models! If I had pearls, I’d clutch them.

This mistake – just throwing everything into the model without really understanding it – was at the root of the inaccurate results coming out of IBM Watson and some other failed AI efforts. The solution to this is data curation. The problem is that a lot of people don’t know what data curation is.

Below is a very short video explaining data curation. Data curation uses very simple tools, but I’m finding that you have to have kind of a forensic and artistic knack for it. It’s easy to identify good, useful curation – but it’s hard to start generating it yourself.

If you want to go deeper into data curation, first try my LinkedIn Learning course, which is a gentle, introductory course. If you still want more after that, then take my new Data Close-out Boot Camp for a more hardcore experience!

Want to kick your health analytics career up a notch? Then click here to sign up for a 30-minute market research Zoom meeting with me, where I will explain a new, exclusive group mentoring program for health data professionals, and get your feedback.

Monika M. Wahi, MPH, CPH is a LinkedIn Learning author of data science courses, a book on how to design and build SAS data warehouses, and the co-author of many peer-reviewed publications. Sign up for her weekly e-newsletter, and follow her blog and YouTube channel for learning resources!

要查看或添加评论，请登录

Monika Wahi的更多文章

WISE Summit: The Perfect Event for Women Entrepreneurs

2023年3月24日

WISE Summit: The Perfect Event for Women Entrepreneurs

Those of you who know my content are aware that I’m pretty cynical about efforts to elevate women in business. I find…
Bias in AI, and What Women are Doing About it

2023年3月23日

Bias in AI, and What Women are Doing About it

I admit I approached the DataRobot webinar offered earlier this month titled, “Women and the Design of Trusted and…

3 条评论
Want to Increase Citations to your Scientific Publications? Introducing CitePeeps!

2023年3月16日

Want to Increase Citations to your Scientific Publications? Introducing CitePeeps!

As my audience is aware, I teach epidemiology, and collaborate with scientific authors on peer-reviewed papers. An…

3 条评论
REDCap: What it is, and Why I Avoid Using it if I Can

2023年3月7日

REDCap: What it is, and Why I Avoid Using it if I Can

*I may be compensated when you click on my links, but you will not be charged. REDCap is a software that was built by…

10 条评论
“Where is my Variable?!”: Data Documentation and Answering User Questions

2023年3月4日

“Where is my Variable?!”: Data Documentation and Answering User Questions

*I may be compensated when you click on my links, but you will not be charged. Have you ever been in a situation where…

6 条评论
Don’t Miss These March 2023 Data Science and Health Analytics Deadlines and Events!

2023年2月28日

Don’t Miss These March 2023 Data Science and Health Analytics Deadlines and Events!

*I may be compensated when you click on my links, but you will not be charged. Deadlines in March 2023 ??Are you an…

2 条评论
Want to Turn Your Health Analytics Background into a Data Science Career? It’s Not Easy!

2022年11月18日

Want to Turn Your Health Analytics Background into a Data Science Career? It’s Not Easy!

*I may be compensated when you click on my links, but you will not be charged. Over the last few months, I interviewed…

6 条评论
Developing Clinically-Useful Data Visualizations is Too Expensive and Takes Too Long

2022年10月24日

Developing Clinically-Useful Data Visualizations is Too Expensive and Takes Too Long

*I may be compensated when you click on my links, but you will not be charged. Because I’m an epidemiologist and a data…
Can Statistics Get Lost in Translation? Business vs. Healthcare Statistics

2022年6月4日

Can Statistics Get Lost in Translation? Business vs. Healthcare Statistics

*This post contains links to educational resources for which I earn royalties if you actually purchase them. In a…

4 条评论
Highlights from Invent New England Product Pitches

2022年5月9日

Highlights from Invent New England Product Pitches

My colleague and I took a trip to Haverhill, Massachusetts last Thursday, May 5 to watch the “In-Person Spring Inventor…

2 条评论

See all articles

3 Simple Ways to Improve AI Right Now

Monika Wahi

Epidemiology & Biostatistics Consultant a/k/a Data Scientist | Exclusive and innovative solutions for data science challenges in public health, research and education

1. Improve Training Data

2. Involve “Diverse” Teams in AI Development

领英推荐

3. Improve Explainability

Monika Wahi的更多文章

社区洞察

其他会员也浏览了

Why AI Implementation is Not “Business as Usual”

DeepSeek: All You Need to Know

Latest AI, Crypto News Headlines for August 28, 2023

Evaluating the Impact of AI in 2023

AI Knowledge Engineer: a key role in any AI project

?????? ???? ???????? ?? ???????? ???? ????????????: ?? ?????????????????? ????????

RAG Revolution: Community-Driven AI for the Future

GPT-4 and the Quest for Human-Like Intelligence in AI

Do you speak AI?

FuturProof #235: AI Technical Review (Part 7) - Fine Tuning

1. Improve Training Data

2. Involve “Diverse” Teams in AI Development

领英推荐

3. Improve Explainability

Monika Wahi的更多文章

WISE Summit: The Perfect Event for Women Entrepreneurs

Bias in AI, and What Women are Doing About it

Want to Increase Citations to your Scientific Publications? Introducing CitePeeps!

REDCap: What it is, and Why I Avoid Using it if I Can

“Where is my Variable?!”: Data Documentation and Answering User Questions

Don’t Miss These March 2023 Data Science and Health Analytics Deadlines and Events!

Want to Turn Your Health Analytics Background into a Data Science Career? It’s Not Easy!

Developing Clinically-Useful Data Visualizations is Too Expensive and Takes Too Long

Can Statistics Get Lost in Translation? Business vs. Healthcare Statistics

Highlights from Invent New England Product Pitches

社区洞察

其他会员也浏览了

Why AI Implementation is Not “Business as Usual”

DeepSeek: All You Need to Know

Latest AI, Crypto News Headlines for August 28, 2023

Evaluating the Impact of AI in 2023

AI Knowledge Engineer: a key role in any AI project

?????? ???? ???????? ?? ???????? ???? ????????????: ?? ?????????????????? ????????

RAG Revolution: Community-Driven AI for the Future

GPT-4 and the Quest for Human-Like Intelligence in AI

Do you speak AI?

FuturProof #235: AI Technical Review (Part 7) - Fine Tuning