3 Simple Ways to Improve AI Right Now
Monika Wahi
Epidemiology & Biostatistics Consultant a/k/a Data Scientist | Exclusive and innovative solutions for data science challenges in public health, research and education
*I may be compensated when you click on my links, but you will not be charged.
We’ve been hearing a lot about the perils of AI recently. Actually, there are a few management tricks that we can apply right now to improve AI without any fancy technology. I’ll explain them here.
1. Improve Training Data
I cringe whenever I read another headline of how ChatGPT has been integrated into something. That’s because the training data for ChatGPT was mostly Wikipedia and scraped Python code from the internet.?
So how do you improve training data? You build it yourself. I recently received an invite to join “The Remotasks Community”. Here is some of the wording from their e-mail invite:
Do you have a background in STEM (Computer Science, Mathematics, Physics, Biology or Chemistry)? Are you interested in helping to train AI models to become better writers? We have several open projects where we are looking for talented writers to help train generative artificial intelligence models to become better writers.
If this sounds like a good deal to you – and it did to me – it probably isn’t. I found a site with only eight reviews on it, but the comments to the reviews were revealing. They basically suggested that there was a lot of boring work for little pay.
Building training data is necessary, but is comprised of many boring tasks. Even if you pick a relatively narrow domain and seed it with a good corpus (e.g., a scientific domain seeded with scientific literature), you still have to do validation exercises a la Capcha – basically the “tasks” part of the Remotasks Community.
I feel we need to get working on this problem. In other words, I don’t think the Remotasks Community solution will work. We need to come up with better ways of specifying what training data we need for what model, and how to generate it. I think it will be easier if we develop AI tightly within a target domain rather than designing it to be broadly applicable (like ChatGPT).
2. Involve “Diverse” Teams in AI Development
This actually is NOT a plug for the traditional “diversity, equity, and inclusion”, so please don’t accuse me of being part of the “woke mob” (even though I probably am). By “diverse”, I mean broadly representative of the population who will use the AI.
I will give this example I read about in a business magazine to illustrate exactly what I mean. There is a nail polish company that is famous for not only for having fabulous colors and coming up with new color combinations all the time, but for naming the colors zany names. For example, they might give a name like, “My coffee is cold” to a dark, radiant coffee color.
领英推荐
When the CEO was asked in a magazine interview about how they were always able to pull off this marketing ploy successfully over and over, they responded by saying they actually didn’t do anything special. They just deliberately hired individuals in their marketing department who were – first - their nail polish customers, and?- second –were all different ages, from very young to very old. I’m assuming the group was heavy on women, since more women use nail polish than men, but they definitely should have male representation if they want to accurately reflect their customers.
Then, each month, they’d have a meeting where the marketing department invented and named colors, and came up with the marketing plan for the next month. By simply involving members of their segments as being actual employees, they saved a lot of trouble with market research, and trying to test color names or marketing messages.
In the same way, having a diverse AI development team can save you a lot of trouble, because different people will spot different things going wrong. You won’t sink into groupthink. I am reminded of a presentation I saw from a team at DataRobot, where they discussed how different members of their team weighed in on their AI bias-testing protocol.
3. Improve Explainability
Just like you have to be able to explain the covariates in your regression model to interpret it, you have to also be able to explain how the independent variables (“features”) in your AI model actually contribute to the outcome you are modeling. I first learned of the problem of AI explainability by taking a short but interesting course on LinkedIn Learning.
I was surprised that this was a problem, because I could not imagine throwing all my covariates into a model without creating a priori hypotheses about how they were associated with the outcome I’m modeling. But then I realized that AI is being taught in a way that is kind of kludgy. You are supposed to just throw features in willy nilly – smoke ‘em if you got ‘em – if they don’t fit, transform ‘em – until you are happy with your metrics. You can imagine how an epidemiologist like me would be horrified by this Wild West attitude toward introducing covariates in models! If I had pearls, I’d clutch them.
This mistake – just throwing everything into the model without really understanding it – was at the root of the inaccurate results coming out of IBM Watson and some other failed AI efforts. The solution to this is data curation. The problem is that a lot of people don’t know what data curation is.
Below is a very short video explaining data curation. Data curation uses very simple tools, but I’m finding that you have to have kind of a forensic and artistic knack for it. It’s easy to identify good, useful curation – but it’s hard to start generating it yourself.
If you want to go deeper into data curation, first try my LinkedIn Learning course, which is a gentle, introductory course. If you still want more after that, then take my new Data Close-out Boot Camp for a more hardcore experience!
Want to kick your health analytics career up a notch? Then click here to sign up for a 30-minute market research Zoom meeting with me, where I will explain a new, exclusive group mentoring program for health data professionals, and get your feedback.
Monika M. Wahi, MPH, CPH is a LinkedIn Learning author of data science courses, a book on how to design and build SAS data warehouses, and the co-author of many peer-reviewed publications. Sign up for her weekly e-newsletter, and follow her blog and YouTube channel for learning resources!