Machines in the Conversation: The Case for a more Data-Centric AI
Janet Dwyer
CEO/Cofounder/Inventor | 3 U.S. Patents | DataScava - Unstructured Data Miner | Domain-Specific Language Processing #DSLP | Weighted Topic Scoring #WTS | TalentBrowser - Automated Skills Analytics/Talent Matching
This is one of a full series of articles (excerpts of which were reprinted in CDO Magazine) commissioned by DataScava and TalentBrowser from Scott Spangler, former IBM Watson Chief Data Scientist, named IBM Engineer, and author of the book Mining the Talk: Unlocking the Business Value in Unstructured Information.
In this article, Scott discusses his views on the latest developments in generative AI, and argues that too much focus on generative AI distracts from the important value a more Data-Centric AI approach can provide to business applications. He then discusses the key technologies we use that enable such an approach within the organization:??
By Scott Spangler
Seventeen years ago I co-authored an article published in the IBM Systems Journal called "Machines in the Conversation" (reference below). ?It was about using data mining of informal communication streams to detect themes and trends in human conversation.?
I argued at the time that whereas the early AI pioneers such as Alan Turing had once envisioned computers and humans having intelligent conversations, the real power of the technology was not in speaking to us, but in helping us better understand what we were saying to each other. I believe this is more true today than ever.??
The Human-Machine Partnership
Recently, AI research prototypes such as ChatGPT have captured the public’s imagination and also stoked some fears about where AI technology is heading.?I believe that these programs, while valuable in certain narrowly defined roles, are nowhere close to doing some of the high-level creative tasks that they are proposed for.?
In fact, such efforts are misunderstanding the true nature of the human-machine partnership.?The value of machines to us is not in what they say, but in what they hear . . . and in how they hear it.
To put it bluntly, in most business contexts, the ROI of a machine learning application is directly proportional to the amount of text it enables us not to read—not the text it enables us not to write.?We should fear AI, but not because it might replace us.?We should worry that AI may instead steal our lives away, moment by moment, one plausible but misleading answer at a time.
We are momentarily enamored (and maybe a little apprehensive) of the creative, generative possibilities of AI. As a partner in the creative process, AI could indeed help us be far more productive writers, researchers, and even artists.??But as a generator of content, all AI can really do is summarize and mimic already existing content.?Sometimes this can be impressive, sometimes laughable, but it’s still just a gimmick -- a “What” without a “Why.”
The Necessity for a more Data-Centric AI
The proper and most effective use of AI is as a reader first and foremost and as a writer as a distant secondary afterthought.?AI can consume as much content as we care to throw at it, sift through it with endless patience and thoroughness, and come up with the key relevant data that we care about.?
But we, as savvy AI consumers, need to always take the results that AI presents with a healthy dose of skepticism.?Being able to read more data, doesn’t always make you smarter, because crowds are not always wise, and the results (at best) are only as good as the data that went into them.?In fact, selecting the right data for your ML application is probably more important than selecting the right model.
This brings us to the necessity for a more Data-Centric AI, and by that I mean a machine learning process that focuses first and foremost on partitioning, selecting, and accurately labeling the data that we use to train these systems.??In order for any AI system to add value both immediately and in an ever-changing environment, it must have high-quality data from which to build its models.?This is the power of the DataScava approach, which I will explain here.
领英推荐
Unlock the Value of Unstructured Text Data
DataScava is an unstructured data miner with patented weighted matching that uses your business and domain language to pinpoint high-value data for use in AI, machine learning, RPA, business intelligence, research, talent, and BAU applications. It helps data scientists, data analysts, BI and operations specialists, researchers, SMEs, talent professionals, and IT precisely index, measure, curate, filter, match, classify, and label heterogeneous textual content automatically.
DataScava perfectly complements existing approaches to unlocking the value of unstructured text data – by helping companies to model higher-level intents and purposes behind the labeling and classification of data – by capturing the abstract topics and themes that represent their own business and subject matter expertise – and by applying both to big data sets real-time.
Less Time Spent by Human Experts doing Manual Labeling
DataScava employs three methodologies in mining unstructured text data -- Tailored Topics Taxonomies (TTT), Weighted Topic Scoring (WTS), and Domain-Specific Language Processing (DSLP) -- which generate value-added metadata about raw text for use in other systems and charting. They work as an alternative or adjunct to Natural Language Processing (NLP) and Natural Language Understanding (NLU).
All of this means far less time spent by human experts doing manual labeling of individual documents, and more time spent identifying the key emerging problems that need to be addressed in the data taxonomy as the business environment changes, or unforeseen problems emerge with AI document classification.?
High Accuracy and a Low Level of Expert Intervention
By focusing on the quality of the incoming training data stream, the business can ensure that the machine learning algorithm used continues to perform with high accuracy and a low level of expert intervention.??This reduces wasted time for both the business and its customers.?
AI Research organizations would do well to spend fewer resources generating new content with AI and more resources on figuring out how to accurately and sustainably ingest existing content in a way that makes us all able to do our jobs better.?
Tools like DataScava help provide a platform for human-machine partnership which furthers creativity rather than just mimicking it.?
"Machines in the Conversation"?[W. S. Spangler, J. T. Kreulen and J. F. Newswanger, "Machines in the conversation: Detecting themes and trends in informal communication streams," in IBM Systems Journal, vol. 45, no. 4, pp. 785-799, 2006, doi: 10.1147/sj.454.0785]
Excellent Article. Thanks for sharing this Janet. I am sending this to a company called IIO who has launched a conversational voice technology based on an AI engine. DataScava may be a great complement to their offerings. Look for the write-up that I am sending you about them. Regards, Al