The Double-Edged Sword of AI in Professional Services:  BCG GPT-4 Findings More Cautionary Than Reported
Image created by Courtlin Holt-Nguyen with artsmart.ai - All Rights Reserved.

The Double-Edged Sword of AI in Professional Services: BCG GPT-4 Findings More Cautionary Than Reported

You’ve probably seen articles about the recent Harvard study on the use of GPT-4 by Boston Consulting Group’s (BCG) consultants, most concluding that LLMs are an incredible productivity boost for knowledge workers. Many articles used variations of the same headline and a single graphic from the research paper. However, if you look closely, you’ll find a few articles (including the main one from BCG) that warn of the opposite effect.

Background

In a working paper titled Navigating the Jagged Technological Frontier (https://social.koleaconsulting.com/I3FV ) published on September 22, 2023, Harvard Business School researchers reported on their experimental study of GPT-4 use by 750 Boston Consulting Group consultants. The research is the first large-scale academic study of generative AI in a professional services setting. Major news outlets quickly picked up the story and ignored the details.

Who to believe?

Venture Beat headline: “Enterprise workers gain 40 percent performance boost from GPT-4, Harvard study finds”

Fortune headline: “BCG consultants solving business problems with OpenAI’s GPT-4 performed 23% worse than those without it, new study finds”

BCG’s blog post interpreting the findings: “How People Can Create – And Destroy – Value with Generative AI”

So, who is correct? Did GPT-4 increase the productivity of Boston Consulting Group’s (BCG) elite consultants, or did it decrease it? The answer is more nuanced (and interesting) than the mainstream media reported.

BCG’s Interpretation of Results

According to BCG’s interpretation of the study’s findings described in their blog post, GPT-4 is helpful for generative product ideation and content creation (e.g. conceptualizing a footwear idea for niche markets and delineating every step involved, from prototype description to market segmentation to entering the market.) However, they found that using GPT-4 is harmful for solving business problems (e.g. offering actionable strategic recommendations to a hypothetical company.) Interestingly, they also warned that using GPT-4 leads to groupthink and substantially reduces overall team idea diversity.

“When using generative AI (in our experiment, OpenAI’s GPT-4) for creative product innovation, a task involving ideation and content creation, around 90% of our participants improved their performance. What’s more, they converged on a level of performance that was 40% higher than that of those working on the same task without GPT-4. People best captured this upside when they did not attempt to improve the output that the technology generated.

Creative ideation sits firmly within GenAI’s current frontier of competence. When our participants used the technology for business problem solving, a capability outside this frontier, they performed 23% worse than those doing the task without GPT-4. And even participants who were warned about the possibility of wrong answers from the tool did not challenge its output.

High and low performers did WORSE when using GPT-4 for business problem-solving

Our findings describe a paradox: People seem to mistrust the technology in areas where it can contribute massive value and to trust it too much in areas where the technology isn’t competent. This is concerning on its own. But we also found that even if organizations change these behaviors, leaders must watch for other potential pitfalls: Our study shows that the technology’s relatively uniform output can reduce a group’s diversity of thought by 41%.”

GPT-4 increases individual performance while reducing overall team idea diversity

Is Prompt Engineering Training the Answer?

Apparently not. Providing prompt engineering training actually decreased performance with GPT-4. Prompt engineering “crash courses” may do more harm than good.

“The strong connection between performance and the context in which generative AI is used raises an important question about training: Can the risk of value destruction be mitigated by helping people understand how well-suited the technology is for a given task? It would be rational to assume that if participants knew the limitations of GPT-4, they would know not to use it, or would use it differently, in those situations.

Our findings suggest that it may not be that simple. The negative effects of GPT-4 on the business problem-solving task did not disappear when subjects were given an overview of how to prompt GPT-4 and of the technology’s limitations. (See “Our Use of Training in the Experiment.”)

"Crash course"-style training in prompt engineering & LLM limitations reduced performance

Even more puzzling, they did considerably worse on average than those who were not offered this simple training before using GPT-4 for the same task. (See Exhibit 3.) This result does not imply that all training is ineffective. But it has led us to consider whether this effect was the result of participants’ overconfidence in their own abilities to use GPT-4—precisely because they’d been trained.”

Conclusion

Be careful using GPT-4 and other LLMs in your professional work. LLMs are helpful in certain situations and harmful in others. None of the LLMs currently warn users when asked to perform tasks outside of their competence; it’s up to the user to know when to trust an LLM and when not to.

?

Sources:

BCG Blog Post: How People Can Create – And Destroy – Value with Generative AI ?https://social.koleaconsulting.com/I3FC

Harvard Business School Research Paper: Navigating the Jagged Technological Frontier https://social.koleaconsulting.com/I3FV

Manan Aggarwal

Researcher | Speaker | Writer | featured on CBC Radio | Tech-a-thon '22 Winner | Start-Up Hackathon '22 Winner

1 年

looks like GPT-4 is a love-hate relationship. It's like having a brilliant but unpredictable coworker

回复
Fernando Coto-Yglesias

Physician / Geriatrician and Gerontologist / AI Consultant / Husband/ Dad / Weekend Cyclist

1 年

This is something remarkable to dig and discuss, but definitely part of our learning curve regarding “applied AI” in professional settings.

CHESTER SWANSON SR.

Realtor Associate @ Next Trend Realty LLC | HAR REALTOR, IRS Tax Preparer

1 年

Thanks for Sharing.

要查看或添加评论,请登录

Courtlin Holt-Nguyen的更多文章

社区洞察

其他会员也浏览了