登录查看更多内容

Start Your Generative AI Transformation Without a Development Team

Chris Pappalardo

Senior Director at Alvarez & Marsal | Software Engineer and FinTech Innovator | CPA, AWS Solutions Architect

发布日期: 2023年6月12日

Following the release of OpenAI’s Large Language Model (“LLM”) chatbot to the public in November of last year, the top search term in the “Finance” category on Google Trends has been “chatgpt”.

No alt text provided for this image — Source: Google Trends

The same is true for the Business & Industrial, Computers & Electronics, and People & Society categories.?Some categories even have ChatGPT for both the top two searches with different spellings (“chat gpt”).

It’s a good bet that you, the reader, have already been either directly affected by ChatGPT in your workplace or know someone who has.?Just about everyone I know is at least aware of ChatGPT or the underlying technology.

This isn’t surprising given the novelty of this technology. What is surprising is the speed at which this technology is transforming the business world.

Even the Big 4 accounting firms, which typically move at a slow pace when adopting new technology, have recently announced plans to invest in and implement generative AI technology.?KPMG just announced their plans last month:

Certain professions will be disproportionately affected, such as accounting and law, so it makes sense that some organizations are reacting faster than others.?However, all organizations need to at least consider the impact of generative AI on their business, and some need to get started with incorporating this technology now.

So how does an organization outside of Big Tech get started with generative AI?

One of my favorite diagrams of late is this one inspired by Fred Brooks from his book on software engineering and project management from 1975:

The diagram depicts the impact on time-to-completion of adding people to a development team responsible for a large and complex software project.?The point Fred is making is that the relationship between the number of resources and time is not necessarily linear, and the “sweet spot” of the efficiency curve tends to be towards the smaller end of the spectrum.?A great example of this is Whatsapp, who built an app used by over 1 billion people with just 50 developers.

The “old way” of IT projects and software development suffers from this problem.?The “old way” includes long planning cycles with large groups of diverse stakeholders, Project Management Office teams with big, frequent status meetings, and lengthy, costly development cycles that often result in delayed “big production” system launches.

The truth is you don’t need any of this.?You can get started with just a single developer and open-source software.

The topic of generative AI and effective agile development is too broad for a single article, so let’s narrow things down to something simple.

The Economist recently published an article in April on LLMs entitled, “Large, creative AI models will transform lives and labour markets,” which made two insightful points.?First, large generative models have probably reached their peak:

And second, future improvements will likely come from private as opposed to public data:

In other words, the game is no longer about who can build the best big model, it’s about who can leverage existing technology on private data faster to create an advantage in their marketplace.

For those of you who know me or follow my content, I work at a consulting firm specializing in the valuation of financial instruments, business interests, and other tangible and intangible assets.?We have a lot of data related to valuation, mostly numeric and financial in nature, going back many years and across sectors and asset types.

That’s the good news.?The bad news is that it’s all tucked away in Excel spreadsheets and other file types that are not well suited to searching and aggregation.

Since building intelligent systems starts with (lots and lots of) data, and the future of generative AI is in data that is not public (and not just textual data, but numeric data such as financial data[1]), one of the first jobs for any organization looking to make a generative AI transformation is to take stock of and begin to harvest their own internal data.

For my organization, that meant:

领英推荐

This AI newsletter is all you need #55

Towards AI 1 年前

Navigating the AI Revolution: Insights from Devoteam's…

Devoteam 8 个月前

AI & Startups December 16th - December 22nd

Avinash A. 3 个月前

Crawling local data storages looking for Excel spreadsheets.
Extracting that data in a systematic way.
Organizing extracted data so it can be meaningfully aggregated with itself and other data.

We accomplished this with a single developer using Python and open-source libraries.

The tool is called “eparse” (short for Excel Parser) and I released it as open-source on GitHub:

Anyone can download it and use it freely under the associated MIT license.

The README file explains how to use the tool, so I won’t rehash that in this article, except to point out the following features of eparse which makes it a viable extraction tool for these purposes:

It crawls recursively, so directory structure is irrelevant.
It identifies tabular data in flexible ways, so table structure and location are irrelevant.
It indexes all data with meaningful labels, such as sheet, name, row, and column headers.
It retains the original Excel row/column source for absolute referencing.
It connects to a database and inserts a standardized version of the extracted data.
It provides a query interface for searching data once extracted.
It is a Python library as well as a command-line tool, so it can be used in other Python projects.

The interface is flexible, and data can also be streamed to the user, so tabular Excel data can be visually inspected during the extraction process.?This is a GIF from a demo of the tool I gave last week:

There are other tools that do this, particularly in VBA.?However, I wanted something lightweight that would work in a headless Linux environment and that had a simple and native database interface.

The point here isn’t to promote any particular tool, but to stress the fact that you need your data in a format that you can query.

Exploring data by looking at distinct column headings, shared fields between groups of data, and the ability to join subsets of your data is the purpose of the exercise.?Like making a tasty glaze, your goal should be to start with “big” data and reduce it down to higher quality.

Again, from the April Economist article:

Once you have a high-quality, curated dataset that is private and specific to your organization, you are ready to move on to experimenting with generative AI and discovering how it can give you an edge.

Where to go from here?

Over the course of my career in both finance and accounting and now in software development, I have never seen the business world react so quickly to an emerging technology as I have seen with generative AI over the past few months.?I believe that many organizations are at an inflection point that is playing out as you are reading these words.

It is clear to me from my research in and experimentation with generative AI that the technology in and of itself is no longer a competitive advantage (I will write another article soon talking about open-source LLMs, which appear to perform nearly as well as the proprietary models).?The key to creating an advantage is to leverage your own data and processes with these tools.

As we’re told in the Zen of Python, “now is better than never.”?My advice is to get started with exploring your data sooner rather than later.?Things are moving too fast to spend time building a development team.?You’ll be surprised by what you can accomplish with a single developer and some free software.

[1] To be clear, LLMs are language models and primarily learn relationships between words to generate text.?However, the underlying technology is a neural network, which trains on numeric data and can learn the relationships between any kind of numeric data, whether it is vectorized word and embedding data or financial statement and valuation data.

Shlomo (Alexander) Agishtein

Director of AI @ Trullion | NLP, Computer Vision, ML

1 年

Chris Pappalardo great article, and great take on how easy it is to get started, and how outsized the impacts can be! Looking forward to playing with that package.

2 次回应

Raj Chilakapati

Managing Director | Valuation Advisory Services | Aprio

1 年

Love this

1 次回应

查看更多评论

要查看或添加评论，请登录

Chris Pappalardo的更多文章

Building an Agentic Application Using On-device Open-source Generative AI

2024年6月18日

Building an Agentic Application Using On-device Open-source Generative AI

Since their debut in 2023, there have been many interesting applications of Large Language Models. One of the most…

9 条评论
Summarizing and Querying Data from Excel Spreadsheets Using eparse and a Large Language Model

2023年8月22日

Summarizing and Querying Data from Excel Spreadsheets Using eparse and a Large Language Model

When I first sat down to write eparse, the objective was to create a library that could crawl and parse a large set of…

13 条评论
How I Built a Document Chatbot From the Ground Up and Learned 4 Valuable Lessons About AI

2023年6月30日

How I Built a Document Chatbot From the Ground Up and Learned 4 Valuable Lessons About AI

Like many other AI-curious developers, the first thing I did with an open-sourced LLM was to create the “hello world”…
Deploying Machine Learning Models at Scale: Insights for Business Leaders and Technical Teams from LeaseSCRE's Lessons Learned

2023年4月25日

Deploying Machine Learning Models at Scale: Insights for Business Leaders and Technical Teams from LeaseSCRE's Lessons Learned

Whether you’re a technical expert or a business leader, if you’re part of an organization that uses machine learning…

1 条评论

Start Your Generative AI Transformation Without a Development Team

Chris Pappalardo

Senior Director at Alvarez & Marsal | Software Engineer and FinTech Innovator | CPA, AWS Solutions Architect

This isn’t surprising given the novelty of this technology. What is surprising is the speed at which this technology is transforming the business world.

So how does an organization outside of Big Tech get started with generative AI?

The truth is you don’t need any of this.?You can get started with just a single developer and open-source software.

In other words, the game is no longer about who can build the best big model, it’s about who can leverage existing technology on private data faster to create an advantage in their marketplace.

领英推荐

We accomplished this with a single developer using Python and open-source libraries.

The point here isn’t to promote any particular tool, but to stress the fact that you need your data in a format that you can query.

Where to go from here?

Chris Pappalardo的更多文章

社区洞察

其他会员也浏览了

How Meta and Microsoft are Democratizing Generative AI with Llama 2

The Impact of OpenAI’s O1 Models on AI Reasoning and Autonomous Workflows

Latest AI, Crypto News Headlines for June 28, 2023

The AI Revolution: How LangChain is Transforming Intelligent Applications

Mastering AI: How to Become an AI Agent Developer with Microsoft Technologies in 2024

Latest AI, Crypto News Headlines for July 10, 2023

SLMOps vs. LLMOps: Understanding the Key Differences

Generative AI is a Gamble Enterprises Should Take in 2024

Rise of Independent AI: How Machines are Becoming More Self-Sufficient

JPMorgan's AI Chatbot to Replace Research Analysts ??

This isn’t surprising given the novelty of this technology. What is surprising is the speed at which this technology is transforming the business world.

So how does an organization outside of Big Tech get started with generative AI?

The truth is you don’t need any of this.?You can get started with just a single developer and open-source software.

In other words, the game is no longer about who can build the best big model, it’s about who can leverage existing technology on private data faster to create an advantage in their marketplace.

领英推荐

We accomplished this with a single developer using Python and open-source libraries.

The point here isn’t to promote any particular tool, but to stress the fact that you need your data in a format that you can query.

Where to go from here?

Chris Pappalardo的更多文章

Building an Agentic Application Using On-device Open-source Generative AI

Summarizing and Querying Data from Excel Spreadsheets Using eparse and a Large Language Model

How I Built a Document Chatbot From the Ground Up and Learned 4 Valuable Lessons About AI

Deploying Machine Learning Models at Scale: Insights for Business Leaders and Technical Teams from LeaseSCRE's Lessons Learned

社区洞察

其他会员也浏览了

How Meta and Microsoft are Democratizing Generative AI with Llama 2

The Impact of OpenAI’s O1 Models on AI Reasoning and Autonomous Workflows

Latest AI, Crypto News Headlines for June 28, 2023

The AI Revolution: How LangChain is Transforming Intelligent Applications

Mastering AI: How to Become an AI Agent Developer with Microsoft Technologies in 2024

Latest AI, Crypto News Headlines for July 10, 2023

SLMOps vs. LLMOps: Understanding the Key Differences

Generative AI is a Gamble Enterprises Should Take in 2024

Rise of Independent AI: How Machines are Becoming More Self-Sufficient

JPMorgan's AI Chatbot to Replace Research Analysts ??