Generative AI Application Retrofit – part 4: Using an LLM as an NLP to Execute Commands
In my previous posts, I shared how we've been using Natural Language Processing (NLP) to enhance user navigation within our application. Today, I'm thrilled to take you a step further into our journey.
We're now leveraging NLP not just for navigation but also for executing complex commands within the application. This development goes beyond basic keyword matching. Now, user inputs, in any language supported by ChatGPT, are intelligently interpreted to understand the underlying intent. This means our system can now discern the specific command a user intends, whether it's creating new records or navigating through multiple layers of the application.
This advancement is a game-changer in how we interact with our technology, making it more intuitive and responsive than ever.
Mapping Commands to their Context
We're taking user experience to the next level. Initially, we used user input to navigate to specific screens or layouts based on a set of Prompt Navigation records. Once on the desired layout, a second API call to ChatGPT helps map the user input to the correct command.
Using natural language input, users can now bypass traditional approaches like sifting through documentation, help sections, or tutorials. Instead, they can rely on intuitive, natural language commands to navigate and use the application effectively.
To make this possible, we've created 31 ‘navigation’ prompts, ensuring users can seamlessly switch contexts to the right screen or layout. Furthermore, I developed 53 ‘command’ prompts, essentially teaching ChatGPT how to use our application.
This innovation is all about simplifying and enhancing the user experience, making our application more intuitive and accessible than ever. Here are just a few of the inputs I used to test this approach.
Sample User Inputs
After implementing this new framework, I discovered its remarkable extensibility and tunability. Initially, as I experimented with various inputs, I noticed that the system would occasionally navigate to the wrong screen or execute an incorrect command. However, by refining the prompts, I've been able to cleanly differentiate between different locations and commands within the application. This precision has significantly enhanced the user experience.
It's important to note that while our system covers a broad range of commands, it's not designed to replace every possible command with a Command Line Interface (CLI) equivalent. Instead, our approach is to guide users to the right starting point in the application, allowing them to complete operations using the standard user interface. This method ensures a balance between innovative AI interaction and familiar user interface navigation.
Imagine asking ChatGPT to create a drawing from your textual description. Similarly, this integration enables the AI to emply the application for interacting with the user for data input and other operations, making the user experience more intuitive and engaging.
Behind the Scenes
I wrote the function 'Get Command by LayoutID' to implement running commands in the application. This addressed several areas:
Firstly, it enabled us to focus on a subset of the total commands available, making our system more efficient and streamlined. This targeted approach not only keeps our token usage to a minimum but also significantly boosts performance.
领英推荐
Moreover, this function allows for enhanced differentiation between commands. By mirroring the commands available on the screens within a specific context, we've made the user experience more intuitive and aligned with the application's layout.
This software application has the following Commands that may be applied within this context. Each Command is delimited by |-|. The CmdID is delimited by |X|. The Description delimited by |+|. [List of Commands for a Layout per the $CommandBlock taken from the database] I want determine the command to for this context in my software application. Give me the CmdID that best matches the following input phrase: [User input]
Prompts for ChatGPT to map the User Input to the commands
The code for the above prompt is shown in the function below in lines 33 to 37, which in turn is what I use to call the ChatGPT API.
And then here is the screen where I manage the Prompts for the prompt administrator.
Lessons Learned
Initially, I considered a keyword-based approach for implementing this capability. However, I quickly realized that such a method would be fragile, challenging to maintain, and not scalable. This is particularly true for users who would benefit the most from NLP, matching their diverse vocabulary would be a daunting task. The solution? Leveraging the power of Large Language Models (LLMs) for more effective matching.
One of the intriguing aspects of using NLP is the complexity of testing. Crafting a set of tests that encompass the vast range of potential inputs is a formidable challenge. This is where capturing each LLM interaction and encouraging user feedback becomes crucial. This feedback allows us to refine our application in real-time, adjusting prompts without altering the core code. However, caution is key – changes to prompts must be carefully managed to avoid disrupting other functionalities.
Reflecting on my career, the dream of solving this problem seemed distant until now. The integration of a robust NLP front-end and the ability to map NLP outputs to specific commands has been a game-changer. While this required adding a new layer of code, the user interface remains clean and easily adaptable.
In working with ChatGPT, I also pondered the question: What level of complexity can these models effectively handle? This approach, which involves navigating between 31 screen and managing up to 19 commands per screen, has been very successful. This clarity in choice has made it relatively straightforward to differentiate commands, even using the capabilities of ChatGPT-3.5.
As applications grow in size and complexity, crafting precise prompts to accurately identify the correct screen or command becomes more challenging. To streamline this process, I strategically directed users to the Edit mode on many screens, even though commands are available in both the View and Listing variations. This decision not only simplifies user interaction but also enhances the overall efficiency of the application. I can imagine reaching a point where the complexity is such that I would use the LLM for more than just NLP, but also to better differentiate the screens and commands through other AI techniques.
Looking ahead, I envision a future where the traditional "application" layer might be replaced, allowing direct conversation with the AI to execute business processes. Yet, capturing structured data to support these processes remains vital. My ultimate goal would be to feed the AI with a comprehensive XML definition of every screen, database schema, and function. This, however, is a significant challenge, requiring extensive token usage and training on numerous fully debugged applications.
Stay tuned for my next post, where I'll delve into how ChatGPT can handle two common human-centric tasks in our application: writing a query to a publisher and managing the publisher's response. For more insights and updates, view the accompanying movie, and don't forget to follow me on LinkedIn!