Building Applications with LLMs
When a small team of us at Microsoft got access to what the world knows now as GPT-4, we learned that it has pretty amazing capabilities! We immediately set about building whatever we could with it.??
But we immediately ran into a challenge: as great as it is, at some level, it “just” takes an array (of text, or increasingly, binary data) and “rearranges” it. That’s a really great function call to have, but it’s “just” a stochastic, pure function – there are no side effects, no state, no callouts. Hard to build a complex program with just one function!?
So, we decided to start looking at programming tools that would give us some of those capabilities: memory (state), procedural control where we want it, and seamless interaction with native code so we could do RPCs and have other kinds of side effects. This was the beginning of the Semantic Kernel, software we put into Open Source on March 1 (you can find it?here).?
The SK has some basic behaviors that you would expect: we organize things into Skills and Commands that are very “unix-like”. They can be written in either a programming language like C#, Python or Typescript, or in a prompt-template format. You can pass input and parameters in and out of them, and chain them as you’d expect. Individual templates can be configured to use different models and settings, which comes in handy when build larger and more complex behaviors.??
We also built a memory store that uses the same vector embeddings that Open AI publish. This lets us store and retrieve semantically rich memories. The whole point of this kind of programming is to get into the “messy” realm of semantics – meaning and intent – so we needed a memory store that worked that way.??
领英推荐
That quickly led us to an interesting idea: GPT-4 itself is so capable that it can actually be a partner in this system. It can be a programming component itself. If we described all of the skills in the system with natural language, and stored them in the same kind of vector database, then we could also write “planner” skills that can introspect over the capabilities of the system and build their own programs. One of the first ones we did is called “contextQuery” that simply categorizes questions in terms of the command or action needed to resolve them – this can reduce the tendency to get “hallucinations” in some cases. There are others we’ve built and will continue to build.??
We’re starting to think about what a common language or library of skills would look like. Many problems seem to break down into “hierarchical planning” where there are successive levels of planning and resolution – can we make this a general enough pattern that the models can make use of it? There are common patterns with how prompt templates get written that seem to make them behave better – ways to break a task down, performance optimizations, things like that. Some of those are already in the repo, but we expect more to emerge as we and the developer community learn together (as a side note – this era reminds me of the early “Web 2.0” era where we were all trying to figure out the right patterns for building apps in the browser, building distributed services, building mobile apps when the iPhone appeared - I think we are in the middle of the same kind of industry-wide conversation now about AI programming. Not just building and training models but how we work with them as application and service builders. It’s fascinating now, just like it was then).??
We are also beginning to build out what we think of as “connector skills” for tools and services in the Microsoft ecosystem. We want to be able to build richer and more complex experiences out of the familiar tools we use, and we know other developers do as well. Over time, we will continue to add to this collection.??
We’ve been having a lot of fun building larger, longer running projects with the SK. I wrote something a few weeks back that takes a short prompt like “eighth grade history in a friendly style, taught to someone who likes analogies” and turns it into a full-length textbook and curriculum, including teacher guide, table of contents, the whole thing. There’s an art to combining procedural code (loops and such) with LLM prompts. Each is good at something the other is bad at, and so crossing the boundary successfully is a new discipline. But it’s worth it! We can build really interesting things with just one prompt call – imagine what you can build with 1000....?
Looking forward to building and learning together. Have a look, give some feedback, and make some cool stuff!?
? If you're a leader who wants their executive team to be more self organizing and get results without intervention, let's talk. | Executive Coach | Trusted Advisor | Team Dynamics Facilitator |
1 年Thanks for sharing
Director, Demo and Associate Solutions Engineering at Box
2 年thanks for peeling back the curtain a bit here -- super interesting comparison to the early "Web 2.0" era.