A Few Use Cases More ... Perhaps!
Sanjay Pande
Chief of Marketing and Product Strategy at Data Vault Alliance, Data Vault 2.0 Authorized Instructor, DV 2.0 Certified Master
Are you pondering what I'm pondering Pinky?
So, I've been thinking about this for some time now. To be honest this is far before ChatGPT which is a Large Language Model (LLM) became so popular.
First, let's ask it, what's an LLM?
Hey, algorithmically programmed brainless drone, what's an LLM?
Apparently, it's some law degree, and it spewed a lot of nonsense related to it which may make a legal person drool, but I was unimpressed as it completely misread my intent.
Not it's fault. It's just a program. Needs better input.
What is a Large Language Model?
A Large Language Model (LLM) refers to a sophisticated artificial intelligence (AI) system designed to understand and generate human-like text. It is trained on vast amounts of text data to learn patterns, language structure, and context. LLMs utilize deep learning techniques, such as transformer neural networks, to process and generate natural language.
These models can be trained on diverse sources of information, including books, articles, websites, and other textual data. They learn to predict the next word in a sentence based on the preceding words and build an internal representation of language.
Large Language Models like GPT-3 (Generative Pre-trained Transformer 3) have billions of parameters, enabling them to capture complex relationships and generate coherent and contextually relevant text. They can perform a variety of language-related tasks, such as answering questions, summarizing text, translating languages, and even engaging in conversation with users.
By leveraging their extensive training and learned knowledge, LLMs like ChatGPT can provide responses and information on a wide range of topics, making them versatile tools for both practical applications and research purposes.
Ok. So, that's more in line with what I was asking about.
Now, let's ask it what a Domain Specific Language (DSL) is according to what it has learnt.
What is a Domain Specific Language?
A Domain-Specific Language (DSL) is a programming language specifically designed to address the requirements and challenges of a particular domain or problem space. Unlike general-purpose programming languages (such as Python or Java), DSLs are tailored to a specific area of application and aim to provide concise, expressive, and intuitive syntax and semantics.
DSLs are created to enhance productivity and improve communication within a specific domain by providing abstractions and specialized constructs that closely align with the domain's concepts and terminology. They allow domain experts (non-programmers) to express their ideas and solve problems using a language that is closer to their natural way of thinking.
There are two main types of DSLs:
DSLs are useful for addressing complex problems within a specific domain, as they provide a higher level of abstraction and reduce the cognitive load of the users. They can increase productivity, improve code readability, and enable better collaboration between domain experts and developers.
Hmm, I have to admit that's not bad. Perhaps I should've asked for examples. Anyways, someone recently talked about something called SLM or Small Language Models for specific tasks.
Looking at the way ChatGPT is being used by most people, the SML idea is actually quite a creative one. If you go with a subset of data required and reduce the need for large, it can be rather focused.
Some examples can be image generation, video generation, certain types of data analyses, creating menus, creating shopping lists etc where there's already several specialized applications that leverage the API and connect to other services and use an LLM prompt as an interface to describe the outcomes.
Being there for specific tasks helps these narrow down the tasks. It's either a subset of the LLM, or an interface of the LLM to something else.
So, I decided, let's look at our tasks in a typical analytics project and see how we can leverage some of these things. But, this is such a niche domain which is why I had already asked it about both LLMs and DSLs.
What is we combine those two concepts and create our own Domain Specific Language Model or DSLM.
领英推荐
Let's see what our AI program thinks of this idea?
Can the concept of LLMs and DSLs be combined to create a smaller and more niche Domain Specific Language model?
Yes, the concepts of Large Language Models (LLMs) and Domain-Specific Languages (DSLs) can indeed be combined to create smaller and more niche DSL models. This combination allows for the development of language models that are specifically trained and optimized for a particular domain or problem space.
The general approach would involve training a language model using domain-specific data and fine-tuning it to cater to the specific needs of the domain. By training on a narrower dataset, the resulting model can become more specialized and focused on understanding and generating text relevant to that domain.
The benefits of using a DSL model over a larger general-purpose LLM include:
Developing a DSL model involves defining the specific language constructs, designing the syntax and semantics, preparing or curating domain-specific training data, and training the model using techniques like transfer learning and fine-tuning.
Combining the strengths of LLMs and DSLs can lead to powerful and specialized language models that excel in specific domains, enabling more efficient and accurate text processing and generation for targeted applications.
Even the AI said so itself.
So, the program likes the idea and has given us the go ahead to think about DSLMs or Domain Specific Language Models. Granted, there is a fine line between this and SLMs or Small Language Models and perhaps even some overlap.
Let's start thinking about what tasks can we assign to this futuristic lawn mower that would help us reduce our time, increase our productivity and leverage technology the way it should be leveraged.
Or, as we say in the Data Vault world, anything that can be automated, should be automated.
We know it can't really do business requirements, but can it help us prioritize them? We don't know the answer to that one yet. Let's assume we know what we want to build.
Some Additional Thoughts
The first exercise we do in the Data Vault 2.0 System of Business Intelligence is to identify the taxonomy of the business keys that are in scope. Can a DSLM help here? It's highly likely that it can, and therefore help speed up the process of identifying the business keys, and to get to the extended taxonomy exercise.
The next thing we need to do is identify the locations in which these business keys exist and find peers which will drive any master data work. If it's already done upfront, then great because that work can be leveraged. This is another area where a DSLM can simply accelerate the work and make the humans more efficient because instead of searching through things, they have something to sit down and come to an agreement with the business.
Once those are finalized, the build can start. A DSLM would probably be fairly capable of converting these into the main entities required to be built.
Then assuming there's some modeling that needs to happen that maps from the outcomes wanted back to the sources via these entities, we'd have a fairly decent picture on how to build this. Assuming that some of these can plug-in to modeling tools (you'd expect the modeling tool vendors would already be looking at some of these capabilities). The nice thing is you can build models with multi-structured sets as we're talking to a machine here. It can easily decide important factors which are potential business keys for example in deeply nested JSON sets just as an example which would take a human more time.
Then the planning. The project plan for build, especially if it's a predictable pattern based Agile Build Cycle as taught in the Certified Data Vault 2.0 Practitioner (CDVP2) course, it should be possible to generate this from the entities and then several forward engineering things can happen. Of course, all of this would need human intervention at different steps, but the human can actually do the intelligent work and the machines can do the boring, churning, drone work that it's supposed to be doing and pretend that it's smarter than the human.
While this is all speculative, there appears to be merit in the idea of Domain Specific Language Models especially for pattern based builds and methodologies like DV 2.0 that reduce project risk.
After all, it makes sense to have a Domain Specific Language Model for the Data Vault 2.0 System anyway.
It actually already exists, but at the moment is at the human/team level and it should be relatively easy to extend to leverage machines to do the repetitive drudge work while the humans in the Analytics teams actually add value and intelligence to the enterprise ... like they're supposed to.
And, you can get access to all the information related to the latest language and standards for Data Vault 2.0 for zilch right over here just by creating an account
Data Lakehouse Consultant, Cloud Solution Architect, Google Developer Expert, Google Cloud Authorized Trainer, Mentor. Talks on Data and AI.
1 年I'm curious about this.