登录查看更多内容

A Few Use Cases More ... Perhaps!

Sanjay Pande

Chief of Marketing and Product Strategy at Data Vault Alliance, Data Vault 2.0 Authorized Instructor, DV 2.0 Certified Master

发布日期: 2023年6月25日

Are you pondering what I'm pondering Pinky?

So, I've been thinking about this for some time now. To be honest this is far before ChatGPT which is a Large Language Model (LLM) became so popular.

First, let's ask it, what's an LLM?

Hey, algorithmically programmed brainless drone, what's an LLM?

Apparently, it's some law degree, and it spewed a lot of nonsense related to it which may make a legal person drool, but I was unimpressed as it completely misread my intent.

Not it's fault. It's just a program. Needs better input.

What is a Large Language Model?

A Large Language Model (LLM) refers to a sophisticated artificial intelligence (AI) system designed to understand and generate human-like text. It is trained on vast amounts of text data to learn patterns, language structure, and context. LLMs utilize deep learning techniques, such as transformer neural networks, to process and generate natural language.

These models can be trained on diverse sources of information, including books, articles, websites, and other textual data. They learn to predict the next word in a sentence based on the preceding words and build an internal representation of language.

Large Language Models like GPT-3 (Generative Pre-trained Transformer 3) have billions of parameters, enabling them to capture complex relationships and generate coherent and contextually relevant text. They can perform a variety of language-related tasks, such as answering questions, summarizing text, translating languages, and even engaging in conversation with users.

By leveraging their extensive training and learned knowledge, LLMs like ChatGPT can provide responses and information on a wide range of topics, making them versatile tools for both practical applications and research purposes.

Ok. So, that's more in line with what I was asking about.

Now, let's ask it what a Domain Specific Language (DSL) is according to what it has learnt.

What is a Domain Specific Language?

A Domain-Specific Language (DSL) is a programming language specifically designed to address the requirements and challenges of a particular domain or problem space. Unlike general-purpose programming languages (such as Python or Java), DSLs are tailored to a specific area of application and aim to provide concise, expressive, and intuitive syntax and semantics.

DSLs are created to enhance productivity and improve communication within a specific domain by providing abstractions and specialized constructs that closely align with the domain's concepts and terminology. They allow domain experts (non-programmers) to express their ideas and solve problems using a language that is closer to their natural way of thinking.

There are two main types of DSLs:

External DSLs: These are standalone languages with their own syntax and semantics. They are typically independent of any particular programming language and often have their own compilers or interpreters. Examples of external DSLs include SQL (Structured Query Language) for database queries and LaTeX for typesetting documents.
Internal DSLs: Also known as embedded or domain-specific libraries, these DSLs are built within a host programming language. They leverage the syntax and constructs of the host language to create domain-specific abstractions. Internal DSLs are more lightweight and flexible compared to external DSLs. Examples include regular expressions in many programming languages and jQuery in JavaScript for manipulating HTML elements.

DSLs are useful for addressing complex problems within a specific domain, as they provide a higher level of abstraction and reduce the cognitive load of the users. They can increase productivity, improve code readability, and enable better collaboration between domain experts and developers.

Hmm, I have to admit that's not bad. Perhaps I should've asked for examples. Anyways, someone recently talked about something called SLM or Small Language Models for specific tasks.

Looking at the way ChatGPT is being used by most people, the SML idea is actually quite a creative one. If you go with a subset of data required and reduce the need for large, it can be rather focused.

Some examples can be image generation, video generation, certain types of data analyses, creating menus, creating shopping lists etc where there's already several specialized applications that leverage the API and connect to other services and use an LLM prompt as an interface to describe the outcomes.

Being there for specific tasks helps these narrow down the tasks. It's either a subset of the LLM, or an interface of the LLM to something else.

So, I decided, let's look at our tasks in a typical analytics project and see how we can leverage some of these things. But, this is such a niche domain which is why I had already asked it about both LLMs and DSLs.

What is we combine those two concepts and create our own Domain Specific Language Model or DSLM.

领英推荐

HOW TO FINE-TUNE LLAMA 2 AND UNLOCK ITS FULL POTENTIAL

Floatbot.AI 1 年前

How To Use Prompt Engineering With Large Language…

Arcitech 1 年前

GPT-4 Is Here and It Is Powerful: Here Is All It…

DigiTrends 2 年前

Let's see what our AI program thinks of this idea?

Can the concept of LLMs and DSLs be combined to create a smaller and more niche Domain Specific Language model?

Yes, the concepts of Large Language Models (LLMs) and Domain-Specific Languages (DSLs) can indeed be combined to create smaller and more niche DSL models. This combination allows for the development of language models that are specifically trained and optimized for a particular domain or problem space.

The general approach would involve training a language model using domain-specific data and fine-tuning it to cater to the specific needs of the domain. By training on a narrower dataset, the resulting model can become more specialized and focused on understanding and generating text relevant to that domain.

The benefits of using a DSL model over a larger general-purpose LLM include:

Specificity: A DSL model can be tailored to the precise vocabulary, syntax, and semantics of the domain, making it more proficient at understanding and generating text within that specific context.
Efficiency: A smaller DSL model can be more resource-efficient in terms of memory and computational requirements compared to larger LLMs. This can be particularly advantageous for deployment on constrained devices or in scenarios where efficiency is a priority.
Accuracy: By training on domain-specific data, a DSL model can potentially achieve higher accuracy and precision in understanding and generating text within the targeted domain.
Interpretability: Smaller DSL models may offer better interpretability compared to larger LLMs. With a focused domain, it becomes easier to comprehend and analyze the behavior and outputs of the model.

Developing a DSL model involves defining the specific language constructs, designing the syntax and semantics, preparing or curating domain-specific training data, and training the model using techniques like transfer learning and fine-tuning.

Combining the strengths of LLMs and DSLs can lead to powerful and specialized language models that excel in specific domains, enabling more efficient and accurate text processing and generation for targeted applications.

Even the AI said so itself.

So, the program likes the idea and has given us the go ahead to think about DSLMs or Domain Specific Language Models. Granted, there is a fine line between this and SLMs or Small Language Models and perhaps even some overlap.

Let's start thinking about what tasks can we assign to this futuristic lawn mower that would help us reduce our time, increase our productivity and leverage technology the way it should be leveraged.

Or, as we say in the Data Vault world, anything that can be automated, should be automated.

We know it can't really do business requirements, but can it help us prioritize them? We don't know the answer to that one yet. Let's assume we know what we want to build.

Some Additional Thoughts

The first exercise we do in the Data Vault 2.0 System of Business Intelligence is to identify the taxonomy of the business keys that are in scope. Can a DSLM help here? It's highly likely that it can, and therefore help speed up the process of identifying the business keys, and to get to the extended taxonomy exercise.

The next thing we need to do is identify the locations in which these business keys exist and find peers which will drive any master data work. If it's already done upfront, then great because that work can be leveraged. This is another area where a DSLM can simply accelerate the work and make the humans more efficient because instead of searching through things, they have something to sit down and come to an agreement with the business.

Once those are finalized, the build can start. A DSLM would probably be fairly capable of converting these into the main entities required to be built.

Then assuming there's some modeling that needs to happen that maps from the outcomes wanted back to the sources via these entities, we'd have a fairly decent picture on how to build this. Assuming that some of these can plug-in to modeling tools (you'd expect the modeling tool vendors would already be looking at some of these capabilities). The nice thing is you can build models with multi-structured sets as we're talking to a machine here. It can easily decide important factors which are potential business keys for example in deeply nested JSON sets just as an example which would take a human more time.

Then the planning. The project plan for build, especially if it's a predictable pattern based Agile Build Cycle as taught in the Certified Data Vault 2.0 Practitioner (CDVP2) course, it should be possible to generate this from the entities and then several forward engineering things can happen. Of course, all of this would need human intervention at different steps, but the human can actually do the intelligent work and the machines can do the boring, churning, drone work that it's supposed to be doing and pretend that it's smarter than the human.

While this is all speculative, there appears to be merit in the idea of Domain Specific Language Models especially for pattern based builds and methodologies like DV 2.0 that reduce project risk.

After all, it makes sense to have a Domain Specific Language Model for the Data Vault 2.0 System anyway.

It actually already exists, but at the moment is at the human/team level and it should be relatively easy to extend to leverage machines to do the repetitive drudge work while the humans in the Analytics teams actually add value and intelligence to the enterprise ... like they're supposed to.

And, you can get access to all the information related to the latest language and standards for Data Vault 2.0 for zilch right over here just by creating an account

-> https://learn.datavaultalliance.com/product/free-account/

Norbert Kremer

Data Lakehouse Consultant, Cloud Solution Architect, Google Developer Expert, Google Cloud Authorized Trainer, Mentor. Talks on Data and AI.

1 年

I'm curious about this.

1 次回应

要查看或添加评论，请登录

Sanjay Pande的更多文章

Data Vault 2.0 - Already Designed for Analytics on SaaS Applications

2022年8月4日

Data Vault 2.0 - Already Designed for Analytics on SaaS Applications

Any organization who is a SaaS (Software as a Service) application provider and wishes to deliver additional analytics…

4 条评论
Data Vault 2.0 Helps Kick-Start MDM Efforts

2022年7月18日

Data Vault 2.0 Helps Kick-Start MDM Efforts

Whether you have a Master Data Management System or are planning on one, DV 2.0 solutions are already ahead.

5 条评论
Data Vault 2.0 and Data Science

2022年7月8日

Data Vault 2.0 and Data Science

Audience(s): Data Vault 2.0 enthusiasts working with AI/ML (Data Science) teams AI/ML teams working with data…

1 条评论
At Least 10 Reasons to Attend WWDVC 2022 - Live and Streaming

2022年5月6日

At Least 10 Reasons to Attend WWDVC 2022 - Live and Streaming

Every year Data Warehousing and Data Vault 2.0 enthusiasts gather at the World Wide Data Vault Conference to discuss…

2 条评论
The Day the Music Died - 14th November 2021

2021年11月14日

The Day the Music Died - 14th November 2021

The reference to the famous Don McLean song titled American Pie is on purpose. The song refers to the untimely crash of…
WWDVC 2021 will be 100% Virtual

2021年3月25日

WWDVC 2021 will be 100% Virtual

Data Vault Alliance is organizing the 7th annual WWDVC and it is going to be virtual this year owing to the travel…
Understanding Homeschoolers

2020年7月18日

Understanding Homeschoolers

Recently an angel investor put out a feeler about looking to fund a startup which will promote homeschooling in India…

1 条评论
At Least 7 Good Reasons to Attend the First Annual WWDVC EU

2019年8月22日

At Least 7 Good Reasons to Attend the First Annual WWDVC EU

The World Wide Data Vault Consortium has been the premier and only Data Vault focused conference in the world for 6…

1 条评论
12 More Reasons to Attend the 6th Annual WWDVC

2019年3月10日

12 More Reasons to Attend the 6th Annual WWDVC

If you’re a program manager, project manager, team leader, business intelligence specialist, data integration expert…
What Would Santa Do?

2018年12月25日

What Would Santa Do?

So, it's that time of the year again and I have to do my customary assessment of the operations at the north pole. Your…

2 条评论

See all articles

A Few Use Cases More ... Perhaps!

Sanjay Pande

Chief of Marketing and Product Strategy at Data Vault Alliance, Data Vault 2.0 Authorized Instructor, DV 2.0 Certified Master

领英推荐

Sanjay Pande的更多文章

社区洞察

其他会员也浏览了

RAG Techniques Every AI/ML/Data Engineer Should Know!

DeepMind’s Michelangelo Benchmark Reveals Limitations of Long-Context LLMs

A brief overview of large langauge model (LLM)

Under-thinking in LLMs: Understanding the Phenomenon and Its Implications

DeepSeek: The Future of AI-Powered Search and Large Language Models

Understanding & Building LLM Applications!

How to pick the right Large Language Models (LLMs) for modern enterprises?

Advanced Prompting Techniques in Large Language Models

DeepSeek R1 vs. OpenAI 4o vs. Claude 3.5 Sonnet vs. Llama 3.3: A Comparative Analysis of LLM

Top LLM Papers of the Week (July Week 2, 2024)

领英推荐

Sanjay Pande的更多文章

Data Vault 2.0 - Already Designed for Analytics on SaaS Applications

Data Vault 2.0 Helps Kick-Start MDM Efforts

Data Vault 2.0 and Data Science

At Least 10 Reasons to Attend WWDVC 2022 - Live and Streaming

The Day the Music Died - 14th November 2021

WWDVC 2021 will be 100% Virtual

Understanding Homeschoolers

At Least 7 Good Reasons to Attend the First Annual WWDVC EU

12 More Reasons to Attend the 6th Annual WWDVC

What Would Santa Do?

社区洞察

其他会员也浏览了

RAG Techniques Every AI/ML/Data Engineer Should Know!

DeepMind’s Michelangelo Benchmark Reveals Limitations of Long-Context LLMs

A brief overview of large langauge model (LLM)

Under-thinking in LLMs: Understanding the Phenomenon and Its Implications

DeepSeek: The Future of AI-Powered Search and Large Language Models

Understanding & Building LLM Applications!

How to pick the right Large Language Models (LLMs) for modern enterprises?

Advanced Prompting Techniques in Large Language Models

DeepSeek R1 vs. OpenAI 4o vs. Claude 3.5 Sonnet vs. Llama 3.3: A Comparative Analysis of LLM

Top LLM Papers of the Week (July Week 2, 2024)