?? Enhancing Analytics Engineering Productivity with ChatGPT

?? Enhancing Analytics Engineering Productivity with ChatGPT

There was a time not too long ago where I thought I’d never use artificial intelligence (AI). I remember hearing about it during my undergrad and thinking “this will never be available in my lifetime”. Well, the time has come and here we are.

With that being said, there are things about my day to day role that I choose to automate with the help of generative-AI tools, like ChatGPT, regardless of my skillset with SQL and dbt. After five years in the industry, here are my tips:

You don’t have to be great at everything

I don’t know about you — but working with regular expressions (REGEXP) is one of my least favourite things about my career choice. Being able to use ChatGPT to assist me with automating regular expressions has been invaluable for my ability to move through these tasks quickly, and with less of a frown on my face.

  1. Define the task using natural language: Identify the specific use case where REGEXP is needed in your SQL query — is it for extracting data or pattern matching? Depending on the use case, you’ll need to prompt accordingly.
  2. Prompt ChatGPT: depending on the use case you identified in step 1 – you’ll need to adjust your prompt. For most of my use cases, it’s text extraction, and this helps me determine my prompt. One thing I’ve learned is, without mentioning the SQL dialect, you’ll usually end up with a syntax error. Using ChatGPT to create REGEXP compatible Snowflake has taught me that you need to constantly remind it that the repetition operator (?) does not exist. The constant reminder is still faster however, than me working it through it on my own.
  3. Document and test: the warning at the bottom of ChatGPT is real – it is not always right. Make sure you do your due diligence and test the query it has produced against your specific data. If you’re using dbt in your stack, then you also have the luxury of testing the generated expression against each row of your data using the dbt expectations package to assert it is correct as your data evolves over time. When the test fails — you can find the row that fails the assertion and use that row to re-prompt ChatGPT and edit your original expression. Iteration is key, even when using AI.

Automate what you can and adapt as needed

While I am a huge fan of dbt (data build tool) and all the things it’s allowed analytics engineers to do — creating .yml files and documenting your data can be tedious. While some data models are too specific to your business context to automate, it is great for automating staging model documentation, which is typically geared toward your data sources.

  1. Prompt ChatGPT: By using natural language descriptions you can automate tasks such as asserting the primary key (id) is never null and always unique and that the updated_at column is never null. It is also useful for generating basic documentation for simpler models such as; users, orders, and sales. A common basic prompt I use is as follows: - Generate a dbt .yml file for a model called users & describe the model accordingly. - Add the following columns to the .yml (try running a DESC VIEW/TABLE in your IDE to get a list of all columns quickly). - Ensure the primary key column (id) is unique and not_null. - Ensure the updated_at column is not_null. - Infer documentation for all columns or otherwise leave blank. It doesn’t need to stop there — you can also add more advanced testing to dbt data tests by including prompts such as “create a test on the ‘profit margin’ column to assert it is always positive”.
  2. Fill in the blanks: While the .yml generation won’t be able to handle every column, it will be able to complete about 90% of the file with a comfortable level of accuracy. Just as with the regular expressions from my previous example, there will need to be some human interaction to get it over the finish line, but it has removed a large portion of the work off your plate. Additionally, you can also add links to source documentation, and ask ChatGPT to add it to your .yml where applicable.

It’s also important to remember that you can prompt ChatGPT to complete your .yml file in either traditional .yml (i.e. with string descriptions) or using doc blocks via a Markdown file. Both of these methods of documentation are great, but the larger your project becomes — the better it will be to migrate to doc blocks where you have the luxury of pulling documentation from upstream models into downstream resources. Stay tuned for future posts on the benefits on migrating from traditional .yml to doc blocks.

The integration of ChatGPT into my analytics engineering workflows for tasks such as automating REGEXP and dbt (.yml) files yields a myriad of benefits that have significantly elevated my efficiency and effectiveness with data manipulation and documentation. While I love my career choice, I’m a huge believer in you don’t have to love every part of your job. These two areas specifically are things I prefer to let AI assist me with, especially with various different projects on the go and various SLAs to meet.

Time To Crazy

Web Developer at no

8 个月

Use ChatGPT Free https://ai-chatbot.one/

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了