Week0 of #100WeeksofAzureDataAI??
makeameme.org

Week0 of #100WeeksofAzureDataAI??

"Maybe stories are just data with a soul.” ―?Brené Brown

Welcome to the first week of my #100WeekofAzureDataAI! I'm very glad you're here and be part of my learning journey. I start this learning journal to crystalise what I learnt and built each week as a junior Data & AI - Cloud Solution Architect and I hope my learnings also enhance your understanding of the world of Data and AI.

At the beginning of this week, I was concerned about where to start learning whether from a data engineering, data analysis, or data science perspective. Initially, I thought about learning EVERYTHING from the bottom up, back to front, starting with databases and data engineering because that's where the data pipeline begins. But I quickly came to the realisation that the plan wasn't effective because 1.) The shadowing/learning opportunities might go beyond these topics and 2.) It's so easy to pointlessly spend time down a rabbit hole. 3.) I'm not alone and I don't need to know everything. I have a team full of expertise that I can lean on. I'm still experimenting with my learning plan using Agile Methodology and Azure DevOps to track... still too early and too unrefined for me to share the plan but I will.

Just to make it easier to read, there are 3 key learnings this week and more details under each section ??

  • 1?? SQL is everywhere
  • 2?? Loading CSV files to a relational database isn't just CRTL+C and CTRL+V
  • 3?? We can experience Internet of Things (IoT) + Artificial Intelligence (AI) + Edge Computing all in one box


Key learnings

1. SQL is everywhere

The first question that came to mind when I was doing the learning plan. What is the fundamentals skill I should have to work in data realm? SQL of course! SQL (Structured Query Language) is a tool used to communicate with databases. It's an expected skill for data professional and in top 5 skills for demanded in ICT jobs according to ACS's 2021 Australia Digital Pulse. I learnt SQL at uni but rusty at it now as I haven't been using it for 3 years. I need to get back into it so I started #100DaysofSQL, a challenge where I hold myself accountable publicly to practice SQL (almost) every day on Twitter for 100 days. I got through the common SQL commands such as SELECT, UPDATE, CREATE, DROP, DELETE, COUNT, MIN/MAX, AVG, ORDER BY, GROUP BY and JOINS to write up my own SQL statements.

Things that were clear in my mind:

No alt text provided for this image

  1. Commands like SELECT, UPDATE, CREATE, DROP, DELETE, COUNT, MIN/MAX, AVG, ORDER BY, GROUP BY are intuitive to manipulate the table and data.
  2. 5 types of SQL Commands: DDL, DML, DCL, TCL, DQL


Things that were still muddy in my mind:

No alt text provided for this image

  1. JOINS... hmm I found this great visualisation but on top of my head, I don't fully understand and remember what are the differences between left join, right join, full join, self join, inner join, outer join... sooo many joinsss. Part of it was because I didn't practice JOINS enough.
  2. Many dialects of SQL... SQL, T-SQL, MySQL, PL/SQL, SQLite, PostgreSQL. The syntaxs aren't the same and when do I know which one to use? Why there are so many variations? Which one is better? Can I use T-SQL in non-Microsoft's databases?

Resources I used SQL:

  • W3School - basic commands and interactively querying dummy table
  • SQLBolt - another interactive SQL exercises


2. Loading CSV files to a relational database isn't just CRTL+C and CTRL+V

How did I get to this second point? I wanted to practice SQL in my own environment with an exciting Formula 1 Wolrd Championship dataset that I found on Kaggle. I thought why don't I just download the files and QUICKLY chuck them into my database? So I can learn how to create a relational database of choice, in this case, Azure SQL, and practice SQL. Feeding two birds with one seed, you see. I was too innocent ?? it actually took me 3 days to get through "chucking it into my database". Thankfully, my colleagues, Rivaaj, Richard, Sergio and Liesel helped point me in the right direction. I discovered that there are many ways to load CSV files into a relational database. Copy-pasting wasn't one of them because the schema needs to be defined for it to be loaded into Azure SQL.

  1. Load CSV files into Azure Blob Storage then into Azure SQL to using Azure Data Factory
  2. Load CSV files into Azure Blob Storage then into Azure SQL using SQL notebook in Azure Data Studio
  3. Load CSV files into Azure Blob Storage then into Azure SQL using Shared Access Signature
  4. Import flat files on SQL Server Management Studio (SSMS)

I managed to load CSV files using #1 and #4 this week. It was slower than I thought as I got stuck. #1 required me to provision a storage account, SQL database, SQL Server and Data Azure Data Factory. Excuse my Azure governance and naming standard as well as putting resources all over the place. I first got stuck to log on to the Query Editor of SQL database the error was "Your Local network setting might be preventing the QE from issuing queries." even though I have configured firewall rules and enabled port "443 and 1443". I later found out that the issue was from the storage account. I first set the access level as "private" and it worked when I changed to "public".

Resources required to load CSV files to blob storage and then to Azure SQL using Data Factory
Error logging on to SQL database query editor

The second place I got stuck at was in Azure Data Factory. I was creating a pipeline that loops through 13 CSV files in the Blob storage (storage account) and loads it to Azure SQL. Looping through each file activity and passing through dynamic content and parameters didn't work well. It loaded 1 CSV file and had no data in the table. I went to Sergio for help and the most valuable takeaway from him was to watch him debugging the pipeline. He validated and debugged each activity from the beginning, one by one until he found the problem.

one CSV file incorrectly loaded
first Azure Data Factory pipeline that had all errors

I chose the wrong activity to begin with. It was supposed to be "Get Metadata" instead of "Lookup", the parameters were incorrect and the database size was too small. Sergio's thought process was enlightening, I never thought of it that way. I just built the whole pipeline and ran it all at the end - failed miserably and not sure where it's gone wrong ??. This is the kind of thinking I must adopt to be an adaptable individual contributor moving forward.

After multiple validations and debugging, I could see all green ticks ticks ticks ???. But the third one I got stuck with was to remove the .csv from the name of the table.

No alt text provided for this image
files loaded correctly to Azure SQL but with .csv in the tables name
.csv remove failed

Altho the pipeline worked and it loaded all the files, .csv was irritating to see ?? so, I dropped (delete) all the tables on SSMS and modified dynamic content to add the string function "Replace()" and re-ran the pipelines. I read and followed the documentation. Guess what it DIDN'T WORK. The output was in the screenshot, only one file loaded and the name was "replace(item().name, ".csv", "")" ??????

No alt text provided for this image


I walked to Brian, my Irish colleague. He is an Azure specialist. He came around and got me to see the dumbest error. There was supposed to have "@" at the beginning of replace() function ??????????. In the end, I got all 13 files loaded up correctly into the Azure SQL database (however, still needs to define table schema which is TBC!)

I also successfully loaded files using #4 import flat files on SQL wizard. It's not as exciting but it simplifies the load... just 8-10 clicks per file and nothing more.


Things that were clear in my mind:

  1. Connecting all required Azure resources together.
  2. The use of user-defined parameters, dynamic content, and functions in Data Factory. This is handy when I want to load multiple files into a targeted source.
  3. The steps I needed to take to load the files into Azure SQL, the relational database.

Things that were still muddy in my mind:

  1. Database schema vs. table schema on SSMS. This is in regards to dbo. and the data type of each column in the table. What is dbo? what is the fundamental concept I should learn?
  2. What are the better practices in relation to performance, reliability, security, cost and operational processes, I should be thinking of to refine what I did? And where to start?
  3. I haven't tried methods #2 and #3. I'm not sure which way is optimised for what yet.

Resources/tools I used:


3. Experiencing Internet of Things (IoT) + Artificial Intelligence (AI) + Edge Computing all in one box

Thursday was our learning day! This was the first time ever I got to learn in a group with other colleagues on Microsoft's latest technology, Azure Percept. This is a hardware and software platform that brings Azure services together including IoT Hub, Cognitive Services and Azure Machine Learning) to the edge.

No alt text provided for this image
No alt text provided for this image

Sergio and Rivaaj brought their Azure Percept for us to play with. We started by assembling the kit and completing the setup and configurations including connecting to Wi-Fi, setting up SSH, creating IoT Hub to use with Azure Percept and connecting the kit to IoT Hub and Azure account. All these were done in less than 30 mins (minus the time we were trying to connect to our Microsoft network ??).




No alt text provided for this image

With the Development Kit, we have a Developer Board (compute), Azure Percept vision (Eye), and Azure Percept Audio (Ear). Check out Azure Percept DK datasheet | Microsoft Docs for full details on the specs.

No alt text provided for this image

We started off by trying the out-of-the-box vision model that detects general things like a person, a bottle, and a chair. It's pretty accurate which is very impressive cause we didn't have to do any training/testing work at all. Then, we were trying to find an object that can't be detected by the model. We tried a pack of Nobby's nuts salt and vinegar ?? and our Microsoft badge. And the model didn't seem to understand what these are so we chose to build a model that detects our badge (instead of a bag of peanuts cause of the sample size and has there ever been a business case to detect a bag of ?????).

No alt text provided for this image

Building our own custom AI model was simple. I just captured images of different variations of our badges by using the Percept camera which I controlled from Percept Studio. All the images were directly kept on Custom Vision waiting to be tagged. For the custom model to work, there must be at least 15 tags to train the model and if you're in it for the model performance then you will need at least 50 tags. We took 21 photos and had 21 tags. The model returns 100% precision (True Positive / (True Positive + False Positive)) and 50% recall (True Positive / (True Positive + False Negative)). I would say our model performance still needs improvement and needs way more images of our badge ??. Of course, it's 100% precision because there's only one tag... recall at 50%? model is maybe no better than random guesses?

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


Despite the model performance, we had the model we wanted to deploy to the Edge device. That could be achieved with a simple click on Percept Studio. Voilà! Vanessa, Sergio, and Rivaaj were testing the detection using their badges. Seems like the model recognises the circle photos more than anything else... also, the closer, the higher the confidence level.

No alt text provided for this image
No alt text provided for this image

While we're viewing the webstream video, in the background we also have live telemetry in JSON format in real-time on the Percept Studio. Live telemetry outputs the log that is automatically sent to Azure IoT hub which can be consumed in an app or connect to Stream Analytics or Event Grid.

Although it was simple for us to get our prototype up and running at the speed of light, we still need to understand fundamental concepts of networking, IoT, Machine Learning, and Cloud Computing. Otherwise, it wouldn't be as simple as starting any proof of concept within minutes. The day after our learning day, Vanessa and I also spent some time explaining to each other the steps to reproduce our AI model at the Edge using Azure Percept. We didn't have much time to do the Ear module and we ran into a bug which still pending a resolution. I think Azure Percept is fascinating and it deserves a tutorial article for itself so I'll try to write something up next week.

Vanessa and I recapping on what we learnt with Azure Percept


Things that were clear in my mind:

  1. The purpose of Azure Percept.. with where the world is evolving, there's often a need for an instantaneous and real-time response from and back to the physical world which demands intelligent edge = extending computing power and AI to the edge. In the past, it's always been such a time-consuming project to develop, deploy and operate. But Azure Percept allows us to start prototyping our best idea within less than an hour.
  2. The process of training Custom Vision and deploying it to the Edge device.
  3. The process of enabling the voice module and change custom keywords and commands of the voice assistant.
  4. Azure Percept has to be in the same network; otherwise, the live stream can't be viewed.

Things that were still muddy in my mind:

  1. I still can't explain in simple plain English why recall is 50% and precision is 100%.
  2. In Custom Vision, why Compact domain is recommended for real-time classification on edge devices when the General domain works just fine? Does domain selection noticeably affect the model performance?
  3. My vision on end-to-end solutions using Azure Percept is very blurry. It was more of what's next from getting live telemetry? I get that it can be consumed in other Azure services such as Stream Analytics or Event Grid but what's next and why? Can we go directly to Power BI if we want to create a dashboard? Perhaps the next thing we should do is work on an end-to-end solution. The after "live telemetry" story.

Resources I used:


Thank you for spending time to read. Don't hold back on your constructive feedback. It will be greatly appreciated here ??. For the muddy things, I'll get back on more explanation after my research next week. #Azure. Invent with purpose. ?

That’s a great read! Well done!

Trevor Drummond

Principal Consultant | Applied AI & ML Specialist | Data nerd

3 年

That's an amazing amount of ground to cover in a week, congratulations!

Daniella Mathews

Learning 3D Animation @ Animation Mentor, learning Animation and Game Design @ Curtin University. Taking a break from being a Cloud Solution Architect @ Microsoft

3 年

Love your work Jia! Really enjoyed the blurry bits ?? I agree that W3School is a great resource. Isn't amazing how much you can learn in a week!?

Jonathan Wade

Executive Leader | Modern Work and Security Lead at Microsoft | Dyslexic Thinker

3 年

???????? what a great start to your CSA journey and a fantastic way to kick off your #100weeksofAzure

要查看或添加评论,请登录

社区洞察

其他会员也浏览了