登录查看更多内容

Week0 of #100WeeksofAzureDataAI??

Jiaranai Keatnuxsuo (???????? ?????????????)

Living a larger life ?? | AI, AI, AI Architect | Born in Cloud | #TechOptimiser | Climate Educator

发布日期: 2021年6月20日

"Maybe stories are just data with a soul.” ―?Brené Brown

Welcome to the first week of my #100WeekofAzureDataAI! I'm very glad you're here and be part of my learning journey. I start this learning journal to crystalise what I learnt and built each week as a junior Data & AI - Cloud Solution Architect and I hope my learnings also enhance your understanding of the world of Data and AI.

At the beginning of this week, I was concerned about where to start learning whether from a data engineering, data analysis, or data science perspective. Initially, I thought about learning EVERYTHING from the bottom up, back to front, starting with databases and data engineering because that's where the data pipeline begins. But I quickly came to the realisation that the plan wasn't effective because 1.) The shadowing/learning opportunities might go beyond these topics and 2.) It's so easy to pointlessly spend time down a rabbit hole. 3.) I'm not alone and I don't need to know everything. I have a team full of expertise that I can lean on. I'm still experimenting with my learning plan using Agile Methodology and Azure DevOps to track... still too early and too unrefined for me to share the plan but I will.

Just to make it easier to read, there are 3 key learnings this week and more details under each section ??

1?? SQL is everywhere
2?? Loading CSV files to a relational database isn't just CRTL+C and CTRL+V
3?? We can experience Internet of Things (IoT) + Artificial Intelligence (AI) + Edge Computing all in one box

Key learnings

1. SQL is everywhere

The first question that came to mind when I was doing the learning plan. What is the fundamentals skill I should have to work in data realm? SQL of course! SQL (Structured Query Language) is a tool used to communicate with databases. It's an expected skill for data professional and in top 5 skills for demanded in ICT jobs according to ACS's 2021 Australia Digital Pulse. I learnt SQL at uni but rusty at it now as I haven't been using it for 3 years. I need to get back into it so I started #100DaysofSQL, a challenge where I hold myself accountable publicly to practice SQL (almost) every day on Twitter for 100 days. I got through the common SQL commands such as SELECT, UPDATE, CREATE, DROP, DELETE, COUNT, MIN/MAX, AVG, ORDER BY, GROUP BY and JOINS to write up my own SQL statements.

Things that were clear in my mind:

Commands like SELECT, UPDATE, CREATE, DROP, DELETE, COUNT, MIN/MAX, AVG, ORDER BY, GROUP BY are intuitive to manipulate the table and data.
5 types of SQL Commands: DDL, DML, DCL, TCL, DQL

Things that were still muddy in my mind:

JOINS... hmm I found this great visualisation but on top of my head, I don't fully understand and remember what are the differences between left join, right join, full join, self join, inner join, outer join... sooo many joinsss. Part of it was because I didn't practice JOINS enough.
Many dialects of SQL... SQL, T-SQL, MySQL, PL/SQL, SQLite, PostgreSQL. The syntaxs aren't the same and when do I know which one to use? Why there are so many variations? Which one is better? Can I use T-SQL in non-Microsoft's databases?

Resources I used SQL:

W3School - basic commands and interactively querying dummy table
SQLBolt - another interactive SQL exercises

2. Loading CSV files to a relational database isn't just CRTL+C and CTRL+V

How did I get to this second point? I wanted to practice SQL in my own environment with an exciting Formula 1 Wolrd Championship dataset that I found on Kaggle. I thought why don't I just download the files and QUICKLY chuck them into my database? So I can learn how to create a relational database of choice, in this case, Azure SQL, and practice SQL. Feeding two birds with one seed, you see. I was too innocent ?? it actually took me 3 days to get through "chucking it into my database". Thankfully, my colleagues, Rivaaj, Richard, Sergio and Liesel helped point me in the right direction. I discovered that there are many ways to load CSV files into a relational database. Copy-pasting wasn't one of them because the schema needs to be defined for it to be loaded into Azure SQL.

Load CSV files into Azure Blob Storage then into Azure SQL to using Azure Data Factory
Load CSV files into Azure Blob Storage then into Azure SQL using SQL notebook in Azure Data Studio
Load CSV files into Azure Blob Storage then into Azure SQL using Shared Access Signature
Import flat files on SQL Server Management Studio (SSMS)

I managed to load CSV files using #1 and #4 this week. It was slower than I thought as I got stuck. #1 required me to provision a storage account, SQL database, SQL Server and Data Azure Data Factory. Excuse my Azure governance and naming standard as well as putting resources all over the place. I first got stuck to log on to the Query Editor of SQL database the error was "Your Local network setting might be preventing the QE from issuing queries." even though I have configured firewall rules and enabled port "443 and 1443". I later found out that the issue was from the storage account. I first set the access level as "private" and it worked when I changed to "public".

Resources required to load CSV files to blob storage and then to Azure SQL using Data Factory

Error logging on to SQL database query editor

The second place I got stuck at was in Azure Data Factory. I was creating a pipeline that loops through 13 CSV files in the Blob storage (storage account) and loads it to Azure SQL. Looping through each file activity and passing through dynamic content and parameters didn't work well. It loaded 1 CSV file and had no data in the table. I went to Sergio for help and the most valuable takeaway from him was to watch him debugging the pipeline. He validated and debugged each activity from the beginning, one by one until he found the problem.

first Azure Data Factory pipeline that had all errors

I chose the wrong activity to begin with. It was supposed to be "Get Metadata" instead of "Lookup", the parameters were incorrect and the database size was too small. Sergio's thought process was enlightening, I never thought of it that way. I just built the whole pipeline and ran it all at the end - failed miserably and not sure where it's gone wrong ??. This is the kind of thinking I must adopt to be an adaptable individual contributor moving forward.

After multiple validations and debugging, I could see all green ticks ticks ticks ???. But the third one I got stuck with was to remove the .csv from the name of the table.

files loaded correctly to Azure SQL but with .csv in the tables name

Altho the pipeline worked and it loaded all the files, .csv was irritating to see ?? so, I dropped (delete) all the tables on SSMS and modified dynamic content to add the string function "Replace()" and re-ran the pipelines. I read and followed the documentation. Guess what it DIDN'T WORK. The output was in the screenshot, only one file loaded and the name was "replace(item().name, ".csv", "")" ??????

I walked to Brian, my Irish colleague. He is an Azure specialist. He came around and got me to see the dumbest error. There was supposed to have "@" at the beginning of replace() function ??????????. In the end, I got all 13 files loaded up correctly into the Azure SQL database (however, still needs to define table schema which is TBC!)

I also successfully loaded files using #4 import flat files on SQL wizard. It's not as exciting but it simplifies the load... just 8-10 clicks per file and nothing more.

Bernard Marr 4 年前

Best Strategies for Upskilling in Data Science

Analytics Insight? 3 个月前

VenD Pulse - July 2024 Edition

VentureDive 4 个月前

Things that were clear in my mind:

Connecting all required Azure resources together.
The use of user-defined parameters, dynamic content, and functions in Data Factory. This is handy when I want to load multiple files into a targeted source.
The steps I needed to take to load the files into Azure SQL, the relational database.

Things that were still muddy in my mind:

Database schema vs. table schema on SSMS. This is in regards to dbo. and the data type of each column in the table. What is dbo? what is the fundamental concept I should learn?
What are the better practices in relation to performance, reliability, security, cost and operational processes, I should be thinking of to refine what I did? And where to start?
I haven't tried methods #2 and #3. I'm not sure which way is optimised for what yet.

Resources/tools I used:

SQL Server Management Studio
Azure Data Factory
Azure SQL Database
Loading a csv file into Azure SQL Database from Azure Storage | by Mayank Srivastava | Towards Data Science
Import Flat File to SQL - SQL Server | Microsoft Docs
Load data from Azure Blob storage into Azure SQL | Data Exposed
Populate Azure SQL Database from Azure Blob Storage using Azure Data Factory (sqlshack.com)
Azure Data Factory | Loop through multiple files in ADLS Container | Lookup & ForEach Activities - YouTube

3. Experiencing Internet of Things (IoT) + Artificial Intelligence (AI) + Edge Computing all in one box

Thursday was our learning day! This was the first time ever I got to learn in a group with other colleagues on Microsoft's latest technology, Azure Percept. This is a hardware and software platform that brings Azure services together including IoT Hub, Cognitive Services and Azure Machine Learning) to the edge.

Sergio and Rivaaj brought their Azure Percept for us to play with. We started by assembling the kit and completing the setup and configurations including connecting to Wi-Fi, setting up SSH, creating IoT Hub to use with Azure Percept and connecting the kit to IoT Hub and Azure account. All these were done in less than 30 mins (minus the time we were trying to connect to our Microsoft network ??).

With the Development Kit, we have a Developer Board (compute), Azure Percept vision (Eye), and Azure Percept Audio (Ear). Check out Azure Percept DK datasheet | Microsoft Docs for full details on the specs.

We started off by trying the out-of-the-box vision model that detects general things like a person, a bottle, and a chair. It's pretty accurate which is very impressive cause we didn't have to do any training/testing work at all. Then, we were trying to find an object that can't be detected by the model. We tried a pack of Nobby's nuts salt and vinegar ?? and our Microsoft badge. And the model didn't seem to understand what these are so we chose to build a model that detects our badge (instead of a bag of peanuts cause of the sample size and has there ever been a business case to detect a bag of ?????).

Building our own custom AI model was simple. I just captured images of different variations of our badges by using the Percept camera which I controlled from Percept Studio. All the images were directly kept on Custom Vision waiting to be tagged. For the custom model to work, there must be at least 15 tags to train the model and if you're in it for the model performance then you will need at least 50 tags. We took 21 photos and had 21 tags. The model returns 100% precision (True Positive / (True Positive + False Positive)) and 50% recall (True Positive / (True Positive + False Negative)). I would say our model performance still needs improvement and needs way more images of our badge ??. Of course, it's 100% precision because there's only one tag... recall at 50%? model is maybe no better than random guesses?

Despite the model performance, we had the model we wanted to deploy to the Edge device. That could be achieved with a simple click on Percept Studio. Voilà! Vanessa, Sergio, and Rivaaj were testing the detection using their badges. Seems like the model recognises the circle photos more than anything else... also, the closer, the higher the confidence level.

While we're viewing the webstream video, in the background we also have live telemetry in JSON format in real-time on the Percept Studio. Live telemetry outputs the log that is automatically sent to Azure IoT hub which can be consumed in an app or connect to Stream Analytics or Event Grid.

Although it was simple for us to get our prototype up and running at the speed of light, we still need to understand fundamental concepts of networking, IoT, Machine Learning, and Cloud Computing. Otherwise, it wouldn't be as simple as starting any proof of concept within minutes. The day after our learning day, Vanessa and I also spent some time explaining to each other the steps to reproduce our AI model at the Edge using Azure Percept. We didn't have much time to do the Ear module and we ran into a bug which still pending a resolution. I think Azure Percept is fascinating and it deserves a tutorial article for itself so I'll try to write something up next week.

Vanessa and I recapping on what we learnt with Azure Percept

Things that were clear in my mind:

The purpose of Azure Percept.. with where the world is evolving, there's often a need for an instantaneous and real-time response from and back to the physical world which demands intelligent edge = extending computing power and AI to the edge. In the past, it's always been such a time-consuming project to develop, deploy and operate. But Azure Percept allows us to start prototyping our best idea within less than an hour.
The process of training Custom Vision and deploying it to the Edge device.
The process of enabling the voice module and change custom keywords and commands of the voice assistant.
Azure Percept has to be in the same network; otherwise, the live stream can't be viewed.

Things that were still muddy in my mind:

I still can't explain in simple plain English why recall is 50% and precision is 100%.
In Custom Vision, why Compact domain is recommended for real-time classification on edge devices when the General domain works just fine? Does domain selection noticeably affect the model performance?
My vision on end-to-end solutions using Azure Percept is very blurry. It was more of what's next from getting live telemetry? I get that it can be consumed in other Azure services such as Stream Analytics or Event Grid but what's next and why? Can we go directly to Power BI if we want to create a dashboard? Perhaps the next thing we should do is work on an end-to-end solution. The after "live telemetry" story.

Resources I used:

Azure Percept Studio
Custom Vision
Azure Percept documentation | Microsoft Docs
Hello World Azure Percept Demo Paul DeCarlo Wednesday May 26, 2021 - YouTube
Unboxing the Azure Percept DevKit - YouTube

Thank you for spending time to read. Don't hold back on your constructive feedback. It will be greatly appreciated here ??. For the muddy things, I'll get back on more explanation after my research next week. #Azure. Invent with purpose. ?

Faye Dalglish-Jones

3 年

That’s a great read! Well done!

2 次回应

Trevor Drummond

Principal Consultant | Applied AI & ML Specialist | Data nerd

3 年

That's an amazing amount of ground to cover in a week, congratulations!

2 次回应

Daniella Mathews

Learning 3D Animation @ Animation Mentor, learning Animation and Game Design @ Curtin University. Taking a break from being a Cloud Solution Architect @ Microsoft

3 年

Love your work Jia! Really enjoyed the blurry bits ?? I agree that W3School is a great resource. Isn't amazing how much you can learn in a week!?

2 次回应

Jonathan Wade

Executive Leader | Modern Work and Security Lead at Microsoft | Dyslexic Thinker

3 年

???????? what a great start to your CSA journey and a fantastic way to kick off your #100weeksofAzure

4 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Week0 of #100WeeksofAzureDataAI??

Jiaranai Keatnuxsuo (???????? ?????????????)

Living a larger life ?? | AI, AI, AI Architect | Born in Cloud | #TechOptimiser | Climate Educator

Key learnings

1. SQL is everywhere

2. Loading CSV files to a relational database isn't just CRTL+C and CTRL+V

领英推荐

3. Experiencing Internet of Things (IoT) + Artificial Intelligence (AI) + Edge Computing all in one box

更多精彩文章

社区洞察

其他会员也浏览了

How Anyone Can Learn 'Big Data' For Free With IBM’s 'Big Data University'

Data Science Training and Certifications at Predictive Analytics Lab

Six Techniques That Helped Me Discover My Passion and Build a Career Path in Data Analytics and AI

Why Data Science Certification Could Be Your Best Career Investment in 2024

Data Science for working professionals

Unlocking Opportunities with Data Science Courses: A Comprehensive Guide

Unlocking the Future: A Comprehensive Guide to Data Science Courses

Trending Beginner-Level Applied Data Science Courses

Top 20 Applied Data Science Courses for Intermediate Level

Top 5 Professional Certifications of 2024 and How to Achieve Them

Key learnings

1. SQL is everywhere

2. Loading CSV files to a relational database isn't just CRTL+C and CTRL+V

领英推荐

3. Experiencing Internet of Things (IoT) + Artificial Intelligence (AI) + Edge Computing all in one box

Sustainability + Technology #32

2023年11月10日

Sustainability + Technology #31

2023年11月4日

Sustainability + Technology #30

2023年10月13日

Sustainability + Technology #29

2023年9月23日

Sustainability + Technology #28

2023年9月15日

Sustainability + Technology #27

2023年8月31日

Sustainability + Technology #26

2023年8月11日

Sustainability + Technology #25

2023年8月4日

Sustainability + Technology #24

2023年7月14日

Sustainability + Technology #23

2023年6月16日

社区洞察

其他会员也浏览了

How Anyone Can Learn 'Big Data' For Free With IBM’s 'Big Data University'

Data Science Training and Certifications at Predictive Analytics Lab

Six Techniques That Helped Me Discover My Passion and Build a Career Path in Data Analytics and AI

Why Data Science Certification Could Be Your Best Career Investment in 2024

Data Science for working professionals

Unlocking Opportunities with Data Science Courses: A Comprehensive Guide

Unlocking the Future: A Comprehensive Guide to Data Science Courses

Trending Beginner-Level Applied Data Science Courses

Top 20 Applied Data Science Courses for Intermediate Level

Top 5 Professional Certifications of 2024 and How to Achieve Them