Google Sheets, KNIME ETL, CSVs, Tigers, Bears, Excel, Tableau, oh my.
A brief working demo, you can follow along too. Use Case; Tableau Users need different calculations on the front end, previous Tableau developer nested everything in the same workbook, we want to put the complexities of the ETL in an ETL product.
Depending on your feedback, I will generate a series of video tutorials explaining everything, to be consumed at your pace.
Let's see what you will get when you're done with this blog.
Today, I'm excited to use a lightweight ETL process in KNIME.
Tableau can handle this problem... However, tableau is not a data warehousing tool.
I'm going to show you the entire setup, which will enable more advanced analytics capabilities and increase your collaborations skills.
I'm going to cover Google Sheets, KNIME ETL, CSVs, Snakes, and Spreadsheets.
And. Tableau. Tigers, bears, etc.
The first time I heard about google sheets?...
A non-technical major giggled directly in my face and was quick to say; "have you been living under a rock."
You wake up to an email, it's a link to google sheets, but you've never used google sheets...
The first time I saw Google sheets... Boy howdy. Here we go...
The capstone course at UNT, group project, summer semester, we get to the computer lab and I'm very familiar.
I was the tutor for databases in this lab, it was how I paid the bills.
Here's a girl laughing at me.
Here I am, coming across as MR.Technical guy, "i will do the powerpoint for us."
She said, "have you heard of Slides." And I said, "no."
That was all.
And I was paired into the group as the technology person.
She taught me something we technical and non-technical people do.
She poked me for an answer to a question, to see if I was "ramped up and technical enough to work with her collaboration skills."
And we do this with SQL, Tableau consulting, EDWs, join logic, string logic...
We leave obvious or clearly bad logic, just to see if anyone is watching.
We like to add little buzz words, tricks, and always learning about the people we interact with because deep down we want to help them learn every aspect of technology.
You wake up to a .sql file, you have no idea how to open the file, and when you do, it's gibberish.
You ask them for the data, not the SQL queries... lol
If you're like me, you learned about each file extension as you go.
Let's blog about the essentials of getting work done. And working with 'other data inputs' like other people.
Because sometimes, you need to share a data source.
KEY: You need gmail to start, but you don't need gmail to use google sheets!
- Use https://docs.google.com/spreadsheets/
- (beginner) Download knime on my website and read blogs by me @ https://knime.dev
- https://knime.com the official website
- (advanced) https://www.knime.com/learning/cheatsheets cheat sheets :)
- Export to CSV in google sheets
- Use as data source in KNIME
- Push to front-end tool
When getting stuff done, everyone reverts to "what's easy" and trello, slack, etc... not easy every time, so I resort to Google sheets.
People jump on spreadsheets without much ramp-up, Google sheets is a great product.
Let's discuss.
Using Google Sheets.
- you need a gmail account
- visit https://docs.google.com/spreadsheets/
- update the title; Data_Tables
- build the following table in Google sheets.
- Click SHARE to get a link, or share with ANYONE... (so easy fam)
6. click file/download/csv...
7. find your download, put your CSV data source in an easy directory, or a new folder.
8. We will use this file in KNIME.
NOTICE: Don't stress about this manual process, in the biggest corporations around the world, people are still using powerpoint slides, so don't stress out about the data input.
I see everyone hit the brakes because 1 simple step is not accomplished, "we need to automate connecting to...." and I say, "NO WE DON'T, give me the CSV."
Stopping because 1 thing isn't perfect means you have a lack of understanding with modular based programming and that thought process will help you with solving in multi step projects.
Most don't care about this part, if someone worries about it being automated, don't let it get to you because it's not as important as it appears. That stuff is "developed" and "ready"... but that's probably an entirely too advanced blog to walk through when you're just learning the WORDS.
ETL.. *
Extract - extract data from a source table, file, api, database, etc...
Transform - transform data in an application like KNIME, excel, SSIS, alteryx, etc..
Load - load data into the same file format, on a shared drive, in an email, database, or all of these locations at the same time.
Using KNIME Analytics Platform
New? Welcome to a free *ETL product, data science, and lots of fun stuff.
But don't want to intimidate you because these apps can be scary, they are advanced, and you can connect to databases, files, build your own custom API, webscrape...
Excited yet?
Is knime free? Yes, knime is free.
Knime is awesome, I'm working on automating my taxes in Knime.
You need to learn "what to search" to find the right stuff in KNIME.
It's a lot like alteryx, ssis, sap data services, etc.
- download knime on my website https://knime.dev
- walk through the install process
- open KNIME! (triangle)
- start a new workflow, next next next. You know the drill.
Finding "knime file reader" node is relatively easy. There are other "knime csv reader" nodes but these nodes are different from each other. File reader can read files, in my experience, better than csv reader. Which is great because I like things that work easily. However, if you're advanced, please visit csv reader for advanced solving.
Also, if you want to "write csvs in knime" please check out the blog - or follow our Reddit @ analytics automation.
Open file reader
- double click or right click, navigate to configure, i made a macro to open "E"
- find your file/csv!
- open it...
Don't get distracted, however, this is where you find most things you need per node.
Let's continue.
BROWSE to the CSV :)
Now, we see our data.
You're done.
This is an automated pull.
If the CSV file updates, when you execute the workflow, you will automatically see the changes. We can stress about complex google sheets connection in a future world where we get our hair cut and shave exactly the same time every week, concurrently, lol.
What I just explained, is what KNIME does for free. Batch scheduling, and you can pick the frequency. Which is pretty sweet.
Sorry. Did you press OK?
Press OK.
Find Column Rename and String Manipulation in the Node repository, utilizing the search functionality we showed in the screenshot above.
You can do it :)
Automate renaming the headers in KNIME
Let's automate renaming the headers because who wants to click buttons.
I don't want to tell someone to "change your headers" when I can fix it in less than the time it takes me to email them, get them on slack, etc.
Okay... Not scary...
Double click Full Name. Let's change it.
Adding an underscore, is helpful. Did you know R can't handle a lot of regular column headers, which makes data science difficult, then they need to make the change. Let's not dig into this on this blog but I want to show you how you can "help your data science person." Because this requires no code.
We can do this in SQL...
- Select
- Region,
- Key,
- "Full Name" as "Full_Name"
- from Data_table.hr_table.csv
But doing this, to do something easy, is a lot of work when the SQL turns into 300k lines of code, which not a tall glass of water for people like myself, and other data architects...
Which is why we move towards these new products, that are user friendly, we need user friendly solutions because explaining the SQL above may take longer than this blog.
Above... I want your screen to look like the above!
Automate concatenating columns in KNIME
Double click the string manipulation tool. Like the workflow above, make sure they are connected to each node. You can drag and drop the input to the output.
Open string manipulation tool, if you haven't already.
Okay, not that intimidating. Some of these functions will do everything you expect other products to do, you just need to learn the functions, similar to any product. Today we will work through "joining two columns."
Joining two columns in Knime or concatenating string characters.
To begin, we need to find JOIN()...
Okay; here's the screen cap.
Join(X, Y) = XY... Rad.
Name the new column rad.
If you're following along, you can copy paste the code; join($Key$,$Full_Name$)
Dollar signs?! Don't stress. This is how columns are stored in KNIME.
Pretend it's Tableau... contains([Key],[Full_Name]) - notice brackets break up the text. Similar to the dollar signs, nothing crazy.
Shall we continue?
Be sure to name that column rad. Click OKAY.
Making your string become a number in knime
Woops... We are learning.
I'm reading a lot of jibberish too.
But the key jibber is "Integer, String"...
You have $key$ as an integer, doesn't work.
You can't join a number and a string.
Let's wrap STRING() around our key.
Running your first workflow in KNIME
It's time. Press that button.
Your traffic lights turn green.
Right click String manipulation.
This button opens a data table.
Or in the top menu next to the play button, once selected on "most nodes," you will be able to access the data.
Far right button with the magnifying glass... Opens data...
As you hit play, the software auto-caches the data, you don't need to "run it again." Which is a big plus.
Here's a quick high level view. You're using the CSV from the Google Sheets.
Most will jump to "let's connect to google sheets"...
However, if you can start the work now, and show value today, you don't need to learn to connect to a complex database or source. You can learn that tomorrow.
Excel writer XLS
Okay, let's write to excel using KNIME.
We will set it to "overwrite existing file" and know that it will delete any file with the same naming condition.
We call it output1. I like adding a number, to start a version system and no fuss.
Click OK and run the workflow.
Before we leave this screen...
I want to break apart this 1 step or it's a "got you."
- Header-less CSVs are helpful when the opportunity arises.
- Remember - you know it's default set unchecked.
- If you missed the "add column header step"... Don't delete the file.
- Remember - we se the configuration to "overwrite..."
- It will automatically do the processing for us. Forever.
- Check for the file.
- Open the excel file.
If you're keeping up, your screen should look similar.
Moving into Tableau Desktop...
You open the spreadsheet/excel as a data source.
Click Connect to data. Select Microsoft Excel.
Navigate to your spreadsheet. Output1.
With tableau open, data loaded, make a basic viz, double click Rad, and Key.
You've accomplished a lot :)
Drag and drop [key] from dimensions into measures.
Add [key] to the visualization.
Congrats, you're now doing ETL, transition data, and likely a jedi.
Fun times. SQL would have been rather easy but fixing it, or supporting it, or delegating that to someone who doesn't know the language, is a BIG negative.
Thanks for your time,
Tyler Garrett, rogue tableau consultant.
Future Tree Trimming expert in Austin Texas.
Want to see my most recent blogs on LinkedIn?
Co-founder of Kids in Data ? Head of Technology ? STEM Ambassador ? Data Literacy Advocate ? Public Speaker
5 年Once again another great blog. Keep up the great work.