How to get your data in Google BigQuery?
Orkhan Atesh Isazade
Digital Marketing Expert | Web Analyst | Driving Data-Driven Growth ?? Transforming Data into Insights | ?? Accelerating ROI | ?? Innovating Strategies
This article covers my eleventh -week review of studying Digital Analytics Minidegree at CXL Institute. What is CXL Institute?
CXL Institute is a paid training program institute and it provides mini degrees and online certification courses in marketing. All the programs at CXL Institute are taught by industry leaders and top marketers which makes this perform the best place for team marketing.
CXL Institute providing Minidegrees in 5 scopes of Digital Marketing and they are:
- Conversion Optimization
- Customer Acquisition
- Digital Analytics
- Digital Psychology and Persuasion
- Growth Marketing
CXL Institute currently has 48 online courses divided into CRO & UX, Analytics, and Marketing. In this article, I am going to talk about how to get the data into BigQuery.
To begin with, I would like to thank Khrystyna Grynko, who is the head of data at Better & Stronger company, for letting me use her presentation on my article as well as helping me out all the time.
So let's see how to get your data in BigQuery. This is your project here. You will have your dataset in your project which you will make. So, I created a sales dataset. Now I'm going to make others, and you can see how you might build them. Imagine you want to create a dataset that only gets CRM data.
Let's call it CRM. Let's choose a location. If you're in Europe, you may want to choose a European Union location. It means that your data is stored on the EU services, servers, sorry. So you can set a table expiration. So all the tables inside this dataset can are deleted after a certain amount of days. So we will not set the table expiration. So that's it. Let's create a dataset. So here it is.
let's try to create a table. So there are several options. You can start from scratch, create an empty table that will then receive data, or you can start from loading data from Google Storage, Google Drive, from your desktop, or from, from Cloud Bigtable. So we will not check how to do this with Cloud Bigtable. It's another Google Cloud Services that is (mumbles) no scale database service. So we'll not talk about it in this course. So let's start from Empty table. So, to create an empty table, let's name ittable1, and we will need to provide a schema. So I'll need to provide a JSON schema so that BigQuery will see what fields we want to include in this table. So here, we say that we want to have a field region, so the description of the field is Region. We can add some more information. This will be a REQUIRED field, otherwise, we will note NULLABLE.So the name will be Region and the type is STRING text. And the second is Total Revenue, it will be REQUIRED, and the name Total Revenue and type is FLOAT.So, for every schema you will need to provide, you can use JSON with only four options to set. It's quite simple, it's not rocket science. I will provide this file so you can use it as an example. Okay, so let's click Create table, and see if it's okay.
So we had our first empty table created. So let's check out a second option, how to upload your data to BigQuery. Let 's create a table, uploading data from your Google Cloud Storage. And if we search here, we see we've got nothing in our cloud storage yet. So let's go to Cloud Storage and build a bucket and a table. So, here is my empty Cloud Space. At first, we will need to create a bucket. Let's say that we are going to have the CRM data. And it's unique on a global scale and you'll need to give your dataset, your bucket, a globally unique name. Yeah, that's perfect. Let's say it's only going to be available in one country, so I'll be choosing in Europe. Well, the closest Continue to France, Belgium. And here we can also select everything that is natural, Fine-grained by nature. That's it, Okay. Click on the Build button.
Here's our Sales file. Let's go back to our BigQuery, go to our dataset, click on Create table, choose Google Cloud Storage, and let's browse to our bucket, here it is, our bucket, that contains only one file that is called Sales Records. We need to click on it and name a table. We will ask BigQuery to auto-detect the schema on the table. Otherwise, if you see that BigQuery cannot correctly define the data schema, so for example, it takes the text like numbers, well, it is not possible, but maybe if you have some strange data and you want to specify, you can specify the schema with JSON, as we did with an empty table, but otherwise, you will click on Auto-detect.No partitioning, we will see what partitioning means in the future lessons. And let's click on Create table. Okay, so here it is. Let's see what we have in this table.
And let's see if BigQuery detected correctly the types of the fields. So Cost, Profit, Revenue, Price, Cost, all defined like as FLOAT, Unit_Sold INTEGER, Ship_Date is DATE, Order_Date DATE, ID INTEGER, and STRING, other fields STRINGS or text. Looks perfect. So we created our table using Cloud Storage. So let's try other options of loading your data into BigQuery.
So here's our dataset, let's CREATE TABLE. You can simply upload the file from your desktop computer. Click on Browse. Let's pick a file that we previously loaded into our computer, such as, for example, Sales Records. It automatically detects the format. But now that you can have several different formats that are available for uploading to BigQuery, you can upload JSON file, Avro, Parquet, and others. Let's name this file this table. Let's click on Auto-detect schema. So we'll see if BigQuery understands our schema automatically. Create table.
oh, here it is. Let me check the Preview mode. It seems okay. And the data is okay. So Total_Revenue is in FLOAT format, Ship_Date and Order_Date are in DATE format, Item_Type in STRING, so it's cool. It worked.
So let's learn how to upload your data into BigQueryusing Google Cloud Function. So we will need our dataset, we will need the table that will receive data, we will create this table. So I've got a file with a data structure. Here is the file. Here it is, sales structure. So I've got the header row and the first row just to make sure that BigQuerydetects automatically this schema so that it understands the data type of every field. So I will create this table, I click now Upload, Browse. Here it is. It will automatically detect this schema. I will call it sales2020.
BigQuery detected automatically all the fields and all the data types. We see that we have only one row in this table, so this table will receive the data.
Today, we learned about different options and possibilities on how you can import the data to BigQuery. We've also discovered some tools like Google Cloud, Dataflow, Cloud Function, and Pub/Sub.