BigQuery Chapter 2A: Interacting with the Platform
Kunal Mehta
Global Data Platform Head | Product Owner | Associate Director | Data Analytics | Google Analytics | Adobe Analytics | Google Cloud Platform | Machine Learning | Data Science | Data Engineering | Speaker
Hello All, in this edition of Learning BQ, we will focus on Interacting with BigQuery console, and creating the building blocs within BQ. You can say that this is where the practical exposure would start, and in the next article, it will gain momentum. In this article we are going to talk about:
- Costing of BQ
- Building Blocs of BQ:
- Querying within BQ
As always, before you proceed any further, remember that this is 5th article of the series, and it would really make a lot of sense if you follow the series from article 1, rather than from anywhere in between.
Now that you are all caught up, let’s understand how do we interact with the BQ platform. As we discussed earlier, there are 3 basic ways of accessing BigQuery, which if you remember is nothing but a way to interact with Dremel:
- Web UI
- Command Line tool
- APIs
For our learning we will only focus on Web UI. To start accessing the BQ UI, first we need to create a project in GCP, Google Cloud Platform, by clicking here. Once you have created the account, we need to enable billing here, and before we do that, let me tell you about the billing details of BQ:
1) As and when you sign up with your account, Google very generously adds $300 in your account for trial purposes. But you will still have to enter your credit card details to avail the offer
2) Now let’s understand the pricing structure, since you are putting your card details:
a. $0.2 for every 1 GB of data, where first 10 GB is free (Active Storage)
b. $0.1 for every 1 GB of data, where first 10 GB is free (Long term storage)
c. $0.1 for every 200 MB of data, in case of streaming inserts, where you will be charged only for successful insertion of data
d. $5 for every 1 TB of data for querying, where first 1 TB is free.
3) For more details please go through this link
Considering you have just created a project, you will have to enable billing for your project:
1) Click on top left corner menu button to open a navigation panel and then click on billing-
2) Create a new billing account, enter your details, including the card details and finally click on Submit and Enable billing for your GCP project.
Since BQ is already enabled on the new accounts, so you wouldn’t have to worry about enabling the BQ services.
Now that you are ready to jump into the console, again click on top left corner menu button, scroll down on the left navigation panel and click on BigQuery
And welcome to your BQ interface
Now there’s a probability that you might land in a different UI, the classic BQ UI, worry not, just click on the new UI button on top right corner and you will be there in the same console as above:
Building Blocs:
1) Dataset: These are the top Level containers used to organize and control access to your data. Tables and Views are created within datasets for us to play around with the data. You are never charged to create or update the datasets.
2) Tables: BQ contains all records organized in rows. Each record is composed of columns, also known as fields. Every table is defined by schema, column names, data types and other information. As a best practice specify the schema of the table when it is created.
3) Views: These are virtual tables defined by SQL queries. It contains data only from tables and fields as specified in the query.
Now let’s learn about how to use them:
Datasets: Whenever you create a dataset, you need to specify it’s location. After you create the dataset, you can’t change the location, so please be very sure of the location. If you are querying an external dataset, like Google Cloud Storage, make sure that your location of both, GCS and BQ are the same.
Click on Create Dataset as in the picture above, put up the dataset ID, location, and expiration time. For learning purpose you can keep it as default, 60 days after table creation. Once all the values are set, click on the Create Dataset button as below:
Tables: Create Table within a Dataset by first selecting your dataset in the left panel, and then clicking on Create Table as shown below:
Now create a Table as you want, it could be an Empty Table, Uploaded from your local system, from Google Cloud Storage, or your Google Drive. Remember to specify the Table Schema by adding fields, their Data Types before you click on Create Table button as shown below:
Views: Creating and using views is extremely simple, just run the query that you might want to, for example, as below, you can write a simple query:
SELECT name, station_id as id, location FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations` LIMIT 1000
Here what you have to notice is that ‘bigquery-public-data’ is the project ID, ‘austin_bikeshare’ is the dataset, and ‘bikeshare_stations’ is the Table name from where we are querying.
Once you have your result, just click on saving the view as shown above. This will open a window for you to save the view in the desired location.
Querying within BQ:
Now this is the easy part, all you have to do is, know a bit of SQL, and then fly away. For example I have written a small query to join two tables as below:
Table A: ID_Table
Table B: Title
All you have to do is, upload the tables as CSV as described in the above sections, and then write a query like below:
SELECT a.id, a.first_name, a.last_name, b.title FROM `sampleproject-220310.Checkagain.id_table` a LEFT JOIN `sampleproject-220310.Checkagain.title` b on a.id = b.id
To get a LEFT OUTER Join output as:
For those who have never worked on SQL, I did create a document long time back for some friends who had no idea about SQL but wanted to learn basics. I am putting the link here, and I truly do believe that it will help you scale up.
Click here to view the SQL Doc
This is where we will take a break for this week, and next week we will write some complex queries within BQ, and will try to integrate the same with Google Analytics, SFDC and maybe your CRM data files, so that we can create a complete customer profile within the BQ platform itself for further analysis.
Hope you are liking the series, in case you want me to expand on some aspects, please do feel free to let me know through comments.