Google cloud bigquery
Darshika Srivastava
Associate Project Manager @ HuQuo | MBA,Amity Business School
Introduction
It is ideal for use cases that require ad-hoc analysis and data exploration, such as business intelligence, data warehousing, and data science. BigQuery is highly scalable, cost-effective, and provides a pay-as-you-go pricing model. Ever since the dawn of computing, and subsequently the internet, the amount of data in the world has been increasing continuously. However, in recent years, the rate of increase has shot up. According to estimates, the total data in the world today is in zettabytes, with 2.5 quintillion bytes being added to it every day! Taking that into consideration, we need something to handle this massive amount of data. Google’s offering BigQuery does just that – giving us a tool to help us work with this huge amount of data. BigQuery can be used to run very large-scale SQL queries – spanning up to billions of rows – and generate results in almost real time.
FREE Google Cloud Beginners MasterClass
?
Kickstart Your Google Cloud Journey: Free Architect MasterClass For Beginners!
BigQuery Working
The working of BigQuery can be divided into some parts like – ingestion, storage and preparation, and analysis. Here we will elaborate upon these parts.
Overall, BigQuery is a highly scalable, cost-effective, and powerful cloud data warehouse service that allows users to store and analyze large datasets using SQL-like queries.
Pre-requisite: Creating a Table
To start working with BigQuery, one needs to have a table. BigQuery tables can be of three types – Native tables, External tables, and Views. Once you have a table, you can run the various BQ operations on it, including:
These tasks can be performed by using Console, the bq command line tool, using client libraries, or by using the API method.
Once a table is created, we can get to the further steps to make use of BQ’s functionalities.
1. Ingestion (Loading Data)
Once a table has been created, the next step is to load data into the table. There are several ways to ingest data into BigQuery:
Ingestion can be done manually, as explained above, or automatically, by using Data Transfer Service.
2. Preparation & Storage
In this process, the data is stored in BigQuery, and prepared for analysis. BigQuery takes the raw data gotten in the previous step, and refines it so that it is ready for further analysis.
3. Analysis (Exporting Data)
Finally, once the data is ready in the table, it can be exported. Data can be exported from BigQuery in several ways, with a maximum of 1GB per file. The available options here are to either manually export the data or to use a service like Dataflow to automate the process. Once exported, various services like BigQuery ML and Google Data Studio can be used to analyze the data.
Also Check:?Our blog post on Google Cloud VPC.
Features of BigQuery
BigQuery is a fast, serverless data warehouse that is designed for organizations dealing with a high amount of data. From getting detailed insights from the data using its built-in Machine Learning, to analyzing petabytes of data using ANSI SQL, BigQuery offers a wide variety of features in a cost-effective pricing model. Here we look at some of those features.
1. Multicloud Functionality (BQ Omni)
BigQuery is an analytics solution that allows for data analysis across multiple cloud platforms. The USP of BigQuery is providing a novel way of analyzing data present in more than one cloud, without it costing an arm and a leg. This is in contrast to other solutions, where this always involved high egress costs to migrate data from the source. BigQuery achieves this by separating Compute and Storage components. This means that BigQuery can run the computation on the data right where it is located, without the need for moving it to a different zone for processing.
BigQuery Omni runs on Anthos clusters which are managed by Google Cloud. This allows for the secure execution of queries, even on foreign cloud platforms. BigQuery Omni comes with the following functionalities:
Also Check:?Our blog post on Google Compute Engine.
2. Built-in ML Integration (BQ ML)
BigQuery ML is used for creating and executing Machine Learning models in BigQuery using simple SQL queries. Prior to the introduction of BigQuery ML, Machine Learning on large datasets required ML-specific knowledge and programming skills. BigQuery Ml eliminated the need for that, by allowing SQL practitioners to build ML models using their existing skills. BigQuery ML can be accessed in the following four manners:
Machine Learning in BigQuery works on models, which are representations of what the ML system has learned from the data. Some of the models used in BigQuery ML include Linear regression, Binary and Multiclass Logistic regression, Matrix Factorization, Time Series, and Deep Neural Network models.
There are a number of reasons to use BigQuery ML for using Machine Learning with a cloud-based data warehouse. Some of these are:
3. Foundation for BI (BQ BI Engine)
BigQuery BI engine is an in-memory analysis solution. It is used to analyze the data stored in BigQuery with high concurrency, and response times of under a second. Being a part of the BigQuery family, it is no surprise that it comes with an SQL Interface as well. This helps it to interact with other BI tools like Looker, Tableau, Power BI, etc. It can also integrate with custom applications, and help with data exploration and analysis.
?
领英推荐
BigQuery BI Engine has the following advantages:
BigQuery BI Engine comes with a free tier that allows up to 1GB of free capacity for Data Studio users, and on-demand capacity at $0.0416 per GB afterward. Organizations with higher needs can choose to opt for flat-rate capacity pricing, which comes with monthly charges.
4. Geospatial Analysis (BQ GIS)
BigQuery Geographic Information Systems (GIS) provides information about location and mapping, which is important in a data-warehouse like BigQuery.
BigQuery GIS functions by converting latitudes and longitudes columns into geographical points. There are four types of objects when working with geospatial data:
The final visualization of the BigQuery data can be done by using one of the following:
5. Automated Data Transfer (BQ Data Transfer Service)
BigQuery Data Transfer is a service that automates the movement of data into BigQuery on a regular basis. This schedule can be managed by the analytics team in a simple manner, without the need for any coding. One can also add data backfills to make up for any gaps or outages during ingestion.
The BigQuery Data Transfer Service can be accessed in three ways:
As of now, BigQuery Data Transfer allows for data to be imported from a variety of Google-owned services like Cloud Storage, Google Play, Google Ads, and YouTube Channel reports. One can also import the data from external cloud storage services, like Amazon S3. Data warehouses like Teradata and Amazon Redshift can also be used to load data into BigQuery using the Data Transfer service.
6. Free Access (BQ Sandbox)
While all the features of BigQuery mentioned above are very useful, sometimes a user might want more experience with a said feature before investing in it. For that purpose, Google has the BigQuery Sandbox – a place for experiencing BigQuery and the Cloud Console without any commitment. This means no need to create a billing account or a project, or even provide credit card details. All the applications run in a separate environment, which emulates the one provided by BigQuery in Google Cloud Platform. Once the user tries the features and is satisfied, they can easily upgrade to the full BigQuery experience.
BigQuery Sandbox also comes with a set of limitations, that are applicable until the upgrade. These are:
Also Check:?Our blog post on GCP IAM.?
BigQuery Pricing
Pricing in BigQuery is done for two main components – Analysis and Storage. Apart from these, charges can be levied for streaming inserts and using the BigQuery Storage API. Let’s look at these in a little more detail.
1. Analysis Pricing
Analysis pricing includes the charges levied by Google for running queries on BigQuery. This can be set up in two ways:
1.1 On-demand Analysis This pricing model comes with a free tier of 1TB per month, following which you are charged on the basis of usage (bytes processed per query). This query may be done on data stored in BigQuery, or in an external source like Cloud Storage, Google Drive, or Bigtable. The rate after the free tier for Analysis pricing is $5.00 per TB.
1.2 Flat-rate Pricing In this model, instead of paying per byte used, the user pays a fixed sum to buy slots. These slots are essentially vCPUs, and options vary based on how long the service is meant to be used:
It goes without saying that the cost per unit of time decreases as the commitment period is increased. For instance, the cost for one month for 100 slots with a Monthly commitment would be $2000, while the same with an Annual commitment would be $1700 (per month).
2. Storage Pricing
Storage pricing is the charge levied by Google for storing all the data that is loaded into BigQuery. It can be classified into two types:
2.1 Active Storage This type of storage is for any and all data that has been modified in the past 90 days. It comes with 10 GB of free storage initially, with subsequent data charged at $0.020 per GB. This is primarily used for frequently used data.
2.2 Long-term Storage This is for data that is stored for a longer period of time. Any BigQuery table that hasn’t been accessed in 90 days is automatically put under Long-term storage, and charged accordingly. The free tier for this type is the same as Active Storage, but the charges after the free tier are lower, at $0.010 per GB.
3. Data Ingestion & Extraction
Two other BigQuery components incur charges. These are data ingestion and extraction.
3.1 Ingestion Data ingestion in BigQuery can be done in two ways – through batch loading, and streaming. While the former is free of cost as long as a shared slot pool is used, the latter is billed on usage at $0.010 per 200MB of data.
3.2 Extraction Similar to data ingestion, data extraction can also be done in two ways – batch exports, and streaming. Again, the batch exports are free as long as a shared slot pool is used, but streaming is charged at $1.10 per TB of data.
Cloud Data Warehouse
A Cloud Data Warehouse is a type of data storage and management system that is hosted in the cloud and designed to handle large amounts of data from various sources. Google Cloud Platform provides a fully-managed, cloud-native data warehouse service called Google BigQuery.
Google BigQuery is a cloud data warehouse service that allows organizations to store, process, and analyze large amounts of data in real time. It is a serverless, fully-managed service that provides a scalable and cost-effective solution for data warehousing needs. BigQuery allows users to store data in tables and provides a range of tools to analyze data, including SQL-like queries, data visualization tools, and machine learning APIs.