Google cloud bigquery

Google cloud bigquery

Introduction

It is ideal for use cases that require ad-hoc analysis and data exploration, such as business intelligence, data warehousing, and data science. BigQuery is highly scalable, cost-effective, and provides a pay-as-you-go pricing model. Ever since the dawn of computing, and subsequently the internet, the amount of data in the world has been increasing continuously. However, in recent years, the rate of increase has shot up. According to estimates, the total data in the world today is in zettabytes, with 2.5 quintillion bytes being added to it every day! Taking that into consideration, we need something to handle this massive amount of data. Google’s offering BigQuery does just that – giving us a tool to help us work with this huge amount of data. BigQuery can be used to run very large-scale SQL queries – spanning up to billions of rows – and generate results in almost real time.


FREE Google Cloud Beginners MasterClass

?

Kickstart Your Google Cloud Journey: Free Architect MasterClass For Beginners!

ENROLL FOR FREE

BigQuery Working

The working of BigQuery can be divided into some parts like – ingestion, storage and preparation, and analysis. Here we will elaborate upon these parts.

  1. Data ingestion: Data can be ingested into BigQuery from a variety of sources such as Google Cloud Storage, streaming data, and direct uploads.
  2. Data storage: Once data is ingested, it is stored in tables in a columnar format. BigQuery stores data in a distributed manner, automatically sharing the data across multiple nodes in the cloud.
  3. Query processing: Users can write SQL-like queries to retrieve data from the tables. BigQuery uses a distributed architecture to parallelize the query processing across multiple nodes in the cloud.
  4. Cost management: BigQuery provides various features to manage costs such as automatic scaling, caching, and on-demand pricing. This allows users to control their costs while still analyzing large amounts of data.
  5. Results output: The results of a query can be outputted to various destinations such as Google Sheets, BigQuery tables, and other data sources.
  6. Data visualization and reporting: Users can use various tools such as Google Data Studio to visualize and report on the data in BigQuery.

Overall, BigQuery is a highly scalable, cost-effective, and powerful cloud data warehouse service that allows users to store and analyze large datasets using SQL-like queries.


Pre-requisite: Creating a Table

To start working with BigQuery, one needs to have a table. BigQuery tables can be of three types – Native tables, External tables, and Views. Once you have a table, you can run the various BQ operations on it, including:

  • Listing all tables in a dataset
  • Getting information about the table(s)
  • Controlling access to the table data
  • Getting the table metadata

These tasks can be performed by using Console, the bq command line tool, using client libraries, or by using the API method.

Once a table is created, we can get to the further steps to make use of BQ’s functionalities.

1. Ingestion (Loading Data)

Once a table has been created, the next step is to load data into the table. There are several ways to ingest data into BigQuery:

  • Batch loading a set of data records
  • Streaming individual records
  • Using queries to generate new data to append to the table
  • Using a third-party application or service

Ingestion can be done manually, as explained above, or automatically, by using Data Transfer Service.

2. Preparation & Storage

In this process, the data is stored in BigQuery, and prepared for analysis. BigQuery takes the raw data gotten in the previous step, and refines it so that it is ready for further analysis.

3. Analysis (Exporting Data)

Finally, once the data is ready in the table, it can be exported. Data can be exported from BigQuery in several ways, with a maximum of 1GB per file. The available options here are to either manually export the data or to use a service like Dataflow to automate the process. Once exported, various services like BigQuery ML and Google Data Studio can be used to analyze the data.

Also Check:?Our blog post on Google Cloud VPC.

Features of BigQuery

BigQuery is a fast, serverless data warehouse that is designed for organizations dealing with a high amount of data. From getting detailed insights from the data using its built-in Machine Learning, to analyzing petabytes of data using ANSI SQL, BigQuery offers a wide variety of features in a cost-effective pricing model. Here we look at some of those features.

1. Multicloud Functionality (BQ Omni)

BigQuery is an analytics solution that allows for data analysis across multiple cloud platforms. The USP of BigQuery is providing a novel way of analyzing data present in more than one cloud, without it costing an arm and a leg. This is in contrast to other solutions, where this always involved high egress costs to migrate data from the source. BigQuery achieves this by separating Compute and Storage components. This means that BigQuery can run the computation on the data right where it is located, without the need for moving it to a different zone for processing.

BigQuery Omni runs on Anthos clusters which are managed by Google Cloud. This allows for the secure execution of queries, even on foreign cloud platforms. BigQuery Omni comes with the following functionalities:

  • Break down silos and gain insights into the data
  • Get consistent data experience across clouds
  • Enable flexibility, courtesy of Anthos


Also Check:?Our blog post on Google Compute Engine.

2. Built-in ML Integration (BQ ML)

BigQuery ML is used for creating and executing Machine Learning models in BigQuery using simple SQL queries. Prior to the introduction of BigQuery ML, Machine Learning on large datasets required ML-specific knowledge and programming skills. BigQuery Ml eliminated the need for that, by allowing SQL practitioners to build ML models using their existing skills. BigQuery ML can be accessed in the following four manners:

  1. Google Cloud Console
  2. bq command-line-tool
  3. BigQuery REST API
  4. An external tool (for example Jupyter)

Machine Learning in BigQuery works on models, which are representations of what the ML system has learned from the data. Some of the models used in BigQuery ML include Linear regression, Binary and Multiclass Logistic regression, Matrix Factorization, Time Series, and Deep Neural Network models.

There are a number of reasons to use BigQuery ML for using Machine Learning with a cloud-based data warehouse. Some of these are:

  • No need for data export leading to a higher speed of model development
  • No need to program ML solution using Python or Java (as shown in the image above)
  • Ability to build and run ML models using existing BI tools & spreadsheets

3. Foundation for BI (BQ BI Engine)

BigQuery BI engine is an in-memory analysis solution. It is used to analyze the data stored in BigQuery with high concurrency, and response times of under a second. Being a part of the BigQuery family, it is no surprise that it comes with an SQL Interface as well. This helps it to interact with other BI tools like Looker, Tableau, Power BI, etc. It can also integrate with custom applications, and help with data exploration and analysis.

?

BigQuery BI Engine has the following advantages:

  • Speed Very fast response times and load times help with data analysis over streaming data
  • Simplicity Performs in-place analysis within BigQuery which eliminates the need for ETL pipelines
  • Ease of use The smart tuning design ensures that there are very few configuration settings from the user’s end

BigQuery BI Engine comes with a free tier that allows up to 1GB of free capacity for Data Studio users, and on-demand capacity at $0.0416 per GB afterward. Organizations with higher needs can choose to opt for flat-rate capacity pricing, which comes with monthly charges.


4. Geospatial Analysis (BQ GIS)

BigQuery Geographic Information Systems (GIS) provides information about location and mapping, which is important in a data-warehouse like BigQuery.

BigQuery GIS functions by converting latitudes and longitudes columns into geographical points. There are four types of objects when working with geospatial data:

  • geometry This represents an area on the surface of the earth
  • spatial feature This represents a logical spatial object
  • spatial feature collection This is a set of spatial features


The final visualization of the BigQuery data can be done by using one of the following:

  • BigQuery Geo Viz
  • Google Earth Engine
  • Jupyter notebooks (using extension)

5. Automated Data Transfer (BQ Data Transfer Service)

BigQuery Data Transfer is a service that automates the movement of data into BigQuery on a regular basis. This schedule can be managed by the analytics team in a simple manner, without the need for any coding. One can also add data backfills to make up for any gaps or outages during ingestion.

The BigQuery Data Transfer Service can be accessed in three ways:

  • Cloud Console
  • bq command-line-tool
  • BigQuery Data Transfer Service API

As of now, BigQuery Data Transfer allows for data to be imported from a variety of Google-owned services like Cloud Storage, Google Play, Google Ads, and YouTube Channel reports. One can also import the data from external cloud storage services, like Amazon S3. Data warehouses like Teradata and Amazon Redshift can also be used to load data into BigQuery using the Data Transfer service.


6. Free Access (BQ Sandbox)

While all the features of BigQuery mentioned above are very useful, sometimes a user might want more experience with a said feature before investing in it. For that purpose, Google has the BigQuery Sandbox – a place for experiencing BigQuery and the Cloud Console without any commitment. This means no need to create a billing account or a project, or even provide credit card details. All the applications run in a separate environment, which emulates the one provided by BigQuery in Google Cloud Platform. Once the user tries the features and is satisfied, they can easily upgrade to the full BigQuery experience.

BigQuery Sandbox also comes with a set of limitations, that are applicable until the upgrade. These are:

  • All BigQuery quotas and limits are applicable
  • Free usage limits are the same as BQ free tier, and can’t be upgraded
  • All datasets have the default expiration time (60 days)
  • Streaming data, DML statements, and Data Transfer Services are not supported

Also Check:?Our blog post on GCP IAM.?

BigQuery Pricing

Pricing in BigQuery is done for two main components – Analysis and Storage. Apart from these, charges can be levied for streaming inserts and using the BigQuery Storage API. Let’s look at these in a little more detail.

1. Analysis Pricing

Analysis pricing includes the charges levied by Google for running queries on BigQuery. This can be set up in two ways:

1.1 On-demand Analysis This pricing model comes with a free tier of 1TB per month, following which you are charged on the basis of usage (bytes processed per query). This query may be done on data stored in BigQuery, or in an external source like Cloud Storage, Google Drive, or Bigtable. The rate after the free tier for Analysis pricing is $5.00 per TB.

1.2 Flat-rate Pricing In this model, instead of paying per byte used, the user pays a fixed sum to buy slots. These slots are essentially vCPUs, and options vary based on how long the service is meant to be used:

  • Flex – Commitment of initial 60 seconds
  • Monthly – Commitment of initial 30 days
  • Annual – Commitment of initial 365 days

It goes without saying that the cost per unit of time decreases as the commitment period is increased. For instance, the cost for one month for 100 slots with a Monthly commitment would be $2000, while the same with an Annual commitment would be $1700 (per month).

2. Storage Pricing

Storage pricing is the charge levied by Google for storing all the data that is loaded into BigQuery. It can be classified into two types:

2.1 Active Storage This type of storage is for any and all data that has been modified in the past 90 days. It comes with 10 GB of free storage initially, with subsequent data charged at $0.020 per GB. This is primarily used for frequently used data.

2.2 Long-term Storage This is for data that is stored for a longer period of time. Any BigQuery table that hasn’t been accessed in 90 days is automatically put under Long-term storage, and charged accordingly. The free tier for this type is the same as Active Storage, but the charges after the free tier are lower, at $0.010 per GB.

3. Data Ingestion & Extraction

Two other BigQuery components incur charges. These are data ingestion and extraction.

3.1 Ingestion Data ingestion in BigQuery can be done in two ways – through batch loading, and streaming. While the former is free of cost as long as a shared slot pool is used, the latter is billed on usage at $0.010 per 200MB of data.

3.2 Extraction Similar to data ingestion, data extraction can also be done in two ways – batch exports, and streaming. Again, the batch exports are free as long as a shared slot pool is used, but streaming is charged at $1.10 per TB of data.


Cloud Data Warehouse

A Cloud Data Warehouse is a type of data storage and management system that is hosted in the cloud and designed to handle large amounts of data from various sources. Google Cloud Platform provides a fully-managed, cloud-native data warehouse service called Google BigQuery.

Google BigQuery is a cloud data warehouse service that allows organizations to store, process, and analyze large amounts of data in real time. It is a serverless, fully-managed service that provides a scalable and cost-effective solution for data warehousing needs. BigQuery allows users to store data in tables and provides a range of tools to analyze data, including SQL-like queries, data visualization tools, and machine learning APIs.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了