ASP.NET Core API + Google BigQuery
Hello everyone!
In this article I’m going to talk about how to create an ASP.NET Core API to connect into Google BigQuery and add/read data.
But, before we dive into the code, let’s see what is that:
“ Google BigQuery is an Enterprise Data Warehouse”, according to https://cloud.google.com/bigquery/what-is-bigquery.
Still according to it, you can store and query massive data sets through “super-fast SQL queries with the power of Google’s infrastructure processing. Just pass your data to BigQuery, and it will do the hard work. Control access to the project and your data according to the needs of your business, such as granting access to other users to view or query data.
You can access BigQuery using the Web UI or the command line tool or by calling the BigQuery REST API with several client libraries such as Java, .NET, or Python. There are also several third-party tools that you can use to interact with BigQuery to view or load data.
BigQuery is fully managed. To get started, you do not need to deploy any features, such as disks and virtual machines.”
You can check more informations on: https://cloud.google.com/bigquery/.
Now that you have a brief notion about BigQuery, that I’m going to call only “BQ”, let’s get started and see how to access it from an ASP.NET Core API.
Starting with BQ
You can find several tutorials on https://cloud.google.com/bigquery/docs/tutorials about BQ, how to use it, configure it, etc.
Once you have it configured and you want to access it from a .NET application, then you will need to use a client library called Google.Cloud.BigQuery.
In this link: https://cloud.google.com/bigquery/docs/reference/libraries, you can find how to configure the authentication and download a JSON file, with all the authentication data. You can use an environment variable as the link above says or you can use the JSON file “manually” for authentication. We are going to use the second approach. The first one you can check in the link!
Starting the project
Create a basic API project. In this case I’m using .NET Core 2.0 and Visual Studio.
Go to the window “Manage Nuget Packages…” and add the package Google.Cloud.BigQuery.V2 as shown in the image below:
The idea of the example is to have a place to store data about some files. Information like Id, Category and Status. So create a class File.cs:
public class File
{
public string Id { get; set; }
public string Category { get; set; }
public string Status { get; set; }
}
We need a base class to contain the base methods for access to BQ, so create an interface IBQ and a class BQ and cofigure the dependency injection inside the ConfigureServices method in the class Startup.cs:
services.AddTransient<IBQ, BQ>();
Also, create a BQController and configure its route to be “api/big-query”.
And put that JSON file for authentication inside the folder “wwwroot” and rename it to “bq-secrets.json”.
Adding data to BQ
First thing we need is to authenticate and create a BigQueryClient object. So add a new method “BigQueryClient GetBigqueryClient();” into the interface IBQ and implement it in the class BQ.
Besides that, it will be necessary to include a constructor to that class, then inject IHostingEnvironment to allow you access the JSON file:
const string PROJECT_ID = "YOUR_PROJECT_ID";
readonly IHostingEnvironment _hostingEnvironment;
public BQ(IHostingEnvironment hostingEnvironment)
{
_hostingEnvironment = hostingEnvironment;
}
public BigQueryClient GetBigqueryClient()
{
var config = Path.Combine(_hostingEnvironment.WebRootPath, "bq-secrets.json");
GoogleCredential credential = null;
using (var jsonStream = new FileStream(config, FileMode.Open, FileAccess.Read, FileShare.Read))
credential = GoogleCredential.FromStream(jsonStream);
return BigQueryClient.Create(PROJECT_ID, credential);
}
The value “YOUR_PROJECT_ID” above, you can get from your JSON file.
In your BQController, inject the interface IBQ and create a POST endpoint that receives a list of your files.
Then call that “GetBigqueryClient” method and with the gotten client instance, build the desired table structure in memory.
For each file, add to the “table in memory” a “BigQueryInsertRow” with the given information and fields, following the defined structure.
And finally, you have two options:
a- add row by row by using the method “table.InsertRow”.
b- add several rows at once by using the method “table.InsertRows”.
That’s nice because in scenarios where you have a big amount of rows to add, you can control the insertion for each X number of rows, so you avoid crashing something and improves performance as well. In the example I show both cases:
private readonly IBQ _bigQuery;
public BQController(IBQ bigQuery)
{
_bigQuery = bigQuery;
}
[HttpPost]
public IActionResult Post([FromBody]List<File> files)
{
var client = _bigQuery.GetBigqueryClient();
var dataset = client.GetOrCreateDataset("YOUR DATASET");
var table = dataset.GetOrCreateTable("YOUR TABLE", new TableSchemaBuilder
{
{ "fileid", BigQueryDbType.String },
{ "category", BigQueryDbType.String },
{ "status", BigQueryDbType.String }
}.Build());
var bqRows = new List<BigQueryInsertRow>();
files.ToList().ForEach(f => bqRows.Add
(
new BigQueryInsertRow
{
{ "fileid", Guid.NewGuid().ToString() },
{ "category", f.Category },
{ "status", f.Status }
}));
if (bqRows.Count > 1)
table.InsertRows(bqRows);
else if (bqRows.Count == 1)
table.InsertRow(bqRows[0]);
return Ok($"{bqRows.Count} rows have been added.");
}
You can use Postman (https://www.getpostman.com/) to send data to the API, or any other tool, and then check your data into BQ via the Web UI, as shown below:
It may be necessary to create the dataset and table on BQ, manually, before you are able to definitely run the code above.
Querying data on BQ
Now, we want to get data from BQ: the data we have added before.
Add a new method “List<TableRow> GetRows(string query);” into the interface IBQ and implement it in the class BQ.
First part of this method will receive a query, get the BQ client, get a job resource and execute the query. Only those steps would be enough to get result from BQ in a simple example like this, but I wanted to take a step forward.
If I’m not wrong, this first execution will wait around 10 seconds. If you had a query that would take more than that to return results, so no results would be returned , the execution would be stopped and you would not get what you intended!
How could we solve this situation? The example shows how. After you execute the query, you check if the job has been completed. If not, with the job id you are able to get query results and check again if the job has been completed. And you can do it over and over again, until it’s complete. Then just after that you return the result. Remember, BQ is about really big amount of data!
public List<TableRow> GetRows(string query)
{
var bqClient = GetBigqueryClient();
var response = new List<TableRow>();
var jobResource = bqClient.Service.Jobs;
var qr = new QueryRequest() { Query = query };
var queryResponse = jobResource.Query(qr, PROJECT_ID).Execute();
if (queryResponse.JobComplete != false)
{
return queryResponse.Rows == null
? new List<TableRow>()
: queryResponse.Rows.ToList();
}
var jobId = queryResponse.JobReference.JobId;
var retry = true;
var retryCounter = 0;
while (retry && retryCounter < 50)
{
Thread.Sleep(1000);
var queryResults = bqClient.Service.Jobs.GetQueryResults(PROJECT_ID, jobId).Execute();
if (queryResults.JobComplete != true)
{
retryCounter++;
continue;
}
if (queryResults.Rows != null)
response = queryResults.Rows.ToList();
retry = false;
}
return response;
}
In the BQController add a new GET endpoint where you define your query, call the method “GetRows” and return the data. Note that the library have a particular structure to its table rows, so we use some constants as indexes to make the code better readable.
[HttpGet]
public IActionResult Get()
{
var query = "YOUR QUERY HERE, FOLLOWING THE FIELDS ORDER DEFINED ABOVE";
var rows = _bigQuery.GetRows(query);
var result = new List<File>();
rows.ForEach(row => result.Add(new File
{
Id = row.F[ORDER_ID].V.ToString(),
Category = row.F[ORDER_CATEGORY].V.ToString(),
Status = row.F[ORDER_STATUS].V.ToString()
}));
return Ok(result);
}
You can use Postman or any other tool to test it!
Conclusion
BigQuery is really powerful! And it’s very easy to interact with it in C# because of the client library that I mentioned above.
I have seen BQ being used in scenarios where the company have several relational databases: some of them are Oracle dbs, others are SQL Server dbs, and a new system should get data from all those databases. So, they imported data from those databases to BQ, and we have a single place to look at. Okay, in some situations this approach wouldn’t be a good idea, but in those cases I mention here it was a good idea.
And the sky’s the limit!
The entire project that I have created for this example you can check in: https://github.com/gabrielfbarros/aspnetcore-bigquery
If you have other approaches to this combination of technologies, please let us know, leave your notes!
I hope I could help you!
Bye!
Attended Z? T.G.M. Milovice
1 年Daj mi pls kontakt na modrú velerybu jak sa dosta? do tej hry prosím