Google is wrong; BigQuery is SaaS (not PaaS)
Google data platform overview

Google is wrong; BigQuery is SaaS (not PaaS)

I work for Google. Google is wrong.

BigQuery isn’t PaaS. It’s SaaS.

Classifying BigQuery as SaaS or PaaS may seem irrelevant semantics. Far from it. It is fundamental to how organisations evaluate data solution needs. Incorrectly treating BigQuery as PaaS may deprive your organisation of best-in-class capabilities for your business objective of insight generation (and so business impact).

Even with reference to our competitor, Snowflake (emphasis added):

Familiar SaaS platforms include CRM, marketing automation, storage solutions, and of course, cloud data warehousing. SaaS apps reduce total-cost-of-ownership (TCO) by eliminating most software maintenance resources and upgrade costs. In addition, SaaS solutions hit the OpEx budget, not the CapEx”

BigQuery is cloud data warehousing. BigQuery eliminates software maintenance resources and upgrade costs. Also, BigQuery is typically OpEx. By their definition, BigQuery is SaaS.

Some may object to BigQuery being SaaS. Whilst BigQuery has zero customer provisioning of resources (whether compute or storage or connectivity) and no upgrade activities, some would class it as PaaS. Even we (Google) sometimes call BigQuery PaaS!

This is incorrect. It stems from misunderstanding the nature of the end user. In business applications (e.g. HubSpot, Salesforce or Zendesk), end users do not need technical skills. But cloud data warehouse end users often require technical skills (although tools like BigQuery Data Canvas reduce this). But technical end users does not automatically mean the software is PaaS.

Furthermore, BigQuery has two high-level categories of usage; the preparation of data and the analysis of that data. Users undertaking the former (e.g. data engineers) are responsible for ingestion, cleansing and transformation of data. They are akin to specialists that configure SaaS business applications - preparing them for end users to derive value. Users responsible for the latter (e.g. data analysts) may write SQL or use visual tools (like Looker) to derive value - akin to business end users executing processes in SaaS applications. In all cases, those individuals are using the software - not developing it. SaaS, not PaaS.

Whether BigQuery or other SaaS applications, a key constraint is their predefined end user interface. A new application is not developed; rather processes are being configured for organisational specifics. BigQuery focussed on data-to-insight workflows, other SaaS applications on business process workflows. No new application or software is created.

To review; BigQuery has all the characteristics of a SaaS application. It is instance-less and version-less (automatic upgrades), requires no infrastructure administration and is used via a constrained user interface (including APIs) - albeit one that allows technical code-based interaction. BigQuery is SaaS. Unconvinced? Access our SaaS BigQuery Sandbox now.


So BigQuery is SaaS. Why care?

Because you must evaluate and decide on data solutions as you would any other business application. Like HubSpot, Salesforce, Shopify, Workday or Zendesk. You should focus on critical factors like functionality, business impact, performance, security, ease of implementation, productivity gains and total cost of ownership.

Conversely, you should not constrain your options to your existing infrastructure provider (whether on-premises or cloud hyperscaler). I have yet to see an organisation constrain a SaaS shortlist to vendors aligned with their cloud infrastructure service provider. Indeed, the underlying infrastructure provider for the SaaS solution is nearly always irrelevant.

BigQuery happens to be built on Google Cloud Platform (GCP). Like many other third-party SaaS solutions. You wouldn’t eliminate those for running on GCP. Nor should you eliminate BigQuery for the same reason.

Yet I also see three common - but invalid - justifications for not considering BigQuery when a non-GCP cloud hyperscaler is the infrastructure provider of choice; data gravity (centralisation), integration and egress (costs).

Data gravity (centralisation) assumes all an organisation’s data is already present in their preferred infrastructure cloud hyperscaler. This is rarely the case. First, we know businesses use a variety of SaaS offerings - each in a different tenant. Second, on-premises systems and data silos abound (including standalone spreadsheets and similar reference sources). Third, firms increasingly gain insights from harnessing third-party data sets (e.g. credit bureau, satellite imagery, weather). In my experience, more of the data an organisation analyses sits outside their infrastructure cloud hyperscaler tenant. Even if not, the underlying assumption is that centralisation makes things much easier. It does not, as we will show.

Integration assumes added complexity when retrieving data from a source within the preferred infrastructure cloud hyperscaler to a sink outside of that cloud hyperscaler. Again, this is mistaken. Integration for data (like CDC, ELT or ETL) sits above the infrastructure layer. That is, the complexity of retrieving data from a business application (e.g. SAP) or database (e.g. Oracle) is very similar whether the source runs within the same or another cloud hyperscaler. Granted, there may be some (normally minor) one-time cloud-to-cloud connectivity during implementation - not an evaluation blocker. But this does not recur for every data source. Integration is to be expected, like any other SaaS project.

Egress is the final invalid justification. It means charges levied when sending data out of a cloud hyperscaler. Often it is far lower than expected - data is typically moved once (an initial load followed by subsequent changes). It is also a simple calculation. A factor in your total cost of ownership. But it does not change solution complexity. Its presence should not predetermine your decision - especially when other costs are usually far more substantive.

Focus on securing best-in-class technology for your organisation. Don’t let invalid justifications constrain your acquisition of capabilities for maximal business value.


To simplify; if you undertake a thorough evaluation of our data solution and decide our business impact, functionality and total cost of ownership don’t work for your organisation, we should all respect that outcome. Just don’t let your choice of infrastructure cloud hyperscaler impede your ability to adopt our best-in-class data solutions.

Zubin Limbuvala

Head of Data Strategy - GFT Technologies UK

6 个月

I agree with you Duncan

Trevor Stratton CISSP, CRISC, CISM, CASP, SEC

Vice President, SecurityRiskAssessment.com, Risk Assessments for Canada and USA

6 个月

As much as I would like to agree with you, I think in this case I will need to side with google! Snowflake, as much as their a leading competitor in the data Warehousing and Data Analytics space, I don't believe they are an authority in defining cloud computing service models for the entire industry. Nist SP 800-145 from 2011 has definitions for both SaaS and PaaS platforms in which from your article I think the more closely aligns Google's Big Query's studio to a PaaS service. I'm not an expert in the google product, and very possible there's a key piece of information I'm leaving out to more closely define it to the SaaS definition.

Juan Urrego

Leadership, Data Engineering, Reactive Systems, AI

6 个月

I think it started as a PaaS and now with all the services on top of if it has become a SaaS solution. So I do agree with you, and also I believe that it’s in fact the reason of being a SaaS data warehousing/ lakehousing / ML solution that makes it unique

Yacine Ahbar

Data & AI Expert

6 个月

Interesting discussion …. do you also consider Database as a Service offerings SaaS ? In my opinion SaaS products natively deliver some kind of business process implementation.

Anouar Hnini

Solution Architecture @ dbt Labs

6 个月

totally agree.

要查看或添加评论,请登录

Duncan Foster的更多文章

  • The Error of Data Gravity Dictums

    The Error of Data Gravity Dictums

    For over a decade, Data Gravity has been a byword for the agglomeration of data stores (primarily analytic stores, like…

    2 条评论
  • Federate before you Replicate

    Federate before you Replicate

    Far more organisations bring data into a central data location (whether data warehouse, data lake or data lakehouse)…

  • Data must break free from IT chains

    Data must break free from IT chains

    Often, Data is misconstrued as a subset of IT. This is wrong.

    2 条评论
  • Bad Fashion: Open Data Lakehouses

    Bad Fashion: Open Data Lakehouses

    A great number of companies I speak with are enthusiastic about adopting the open data lakehouse pattern. Some already…

    2 条评论
  • Suicidal AGI: Truly Terrifying

    Suicidal AGI: Truly Terrifying

    Recent AI advances revitalised interest in AI Alignment and control, with Existential Risk from AGI capturing…

    2 条评论
  • The best LLM? The platform

    The best LLM? The platform

    The best LLM today no longer matters. ChatGPT went viral in November 2022 and captured the world’s imagination.

    3 条评论
  • Over-Building: The Tech Firm Failure

    Over-Building: The Tech Firm Failure

    TLDR; Just because you can, doesn't mean you should Startups face many constraints. Time, money, resource.

    2 条评论

社区洞察

其他会员也浏览了