登录查看更多内容

Big Data Science is not Targeting the Right Audience

Erik Tromp

Future Club makes AI work for you

发布日期: 2016年1月17日

Big data science (yes I am merging two fields into one here) is booming business and an exciting field for many companies to step into. Tech startups are popping up like mushrooms and the big tech vendors as well as consultancies are nowadays rapidly integrating with all the popular big data and data science tools too. Most of these big data science tools and consultancies out there are focusing on the wrong thing though – let me explain why.

Big Data Science Status Quo

It is safe to say that big data lends most of success due to the introduction of distributed compute platforms such as Hadoop and Spark as well as due to distributed data stores such as HDFS, Cassandra and MongoDB – to name some of the most popular ones. Getting your data in place righteously is usually just the beginning however, and analysis platforms and languages such as R, Spark MLLib and Python’s SciKit – again, just to name a few – made (and are still making) the data science field increasingly popular.

The aforementioned technologies all have one thing in common; they require (technical) experts to use and operate them. While one might argue that R can be operated by less technically educated people than those writing Spark jobs in Scala, R is still just a programming language and not usable to the majority of business users. And this is wrong!

What About Other Fields?

Big data science is often regarded as a natural sequel to BI (business intelligence). With the volumes of data and the speed at which we want to query it, BI is hitting its limits and big data science is to the rescue to help us overcome these limits. We can learn a big deal from BI that big data science tool vendors and developers seem to overlook though.

Big data science tools and software are largely open source and free to use, though without any warranties. In contrast, most BI tools are commercially backed by big vendors that license their software for fees that are regarded as expensive by many. This however, is not a bad thing as such. The current landscape of BI tooling is such that actual analysis – be it dashboarding, modeling or other forms of gaining insights from data – is accessible even to business users. In reality, I have even experienced projects where a Chief Marketing Officer was using Tableau himself to base his strategy on. And he was able to do this correctly too. This is largely due to the fact that the commercial vendors focus on having businesses adopt their technology at large.

Big Data Science’s Flaw

The reason why BI got adopted by so many businesses is partly because BI has matured into a field that can be understood by a range of people far broader than the technical experts once controlling the domain. Business users that understand the application of their own data analyses well, but lack the skills required to dive into technical aspects of getting to the result of such a data analysis, are still capable of going from input to output due to the ease of use of BI tooling. This exact aspect is lacking in a big way for big data science.

As mentioned, big data science tools and technology is first and foremost driven by open source communities, recently being backed more and more by commercial third-parties. Tooling developed in open source settings focus mainly on functionality and getting things working and right in the first place, which is a sensible thing to do. The downside of this is that user experience and ease of use is not first priority. The result of this is that while big data science technology possesses huge potential, it’s secrets to unlocking remain with the few that are capable of using it.

Separating Implementation from Logic

Dealing with implementation details – a technical party more than anything – means that as a big data scientist, you are not able to focus on the thing that is most important to the job you are performing; getting the logic right. Why would we want to be bothered with writing a map and a reduce function? All that matters is that things get done the way we want it to – preferably in a decently fast way.

A key element to the success of getting rid of constantly dealing with implementation details, is to separate it from the actual logic. This is exactly what BI tools like Tableau and QlikView are doing and what should be done in a big data science setting too – the field is mature enough for it by now. Allowing business users and decision makers to truly understand what is happening in a data analysis, or even better; give them the tools to define the logic by themselves is what will make big data science available to the masses rather than to a small audience of technical experts.

Providing the necessary functionality to business users in a platform that hides the technical details in such a way that big data science aspects like machine learning and concurrent processing become readily available is not easy though.

At UnderstandLing, we focus on making often non-technical decision makers aware and understand analyses on their own data. This creates more vivid discussions and makes business cases succeed more easily because they are understood. Our technology focuses on ease of use, time to action and understandability, allowing for the rapid prototyping that decision makers are demanding from big data science. Not a single line of actual code has to be written to apply complex concepts such a predictive modeling – all the way to deep learning – and integrating your own internal data sources with new, opportunity-rich, external data sources in a matter of seconds.

For the ones ready to take a jump forward, check out our technology; Tuktu or contact us to help you implement and understand big data science.

要查看或添加评论，请登录

Erik Tromp的更多文章

How Gen Z is reshaping today's job market

2023年10月10日

How Gen Z is reshaping today's job market

Generation Z is making its mark on the job market, transforming the employer-employee relationship as we know it. This…
Staffing 2.0 - Programmatic Matching

2018年8月27日

Staffing 2.0 - Programmatic Matching

Wow! I have been so occupied with #Personality #Match that I hardly had time to blog about it and explain to the world…

1 条评论
PersonalityMatch on ProductHunt!

2018年5月7日

PersonalityMatch on ProductHunt!

We call it matching 2.0! Blending in personality driven by AI to make the old-fashioned recruitment and staffing…
Fake News Detection

2018年4月11日

Fake News Detection

With the recent developments of fake news playing a role in Trump’s elections, Cambridge Analytica using it to great…
‘Een pakket dat geautomatiseerd bedrijven helpt’

2017年12月25日

‘Een pakket dat geautomatiseerd bedrijven helpt’

Het kennen van je klant zou het uitgangspunt moeten zijn voor elk bedrijf. Met de opkomst van big data wordt het steeds…

1 条评论
Deriving Personality Traits from Text

2017年8月19日

Deriving Personality Traits from Text

If you’d ask me, one the most compelling fields in language processing is that of authorship profiling. In this field…
The (Non-)Sense of Word Vectors (2/2)

2017年8月11日

The (Non-)Sense of Word Vectors (2/2)

This is the second part in a two-series blog. Read the first part here.
The (Non-)Sense of Word Vectors

2017年8月3日

The (Non-)Sense of Word Vectors

In this new blog post we explore the power of word vectors as many claim. We show the boundaries of what they are…
Topic Classification – Bridging Topic Modelling and Text Classification

2017年7月26日

Topic Classification – Bridging Topic Modelling and Text Classification

Processing human language is a wide field with many aspects that can be of interest. One of such aspects is to find out…
The Need to Know Your Customer

2016年2月13日

The Need to Know Your Customer

The field of customer experience monitoring is a booming business, just google the term and you will be overloaded with…

1 条评论

See all articles

Big Data Science is not Targeting the Right Audience

Erik Tromp

Future Club makes AI work for you

Big Data Science Status Quo

What About Other Fields?

Big Data Science’s Flaw

Separating Implementation from Logic

Erik Tromp的更多文章

社区洞察

其他会员也浏览了

SQL in Data Science: Why It’s Still Essential in 2025

Real-Time Data Engineering Challenges in Databricks: How to Overcome Common Pain Points with PySpark

Mastering the Technical Stacks: A Guide for Data & Analytics Professionals

Building a Universal Data Lake with EMR Serverless: Hands-On Labs for Querying with Snowflake, Athena, and Spark – A Guide for Beginners, Leaders

Tackling the “Large Number of Small Files” Problem in Spark

Why Data Scientists Should Add Google BigQuery to Their Skillset

Musings on Data, Part 1: lakes, houses, clouds, etc.

75 Big Data Terms To Make Your Father Proud

Big Data Computation: Revolutionizing the Digital World

Big Data Science Status Quo

What About Other Fields?

Big Data Science’s Flaw

Separating Implementation from Logic

Erik Tromp的更多文章

How Gen Z is reshaping today's job market

Staffing 2.0 - Programmatic Matching

PersonalityMatch on ProductHunt!

Fake News Detection

‘Een pakket dat geautomatiseerd bedrijven helpt’

Deriving Personality Traits from Text

The (Non-)Sense of Word Vectors (2/2)

The (Non-)Sense of Word Vectors

Topic Classification – Bridging Topic Modelling and Text Classification

The Need to Know Your Customer

社区洞察

其他会员也浏览了

SQL in Data Science: Why It’s Still Essential in 2025

Real-Time Data Engineering Challenges in Databricks: How to Overcome Common Pain Points with PySpark

Mastering the Technical Stacks: A Guide for Data & Analytics Professionals

Building a Universal Data Lake with EMR Serverless: Hands-On Labs for Querying with Snowflake, Athena, and Spark – A Guide for Beginners, Leaders

Tackling the “Large Number of Small Files” Problem in Spark

Why Data Scientists Should Add Google BigQuery to Their Skillset

Musings on Data, Part 1: lakes, houses, clouds, etc.

75 Big Data Terms To Make Your Father Proud

Big Data Computation: Revolutionizing the Digital World