Data.world: The importance of linking data and people
George Anadiotis
Analyst, Consultant, Engineer, Founder, Researcher, Writer | Data, AI, Technology, Media, Content, Knowledge Graphs
Notepads, graphs, data lakes, collaboration, and data manifestos. Data.world has an interesting blend of philosophy and technology going on -- and it all converges around one thing: Facilitating data-driven analysis by making it a team sport.
Getting data.world is not very easy.
That's because data.world seems to be working on the intersection of a number of things. What is it that it does exactly, how, and why?
As of today, data.world is officially releasing the enterprise version of its platform, and ZDNet had an in-depth discussion with data.world's team to address those questions.
FROM MAGIC SPREADSHEETS TO MASSIVE GRAPH DATABASE AS A SERVICE
To understand where data.world is coming from, we took a step back to discuss data-related issues the team has had to deal with regularly. CEO Brett Hurt and CPO Jon Loyens co-founded data.world and referred to their experiences in previous roles in enterprises such as Bazaarvoice and Homeaway. Loyens, for example, referred to his magic spreadsheet, which should ring a bell:
We had a data lake, and a data warehouse, and SAS, and we also had to integrate data from external sources to get our quarterly targets and forecasts. And it all went into this epically huge spreadsheet that would tell us the magic numbers we'd have to hit.
When I had to pitch that to a team of engineers, often times some would want to question my data and assumptions. And all I could tell them was, 'Well, here's the magic spreadsheet... good luck.'"
We've all had our share of magic spreadsheets, but there is a better way. This is data.world's message. (Image: James Kendrick/ZDNet)
Loyens added that at the same time ex-colleague and data.world CTO Bryon Jacob was struggling with data management. So, at some point, this motivated them to join efforts and form the team now led by Hurt. The team has been working since 2015 and raised a total of US$ 33 million.
Data.world describes its mission as democratizing access to data and helping tap into more of your team's collective brainpower to achieve anything with data faster. That should also ring some bells, as it sounds like something a data science notebook or a self-service BI tool or a Hadoop vendor or a data lake platform provider could all be pitching.
So, what is data.world again, and how is it different from those? Loyens said there is a lot of emphasis on infrastructure, and a lot of emphasis on analytics, but how to get from one to the other is not clear. Their take on this was to build a massive graph database as a service, add layers on top of it, and focus on collaboration and social aspects.
TIM BERNERS LEE INSIDE
This sounds pretty generic, except maybe for the graph part. But for Hurt, this was the biggest strategic unlock for their business model and how they work with communities:
"The secret sauce in what we do is we're built on top of the Semantic Web and Linked Data. This is how the network effect of what we do kicks in. We are able to connect people to datasets they may not have even thought of, and it makes the world smaller," Hurt said.
Data.world is vocal about its use of this technology, but it also keeps a pragmatic stance. While it refers to how Linked Data technology lends itself very well to data integration and breaking down silos, they acknowledge the two most common criticisms of Linked Data: Accessibility and scale.
Part of data.world's mission is to make data discoverable, and while Linked Data may be a good match for this, it's not really considered accessible by data scientists or analysts.
Tim Berner's Lee vision for Linked Data is underpinning data.world's approach
"We've heard about Linked Data -- great promise, but it's hard to use, and hard to annotate," This is something data.world heard from users over and over, and its way of dealing with that was to abstract as much as possible from the specifics of using Linked Data to ingest and publish datasets.
When data is ingested or published, they are introspected and annotated by data.world using Linked Data standards and vocabularies (most prominently, RDF, SKOS and CSVW). Loyens said they make it easy for people to work with data in tabular formats they are familiar with, and have built things such as a SQL - SPARQL bridge to democratize access.
At the same time, data.world provides access to the underlying formats and technology for the ones that want it. Hurt referred to how this aligns with the vision of Linked Data Tim Berners Lee has been promoting, and added he met TBL and he "loved what we do, and now has our sticker on his laptop wherever he goes."
IS THIS ANOTHER DATA LAKE?
Celebrity endorsement is always good, but it won't get you too far if you have scale issues. Loyens said their take on that was to adopt Apache Jena, and more specifically, a part of it that was an abandoned academic project and pick it up. Having hardened it, Loyens added they intend to re-release it as open source soon.