The Signals Repository: Enabling & Accelerating the Citizen Data Scientist

No alt text provided for this image

We live in the world of the ‘citizen data scientist’, where any analyst with a computer is empowered to generate insight from data. In a world where Excel can handle 1M rows, pivot tables are sufficient to summarize decades of data. An Alteryx or SAS will let you build a logistic regression or train a tree model with a few point and clicks, while Tableau & PowerBI beautifully visualize your results on interactive dashboards. Coursera/EdX/Udemy/ CodeAcademy (take your pick) bring the ability to train an NLP model to our living rooms and if it is processing power you are after GCP or AWS will let the average citizen spin up a cluster to analyze petabytes of data, again for a few clicks, without a single line of code.

But for all the problems these tools solve, our citizen data scientist still cannot generate insight without data. To predict the best place to put a new restaurant/bank branch/shopping mall, one needs data on where current structures are and who lives where. How could one build a localized demand model without knowing how much income and liquidity there is in an area? Couldn’t a city plan better if it could analyze local commuting patterns and identify busy intersections? And wouldn’t revenue forecasting be more accurate by incorporating hundreds of macroeconomic indices from around the globe.

The good news is that all this data is out there for all of us in the open-source realm. But unlike a Bayes model, the data you need is rarely a couple of clicks away. In the wild west of the Internet, 10 websites will have 14 different data formats for you to parse through. And once you get the data you realize it is full of code values and you now need to download a lookup file (if you can find it). And once you are done cleaning the nulls and transposing the data in to a time series, you might find there is a new data update to download. Little wonder the old adage that 86.4% of data science is data acquisition and cleaning, and the science is the few buttons you press at the end with the little energy the data prep left you with.

We designed the Signals Repository (S-R) to minimize these worries, and let our ‘citizen data scientist’ concentrate on the insight and not the prep.

No alt text provided for this image

With the S-R, local points of interest (POI), demographics, tax statistics, commuting patterns and commodity prices can indeed be a couple of clicks away. And along with those, the S-R can tell you the weather in Australia, how many pockets were picked in a London borough and how rents have grown in zip-code 67212 in Wichita, KS. Among our growing catalogue of 60,000+ signals are global cell tower density, schedules of NBA games, what cars are being driven in Sao Paulo and the topography of Japan. Geo-coded for spatial analysis and with standardized data formats, this wealth of data is indexed and easily explored through our web-portal. And our cloud API and connectors makes the data accessible at scale through common tools ranging from Excel and Alteryx for analysts to R/Python for hardened data scientists.

…..And more than just data, the S-R is a growing platform for developing data enabled applications. Our Functions menu lets users insert proprietary transforms like Richness (to assess the buying power of an area) or Gravity (scores the optimality of a location relative to competitors) in to their own workflows. The insight generated by our data scientists each time they develop a new use case, ranging from auto insurance pricing to online customer churn, are fed back in to our Signals Archive, giving other users a reference manual of the most relevant signals for a given problem. Our vibrant S-R community continues to find new use-cases, be it our company news analyzer that measures a firm’s reputation based on news stories or our forecast simulation tool that can give you over 50,000 potential projections of an economic series, making the S-R a vibrant and growing platform built to enable and accelerate a citizen data scientist’s journey to generate insight from the wide web of information all around us.

We would love to tell you more….or even better show you….

要查看或添加评论,请登录

Shreedhar Sasikumar的更多文章

社区洞察

其他会员也浏览了