登录查看更多内容

The Signals Repository: Enabling & Accelerating the Citizen Data Scientist

Shreedhar Sasikumar

Senior Director, AI & Machine Learning

发布日期: 2020年1月20日

We live in the world of the ‘citizen data scientist’, where any analyst with a computer is empowered to generate insight from data. In a world where Excel can handle 1M rows, pivot tables are sufficient to summarize decades of data. An Alteryx or SAS will let you build a logistic regression or train a tree model with a few point and clicks, while Tableau & PowerBI beautifully visualize your results on interactive dashboards. Coursera/EdX/Udemy/ CodeAcademy (take your pick) bring the ability to train an NLP model to our living rooms and if it is processing power you are after GCP or AWS will let the average citizen spin up a cluster to analyze petabytes of data, again for a few clicks, without a single line of code.

But for all the problems these tools solve, our citizen data scientist still cannot generate insight without data. To predict the best place to put a new restaurant/bank branch/shopping mall, one needs data on where current structures are and who lives where. How could one build a localized demand model without knowing how much income and liquidity there is in an area? Couldn’t a city plan better if it could analyze local commuting patterns and identify busy intersections? And wouldn’t revenue forecasting be more accurate by incorporating hundreds of macroeconomic indices from around the globe.

The good news is that all this data is out there for all of us in the open-source realm. But unlike a Bayes model, the data you need is rarely a couple of clicks away. In the wild west of the Internet, 10 websites will have 14 different data formats for you to parse through. And once you get the data you realize it is full of code values and you now need to download a lookup file (if you can find it). And once you are done cleaning the nulls and transposing the data in to a time series, you might find there is a new data update to download. Little wonder the old adage that 86.4% of data science is data acquisition and cleaning, and the science is the few buttons you press at the end with the little energy the data prep left you with.

We designed the Signals Repository (S-R) to minimize these worries, and let our ‘citizen data scientist’ concentrate on the insight and not the prep.

With the S-R, local points of interest (POI), demographics, tax statistics, commuting patterns and commodity prices can indeed be a couple of clicks away. And along with those, the S-R can tell you the weather in Australia, how many pockets were picked in a London borough and how rents have grown in zip-code 67212 in Wichita, KS. Among our growing catalogue of 60,000+ signals are global cell tower density, schedules of NBA games, what cars are being driven in Sao Paulo and the topography of Japan. Geo-coded for spatial analysis and with standardized data formats, this wealth of data is indexed and easily explored through our web-portal. And our cloud API and connectors makes the data accessible at scale through common tools ranging from Excel and Alteryx for analysts to R/Python for hardened data scientists.

…..And more than just data, the S-R is a growing platform for developing data enabled applications. Our Functions menu lets users insert proprietary transforms like Richness (to assess the buying power of an area) or Gravity (scores the optimality of a location relative to competitors) in to their own workflows. The insight generated by our data scientists each time they develop a new use case, ranging from auto insurance pricing to online customer churn, are fed back in to our Signals Archive, giving other users a reference manual of the most relevant signals for a given problem. Our vibrant S-R community continues to find new use-cases, be it our company news analyzer that measures a firm’s reputation based on news stories or our forecast simulation tool that can give you over 50,000 potential projections of an economic series, making the S-R a vibrant and growing platform built to enable and accelerate a citizen data scientist’s journey to generate insight from the wide web of information all around us.

We would love to tell you more….or even better show you….

要查看或添加评论，请登录

Shreedhar Sasikumar的更多文章

Are older folks in the US receiving the mental health support they need (or are yours 40s your best decade):

2024年12月28日

Are older folks in the US receiving the mental health support they need (or are yours 40s your best decade):

Abstract: Older people (50-60+) seem to receive/utilize less mental health related care relative to young people AND…

2 条评论
A playbook to school LLM Agents to create your AI-Powered knowledge workforce:

2024年11月20日

A playbook to school LLM Agents to create your AI-Powered knowledge workforce:

Imagine your ideal entry-level consultant: a research whiz, a data visualization maestro, and a tireless…

3 条评论
Housing Market in the Aftermath of Covid: A growing market with shifting ‘hotspots’

2020年9月24日

Housing Market in the Aftermath of Covid: A growing market with shifting ‘hotspots’

Abstract Summary: Anecdotal evidence suggests that the housing market has continued to grow despite challenges facing…

2 条评论
State of US Health & The High Cost of FFS

2017年2月22日

State of US Health & The High Cost of FFS

A couple of weeks ago, I ran across a data set, titled County Health Rankings courtesy of the University of Wisconsin…

1 条评论
Medicare: A Geographical Survey Across the States (Part I)

2016年12月20日

Medicare: A Geographical Survey Across the States (Part I)

Quick Summary: Part I (below) is a survey of how Medicare costs vary across the United States (US). Medicare costs can…
Costly Protection: Why Insurance is the most expensive form of Credit

2016年7月20日

Costly Protection: Why Insurance is the most expensive form of Credit

From a simple economic perspective, an average person should never buy any form of commercial insurance. An average…

8 条评论
Want the Most Efficient Way to Reduce Inequality?: Tax the Land

2015年5月30日

Want the Most Efficient Way to Reduce Inequality?: Tax the Land

A while ago, a friend asked me about the effectiveness of inheritances taxes to reduce inequities in wealth…

4 条评论
All Net Neutrality is not Equal

2015年5月22日

All Net Neutrality is not Equal

In late ‘90s and ‘00s most ‘dot-coms’ lacked a revenue model (past an IPO). But they were making a lot of money for…
Net-Neutrality Not for Everyone

2015年2月27日

Net-Neutrality Not for Everyone

‘Net-Neutrality’ for Everyone? Last week, Google announced plans to enable app/web publishers to pay for data charges…

4 条评论

See all articles

The Signals Repository: Enabling & Accelerating the Citizen Data Scientist

Shreedhar Sasikumar

Senior Director, AI & Machine Learning

Shreedhar Sasikumar的更多文章

社区洞察

其他会员也浏览了

What is Data Science?

Understanding Data Science: The Future of Technology and Career Growth ??

How Companies Can Prepare Themselves for Data Science Adoption

Navigating the Data Tsunami: Strategies for Success in Data Science

Top 3 Data Science Trends In 2022

Data Science Unveiled: The New Age of Data-Driven Decision Making

Low Code Data Scientist: How to use llms to create end to end AI applications as a non developer

Future of Data and Data Driven Decision Making (DDDM)

Normalization and Standardization in Data?Science: When to apply one, when to apply the?other?

The Wild Evolution of Data Science: Unpacking its Many Layers

Shreedhar Sasikumar的更多文章

Are older folks in the US receiving the mental health support they need (or are yours 40s your best decade):

A playbook to school LLM Agents to create your AI-Powered knowledge workforce:

Housing Market in the Aftermath of Covid: A growing market with shifting ‘hotspots’

State of US Health & The High Cost of FFS

Medicare: A Geographical Survey Across the States (Part I)

Costly Protection: Why Insurance is the most expensive form of Credit

Want the Most Efficient Way to Reduce Inequality?: Tax the Land

All Net Neutrality is not Equal

Net-Neutrality Not for Everyone

社区洞察

其他会员也浏览了

What is Data Science?

Understanding Data Science: The Future of Technology and Career Growth ??

How Companies Can Prepare Themselves for Data Science Adoption

Navigating the Data Tsunami: Strategies for Success in Data Science

Top 3 Data Science Trends In 2022

Data Science Unveiled: The New Age of Data-Driven Decision Making

Low Code Data Scientist: How to use llms to create end to end AI applications as a non developer

Future of Data and Data Driven Decision Making (DDDM)

Normalization and Standardization in Data?Science: When to apply one, when to apply the?other?

The Wild Evolution of Data Science: Unpacking its Many Layers