The Lens of Time
As an industrial engineering student at Stanford in the 80s (they didn't yet have an undergraduate Comp Sci degree!), we were taught about time and motion studies and the pioneering work of Frederick Taylor. In order to assess and improve industrial efficiency it was critical to capture detailed time-based measurements. For my final project, I wrote a computer simulation for a local restaurant that was based entirely on probability distributions (like poisson) derived from detailed measurements like guest arrival times, seating wait times, food preparation times, and many more.
Today, these same basic principles of industrial engineering are being applied to a wide variety of problems. Applications today are leveraging container/microservice design and using lots of third-party cloud APIs/services. Yes, IT has always had an emphasis on systems monitoring, but as software has become more distributed the scale and scope has ballooned. Lots of the early and big internet companies have built internal systems to handle these problems - but what about a general solution for everybody else?
It is not just in IT or DevOps that real-time monitoring is critical - companies across different industry verticals have woken up to the opportunity. GE is making a big bet by providing the Predix cloud platform to augment industrial operational technologies. From personal devices for health/fitness to connected home to verticals like healthcare and utilities -- sensors are being embedded everywhere to emit timestamped data.
Why is all this happening now? Well, the cost of storage and compute has plummeted, allowing companies to adopt much more aggressive strategies for storing and analyzing all this data. Advanced processing capabilities like machine learning are becoming more mainstream within software companies as open source solutions like TensorFlow and others emerge.
So why not just store all this in a relational database? The hegemony of the relational religion is being challenged in the database market -- and "polyglot persistence" appears to be the new norm. Most modern applications are built using multiple different types of databases. Yes, relational still exists but now there are special databases for key-value/caching (Redis), json/documents (Mongo), columnar/analytical (Cassandra), search (Elastic) and even graph (Neo) -- and some like Couchbase that support multiple models (key-value/caching, json, search and SQL capabilities).
Time series data deserves its own purpose-built platform -- and that is exactly what InfluxData is delivering. This is not a new problem, and in fact there are legacy proprietary solutions like kdb+ that were built for an even more specific problem like complex financial data analysis.
The combined TICK stack from InfluxData includes:
- T = Telegraf - collection
- I = InfluxDB - storage
- C = Chronograf - visualization
- K = Kapacitor - processing, monitoring & alerting
This provides a base platform upon which to build solutions -- but it can also be used out-of-the-box to replace legacy IT monitoring solutions like Graphite. The community has already contributed integrations with dozens of important adjacent technologies ranging from Apache to Zookeeper.
We are proud to be involved with Paul Dix, Todd Persen and the entire InfluxData team as they embark on this mission to help companies gain value from analyzing their metrics data.
Entrepreneur and builder
9 年Nice post Robin and an extremely interesting space.
Director at Insight Partners | Venture Capital | Data & AI
9 年Thanks for this post. Time-series analytics will become increasingly critical as the velocity of IoT/sensor data spikes in the coming years. I wanted to elaborate that Couchbase does have full-text search, as well as columnar capabilities. It is one of the few NoSQL databases capable of stretching across all use cases, and now it even has SQL compatibility with N1QL.