Data Types, the Internet of Things, and the Illusion of “Big Data”
Caveat: While I'm posting this in 2022, the original document this came from was written in around 2018 as I recall.
Introduction
The hype cycle is in full swing, and that's not unusual for new technologies, but the IoT is all, and only, about connected devices, and their growth in popularity. This paper is a stream of consciousness amalgamation of personal experience coupled with light research. Where I could, I’ve credited the original source.
The Internet of Things (IoT) is the network of physical objects that contain embedded technology to communicate and sense or interact with internal states or the external environment.
?It is well known by now that Big Data and the Internet of Things (IoT) will require the enterprise to push data processing and storage to the cloud, and to the edge. With potentially billions of devices streaming data 24/7, there is no practical way to shore up centralized resources in premise-based data centers to handle the load.
?When many executives think of Big Data, they think of large volumes of data. A common notion is that bigger is often better when it comes to data and analytics, but this is not always the case. In their 2012 article, Big Data: The Management Revolution, MIT Professor Erik Brynjolfsson and principal research scientist Andrew McAfee spoke of the “three V’s” of Big Data — volume, velocity, and variety — noting that “2.5 exabytes of data are created every day, and that number is doubling every 40 months or so. A petabyte is one quadrillion bytes, or the equivalent of about 20 million filing cabinets’ worth of text. An exabyte is 1,000 times that amount, or 1 billion gigabytes.” This focus on the rate of data proliferation has sometimes obscured an appreciation of data and analytics value.
The result is a myth about Big Data — that Big Data is synonymous with large volumes of data.
Tapping Into the “Long Tail” of Big Data
When asked about drivers of Big Data success, 69% of corporate executives named greater data variety as the most important factor, followed by volume (25%), with velocity (6%) trailing. In the corporate world, the big opportunity is to be found in integrating more sources of data, not bigger amounts. Variety, not volume, is king. Variety is driven by the array of sources, or things from which data is collected.
Types of Data
Forbes published The 13 Types of Data, by Adrian Bridgwater, in July of 2018 (https://www.forbes.com/sites/adrianbridgwater/2018/07/05/the-13-types-of-data/?sh=6c583d0d3362). Here are core excerpts that identify the 13 types of data we must consider.
Big Data
A core favorite, big data are very large, diverse data sets that include structured, semi-structured and unstructured data, from different sources and in different volumes, from terabytes to zettabytes. In this author’s opinion, Big Data is an illusionary catch phrase that encompasses all the data types described here.
?Structured, unstructured, semi-structured data
All data has structure of some sort. Delineating between structured and unstructured data comes down to whether the data has a pre-defined data (schema) model and whether it’s organized in a pre-defined way.
?Time-stamped data?
Time-stamped data is a dataset which has a concept of time ordering defining the sequence that each data point was either captured (event time) or collected (processed time). In particular, computerized system logs, used in the analysis of outages or security breaches, re a prime example.
?Machine data
Simply put, machine data is the digital exhaust created by the systems, technologies and infrastructure powering modern businesses.
?Spatiotemporal data?
Spatiotemporal data describes both location and time for the same event -- and it can show us how phenomena in a physical location change over time. Tracking information, widely popular with IoT solutions is a great example.
?Open data
Open data?is data?that is freely available to anyone in terms of its use (the chance to apply analytics to it) and rights to republish without restrictions from copyright, patents or other mechanisms of control.?
?Dark data
Dark data is digital information that is not being used and lies dormant in some form. Dark data represents the largest missed opportunity for enterprise business. More than one enterprise has begun the journey to creating a data lake, where all corporate data is stored. There is a grave danger the data lake will devolve into a data swamp of unused or unusable data, degrading into a data tomb, where data goes to die.
?Real time data?
One of the most explosive trends in analytics is the ability to stream and act around real time data. Some people argue that the term itself is something of a misnomer i.e. data can only travel as fast as the speed of communications, which isn’t faster than time itself.
?Genomics data?
Genomics data involves analyzing the DNA of patients to identify new drugs and improve care with personalized treatments.
?Operational data?
Companies have big data, they have application logs and metrics, they have event data, and they have information from microservices applications and third parties. This operational data is often time-stamped data.
?High-dimensional data
High-dimensional data is a term being popularized in relation to facial recognition technologies.
?Unverified outdated data
Mike Bursell of Red Hat points to what he calls unverified outdated data. This is data that has been collected, but nobody has any idea whether it's relevant, accurate or even of the right type.
?Translytic Data
An amalgam of ‘transact’ and ‘analyze’, translytic data is argued to enable on-demand real-time processing and reporting with new metrics not previously available at the point of action.
?Capture Legacy Data Sources
Many firms see the opportunity in Big Data resulting from the capture of traditional legacy data sources that have gone untapped in the past. These are data sets that typically sat outside the purview of traditional data warehouses — what we now call the “long tail” data.
?One of the greatest benefits of Big Data is that businesses can now dig deeper into their own data before they turn to new sources or things. Many things already exist, but the data has remained an untapped resource.
?Integrate Unstructured Data
Businesses have been inhibited in their ability to mine and analyze the vast amounts of information residing in text and documents. Our ability to capture data, until recently has far surpassed our ability to analyze the data. Traditional data environments were designed to maintain and process structured data — numbers and variables — not words and pictures.
?A growing Number of organizations are now focusing on integrating this unstructured data, for purposes ranging from customer sentiment analysis to analysis of regulatory documents to insurance claim adjudication. The ability to integrate unstructured data is broadening traditional analytics to combine quantitative metrics with qualitative content.
?Many data warehouses today are largely a data tomb – where data goes to die.
Things: What Exactly are Things In Our Universe?
?There are really multiple universes of things. For consumers, wearables, appliances, lightbulbs, cameras, Alexa or Google Home are widely know and accepted.
?The phrase IoT is new, but the reality is it is an evolution of something that’s been around for years. If you think of telemetry systems, we’ve been monitoring devices from flow meters, or power grids, to factory controls for years. Perhaps a better way to thing to things is as sensors. In context, in the network, and SNMP MIB is simply another form of sensor. Things are everywhere.
领英推荐
If it uses power, it can incorporate sensors and become a thing to monitor.
Things run a variety of network technologies. In our market we can expect to see five primary communications methods for things:
?Exponentially beyond the corporate or enterprise data and things shown here, the healthcare universe brings exponentially more things and more data. Patient records, research papers, X-rays, MRI/CT scans, output from labor devices like urinalysis or blood gas analyzers all produce volumes of data for research and analysis.
Cloud Computing is Evolving to Fog Computing
This should be a simple concept for us to grasp. We’ve drawn the network, whether it’s the PSTN, the Internet or a global network as a cloud for decades.
?The growth of could computing was initially view as a balance of private, public and hybrid cloud services. With the growth in Cloud Service Providers (CSPs), cloud computing is now frequently being referred to as fog computing for two reasons. First, we are seeing cross could integration and levels not originally anticipated. Companies aren’t putting all their eggs in one basket or all their data in one cloud. Second,?we’re able to push the cloud closer to the end user, or ground, layer. Beyond integration of simply connecting cloud resources to the edge because it pushes more of the processing load to the cloud gateway where it can dispense services more quickly and cheaply.
The impending rollout and growth of 5G networking, brings and increased focus on Mobile Edge Computing (MEC), which further moves the edge to the “fog”
?Since the value of IoT data will diminish over time, and in many cases quite rapidly, the faster an enterprise can capture and analyze data, the more it will be able to capitalize on opportunities before they vanish. It also introduces a number of security advantages by virtue of its ability to detect potential threats long before they get close to critical systems.
?To gain this level of functionality, we have to push not just processing power but full intelligence to all three tiers of the IoT infrastructure: devices, gateways and the data center. At the device layer, we have to cope with a wide variety of devices, operating environments and data types, while the data center will need new backend cloud connectivity to handle the data that still requires centralized processing. A more intelligent gateway, however, acts as the hub of the IoT environment, not only providing analytics and other services but supporting device management and automation as well.
?An intelligent edge is the key to deriving value from Big Data. By pushing processors as close to users as possible, the enterprise will be able to deliver the highest quality results in the shortest amount of time, all without overloading centralized resources in a flood of data.
IoT and Big Data Trends for 2016-2017 (a past look)
Some industry highlights of trends we should watch closely are:
?Smarter, faster, Bluetooth
New Bluetooth Smart tech will supercharge the IO. How IoT devices communicate is something of a battleground, but for the smart home it's going to be Wi-Fi and Bluetooth, with perhaps a mix of Zigbee in some corner cases.
?Most excitingly, mesh networking will see Bluetooth devices connecting together in networks that can cover an entire building or home. Bluetooth also wants a slice of industrial automation (the so-called IIoT), location-based services and smart infrastructure, too.
?The connected office
The IoT will cover offices – and office workers, too.
Automating a factory or production process using cutting-edge IoT products is one thing, but what about the employees? Since each of them already has a smartphone – and with smart wearables becoming more popular – it's only a matter of time before the office becomes a major connected ecosystem.
?As well as location and data harvesting apps that tell staff about traffic conditions on their way to meetings, you can soon expect to be able to 'see' colleagues' movements around the office on custom-made apps that facilitate quick face-to-face meetings. The IoT includes people, too.
?The spread of beacons
Beacons and beacon-fueled apps will become much more visible in 2016. Is this the physical web? The installation of small, low-power Bluetooth beacons across cities and stores have until recently been talked up only as canny ways for retailers to push timed and targeted adverts to smartphones. That's so 2014.
?"Beacons mean that apps are able to open in the right place," says Mike Crooks, Head of Mubaloo Innovation Lab, who gives the example of an engineer approaching an asset in the field. "The app would open on the right page to bring up the information and tools the employee needs for that specific asset."
?Apple's iBeacon works better for proximity awareness on iOS devices, but Google's new Eddystone means that developers or companies can use a single SDK in their apps. 2016 will see interactive discovery apps for events that use beacons debut, such as Disqovr.
?Sensors become the core focus
The IoT is all about sensors, not things. Should we be calling it the Internet of Sensors rather than the Internet of Things? "The IoT is, to a large extent, a solution looking for a problem, rather than the other way round," says Steve Taylor, a senior consultant at The Technology Partnership (TTP). "There's simply no point in objects talking to each other just for the sake of it and the IoT only provides the communications backbone. An Internet of Sensors looks more like the roots of a tree, with sensors of all types at the extremities, capturing and feeding data upwards to the main trunk – the internet."
?The IoT powers down for health
Embedded sensors will become all the rage in healthcare. Wearable devices on patients that measure vitals is an obvious use for the IoT in hospitals, and as similar technology is used by us all at home the traditional 'doctor-patient' model will change. But there's a trend for putting sensors inside the human body, too.
?This is where the innovation kicks in, because anything inside the body must use battery-free, ultra-low power wireless sensor tech. Powered by biological energy sources, radio waves, vibration and heat, such implants are now being used to wirelessly monitor – from Bluetooth or NFC-connected phones – orthopedic implants.
Be Alert for Slow Growth and a Narrative Shift
The IoT could bloom a lot slower than most analysts are predicting.
?It's predicted that there will be billions of devices on the IoT, but will the future really be like this? "There are good reasons to think that it won't," says Michael Barkway, consultant at The Technology Partnership (TTP), who think it's the web's cheap 'many-to-one' model that has driven revolutionary change. "For the IoT, the cost model is many-to-many, and the issue of who pays makes a pivotal difference." With many suppliers, standards and technologies in a complicated market, the IoT will grow – but slowly.
?We know that he who ignores the past is doomed to repeat it. We would be wise not to forget the optimist euphoria of the dot-com bubble burst. IoT is contingent on our ability to analyze the data collected, not the collection of massive datasets. The complexity of this task may well take longer to mature fully than many predictions indicate.
The Battle for Platforms
Building the IoT is all about creating robust sensors to collect data, then using reliable and cost effective wireless technologies to deliver it – most likely into the cloud.
?However, there's another battle going on, and that's for the platform. Not a day goes by without a new player claiming that their cloud-hosted platform is best, from Apple's HomeKit, Google's Brillo and Intel's IoTivity to Qualcomm's AllJoyn, the UPnP Forum and ARM embed – and many, many more.
?Expect myriad global alliances, forums, partner programs and development kits to appear as the small players chum-up and the big names push their own proprietary platforms. May the best platform(s) win… and be completely open-source and interoperable (yeah, right!).
Chief IoT Officers – Is the IoT Officer the Latest Fancy Job Title?
Until recently it's been right up there with the 'Chief Evangelist' in the roster of comedy job titles in tech organizations, but expect the IoT Officer to become a new hybrid of the Chief Data Officer in many organizations.
?It's in the same vein as bland self-important statements like 'every company is now a tech company', but in a lot of industries companies are selling not standalone products, but those infused with connectivity and automation software. Everyone will look for an edge in how they position their goods and services.
The IoT Data Life Cycle
Datasets have a shelf life. If kept too long, data can pose a security threat to organizations. Addressing that threat could help stave off the next Office of Personnel Management-type hack, according to military officials.
?"We have to start measuring the toxicity of data over time because…the longer we retain it, the more and more threat it represents from a compromise perspective," said David Tillman the Department of Navy's cybersecurity director.
?Data toxicity from aging quickly is one concern. Data poisoning due to immature security in data gathering at IoT scale presents another concern. See the AT&T paper “The CEO’s Guide to Securing the Internet of Things” (https://www.corp.att.com/cybersecurity/docs/exploringiotsecurity.pdf ) for another perspective.
Summary
If we’ve learned one thing, it’s that the number of sources providing data is immense and growing exponentially. The danger, both real and formidable, is that it’s far too easy and shortsighted to fall into the trap of collecting all the data that can be collected. Digital hoarding can quickly become a debilitating corporate disease.
User experience is the key driver for success. The analyst or researcher needs useful, ready access, driving a couple of factors:
?
The future is bright. The opportunities are immeasurable. The key is a thoughtful and methodical strategy that’s also adaptable as lessons are learned. Adapting quickly and ensuring users get the greatest potential value will ensure user success, which will, in turn, drive return in value.