Drowning in Data? Challenges and Opportunities
Jeffrey Woodard Certified Financial Fiduciary?
Tax-efficient Retirement Income Planning Specialist
The "digital universe" is growing 40% a year into the next decade, expanding to include not only the increasing number of people and enterprises doing everything online, but also all the “things” – smart devices – connected to the Internet, unleashing a new wave of challenges and opportunities for businesses and people around the world.
Like the physical universe, the digital universe is large – by 2020 containing nearly as many digital bits as there are stars in the universe. It is doubling in size every two years, and by 2020 the digital universe – the data we create and copy annually – will reach 44 zettabytes, or 44 trillion gigabytes. (IDC).
Many people don't really understand the size of this, and just what challenges need to be faced regarding this data deluge.
Data scientists break big data into three dimensions: volume, velocity, and variety.
Volume
The benefit gained from the ability to process large amounts of information is the main attraction of big data analytics. Having more data beats out having better models: simple bits of math can be unreasonably effective given large amounts of data. If you could run that forecast taking into account 300 factors rather than 6, could you predict demand better?
Velocity
The importance of data’s velocity — the increasing rate at which data flows into an organization — has followed a similar pattern to that of volume. Problems previously restricted to segments of industry are now presenting themselves in a much broader setting. Specialized companies such as financial traders have long turned systems that cope with fast moving data to their advantage.
Variety
Rarely does data present itself in a form perfectly ordered and ready for processing. A common theme in big data systems is that the source data is diverse, and doesn’t fall into neat relational structures. It could be text from social networks, image data, a raw feed directly from a sensor source. None of these things come ready for integration into an application.
Implications for business and IT professionals:
- Data warehouses will need to be upgraded or swapped out for more flexible data repositories that can handle various data types, automatic tagging, autonomous data “check-in,” and many terabytes. These warehouses must be able to store the vast amount of data on the most efficient infrastructure, bowing to the reality that only a fraction of stored data is actually engaged at any given moment.
- Data analytic output will need to be driven to more parts of the organization, including real-time input to operational decision making.
- Big data is messy: most Data Hub projects are essentially data cleanup projects. The time spent 'cleaning' and without tangible output will doom a good number of these ambitious projects.
- The far-reaching nature of big data analytics projects can have uncomfortable aspects: data must be broken out of silos in order to be mined, and the organization must learn how to communicate and interpet the results of analysis. This is a cultural problem, and one which will require a rethinking of the interface of multiple (often siloed and geographically dispersed) business organizations and the IT services group. The role of the CIO will change, and all executives must be engaged in the initiatives.
- Real transformation to a data-driven or software-defined enterprise is an all-hands-on-deck imperative. IT alone will never be able to make the transition.
______________________________________________________
WHAT ARE BITS AND BYTES?
A "bit" (binary digit) is the smallest unit of information that can be stored in a computer; either a 1 or 0 (or on/off state). All computer calculations are in bits.
A "byte" is a collection of 8 bits. Bytes are convenient units, because, when converted to computer code, they represent 256 characters, (either numbers or letters). So a byte is 8 times larger than a bit.
Bytes are typically mentioned in multiples of 1,000, such as kilobyte, (1000 bytes) megabyte, gigabyte, etc. The progression is as follows:
Bit (b) 1 or 0
Byte (B) 8 bits
Kilobyte (KB) 1,000 bytes
Megabyte (MB) 1,000 kB
Gigabyte (GB) 1,000 MB
Terabyte (TB) 1,000 GB
Petabyte (PB) 1,000 TB
Exabyte (EB) 1,000 PB
Zettabyte(ZB) 1,000 EB
This seems simple enough, except sometimes multiples of bytes are considered as powers of 2, since the original machine language only has two states, 1 or
- So a kilobyte is 210 bytes, or 1,024 bytes.
A megabyte would be 220 bytes, or 1,024 kilobytes, and so on.
Put it into Context
A short novel 1 MB
A meter of shelved books 100 MB
A stack of tablets reaching 3/4 of the way to the moon 4.4 ZB (today)
A stack of tablets reaching 6.6 times to the moon 44 ZB (2020)