What Technology Infrastructure Do You Need For Big Data?
What Technology Infrastructure Do You Need For Big Data? | Adobe Stock

What Technology Infrastructure Do You Need For Big Data?

In this article, I want to explore what sort of infrastructure and tech framework you need to put in place to effectively work with big data (large, fast-moving datasets) that hold so much value for organizations today. To do this, I continue my conversation with Ivo Koerner, VP of IBM Systems. (our previous conversation focused on: tech infrastructure for AI).

Today, data is the fuel for the “fourth industrial revolution," and our ability to collect, store and analyze it in huge volume is the power behind advances like artificial intelligence (AI) and the Internet of Things (IoT) that are driving digital transformation today.

This means it’s important to consider that the volume of data being generated and collected by organizations is growing so rapidly that we need to be able to decide what data has value, and what is just meaningless "noise." 

From my own work with businesses all over the world, it's clear that some are not collecting enough data – allowing valuable insights to be missed. On the other hand, some take the approach of storing absolutely everything, and this can cause its own problems - increasing cost, compliance burdens, and, fairly often, confusion.

“It’s a difficult decision,” says Ivo, “You don’t know exactly what data you will need in the future and if you delete it or throw it away … you cannot recover it.”

Ivo’s take is that it’s often better to collect and store more, rather than too little. “Deleting is far easier than recreating data, so I'd rather store and archive more data. Modern technology is becoming so cost-efficient that at the end of the day if you install another terabyte or two or even a petabyte, it's not such a cost problem."

This then raises the next challenge. If you are going to take the approach of collecting and storing data in bulk, you need to make sure you have the framework in place to keep it accessible and organized – otherwise, it can be difficult to remember exactly what information you actually have.

When collecting and storing huge volumes of data, it's important to consider how the data will be used. Particularly, what data will you need instant access to at any time, and what data is it probably safe to store away in an archive until you have a use for it.

Many people are surprised to learn that even at the most cutting-edge tech companies, vast volumes of data are still stored on tape – a medium that many consider to be antiquated. This is because when dealing with very large volumes of data that need to be archived, but may not be accessed very frequently, it’s still very reliable and cost-effective.

On the other hand, the probably far smaller amount of data you collect that you know will be frequently accessed or updated, will be more suited to flash storage – a much more modern, flexible – but expensive – solution.

"Data that you don't use a lot you put on tape … it's very cheap," says Ivo. "And data that you use day-to-day you put on a flash system. Then you need to find a way to really understand the data that you collect, so you need a management layer above it."

This is the software layer that will handle the data operations such as data cleansing, extraction, transformation that extract the actual business value from the information you’ve collected, in the form of insights that can be used to drive growth and transformation.

“There’re various trends, but at the end of the day, it's still fed by the data you collect and build in your production environment," Ivo tells me. 

Today’s trends are generally moving in the direction of the “data lake” – this is a concept that revolves around overcoming some of the problems caused by the traditional “siloed” approach that has generally been used to store data. The problem with a siloed approach is that data is often held in isolation by the different operational branches of the organization that has collected it. This could make it difficult for other operational branches to access it, and there may be a lack of consistency around data formats, metadata, and how the information is stored.

In a data lake, on the other hand, all of the information is stored together – generally in a raw, unedited format – meaning it's accessible to everyone, and anyone can use their own tools and strategies to unlock value from it.

“It’s a very interesting trend,” says Ivo. “My personal opinion is [whatever approach you take] should support the project and the business that you want to help with the data. I don’t think that, generally, a super big data lake can be kept consistent for a long time.

“The bigger the data lake you create, the more complex it gets. I would rather start with the first business process -the first decision you want to take – and then enrich it from there. But never start generating and architecting an enterprise-wide data lake.”

For more insights, you can watch my full interview with Ivo Koerner below, and you can read our previous conversation on what technology infrastructure you need for artificial intelligence and machine learning here.


Thank you for reading my post. Here at LinkedIn and at Forbes I regularly write about management and technology trends. I have also written a new book about AI, click here for more information. To read my future posts simply join my network here or click 'Follow'. Also feel free to connect with me via TwitterFacebookInstagramSlideshare or YouTube.

About Bernard Marr

Bernard Marr is an internationally best-selling author, popular keynote speaker, futurist, and a strategic business & technology advisor to governments and companies. He helps organisations improve their business performance, use data more intelligently, and understand the implications of new technologies such as artificial intelligencebig datablockchains, and the Internet of Things.

LinkedIn has ranked Bernard as one of the world’s top 5 business influencers. He is a frequent contributor to the World Economic Forum and writes a regular column for Forbes. Every day Bernard actively engages his 1.5 million social media followers and shares content that reaches millions of readers.

For more on AI and technology trends, see Bernard Marr’s book Artificial Intelligence in Practice: How 50 Companies Used AI and Machine Learning To Solve Problems and his forthcoming book Tech Trends in Practice: The 25 Technologies That Are Driving The 4Th Industrial Revolution, which is available to pre-order now.

Manerep Pasaribu

Author & Speaker : Big Data Strategy-KM-Innovation- Entrepreneurship II Lecturer : FEB Universitas Indonesia II Diver II

4 年

I like the conversation so much....thank you Bernard and Ivo....

回复

Selamualyküm Sizinde mübarek geceniz hay?l? olsun hay?rlara versile olsun. Cenab? Allah cc bizleri huzura kavu?tursun s?hat versin selameti nasip eylesin zalimlerden fas?k münaf?klardan korusun adaletiyle tacland?rs?n. Geceniz hay?rl? olsun.

回复
Trad. Ifezuruoha Nnamdi Jnr

Traditional Evangelism / IT Service & HelpDesk

4 年

Of fact, the world is yet to embrace the potentials of bigdata. It is important vand classified to accept the fact that most organisations and individuals do not understand data, now talking of big data. Effective and efficient data management is a way to go and will bring to fulfilment the meaning of human existence and associated activities

回复
Breta Bishop, Assoc. AIA

Project Manager | Space Management & Construction Facilities | Sustainability | BIM Champion

4 年

Bernard, This methodology/ approach would greatly assist in Corp RE decision-making. HR and Corp RE use very different systems for data management, with a focus on how they utilize information which typically doesn't translate for other departments preventing the secondary use as Ivo mentioned. 1.Select decisions/processes that happen quarterly or on an annual basis. I.e. future space requirements/forecasting 2. Focus on the information required to make those decisions. Continuing with my example A. Headcount projections B.Current open/available space C. Promotions/job level (to provide estimate of # of offices) D. Open requisitions E. Number of consultants/temporary employees 3. Who contributes to the Data Lake to provide the necessary information? A. HR B. Procurement- or whatever arm of business obtains specialized temp employees C. Corp RE - whomever manages occupancy D. Department leads- provide headcount projections potentially in conjunction with HR depending on how your company works E. Facilities F. IT 4. Management layer that allows all information to be integrated for analysis. Ideally an IWMS software.

回复

要查看或添加评论,请登录

Bernard Marr的更多文章

社区洞察

其他会员也浏览了