Data in itself is a Cognitive Problem
Image Source : IBM Research

Data in itself is a Cognitive Problem

Ok Folks, so let me put this in perspective. In the times where everyone is talking about the Cognitive Era and all businesses embracing It to solve different business challenges OR gain new insights, it is almost obvious and evident that data is the raw asset with which each of these journeys will start. And nobody is debating that Data will be fuel to these applications to drive insights from.

However, What I think less obvious is that “one of the problems to be solved is data in itself”.

Now, why do I say So?

Let us first look at what defines a problem as a cognitive problem? (As in, something that would not have solvable realistically with non-cognitive means at hand). One obvious set of problems are of course the millennium prize problems J for obvious advantages but I digress.

https://en.wikipedia.org/wiki/Millennium_Prize_Problems

Coming back to the point

In my view, a problem to get defined as a cognitive problem, it needs to have three fundamental dimensions:

1.    At least the hypothesis of resolution is based on data. (Basically, no data, like a new theoretical physics problem is not cognitive. That’s where human brain has huge advantage called Power of Imagination. And evidently, we are losing it fast J

2.    The scale of resolution and required churning into data is huge enough to make it almost unrealistic of individual EXPERT human being to solve

3.    The nature of the problem is volatile. As in, the problem needs to be resolved in specific time frame, if not done so, it may become irrelevant OR resolution may have diminishing returns. So, speed of reaching resolution and accuracy is of utmost importance.

Having said this, now let us go back to what is Cognitive Problem in Data. The all-encompassing problem statement would be “How to support growing capacity and performance needs of underlying data storage, in flat OR even shrinking IT budgets, when organic technology cost reductions are not enough?”

This problem statement can then be distributed into multiple smaller problems, but you will agree that any such sub-problem will need lot of understanding of current data layout. Some of the questions that beg for answers are

1.    What is the production capacity?

2.    What performance need for each application (what is the gap in projected and consumed)

3.    What capacity is orphaned

4.    How much gets backed up? If there is a gap in production and backup, what are the application which are production which do not consider backup as strategy ;)?

5.    How many copies of each production instance?

6.    What is the file / object workloads? Average file sizes? Access Patterns?

7.    What are the hotspots?

8.    How much data lies in tape/ backup pools?

9.    What is the trend of growth for all that is already deployed?

10. What is coming? What are key IT projects that the business aspires to drive? What is the impact on capacity and performance?

Now these questions are based on lot of monitoring/ analytics data about data itself, which qualifies the first criteria of cognitive Problem.

It is humanly impossible to do this analysis in respectable time frame. We need a cognitive program which can churn this information and give us some output in terms of actionable (which reasonable logic based on data). This confirms the second criteria.

Third, if we do not do this now, and resort to traditional ways of growing capacity and performance, we will overrun IT budget, fail to meet expectations and worse, lose to competition, who will do this right way. So, delay in resolution, will result in a loss of business, and we will be left with no means and motivation to do it later. So that is the third criteria of the cognitive problem.

And hence the title, that data in itself is a cognitive problem.

First step can be to instrument the data in a way that can aid decision making. Few tools can be

1.    IBM Spectrum Control: This can help monitor trends in performance/ capacity per application per host. Can figure out orphaned capacity, cold capacity and put a predictive analytics and project near future hotspots which will need immediate attention. It can also standardize and simplify capacity provisioning, and in the process, make it more predictable from capacity planning perspective.

2.    Copy Data Management: This tool can help centralize the entire copy process for all applications, giving tighter control and visibility into the number of copies, the performance need of each copy, the resiliency of each copy etc. This can streamline the balance between copy consolidation, and performance needs, adoption of data reduction like deduplication/ compression where feasible, and reclaiming capacities of orphaned/ not in use copies.

3.    Butterfly Study: This study can provide some good insights into how much capacity is protected VS what is in production, schedules of protections, correctness of backups, workload patterns of unstructured data etc. It can also project the optimized configuration for the entire infra, and give insights into TCO/ROI if the optimized solution is deployed using IBM technologies.

Knowledge about our data is going to be the first step when we embark cognitive journey. Hope, this article gives some pointers into how to do it the right way.

Anil K Nayak

Technical Sales Leader(India/South Asia). IBM Storage Portfolio including Solutions for Data & AI, Hybrid Cloud and Data Resilience

7 年

Very insightful. Learnt quite a bit especially how you connect the dots

回复

要查看或添加评论,请登录

Shalaka Verma的更多文章

  • Reading for Soul: The ultimate Gift

    Reading for Soul: The ultimate Gift

    Over the weekend, caught up with “The Ultimate Gift” by Jim Stovall. It’s a very lean book, but I found it be soul…

    6 条评论
  • Driving a holistic AI at Scale Approach and Building AI Supercomputer

    Driving a holistic AI at Scale Approach and Building AI Supercomputer

    Microsoft recently developed single system with more than 285,000 CPU cores, 10,000 GPUs and 400 gigabits per second of…

    2 条评论
  • Why Micro-services, Why Now?

    Why Micro-services, Why Now?

    API based programming model is nothing new for developer community. However in the past, developer community preferred…

    2 条评论
  • Demystifying Storage Security

    Demystifying Storage Security

    Data Beaches impact each of us. It’s not about just loss of asset, but loss of individuality, credibility, privacy…

  • Pace up with NVMe and NVMeoF

    Pace up with NVMe and NVMeoF

    There is a lot of talk about NVMe and NVMeoF currently, especially in the Data / Storage Industry. But why so much…

  • IBM’s new FS9100 : You Name It, You Have It

    IBM’s new FS9100 : You Name It, You Have It

    One of the analysts recently said “What’s there to not like in IBM FS9100? ” True Indeed!! · A new densely packed all…

    2 条评论
  • Is AI turning all data computing to essentially HPC?

    Is AI turning all data computing to essentially HPC?

    I remember my first stint with HPC, in year 2001 when I was working as a scientist in Bhabha Atomic Research Center…

    2 条评论
  • NVMe over Fabrics Demystified

    NVMe over Fabrics Demystified

    As storage industry moved towards flash, the slowest component of the performance chain , Media, was taken care of. The…

    6 条评论
  • The solution you get is only as good as the problem you posed!!

    The solution you get is only as good as the problem you posed!!

    The point is, I believe the only thing that we should help our clients do, is to help them frame the correct problem…

    1 条评论
  • Can Prayers Heal?

    Can Prayers Heal?

    I have been reading about Noetic Science for some time now. It is basically science which tries to find deeper…

    3 条评论

社区洞察

其他会员也浏览了