Data in itself is a Cognitive Problem
Shalaka Verma
Technical Executive Leadership | Quantum Computing | Presales| Startup Advisor
Ok Folks, so let me put this in perspective. In the times where everyone is talking about the Cognitive Era and all businesses embracing It to solve different business challenges OR gain new insights, it is almost obvious and evident that data is the raw asset with which each of these journeys will start. And nobody is debating that Data will be fuel to these applications to drive insights from.
However, What I think less obvious is that “one of the problems to be solved is data in itself”.
Now, why do I say So?
Let us first look at what defines a problem as a cognitive problem? (As in, something that would not have solvable realistically with non-cognitive means at hand). One obvious set of problems are of course the millennium prize problems J for obvious advantages but I digress.
https://en.wikipedia.org/wiki/Millennium_Prize_Problems
Coming back to the point
In my view, a problem to get defined as a cognitive problem, it needs to have three fundamental dimensions:
1. At least the hypothesis of resolution is based on data. (Basically, no data, like a new theoretical physics problem is not cognitive. That’s where human brain has huge advantage called Power of Imagination. And evidently, we are losing it fast J
2. The scale of resolution and required churning into data is huge enough to make it almost unrealistic of individual EXPERT human being to solve
3. The nature of the problem is volatile. As in, the problem needs to be resolved in specific time frame, if not done so, it may become irrelevant OR resolution may have diminishing returns. So, speed of reaching resolution and accuracy is of utmost importance.
Having said this, now let us go back to what is Cognitive Problem in Data. The all-encompassing problem statement would be “How to support growing capacity and performance needs of underlying data storage, in flat OR even shrinking IT budgets, when organic technology cost reductions are not enough?”
This problem statement can then be distributed into multiple smaller problems, but you will agree that any such sub-problem will need lot of understanding of current data layout. Some of the questions that beg for answers are
1. What is the production capacity?
2. What performance need for each application (what is the gap in projected and consumed)
3. What capacity is orphaned
4. How much gets backed up? If there is a gap in production and backup, what are the application which are production which do not consider backup as strategy ;)?
5. How many copies of each production instance?
6. What is the file / object workloads? Average file sizes? Access Patterns?
7. What are the hotspots?
8. How much data lies in tape/ backup pools?
9. What is the trend of growth for all that is already deployed?
10. What is coming? What are key IT projects that the business aspires to drive? What is the impact on capacity and performance?
Now these questions are based on lot of monitoring/ analytics data about data itself, which qualifies the first criteria of cognitive Problem.
It is humanly impossible to do this analysis in respectable time frame. We need a cognitive program which can churn this information and give us some output in terms of actionable (which reasonable logic based on data). This confirms the second criteria.
Third, if we do not do this now, and resort to traditional ways of growing capacity and performance, we will overrun IT budget, fail to meet expectations and worse, lose to competition, who will do this right way. So, delay in resolution, will result in a loss of business, and we will be left with no means and motivation to do it later. So that is the third criteria of the cognitive problem.
And hence the title, that data in itself is a cognitive problem.
First step can be to instrument the data in a way that can aid decision making. Few tools can be
1. IBM Spectrum Control: This can help monitor trends in performance/ capacity per application per host. Can figure out orphaned capacity, cold capacity and put a predictive analytics and project near future hotspots which will need immediate attention. It can also standardize and simplify capacity provisioning, and in the process, make it more predictable from capacity planning perspective.
2. Copy Data Management: This tool can help centralize the entire copy process for all applications, giving tighter control and visibility into the number of copies, the performance need of each copy, the resiliency of each copy etc. This can streamline the balance between copy consolidation, and performance needs, adoption of data reduction like deduplication/ compression where feasible, and reclaiming capacities of orphaned/ not in use copies.
3. Butterfly Study: This study can provide some good insights into how much capacity is protected VS what is in production, schedules of protections, correctness of backups, workload patterns of unstructured data etc. It can also project the optimized configuration for the entire infra, and give insights into TCO/ROI if the optimized solution is deployed using IBM technologies.
Knowledge about our data is going to be the first step when we embark cognitive journey. Hope, this article gives some pointers into how to do it the right way.
Technical Sales Leader(India/South Asia). IBM Storage Portfolio including Solutions for Data & AI, Hybrid Cloud and Data Resilience
7 年Very insightful. Learnt quite a bit especially how you connect the dots