Questions from Webinar – Designing a Knowledge Repository

Questions from Webinar – Designing a Knowledge Repository

Many of you attended our webinar Big Data Analytics: Designing the Knowledge Repository, (https://www.brillio.com/1115/designing-the-knowledge-repository-webinar) but we ran out of time to answer all of your questions during the session. So we wanted to take the time to answer some of the questions with you in this blog. 

Q: How does the security governance for the data layer work within this framework?

A: Overall the security governance will be part of the data governance function within the data management layer. However, the physical/network security, as well as the identity and access management functions, are handled by the foundation layer. Fine-grained security, encryption, and role-based security are implemented in the other layer. 

While data security is often (and rightly) top of mind for most companies, a fine balance needs to be achieved from governance perspective. A platform that is too tightly controlled will result in data silos (exactly the type of thing we are trying to avoid) and will limit usability/access, which is detrimental to driving high confidence, insights-based decision-making.

Q: Why do you say the discovery layer should be temporary?

A: The discovery layer should not be temporary; rather we are saying the sandbox environment - which allows organizations to hypothesize, build, test - should be temporary.

This sandbox environment resides within this discovery layer, and is typically provisioned for a period of 4 to 12 week during which time scientists are able to provide or disprove their ideas. After that point, the sandbox is reclaimed for another rapid experimentation cycle.

Q: With some classes of data, I would image there is a "completeness" factor that needs to be tracked, especially if a proved experiment gets "mechanized" into the Enterprise Knowledge Layer. Any best practices for establishing sound "data completeness" controls?

A: Good question.  This makes me think about how we need to move from the traditional way of handling data to the new of working with data. In the past, the core focus of data teams was to bring complete, clean data into the platform.  But in today’s work all data is important, even data that is incomplete, as it may contain some valuable insights. This is where metadata comes in: organizations can tag relevant metadata, bring that data in, and use it for various analysis. This allows us to find wisdom in the entire data set.

Q: You mention that data analysts and data scientists trying to setup an experiment can spend a large amount of their time finding, collecting and preparing the data. Seems to me like automation is key here. How much automation to you bring – after all you need some human insight? And where do you bring in automation?

A: Automation in the data platform context is the ability to automatically capture metadata at all data lifecycle stages (at ingestion, preparation, consolidation, wrangling, etc.). Equally important is the process of exposing that metadata through a searchable data catalogue user interface. Adding few social features to the data catalogue will bring in the human insight factor, where the analysts should be able to comment on their experience with the underlying data. These two things allow for automation but also add a human element and provide additional “context” that can improve the usability of data for future analysis.

Q: How long does it take to deploy this type of platform or deploy this type of architecture?

A: With some good expertise and head start with prebuilt accelerators, which we have developed for numerous types of analysis and for numerous industry specific functions, this type of platform can be deployed in 8 to 12 weeks. 

Q: The elements or blocks you describe look very linear. Is this correct, or are some elements done simultaneously?

A: Yes, you are right it looks linear, as the effort here was to abstract the complexity and present it in a conceptual framework. In reality, while deploying the architecture, multiple components are implemented together or in parallel. This allows the analysts and data scientists to adjust information and data as needed.

Q:Is this type if structure only useful for companies involved in a rapid experimentation approach to big data?

A: This type of structure can be used by any organization, whether they are following an ROI-based data project approach or rapid experimentation approach.  In our view, current data world is complex, and getting more complicated. Often times we wander mostly in ‘Unknown Unknown’ space, and the rapid experimentation approach helps companies move forward on the path to becoming more data driven. So while we are little biased toward the repaid experimentation approach, this architecture has very broad applicability.

Got more questions on developing a robust data platform that can act as your organization’s knowledge repository, and how you can implement it in your own organization? If so, we’d love to hear from you. Contact us to talk specifics or explore this idea further.

要查看或添加评论,请登录

Naresh Agarwal的更多文章

社区洞察

其他会员也浏览了