DATA vs DATA
Introduction
With the advent of Generative AI and the explosion of data generated by large language models (LLMs), it is time to revisit the data frameworks that the industry has established to ensure data quality, privacy, compliance, and ownership.
For the sake of argument, let us refer to?
UGD World?
With UGD, data management is a complex process involving the handling of data from various sources to multiple destinations, incorporating both simple and complex transformations along the way. We ensure data quality through continuous observability, maintaining completeness, comprehensiveness, consistency, and accuracy - at a minimum.
We collaborate with privacy and security teams to understand data classifications and the relevant regulations for Personal Identifiable Information(PII), confidential data, and internal data. These regulations and policies are implemented on both data in motion and data at rest. Role-based access control (RBAC) is adopted to regulate data access based on user roles.
In all enterprises dealing with UGD, emphasis is placed on training data teams, consumers, producers of data about data security and data privacy. This training is supplemented with regular internal audits and scheduled external audits to ascertain responsible data management. Ultimately, data ownership means managing UGD diligently and securely, in compliance with GDPR and other data security and privacy regulations.
领英推荐
GGD World?
Now, let’s explore the world of Generative AI-generated Data (GGD). This domain is similar to the wild west, full of untapped potential and emerging challenges. I categorize GGD into four main sections.
As a Data Leader with many years of experience handling user-generated data (UGD), I am now trying to understand the new world of Generative AI-generated Data (GGD) with the same data governance mindset. I realize that this comparison isn't entirely straightforward, even though UGD and GGD may appear similar in look and feel. If any of you reading this are also seeking answers or have already solved part of the puzzle, please DM me. I would love to learn from your experiences and contribute to the discussion.
Fun Fact: The article is UGD and the image above is GGD
#data #security #data privacy #data compliance
?
Director - Customer Success, Tech Advisory, Online and Platform SBU | Brillio - A Bain Company | Ex Oracle
8 个月A great and very pertinent discussion Ishita Majumdar. Privacy and Security of GGD is a concern that almost all enterprises carry on their back as they continue to invest on the same. The fundamental difference between approaching Privacy and Security for Non-GenAI and GenAi apps is that the former is a design time process/framework vs the later a run time process where weights of tokens are modeled pretty much based on data being used in training . Once PII data is used to train an LLM, there is no way some one ( even OpenAI) can tell how this impact the outputs . With PII data used in prompt engineering , the challenges and the biases are generally more pronounced . In the industry we have started seeing impacts of these not so desired issues and generally obfuscating too much also takes way data lineage and utility of the training data.. so not sure I am providing any solutions here but this is something thats very worth discussing.. Any other views ?