My Berkeley co-instructor
David Steier
and I have been talking about large transformer and/or generative models for a while. We have seen many students using these large models and apply them in very interesting and creative ways to enable various meaningful use cases with big market opportunities. For example, how to predict the next dance moves? How to generate art based on the "mood" and "color" and "taste" of the room? How to generate conversational response based on initial prompt (or trigger in the real world) to help train couples to communicate with care and empathy. Many examples exist, and we often ask our students what guardrails are or can be put in place when working with these generative models given that the outputs span a range and are not highly predictable.
It seems that the next chapter opened for the powerful generative models when DALL·E 2? and others (e.g., Midjourney) emerged in full force in the last few months. While it is truly amazing what these models can generate based on short natural language prompts, it feels pretty scary if/when these models become weaponized and if no guardrails (and dare I say regulations) are put in place. I have tried a number of these platforms; many innovative and good attributes exist but also many questions are emerging.
What is different this time with the new releases of these generative models?
- These are no longer just massive models for data scientists and machine learning/ AI enthusiasts. These models are put in production to the masses and are being productized in a way that encourage use or experimentation at high frequency (and let me tell you it is additive). The point is that these are not just models for research anymore. These are what I call generative model platforms.
- Related to the first point, elegant business model and pricing are being put in place. Based on the pricing I see after the initial free usage, I suspect that these platforms can become quite profitable with the economies of scale they can gain / will gain over time. The strong market-orientation has many commercial and legal implications.
- Some platforms allow users to register with anonymity. If humans can generate any image without their names attached to it, human curiosity will probably run wild! At an extreme end of the spectrum, I have seen some very explicit images generated by users on these platforms. While it seems that moderators are being put in place, the pace at which images are being generated all over the world 24/7 will make moderation activities unsustainable and near impossible. Many questions can be / should be raised here including freedom of speech, ethics (which David and I have raised in our classes), and content safety for children and young adults. It will require multi-disciplinary approach to answer these questions.
- Scale, Virality, and Ownership. Generated images can be downloaded, shared, and re-shared. This raises the question of creative control and ownership as well as having certain societal boundary on what should be shared versus the should-not (e.g., machine generated adult-rated image of a respected celebrity or person). And I surmise that the deepfake detection programs will probably never catch up at this point.
- New industries and automation? The creative art industry and the adjacent fields are being disrupted by these generative model platforms as I write this. Synthetic data is turning to another chapter.
Professionals in all relevant fields should consider the positive and negative implications of what powerful models are bringing to us. AI Generative Models are eating the world.
Vice President, Data and AI @ First American | Masters in Data Science | Modernizing and transforming using AI & Technologies
2 年Yes, It is impressive to see how well these models generate a good image on a meaningful sentences and are close enough for non-realistic sentences like 'man flying over the ocean'. These models are expensive to deploy and use for inferences for small/mid-size organizations, I am pretty sure prominent software vendors (cloud) are thinking about making it as PaaS (Platform as a Service) or Saas (Software as a service) so that you pay only when you use it, as they did with OCR/Vision, Classification models
Sr. Manager, Data Science at Google
2 年There seems to be an increasing volume of discussion around copyright and consent among artists and associated communities and organizations around the inclusion of scraped art in the training data for these large generative image models. Kind of calls into question issues with marginalization of these artists who often struggle to make a living from their work, seeing these large companies potentially produce large profits without attribution, consent, or compensation. We've definitely heard some rumblings about these concerns around LLMs, but something about the medium seems to be really driving a lot of chatter with models like dall-e and stable diffusion...