AI Avatar - A Brief Analysis of Photo-to-Image AI Models
Chenxi Wang, Ph.D.
Investor, Cyber expert, Fortune 500 board member, Venturebeat Women-in-AI award winner. I talk about #cybersecurity #venturecapital #diversity #womenintech #boardgovernance
Recently I used Meta AI to generate an Avatar image with the prompt: "Imagine me as a rock chick with wind-tossed hair"
As soon I posted the image on Facebook, my DM blew up.
"This is fantastic!" my friends exclaimed!
"Which service did you use to generate this?", many asked.
Meta AI was indeed easy to use. You upload a selfie and seconds later, you can start to use prompts to generate images.
This image is probably my favorite. The background and the apparel are on point, down to the details of the hair. Of course my arms are thinner and the body is leaner than in reality, but hey, who is checking?
I tried a few other prompts, like "Imagine me as a conference speaker", which generated the image below.
Hmm... It does look like I am giving a talk, but why do the audience face away from me?
And here is another interesting one. The prompt is: "Imagine me as Carrie Bradshaw". Carrie is perhaps my favorite TV show character. And we got this:
I was actually fairly impressed by this one, With the signature curls, the whimsical outfit, the image captured the essence of being Carrie, albeit with my face.
This got me thinking: what else is out there that can generate images based on a photo and prompts? So I went on a bit of a research mission, and this is what I found:
Microsoft designer, DALL*E both fall into this category. These apps, often based on Stable Diffusion, generate images based on prompts alone, but there is no option to input an image. So they cannot generate images with your likeness.
Apps in this category take images as input but they only let you modify certain aspects of the image, such as the background, your eye color etc. Some of them are scenario based, for example, they can create professional headshots, you in a Christmas photo, you in an anime setting, etc. But there is no general prompt flexibility. Examples include Aragon.ai, fotor.com, aiavatar.com, lightxeditor.com, and Canva.com. In my opinion, these are AI for editing, not true image generation.
Lensa.ai is an example in this category. These apps will generate many images based on uploaded photo, but doesn't allow prompts - in other words, you can't tweak the images that it generates.
For my interests, I was targeting apps that can take both images and prompts. I also avoided paid services, trying only free trials or free services. I did find that many apps, like Photoleapapp.com advertise free trials, but when you click into it, it asks for money to buy credits to generate even a single image.
The one that caught my eye was Imagineme.ai. From the website, it looks like a service that is comparable to Meta AI. The image quality looks superb; Its tag: "Generating stunning images of yourself with one line of text."
Imagineme.ai is a paid service, but a friend of mine had some credits and he let me use them. Here are my analysis of ImagineMe AI vs. Meta AI.
- Meta AI works on a single selfie. Of course they have access to many photos of you, the output might not be based on a single photo. Imagineme, on the other hand, ask you to upload 20 photos.
- Meta AI works almost instantaneously after you upload the photo. Imagineme, however, takes 10-20 hours to train a model using your photos. Fine tuning takes a bit less time, but can still be several hours.
Here are some side by side comparisons between Meta AI's generations and those of Imagineme.
Prompt: Imagine me as a rock chick with wind-tossed hair. Left: Meta AI, Right Imagineme
领英推荐
While MetaAI smoothed the lines on my face and gave me a more youthful look. Imagineme accentuated the wrinkles on my face and made me look like an aging asian Bon Jovi. "At least it gave me some serious guns on the arms", I chuckled.
Prompt: Imagine me at a cocktail party with a fancy dress Left: Meta AI, Right Imagineme
The left is a slightly younger version of me, in a 20's dress and headpiece. Why 20's? It was not in the prompt. I guess the AI decided a 20's style dress is desirable. The right is ... my aunt!
Prompt: Imagine me very sad about something (Left: Meta AI, Right Imagineme)
While Meta AI didn't quite get "sadness" very well, my sad face was more like a "subdued" face. Imagineme's sad face was frightening. Am I 80?
I also asked Imagineme to generate me as Carrie Bradshaw. I fully expected an aging version of me as Carrie. However, I did not quite get that.
What I got was something rather interesting -- instead of my face (aged or not) with Carrie's hair and clothing, the service had combined Asian features with Sarah Jessica Parker's distinct facial structure. It is neither I nor Sarah Jessica Parker.
At this point, I was determined to find out why Imagineme's models, after getting 20 of my photos, were this off the mark. So I looked around in the app and eventually found that I could change the fine-tuning steps the model go through to improve the image quality.
By default, the model fine tunes in 2000 steps, but the user can manually configure it to a different number of steps. I experimented with it -- before 1500 steps, the images bear no resemblance of me. After 1500, the images started to take on some of my features. After quite a few tweaks, 1800 seems to be the optimal where I don't look too old and the images still look somewhat like me.
Here are some of the images Imagineme generated after I changed the fine-tuning steps.
Imagine me as Carrie Bradshaw:
This version of generated Carrie Bradshaw is better than the previous version. It still does not look like me, but at least it is not an asian Sarah Jessica Parker.
I repeated the "Imagine me as a rock chick with wind-tossed hair" with the 1800-step-tuned model. The result was slightly better, less frightening than the 2000-step model.
The rocker image the 1800-step model generated was more youthful and looks more like me than the 2000-step model, but not as attractive as the Meta AI one.
Another prompt I tried was "Imagine me as a high school student" to see how the models can generate a younger version of me.
Both models rendered a younger woman in a classroom setting. While the MetaAI image looked more like me. The Imagineme image looked like someone else, though that smile was definitely mine.
It's been interesting experimenting with the different photo+prompt image generation apps. I have to say between the Imagineme and Meta AI models, there is a clear winner. With soft edges and a gentle lighting, Meta AI's images have an overall more sophisticated feel and quality than others. The images also highlight the subject's good features and smooth out the sharp edges. I did like the fact that Imagineme allows one to customize the steps of fine tuning. However the overall images were hit and miss.
While AI can generate all kinds of scenarios at the beckon of a prompt, I have not found a model that is a match to the real thing. Below is a photo I took three weeks ago in Zurich, no filters, no AI touch ups, with wrinkles and all, but no AI can generate that sparkle in the eyes of the real photo. Well, not yet anyway.
Digital Creative Designer | Researcher | Prompt Engineer | Staff Engineer
1 个月Very interesting creations using AI Chenxi ! And quite a fun project to get started with using AI tools.
Senior Cloud Engineer at CIBC
1 个月Definitely gorgeous
Engineering Executive | Diversity & Inclusion Leader | Board Director | Silicon Valley Business Journal Women of Influence 2019
1 个月Fascinating!
Corporate America’s Financial Planner | Family Planning | Tax Efficiency | RSUs/Stock Options | Retirement Planning | Generational Wealth Building | Financial Advisor & Growth & Development Director | CLU?
1 个月Seeing how AI is pushing what’s possible with image generation is pretty fascinating!