Welcome back to issue #4 of the Synthetic Thought: AI Digest newsletter. As you might have noticed, I have changed the name of this newsletter from "AI Matters" to "Synthetic Thought: AI Digest". Mainly to avoid name collisions with other newsletters out there.
I have taken some time off after the last publication and have not published an issue last week. However, I have included all events, projects, papers, and notable topics that I'd have covered last week into this week's.
- OpEd: An attempt at an AI-generated newsletter
- Cool Projects
- Notable Papers
I have been using various AI image generation tools (Lexica and DreamStudio) to generate the image that goes with each issue. I wanted to see if I could get an AI-assist in making this newsletter generation process easier. Maybe even fully generate it using an AI. Here's the process I wanted to use:
- Generate a prompt using ChatGPT 4
- Feed the generated prompt back into ChatGPT 4 to generate the actual newsletter.
- Use the same generated prompt with Bard and Bing.
- Lastly, try step #2 using "Browse with Bing" feature that allows ChatGPT 4 to surf the internet, click on links, and gather knowledge.
Overall, I have to say I was pretty disappointed with the results from all of them (ChatGPT 4 with and without the Browse with Bing plugin, Bing, and Bard). Here are some reasons why:
I have started with the following prompt:
First of all, they all failed at the prompt generation and just directly started writing the newsletter. Their responses included items from prior to May 17th. There was some inaccurate information and the content was anemic for a newsletter. I spent a bit more time tweaking the prompt to see if I could coerce them into generating a reasonable response. I gave up probably too quickly. But it feels like they're not ready to generate an entire newsletter on their own at this point. In the future, I'd like to prompt-engineer futher or fine-tune using previous editions of this newsletter as examples and see if that helps.
- Skybox AI generates a 3D world based on a text prompt. I asked it to generate "A Hindu temple on a mountain floating above a beautiful beach in concave shape" and it created this world. Pretty amazing!
- Genmo is a text-to-3D model and image-to-3D model generator. I saw the demo but had to get on a waitlist to try it out.
- Confused about all the different LLMs out there and not sure which one to pick? I ran into this Open LLM Leaderboard. According to their page, this board "aims to track, rank and evaluate LLMs and chatbots as they are released". Hopefully, this makes the process a bit easier.
- Meta announced support for an impressive 1,100 languages in speech-to-text and text-to-speech capabilities as part of its Massively Multilingual Speech (MMS) project. In addition, they've announced language identification capabilities for 4000 languages.
- RedPajama-3B now runs on a range of consumer devices, including iPhones.
- Every research paper on arXiv is now available as an embedding! This is significant. Check out this tweet by Will Depue announcing this. Through his tweet, I also learned about the Alexandria Index, an attempt at embedding all of the Internet. Embeddings allow us to perform searches, clustering of information, recommend, detect anomalies, classify, measure diversity, etc. Creating embeddings of large datasets accelerates AI workflows and pipelines. Thank you, Will!
- Voyager plays Minecraft autonomously using GPT-4 and excels at it. According to their website, "It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA"
- Want to renovate a room and need ideas? Try Reimagine XL. Of course, you are not limited to just furniture. You can upload any image and generate a number of variations. Try it out, and let me know how it goes.
- The Generative Fill feature in Adobe Photoshop is just mindblowing. Watch the intro video on their page. It's very cool.
- Mindeye is another mind-blowing (I know I have already said that about the previous one :) )project. This project reconstructs an image from brain activity! Look at the examples on their page. Another fascinating project.
- Another Brain+AI project that's amazing: this one helps a paralyzed man walk again. They're calling it the Brain-Spine Interface. The end of that article has videos showing a man walking with the help of this Brain-Spine Interface.
- This tweet explores how specifying the lens type in your prompt impacts the generated image in Midjourney. Another type of prompt "engineering".
- Gorilla integrates 1600+ APIs into LLMs so that natural language queries are translated into accurate API calls. General challenge with LLMs is, sometimes they hallucinate, which doesn't work very well when using APIs. Gorilla team claims to reduce hallucinations substantially.
- Break-A-Scene extracts multiple items/concepts from a single image and generates variations of those concepts in other images. Currently most methods only extract a single concept and that too from multiple images. This is significant progress compared to most current methods.
- This professor (I'm guessing he's a professor) asked his undergrad students to use ChatGPT for an assignment and asked them to grade by looking for hallucinated info. Apparently, all 63 essays had hallucinated info. Lesson here is, in my opinion is, don't trust the AI blindly. Always double check.
- DINOv2 is a computer vision model that can perform high-quality segmentation, depth estimation, classification, and image retrieval using the Self-Supervised Learning approach. Try out the demo. More on the history and evolution of these models at InfoQ.
- If you enjoy reading about advances in hardware architectures, here's the spec for Nvidia's Grace Hopper architecture. This Superchip, as Nvidia calls it, uses an NVLink interconnect running at 900GB/s bandwidth, 7x more than x16 PCIe Gen5 lanes! If you do read it, let me know what other features stood out for you. There are too many to mention here.
- Neuralangelo reconstructs 3D surfaces with high-fidelity from just video captures. Their reconstruction is of much higher quality than current state of the art.
- SoundStorm generates realistic voices and dialogues using an introductory voice prompt. I couldn't tell the difference between the original voice and synthesized voice(s). I can see some very cool applications using this and can also see how this could be misused, such as bypassing biometric identification systems etc. The authors acknowledge some of these safety issues at the end of the webpage linked.
- There were a number of sessions at the Microsoft Build conference that were interesting. I did manage to watch Andrej Karpathy's State of GPT. Highly recommend watching it.
- This paper discusses LLMs' bilingual capabilities. Apparently have the ability to translate very well. However, there seems to be a correlation between how well they can translate and how large a model is. Larger models perform better, according to this paper.
- DragGAN allows users to manipulate generated images. Users click and drag features of an image and adjust various parameters to regenerate images until desired properties are achieved. Their website has some cool demos. I encourage checking out those demos to get a sense of how this works.
- FrugalGPT claims to reduce costs by 98% by picking the right LLM based on the query and optimizes for cost and accuracy.
- Current LLMs are resource intensive as they take historical context into account. This is also why they are great at NLP. However, a new architecture allows for an approach "that combines the efficient parallelizable training of Transformers with the efficient inference of RNNs"
- Sophia helps train AI models 2x faster than the traditional Adam optimizer.
- Prompt Engineering is necessary to steer Large Language Models (LLMs) to get the best out of them. Automatic Prompt Engineer?(APE) demonstrates a way to use LLMs also to generate the required prompt. This paper shows that
Reminder: please subscribe to the?Synthetic Thought: AI Digest newsletter (renamed from AI Matters) and share it in your network. Thank you!
Please let me know your thoughts on this edition in the comments section. Did you like it? Too much info in one article? Did I miss anything you encountered in the last week?