AI Code Generators, Misinformation in AI models, Javascript Agents library, Model editing, and current news
Image generated using OpenAI GPT-4, Lexica.art, Instagram filters, and Stability.ai Uncrop

AI Code Generators, Misinformation in AI models, Javascript Agents library, Model editing, and current news

Dear Readers,

Welcome to the 9th issue of Synthetic Thought: AI Digest. I use many sources to keep up with what's happening in the industry. Sometimes, this includes information from other content creators. To be fair to the original content creators, starting this issue, I will attempt to attribute content to original authors when I use their information.

Most of the developer-related content is towards the end of this issue.


OpEd: How this week's cover image was generated. Hint: using a series of AI tools

As you might have noticed, this week's cover image was generated using several AI tools. Here's roughly what the process was.

  1. Used GPT-4 to generate a prompt for Midjourney. The prompt was "Generate a mid journey prompt to generate an image representing future, technology, intelligence, and creativity".
  2. Pasted the prompt into Midjourney, Lexica.art, and DreamStudio. I generated several images and tweaked the prompt for each tool until I got something reasonably interesting.
  3. Uploaded the generated image to Instagram, applied Instagram filters, and played with the filters until the image got a bit more interesting.
  4. I didn't find an easy way to export the image from Instagram, so I screenshotted the refined image, pasted it into Stability.ai's Uncrop tool, and resampled it to half the size and resolution of LinkedIn's cover image.

That was roughly what the process was. Took a bit of tweaking and experimentation at each step. This is just one way to do it. If this is what you want to do, I'd recommend exploring various tools, understanding their strengths and weaknesses, and getting to a process that would work for you.


This week's updates:

  • A thread on some cool image-to-video clips using Runway. The easiest approach seems to be to prompt Midjourney with text and use the generated image(s) to feed Runway to generate a video.
  • Using Interpolating Images, approach a series of images can be generated by specifying a source and target images. Generated images represent a continuum of images as you progress from source to target. Source.
  • brain2music reconstruction paper identifies or reconstructs music that a human subject hears using fMRI. Source.
  • An update to ChatGPT allows custom instructions that are persistent across multiple chats. So folks who have had to specify prompts/instructions repeatedly can now set and forget some of these instructions in the future.
  • Llama-2 was announced since our last edition. Its license allows for commercial use. But it's not open-source. A couple of other caveats to be mindful of: 1) can't use it to train other language models 2) Organizations with >700M users have to apply for a special license. Covered in this post.
  • llama2.ai lets you play with the latest Llama model from Meta using a ChatBot-like web interface.
  • llama-playground allows you to experiment with the latest Llama update on your Macbook. This is for folks who can drop to a shell prompt and execute commands.
  • InstructPix2Pix takes an original image (left) and can alter it based on an instruction, and regenerate the original image based on the instruction provided. Unfortunately, I lost the prompt as it took a while for the model to execute and lost track of it later. It was something along the lines of "modify this image to represent lava spewing out of the mountain" or something along those lines.

No alt text provided for this image
Original image on left regenerated using a text prompt and InstructPix2Pix model. Result on the right.

  • This paper helps with watermarking language model-generated content to establish provenance later. This is one aspect of AI safety that is of great interest currently. Source.
  • Meta open-sourced AudioCraft, a suite of tools for music, sound generation, and compression. I had to navigate a few links to get to their GitHub repo. Here's a link to it.
  • Music To Image generates an image representing the music. I found this very interesting. This model sends audio to another model (LP-Music-Caps) that generates text from audio. The model then takes the generated text and sends it to Stable Diffusion XL to create an image. So I have decided to extend this further. I have taken one of the examples from AudioCraft text, corresponding to the generated audio, and fed the audio to this model, which internally directs the audio to LP Music Caps, and then finally routes that text generated to Stable Diffusion. Why? Because I can. :) So here are all the hops in short form: text -> audio -> text -> image. I used the first prompt in the AudioCraft blog: "Whistling with wind blowing" and fed that into this model. The image it generated was:

No alt text provided for this image


  • Hugging Face released Agents.js, which can be thought of as a Javascript-based orchestration tool with tight integration into the HF ecosystem of models. It can run in browser and Node environments.
  • This tweet (or should I call them Xs now?) reports some new models in the leaderboard.
  • In the last edition, we discussed some architectural patterns. Eugene Yan has an interesting and different representation of patterns for building LLM Systems & Products. Check it out.
  • Med-Flamingo is an LLM fine-tuned on medical textbooks, images, and a Biomed dataset. This tweet summarizes it well.
  • PointOdyssey has a data and way to synthesize data for fine-grained tracking in lengthy video clips.
  • Found this interesting comparison of two AI app generators using LLMs. I plan to try both. My initial attempt while writing this issue led to errors. Had to refocus on completing this article. Will be trying out both, and if I find anything worth reporting, you will find it in the next issue.
  • Similarly, MetaGPT takes a one-line mission statement of a software product and generates code. Internally, it uses agents representing Product Managers, Architects, and various other team members you would typically find in a software dev team—one more tool to try on my list.
  • Rift is another code generator that can now edit code on the fly based on prompts.
  • A summary of Llama2 with a nice infographic based on the original paper.
  • Given that the whole world has been focused on supply chain and provenance ever since COVID struck the world, LLMs should also be looked at from a supply chain perspective. Many models published on Hugging Face and elsewhere start off with some popular model, and folks build on top it (fine-tuning etc.). However, when this is done repeatedly, we can lose track of all the changes that went into it. This article talks about how that chain of models can be poisoned with misinformation. And then, they refer to a tool called ROME that enables editing model facts and identifying specific weights related to the knowledge in question. This can then correct or misinform the model, depending on the user's intent.
  • Along the same lines, this paper discusses the ripple effects of editing a model and its effects on other knowledge. It discusses how current practices aren't sufficient for consistent model updates. They then conclude that in-context editing is likely the best approach to this. Source.
  • Robotic Transformer 2 combines vision and natural language commands. The model can reason and perform actions, even on previously unseen data.
  • Found this leaderboard for Embedding Models and an associated video explaining tradeoffs when choosing one.
  • An article that discusses design trade-offs involved in building your own ChatBot.
  • A guide on fine-tuning Llama 2 using your own set of instructions.
  • Two new models from Stability.ai called Stable Beluga 1 (Llama 1-based) and Stable Beluga 2 (Llama 2-based) were announced. Both models were released under a non-commercial license.
  • A new benchmark in the field of medicine to evaluate multi-modal capabilities (text, images, genetics, etc) called MultiMedBench was introduced. Source.
  • Prompt Engineering is increasingly becoming an essential skill. PromptsRoyale allows you to generate multiple prompts and run a battle to test which ones are the best.
  • Nvidia outlines developing a Pallet Detection Model and generating synthetic data. Pallet detection can be important in a range of Manufacturing use cases.
  • Generally, language models are prompted using text. This paper discusses prompting using speech in different languages. While many have built apps to accomplish this effect, this paper discusses doing this natively using model constructs. Source.
  • This article outlines training a language model from scratch using TensorFlow and TPUs.
  • AzureML now has direct integration into Hugging Face Hub models. Models can be deployed directly without leaving the Azure web interface.
  • Much of recent research and development effort has gone into quantization techniques to make models lighter for inference. This paper discusses scaling laws around quantization. According to this paper, if you have a 30B 8-bit model and a 60B 4-bit model, apparently, the 4-bit model has better (zero-shot) accuracy. Source.
  • Andrej Karpathy has a 500-line repo that can train and run inference on a model, including Llama2. Very impressive.
  • Retentive Network is being positioned as a successor to the Transformer architecture. It looks great on paper. The authors claim that more work needs to be done. Transformers have been successful because of their reasoning abilities and emergent properties. Hoping that future studies will explore these areas.
  • llm-toys repo has quantized models that are fine-tuned for various language tasks such as paraphrasing, changing tone, summarization, etc.


Please subscribe to the?Synthetic Thought: AI Digest?newsletter and share this with your network. Thank you!

And I encourage you to let me know your thoughts on this edition in the comments section.?

#innovation??#artificialintelligence??#technology??#news#ai?#datascience?#machinelearning?#deeplearning?#technews?#techcommunity?#aiinsights?#digitaltransformation?#techupdates?#futuretech?#subscribe?#stayinformed?#aiknowledge?#techdiscoveries?#techrevolution?#aicommunity?#techenthusiasts?#techinfluencers

要查看或添加评论,请登录

Praveen Cherukuri的更多文章

社区洞察

其他会员也浏览了