AI News Bytes: The first Open-Source Text2video 1.7 billion parameter diffusion model; Meet Instruct-NeRF2NeRF; Memoji on Steroids.....
Asif Razzaq
AI Research Editor | CEO @ Marktechpost | 1 Million Monthly Readers and 56k+ ML Subreddit
Sponsor ??|??Join Discord ?|??Join 16K+ ML SubReddit
The first Open-Source?Text2video 1.7 billion parameter diffusion model? has been released, and you can play with it now at?HuggingFace . ModelScope is built upon the notion of “Model-as-a-Service” (MaaS). It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications. The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform model inference, training and evaluation.
Meet Instruct-NeRF2NeRF : A new?AI method ?for editing 3D scenes with Text-Instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, this method uses an image-conditioned diffusion model (InstructPix2Pix) ?to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. The research team demonstrated that the proposed method can edit large-scale, real-world scenes and accomplish more realistic, targeted edits than prior work.
Memoji on Steroids: ?This AI Model Can Reconstruct 3D Avatars from Videos. Time to meet?Vid2Avatar. ?A tool that can generate high-fidelity 3D avatars from videos captured in the wild.?Vid2Avatar learns 3D human avatars from in-the-wild videos. It does not need without need ground truth supervision, priors extracted from large datasets, or any external segmentation modules. You just give it a video of someone, and it will generate a robust 3D avatar for you.
Runway announces?Gen-2 : A?multimodal AI system ?that can generate realistic videos from the text. It's like filming something new without filming anything at all. Gen-2 offers several modes. (1) Mode 1: Text To Video. (2) Mode 2: Text + Image to Video (3) Mode 3: Image to Video (4) Mode 4: Stylization (5) Mode 5: Storyboard (6) Mode 6: Mask (7) Render (8) Mode 8: Customization.
GPT-4 Passes Medical Exams:? Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation across various domains, including medicine. A new research presents a comprehensive evaluation of GPT-4, a state-of-the-art LLM, on medical competency examinations and benchmark datasets. The study show GPT-4 exceeds on US Medical Licensing Exam by 20+ points. It is unable to find evidence of training data memorization. It outperforms LLMs fine-tuned on medical data. GPT-4 is much better at predicting likelihood answers are correct than GPT-3.
领英推荐
LLMs Can Outperform Humans on Data Annotation:? The University of Zurich researchers used 2,382 tweets to compare the performance of ChatGPT with crowd-workers and trained annotators for various annotation tasks. ChatGPT was found to outperform crowd-workers in relevance, stance, topics, and frames detection, with its zero-shot accuracy being higher than that of crowd-workers for four out of five tasks. The intercoder agreement of ChatGPT was also higher than that of both crowd-workers and trained annotators for all tasks. Additionally, the cost per annotation with ChatGPT was less than $0.003, making it twenty times cheaper than using MTurk. These findings demonstrate the potential of large language models in significantly improving the efficiency of text classification.
Meet?ALOHA : ?? ??ow-cost ??pen-source ????rdware System for Bimanual Teleoperation. With a $20k budget, it is capable of teleoperating precise tasks such as threading a zip tie, dynamic tasks such as juggling a ping pong ball, and contact-rich tasks such as assembling the chain in the NIST board #2. ALOHA has two leader & two follower arms, and syncs the joint positions from leaders to followers at 50Hz. The user teleops by simply moving the leader robots. This takes 10 lines to implement, yet intuitive and responsive anywhere within the joint limits.
Do You Know?Marktechpost? has a community of?1.5 Million+? AI Professionals and Engineers?
Sponsor ??|??Join Discord ?|??Join 16K+ ML SubReddit