登录查看更多内容

Watch #1: About LLMs Browsing on Your Phone, Unit Tests and Code Reviews

Pascal Biese

Daily AI highlights for 60k+ experts ???? AI/ML Engineer

发布日期: 2023年9月11日

+ 关注

In this issue:

LLMs taking control of your smartphone
Never writing unit tests again (I wish)
Training LLMs to look at your code and say “LGTM!”

1. Empowering LLMs to use Smartphones for Intelligent Task Automation

Watching: AutoDroid (paper/code)

What problem does it solve? Traditional bot software is rather limited due to rule-based systems being hard-to-scale and often leads to behavior that will quickly get you flagges as bad actor - even if you aren't doing anything bad - because any excessive non-organic usage pattern will look shady to app owners.

How does it solve the problem? AutoDroid tries to solve Smartphone Task Automation with the power of LLMs. It supports cloud-based user favorites, such as GPT-3.5 and GPT-4, and also local 7B models that can be finetuned to work with specific apps. App data is fed into the LLM’s context in order to simulate memory. Just like in a chat, the model enters a dialogue with the app. Just that this “dialogue” is more complex and consists of several intermediary steps - think of two people not being able to communicate directly. The LLM acts as a guide for the Task Executor and the app sends back feedback after every action.

What’s next? There's certainly still a high barrier to entry to take advantage of this technology and the failure rate is non-trivial. But I'm excited to see smartphoine automation making big strides and personally, I’m looking forward to automating some things on my phone.

2. An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

Watching: TestPilot (paper)

Fast Company 9 个月前

Prompt Engineering, The Master Key of AI?

PeopleLogic 10 个月前

Future Beat: No coding necessary?

The National News 1 年前

What problem does it solve? Coding LLMs have been all the rage lately and reasonably so! But with coding assistants evolving, there’s a demand for assistants that can handle additional tasks, such as debugging and designing tests. In terms of unit testing, current Automated Test Generation is suffering from a lack of readability, e.g., due to crude variable naming, and assertions.

How does it solve the problem? Most previous methods utilized conventional techniques based on symbolic execution, evolutionary methods or search. LLMs are trained with human or human-like instructions, so they excel at mimicking natural language and code. This mitigates the two main problems mentioned above: lack of readability and assertions.

What’s next? Tests are often seen as binary - either they pass or they fail. But there’s more to it from the point of a developer. What if a tests fails because it’s the wrong test for your code? Which generated tests are useful and only need a little fixing? Which ones are simply bad generations? Having a more fine-grained evaluation method will be crucial to further improve the user experience.

3. LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning

Watching: LLaMA-Reviewer (paper)

What problem does it solve? While code reviewing can be an effective way to learn collaborative coding, it’s also often perceived as tedious. Current SOTA methods, such as CodeReviewer, are based on pre-trained Transformer models that take up a lot of space (~850MB) and parameters (220M). In times of GPT-4 this might not seem like much, but keep in mind that for coding, we’d ideally want to finetune the model for each of our code bases.

How does it solve the problem? LLaMA itself is way too big with its ~7B parameters. But luckily, Parameter-Efficient-Fine-Tuning (PEFT) is getting better and better. For LLaMA-Reviewer, the researchers explored Prefix-Tuning (PT) and LoRA. The latter performed significantly better - on par with the current SOTA at 26x less parameters taking up 50x less storage space.

What’s next? As this was only done with the smallest version of LLaMA and before LLaMA-2 even existed, there’s still a lot of room for quick improvements. This might be a good time to develop a consumer grade code reviewing software? Sounds like an awesome VSCode plugin (or Cursor if you’re feeling hip).

Thanks for reading LLM Watch! Subscribe for free to receive new posts and support my work - here on LinkedIn or on my substack.

Watch #1: About LLMs Browsing on Your Phone, Unit Tests and Code Reviews

Pascal Biese

Daily AI highlights for 60k+ experts ???? AI/ML Engineer

In this issue:

1. Empowering LLMs to use Smartphones for Intelligent Task Automation

2. An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

领英推荐

3. LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning

LLM Watch

47,464 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Say Hello To Gemini

Prompt Engineering: What Is It, Why It's Important, and Is It Obsolete?

GenAI Chatbot Augmentation with LLM and Video Insights

??Exploring OpenAI's Latest Models: ChatGPT o1-Mini, o1-Preview, and GPT-4o ??

How Did You Make These Bots?

Learning to Reason with LLMs - Introducing OpenAI o1

Prompt Engineering, Fine-Tuning LLMs, or RAG: Which Is Best for Your Applications?

#2 - Let ChatGPT Program Your Own Twitter Sentiment Analyzer Without Coding

OpenAI Developers Day – Many good things are coming our way

Output to Outcome: Overcoming the Limitations of Generative AI

In this issue:

1. Empowering LLMs to use Smartphones for Intelligent Task Automation

2. An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

领英推荐

3. LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning

LLM Watch

47,464 位关注者

?? Is AI Capable of Reflection?

2024年10月25日

??? GraphRAG Evolves into StructRAG

2024年10月18日

?? Fixing AI's Energy Consumption

2024年10月11日

?? Chasing o1: Closing the Reasoning Gap

2024年10月4日

?? LLMs Are Improving Themselves

2024年9月27日

?? A New Neural Architecture (Again)

2024年9月20日

?? What Next-Gen RAG Is About

2024年9月13日

?? The Next Level of CoT Prompting

2024年9月6日

?? Agents for Time Series Analysis

2024年8月30日

??? Agent-ception: When Agents Are Creating Agents

2024年8月23日

社区洞察

其他会员也浏览了

Say Hello To Gemini

Prompt Engineering: What Is It, Why It's Important, and Is It Obsolete?

GenAI Chatbot Augmentation with LLM and Video Insights

??Exploring OpenAI's Latest Models: ChatGPT o1-Mini, o1-Preview, and GPT-4o ??

How Did You Make These Bots?

Learning to Reason with LLMs - Introducing OpenAI o1

Prompt Engineering, Fine-Tuning LLMs, or RAG: Which Is Best for Your Applications?

#2 - Let ChatGPT Program Your Own Twitter Sentiment Analyzer Without Coding

OpenAI Developers Day – Many good things are coming our way

Output to Outcome: Overcoming the Limitations of Generative AI