Synthetic data generation for email summarization tool testing
Currently I am working with some students on a project (let's call it tool-1), it is being designed to have the following features
If done right, it would be very attractive tool for senior executives.
The problem that we were facing was that how to test the solution?? Then a friend of mine gave an idea!? The idea was that we develop a tool (lets call it it tool-2) which takes summarized emails with the above qualities (especially the themes/projects part), then it would split them into multiple emails.??
We will then use tool-2 to generate lots of emails and test tool-1.? If we can arrive at the same/similar summary then we have a reasonable way to check the quality of tool-1.?
Shows the power of synthetic data, just like the AlphaGo team got their Machine Learning algorithm to play against a different version of the same algorithm.? This way they were able to play these algorithms against each other millions of times, coming up with moves that the grand master also did not know about.
Do reach out if you want to discuss more ideas around synthetic data generation.
Your Startup's Fractional AI CTO | Generative AI & LLM | Computer Vision | Machine Learning
4 个月Are you going to use two different LLMs for each tool to avoid bias?