AI2’s AllenNLP, Grover, and GPT-2 For Practical Content Generation
The world of natural language text generation (NLG) is getting increasingly exciting with a small community of developers and groups like Hugging Face, OpenAI, Allen Institute making huge strides every couple of months. It is incredibly accessible, especially with Hugging Face’s Transformers repository that allows a lay-man to combine the most powerful NLP platforms using a couple lines of python code in a Google Colab notebook.
Where there is tremendous potential in my opinion is with NLG-assisted writing that allows a thought leader to frame out an original idea in a structured way: Argument, supporting statement one, supporting statement two, supporting statement three, conclusion, and then use a model like GPT-2 to fill in the rest.
You can fine-tune the model by giving it a corpus of past email or other writing the thought leader has done to maintain style and tone. Or you can train it on an author like Michael Lewis (sample size may sill be small) or the transcribed podcasts of someone that has popular appeal in the chosen genre.
Here is a way to actually operationalize the technology and create an interface anyone in an organization can use. The focus here is operationalization, out of the lab and into production.
- Using Retool (shout out to David Reinfeld, David Hsu, David Dworsky) or possibly Bubble.is as a UI – Retool is great and more focused around data processing / manipulation, but it’s power comes from ability to hook into all the database structures you would want – BigQuery, Redshift, Snowflake, PostgreSQL, Sheets and allows for sophisticated API queries and authentication with consumer apps Github, GraphQL, Salesforce, Basecamp, Twilio. Most importantly is fast and flexibile to make and test something.
- Set up a new app, with form entry fields for “argument”, “supporting argument one”, “supporting argument two”, “supporting argument three”. Add a submit button that will make a POST call to the API we are going to set up.
- Then drag and drop a new text box and enter as it’s data source ={{query1.data}} or however you name the POST.
- Use Cortext.ai (Github repository here) to containerize DistilGPT-2 on AWS and launch as a Flask app which then gives you an API end-point to call from Retool (or Zapier, Integromat, IFTT, a chrome extension, Google Forms entry post – you name it). DistilGPT-2 is based on OpenAI's famous GPT-2, but is faster while retaining most of its strengths. You can check out and experiment with a number of different models from Hugging Face's "Write with Transformer" here.
- Front-end from Retool is a simple web page with form entries for Argument, Supporting Argument One, Supporting Argument Two, Supporting Argument Three, Conclusion, and submission button. Can also input settings such as length of article, type of content (blog article, white paper, tweet, etc.).
- After hitting submit, returns generated article onto the Retool canvas and can add the buttons below as options to either refine the content, send to someone else to refine, or approve it.
BUTTONS / OPTIONS:
- Edit themselves
- Hit submit again to get a brand new version
- Select the sentences they like and hit submit to generate new text based on those sentences
- Post to Asana, Slack, send as an email to someone on editing staff to clean up
FINAL STEP: From Retool, add a “Finalize” button, which when clicked will then send the finalized content on canvas as a POST that can be caught by Zapier, Integromat, or directly to WordPress via hooks and syndicate to social media with same webhook and Hootsuite or name the SMM tool of your choice.
You now have a tool and process that can take a couple of sentences, and generate a full article in a matter of minutes. It retains the same originality of the author by using their core ideas, but fills in supporting text that is “predicted” from those ideas.
As importantly, it is user-friendly and can be distributed to everyone in the organization to become content producing machines.
AllenAI's Grover - A live example
As an example of the power of the current models, I would highly recommend checking out AI2’s Grover, developed to detect fake news, can generate phenomenally convincing articles based on it’s training set of all articles from the top 5000 news sites (as indexed by Google) from 2016 to 2019. By entering in the title, website, and author, Grover then generates a full article based on the style of those authors, publication on that topic.
I gave it the below as inputs:
- Site: NewYorker.com
- Date: January 31st, 2020,
- Authors: Ian Parker, David Remnick and
- Headline: “Why the Santa Monica technology community is winning the war for talent”
Look at what it came up with up! In 30 seconds, the first time! This is basically publishable.
newyorker.com
Why the Santa Monica technology community is winning the war for talent
January 31, 2020 – Ian Parker, David Remnick
What’s up? Over the last couple of years, the Los Angeles office of the McKinsey & Company has been ranking US cities in terms of what it calls city advantages. What’s up is that Los Angeles has been bucking the trend of cities losing ground, or even regressing, in economic competitiveness, according to McKinsey. Its most recent ranking puts L.A. No. 1. Why the growth? Hardly a surprise to anyone with a stake in the L.A. economic ecosystem: A flood of new, mostly high-paying tech jobs.
The city’s digital sector (mostly companies working in content creation, commerce, and advertising) employs more than 500,000 people (in a city of more than 12 million) and, according to McKinsey, is worth $622 billion (out of a total $4.5 trillion), 3 percent of all jobs in the L.A. economy. Not long ago, the supply of workers like this was in short supply, so companies shuttled their workers to Amazon’s Seattle campus. More recently, however, Los Angeles has garnered the influx of new tech talent, thanks, in large part, to innovations that some outsiders thought silly at the time. Namely: including housing stock and amenities in rent-control laws; creating incubators and coworking spaces; building a network of local colleges; and subsidizing construction costs by using public money.
Some of this contributes to why the Bay Area is a less appealing, high-cost environment for high-skill digital jobs. In 2015, Google and Apple paid less than half the average for L.A. workers ($58,237) than they did in Oakland ($128,896). What’s up in L.A. is that it’s also getting a high return on local investment. The biggest investment of all was about $1 billion for office space on the downtown waterfront. And with that in mind, meet Three Seconds, a new coworking space in Culver City that has quickly become a hot destination. It opened in October 2017, but soon offered spaces for free to all start-ups with at least $2 million in annual revenue. (Founded by Paul Quirk and Tom Waits, the media and tech financier-turned-tech-entrepreneur has an office in New York City as well.)
Much of this success has to do with collaboration across the community, which lets newcomers have a smooth and quick connection with established talents in that space. Talk to folks like Walter Carmona, who runs audio post-production company Brutally Professional Post; Janna Carr, who runs the Tequila and Tequila Garden artisanal mezcal bar; and boardmember Ben Sender, a digital strategist at consulting firm Avenue 51, who all work together to build businesses and companies. As Quirk told me, “I’ve been working for years as an independent, freelancer tech-related consultant.” One day, he’s working for hip talent, the next for corporate clients.
Then there’s photographer Denise DeAnda, who manages to complete Instagram deals and shoots portraits of farmers and activists (with the help of her business partner, Raphael Andrews). And chefs like Alex Torres, who owns Rafael’s Gastro Mexican food truck and is cofounder of the culinary school Appetite for Innovation.
The city’s financial systems, too, have benefitted from innovations that have made it easier to establish businesses, grow money, and manage investments: Small Business Administration loans have nearly tripled since 2000, and non-mortgage consumer credit has expanded by 11 percent, according to an SBA review. Even the affordable housing market, which was notoriously scorned by Silicon Valley companies, has boomed, by McKinsey’s own reckoning, thanks to multifamily dwellings: 4.1 million units have been built between 1996 and 2014.
Business Marketing and Sales manager
2 个月???? ?? ??????! ??? ?????? ????? ???? ?????? ?????? ??? ?????? ????? ??? ????? ??????? ?????? ?????? ?????? ??????: https://chat.whatsapp.com/BubG8iFDe2bHHWkNYiboeU
Account Executive at Full Throttle Falato Leads - We can safely send over 20,000 emails and 9,000 LinkedIn Inmails per month for lead generation
3 个月Tristan, thanks for sharing! How are you?
Quantitative researcher, developer and trader
4 年The output probably requires slight touch up before publishing but it's fantastic to see it come from an ML model!
Editor In Chief at SolarPowerInvestor.com
4 年This is amazing