SpacyIRL 2019 Conference in Overview
General Impression
This is the first time we have a spaCy dedicated conference and from my perspective as well as general sentiment of the crowd it seems like everyone loved it. Even when most of the talks concentrated on spaCy, there was still plenty to see and learn even if you've never used spaCy before. This conference was, of course, for Natural Language Processing (NLP) enthusiasts and at that, it excelled - all talks were filled with great hands-on NLP content as well as stories about how NLP is used within various organizations.
In this overview, I want to go over each of the talks and give some of my thoughts on what was discussed, in some places also explain the more technical content and also extend my commentary with further reading for you to grow your knowledge even more. Can't wait to re-watch all of the talks when they are available online. For those who have attended the conference, this overview should give a lot of additional content to look into and for those who are waiting for the talks to go live, I hope I can spark your interest in one talk or the other.
The Amazing NLP Talks
"Transfer Learning in Open-Source Natural Language Processing" by Sebastian Ruder
Speaker Profile
>>> Name: Sebastian Ruder
>>> Occupation: Research Scientist at DeepMind (https://deepmind.com)
>>> Website: http://ruder.io
>>> Runs an NLP Newsletter at http://ruder.io/nlp-news/
>>> Twitter: https://twitter.com/seb_ruder
Slides
>>> Slides are not yet available online.
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
As the first talk, we have Sebastian Ruder with "Transfer Learning in Open-Source Natural Language Processing". Sebastian summarized the current state of affairs in transfer learning, which now is one of the main ways of attaining the best results for various NLP tasks. Namely, by using models that have already been pre-trained on a large dataset, usually using a powerful GPU cluster. Of course, we are talking about BERT and similar approaches, the outcome of which is most of the time a pre-trained language model. Right now, general BERT models are available, but pre-training them on your specific domain, for example, is not an easy task due to the computing power needed. Also, there are some environmental considerations to take into account given the enormous carbon footprint that using that computing power for pre-training entails (a reference to the recent paper on "Energy and Policy Considerations for Deep Learning in NLP".)
What can we, the NLP practitioners who generally don't have access to the big and powerful cluster of GPUs, do? A step in the right direction for our community is the creation of Hubs, for now, we are talking about the Tensorflow Hub and the PyTorch Hub, which is a great way to store all pre-trained models for everyone to access. For example, a pre-trained version of BERT from Hugging Face can be found on the PyTorch Hub here (a more detailed repository of pre-trained BERT models can be found here). In addition to releasing models, we can also release checkpoints of the model to enable others to continue training weights from those checkpoints. It works similarly to an approach in Computer Vision where we can freeze a certain layer of ResNet or VGG-19 models and continue training from there. Of course, Hubs have their own shortcomings: the models are a black box that we usually can't change ourselves, the models could have been trained on datasets that promote some form of bias or can even lead to a leak of some private information. Hubs are definitely a good start since the need for sharing our pre-trained models will grow.
Further reading and links:
1. NAACL Presentation on Transfer Learning for NLP
(by Sebastian Ruder, Thomas Wolf, Matthew Peters and Swabha Swayamdipta)
Slides: http://tiny.cc/NAACLTransfer
Colab: http://tiny.cc/NAACLTransferColab
Code: http://tiny.cc/NAACLTransferCode
2. Awesome-BERT, a list of resources related to BERT:
https://github.com/Jiakui/awesome-bert
3. This talk was mostly about BERT, but we also have a new model that is poised
to overshadow BERT, and it's XLNet. Read up more on it here:
https://arxiv.org/abs/1906.08237
And Thomas Wolf from Hugging Face has already tweeted about including XLNet
in the next release. Follow him for more info at: https://twitter.com/Thom_Wolf
Sebastian also advertised the next EurNLP Conference, something you
should definitely not miss: https://www.eurnlp.org
" 'So when will spaCy support BERT?' Improving sparse transformer models for efficient self-attention" by Giannis Dara
Speaker Profile
>>> Name: Giannis Dara
>>> Occupation: Computer Engineer at Explosion AI (https://explosion.ai)
>>> Twitter: https://twitter.com/giannis_daras
Slides
>>> https://github.com/giannisdaras/spaCyIRL_slides
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
The task Giannis is trying to solve is making BERT faster through changing the internal Transformer structure, namely making the Transformer sparse. Attaining this goal would allow adding BERT pre-training functionality to spaCy. Giannis gave a brief overview of how attention works within the Transformer architecture before explaining how we could leverage sparsity to improve computation time of BERT.
If you want to brush up on your understanding of how Transformer and Self-attention work before we talk about the main points of the presentation, some great resources include The Annotated Transformer and The Illustrated Transformer, or if you prefer a video format you could watch a presentation I gave on how the Transformer works called Understanding and Applying Self-attention for NLP or a presentation by one of the authors of the original Self-Attention paper called Attention is All You Need (same name as the paper).
As described in the Google AI Blog and the original Attention is All You Need paper, the main approach to computing attention in the encoder is to compare each word at the current position n to each of the other words in the sentence. Exactly this part is what is making the Transformer so hard to train, because, well, comparing everything to everything is never giving you a good time complexity in return.
Some of the approaches to solving this problem Giannis has looked into were to apply attention in discrete steps and use closure (as I understood it, to group tokens into bins of a certain window length). Giannis gave an extensive overview of his experiments on applying Information Flow Graphs and Strided Patterns to help each attention head learn information in a more sparse way by introducing new masking techniques to help the heads decide which words to attend to.
I think this talk should be definitely experienced in full when the recording comes out, and I know I will go over it a couple of times to get a better understanding of the proposed techniques.
Further reading and links:
1. Approaches to pruning attention heads in the Transformer architecture:
https://github.com/lena-voita/the-story-of-heads
2. fast-bert library: https://github.com/kaushaltrivedi/fast-bert
"Applied NLP: Lessons from the Field" by Peter Baumgartner
Speaker Profile
>>> Name: Peter Baumgartner
>>> Occupation: Data Scientist at RTI international (https://www.rti.org)
>>> Twitter: https://twitter.com/pmbaumgartner
>>> Website: https://pmbaumgartner.github.io/
Slides
>>> Slides:
>>> https://tinyurl.com/y524oj3w
>>> Summary of his talk from Peter himself is available at:
https://pmbaumgartner.github.io/blog/applied-nlp-lessons/
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
Peter gave a great talk on how he deals with customers who want to use A.I. in their company and how some of those interactions are quite challenging and funny in some ways. I really encourage you to read his summary of the talk, it will give you a very nice breakdown of what kinds of clients you can expect to see when working in our field and how to deal with them in an effective way. From my side, a particular point that Peter made really stood out, it's the idea that when you have to use machine learning or anything related to it in your project, the project's outcome becomes probabilistic and not deterministic, meaning that there is always a possibility for it to fail. In this case, you are responsible for communicating this either to your product manager or, when applicable, directly to the client, in order to prepare them for such an outcome. In short, putting A.I. into your project amplifies the need for expectation management.
Peter also gave a nice comparison between NLP researchers and NLP practitioners (Applied NLP vs Research NLP). In short, NLP researchers have many opportunities to share their findings in conferences through papers, posters, and presentations, while the immense chunk of the knowledge of how those findings are actually applied in the industry gets lost. Peter encourages everyone who uses NLP in production to share their knowledge by blogging. Everyone can start with a short blog to share something interesting they worked on recently.
Some recommendations from Peter:
1. Data is Plural (https://tinyletter.com/data-is-plural), a newsletter
that sends you information about new datasets.
2. List of all kinds of datasets from the Newsletter above:
https://tinyurl.com/newsletter-datasets
"Lessons learned in helping developers ship conversational AI assistants to production" by Justina Petraityt?
Speaker Profile
>>> Name: Justina Petraityt?
>>> Occupation: Data Scientist and Head of Developer Relations
>>> at Rasa (https://rasa.com/)
>>> Twitter: https://twitter.com/juste_petr
Slides
>>> Slides are not yet available online.
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
Justina talked about what problems data scientists who work on conversational AI face and how a lot of them can already be alleviated by using Rasa, a machine learning framework that automates various steps in building chatbots. She talked about how you can already extract entities and intent using Rasa and how they make use of supervised word embeddings to solve these issues. For inspiration on how to work with supervised word embeddings, she recommended looking into WSABIE and StarSpace papers. If you work on conversation AI give Rasa a try and also definitely watch the full presentation when it is available.
"The missing elements in NLP" by Yoav Goldberg
Speaker Profile
>>> Name: Yoav Goldberg
>>> Occupation: Research Scientist at Allen NLP (https://allennlp.org/)
>>> Website: https://www.cs.bgu.ac.il/~yoavg/uni/
Slides
>>> Slides are not yet available online.
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
Yoav gave a great overview of the differences between using NLP in the industry versus the current state of NLP in research. Why do people in the industry still use rule-based approaches and regular expressions? Well, because all of those things are very fast compared to training a new language model and keeping it up to date, also current A.I. approaches are not interpretable, whereas you can always explain a certain output with a rule you wrote in your rule-based system.
Modern NLP approaches also often produce meaningless and wrong results for some very easy cases while being able to deal with super complex ones, which leaves them to be rather inconsistent and gives an impression to the data scientist who works on them that they have no control over them. Yoav also gave a lot of great examples where NLP solutions are still lagging behind. In particular the problem of fused numeric heads on which his team is currently working. For example, in sentences like "She just turned 50" and "I will give you 50 for that.", the numeric value of 50 have different contexts which we humans can easily understand (50 years and 50 dollars) but the current NLP solutions are still far from solving this problem entirely. I am very excited to see what comes out of the research that his team does to solve this problem.
Yoav also talked about how we should start thinking about fusing the two approaches of writing rules and building models into one that will essentially supplement each other. He also introduced a tool his team is working on called "spike" that will in the future help us brings both of the two worlds closer together.
"Entity linking functionality in spaCy: grounding textual mentions to knowledge base concepts" by Sofie Van Landeghem
Speaker Profile
>>> Name: Sofie Van Landeghem
>>> Occupation: Freelance Software Engineer
>>> Founder of: OxyKodit (http://www.oxykodit.com/)
>>> Twitter: https://twitter.com/OxyKodit
Slides
>>> https://tinyurl.com/entity-linking-spacy
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
Currently, the only input that spaCy takes into account is the text itself when it is analyzing it, Sofie wants to extend this for spaCy to be able to connect to a Knowledge Base (KB) of your own in order to do Entity Linking more efficiently and to give you more control over this process. The general idea is to first use named entity recognition to find the entity alias and then to select the entity class from the knowledge base. However, there are many cases where entity definitions vary based on their context so Sofie wants to explore how we could also use the context to generate probability values for potential entities.
Generally, when this functionality is available in spaCy, you can plug in any type of KB. For now, all the experiments on dealing with context were done on Wikipedia. Based on the evaluation of around 1.1 million entities, Sofie was able to get up to 79.0% Accuracy using the context probabilities. Definitely very impressive work and I personally can't wait to have this functionality available. If you are interested in more technical details of the implementation there is a first Pull Request to spaCy available.
"Rethinking rule-based lemmatization" by Guadalupe Romero
Speaker Profile
>>> Name: Guadalupe Romero
>>> Occupation: Research Assistant at Explostion AI (https://explosion.ai/)
Slides
>>> https://tinyurl.com/lemmatization-spacy
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
Guadalupe walked us through the existing English lemmatizer in spaCy and outlined her plans for improving the Spanish and German lemmatizers since they are currently only based on a dictionary lookup for the most part. The idea is to extend the lemmatizers to use universal dependency morphological schemas which then can be extended for each language separately with specific cases. Originally Guadalupe worked on creating a Spanish lemmatizer using Deep Neural Networks, but the runtime of that was too slow and that is the main reason she is proposing to build lemmatizers with morphological rules in mind. There are, however, many details in the implementation that still need to be worked on. For example, in Spanish, there are about 1.000 morphological rules for lemmatization based on what context the word is used in and what part-of-speech tag it gets. A solution for that would be to cluster similar rules into various collections of rules. You can see an overview of the whole pipeline for lemmatization within spaCy below.
I think the work Guadalupe is currently doing to improve lemmatizers within spaCy is great and really needed by the community. Hope to see the changes live soon.
"ScispaCy: A full spaCy pipeline and models for scientific and biomedical text" by Mark Neumann
Speaker Profile
>>> Name: Mark Neumann
>>> Occupation: NLP & ML Research Engineer at Allen AI (https://allenai.org)
>>> Website: http://markneumann.xyz
>>> Twitter: https://twitter.com/MarkNeumannnn
Slides
>>> https://tinyurl.com/scispacy-talk
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
Next, we have a great talk about scispaCy, a library that adds additional models and pipeline steps to spaCy that are specific to the biomedical field. You may ask, why not simply use spaCy for this? Well, according to Mark and his team, the existing spaCy models were trained mainly on web content and for are not suitable for a specific kind of language and formulations that are typically used in the biomedical field.
Existing spaCy models often produced wrong POS, wrong dependency parse trees, the model was also rather confused on what should now be tagged as a named entity. So re-training spaCy models on biomedical data was the only way to achieve good performance. Additionally, if you mix in a bit of web data into the biomed dataset used for training you can still use scispaCy models to process non-biomedical domain text without too much of a loss in accuracy. Mark also compared scispaCy to existing software solutions used to process biomedical text, many of which have a great number of features but ultimately have a huge flaw: they are incredibly slow at processing data compared to scispaCy. Of course, not all features already available in other tools are fully ported to scispaCy but Mark and his team are hard at work on extending the library. For instance, he showed how he managed to add Entity Linking for almost 2.4 million existing entities in the biomedical knowledge bases (UMLS KB) through clustering all similar/mostly the same terms into bigger groups and extracting them from a produced ranked list of candidates.
Further reading and links:
1. BioBERT library: https://github.com/naver/biobert-pretrained
2. sciBERT and bioBERT compared: https://towardsdatascience.com/how-to-apply-bert-in-scientific-domain-2d9db0480bd9
3. Semantic Scholar to search for papers: https://www.semanticscholar.org
"Financial NLP at S&P Global" by Patrick Harrison
Speaker Profile
>>> Name: Patrick Harrison
>>> Occupation: Director of Artificial Intelligence Engineering at S&P Global
>>> (https://www.spglobal.com/en/)
Slides
>>> Slides and video recording not yet available online.
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
Patrick showed us a lot of inner workings of a big corporation that is S&P Global and how NLP fits into their whole data infrastructure. From my own experience, I know that Precision is very important to the client, but at S&P, where Patrick works, they are mandated to provide 100% Precision with 100% Recall on the financial data they collect and analyze for millions of companies in the world. That is not a very easy undertaking, so Patrick's team uses state-of-the-art NLP techniques to help them extract as much data as possible and then pass on their generated data onto an army of human annotators. This is then a never-ending circle of producing suggestions for the annotators, annotators improving the quality and then the models re-training on new training data from annotators.
Patrick also talked about the fact that their company has a ton of amazing datasets in the financial domain and how much fun it is to work on them. Patrick also promised that if anyone finds any errors or inconsistencies in the datasets they produce, if you report them you will get a 50$ reward for each of them.
"NLP in Asset Management" by McKenzie Marshall
Speaker Profile
>>> Name: McKenzie Marshall
>>> Occupation: Data Scientist at Barings (https://www.barings.com)
Slides
>>> Slides are not yet available online.
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
In this talk, McKenzie talked about how her small team worked on applying NLP within their active financial management system. She used the prodigy annotation tool to create a dataset for Named Entity Recognition model that was very specific to their use-case. In particular, her team needed to find all mentions of various companies and to get the general sentiment of the text they were used in. However, she encountered a very specific type of a Word Sense Disambiguation problem were some company names are actually referring to a product of the company (like in the case of "Twitter" or "Apple iPhone"), so the model they used had to learn to differentiate those cases based on the sentence and document context.
Additionally, McKenzie mentioned that when the clients generally didn't understand the idea of sentiment scores, sentiment polarity or even the notion of a neutral sentiment, so, in the end, they were just happy to get two labels: "Positive" and "Negative". I wish all clients were that easy to please.
"spaCy in the News: Quartz's NLP pipeline" by David Dodson
Speaker Profile
>>> Name: David Dodson
>>> Occupation: Researcher at Quartz (https://qz.com/)
Slides
>>> Slides are not yet available online.
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
David had a lot of interesting stories to tell from his career in the NLP field. From the time he worked on the team that had to transcribe live speech of conference presenters, send it to a stenographer, process it using spaCy, store all extracted information and then make it all available through a chatbot, or when he worked at a public library doing semantic search over scanned books. Currently, he works at Quartz and he showed us some of the inner workings of the data analysis pipelines used in the company. I particularly, like the idea that Quartz has a centralized topic graph that is updated in real-time. It tracks all interesting named entities that pop up over the Internet and defines dependencies graphs between all of those tokens. A really great use case for this graph is to constantly uncover new emerging entities and based on those re-train the NLP models to include them. The example David gave was that they originally had 3G and 4G news articles automatically tracked, and when 5G came around and generated enough buzz on the Internet, their topic graph picked it up automatically and passed on training examples that included articles on 5G to subsequent NLP pipelines. A really interesting approach and a great talk from David.
"Closing: spaCy and Explosion, present and future" by Matthew Honnibal & Ines Montani
Speaker Profile 1.
>>> Name: Matthew Honnibal
>>> Founder of Explosion AI (https://explosion.ai/)
>>> Twitter: https://twitter.com/honnibal
Speaker Profile 2.
>>> Name: Ines Montani
>>> Occupation: Founder of Explosion AI (https://explosion.ai/)
>>> twitter: https://twitter.com/_inesmontani
Slides
>>> Slides are not yet available online.
Video recording:
>>> https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc
The last talk was from the spaCy founders who walked us through their journey with spaCy so far and gave us a sneak peek into the future of spaCy. I hope we can see all the changes that were discussed implemented soon and I am also glad that the team is hard at work on making it possible to customize the existing pipeline steps and the given models in more possible ways to give as much power and control as possible to the NLP specialist who uses spaCy.
Overall an amazing conference, thank you to Matthew Honnibal and Ines Montani, as well as, everyone else involved in the conference preparation, for a great time. Also big thanks to the speakers, without you such great event wouldn't happen!
Thank you for reading and see you next time!
P.S. Huge thanks to TrustYou, my current employer, for believing in personal and professional development and always eagerly financing conference attendance for their employees, including yours truly.
* About the Author *
Name: Ivan Bilan
Educational Background: Computational Linguistics, Computer Science,
Applied Linguistics, Business and Management
Occupational Background: Data Scientist, Data Engineer
Research Interests: Machine Translation, Relation Extraction,
Opinion Mining, Self-attention / Transformer and more
Other social media on which you can also follow me:
Twitter: https://twitter.com/DemiourgosUA
Github: https://github.com/ivan-bilan
Medium: https://medium.com/@demiourgosua
~
5 年What a great summary Ivan. Thanks a bunch.
very interesting for all those who did not have the chance to attend
Area Tech Lead, Computational Linguist at Grammarly
5 年Thank you! Great?summary!
Senior Machine Learning Engineer
5 年Great summary.?
Manager, Machine Learning
5 年Great summary, thanks for this. Do you remember what "non-entity span tagging" is in the slides of "spacy 3.0" ?