The Day Trajectory Data Met Language Models
"... As I was reading through Google AI's BERT paper, I started thinking of GPS points as words, trajectories as sentences, and road networks as grammar systems. I even thought of H3 as a Spatial tokenizer to convert the space of GPS points into a finite set of tokens."
When I was interviewing with Meta back in 2020 (what best could one do during covid lockdown?) and after passing six interviews (1x screening, 2x coding, 2x system design, and 1x behavioural,) the recruiter came back with a rather weird ask: they want you to go for one last - specialised - interview focused on NLP (Natural Language Processing.)?
This was quite surprising given that I had clearly mentioned when he first reached out that I wasn’t interested in an NLP role (Sorry if I spent the last four years in the company of GPS coordinates, trajectory data, satellite images, and maps.) But then I thought this was a good opportunity to catch up with the latest advances in NLP… and just like that I committed to my seventh interview to take place in just under 48 hours!
Before my NLP friends roar at me, I agree that it was quite pretentious to think one could skim through the whole NLP field in such a short time. However, I had a good strategy and a strong believe a miracle would be there for me!
On the strategy, I decided to focus on representations. After all, if you are to build a good model for any NLP task (sentence classification, Q&A, tagging, etc.) then you’d be better off with some good representations. In 2020's interviews, tfidf and Word2Vec didn't sell for much.
So I took it first to Google where I spent an hour firing queries to make sure ‘representations’ were a good bet. I might have bookmarked a 100 blog posts and research paper links to read later (how realistic of me!) when suddenly the miracle occurred in one of the tabs which presented me with the exact content I needed, with the right hard-to-achieve balance of technical details (pas trop sucre :) ) and the most self-explainable visual illustrations ever! The title of the post read:
By Jay Alammar ! Man you were definitely my hero on that boring lockdown afternoon of Nov 2020.
I spent the next few hours devouring that post and branching out to the links it provided to cover Word2Vec, Elmo, LSTMs, Transformers, transfer learning, attention mechanism, etc.
The Link to Trajectory Data
The last piece was about BERT and how the authors used masked language models to train a stack of transformer encoders. The two tasks used to train BERT were ‘masking terms’ and ‘next sentence prediction’.
As I was reading through the details, I couldn’t stop thinking about another problem I was trying to solve with some colleagues at QCRI related to trajectory imputation (remember how the objective was to prepare for tomorrow's interview?)
In a nutshell, depending on the sampling rate used to generate trajectory data, it is possible for two successive GPS points in the same trajectory to be quite far away from each other, which makes it hard to interpolate the intermediary points i.e., to know which areas of the space did the traveling object cover.
Note that map matching won’t help you here as we assume the structure of the underlying road network (map) is unknown. This is the case in fast growing cities where the shape of the roads is frequently updated.?
领英推荐
At this point, I put my interview preparation aside and decided to read the paper in which Google AI presented BERT to the world: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. The paper was such a great read.
As I was progressing, I started thinking of trajectories as sentences, GPS points as words, and road networks as grammar systems. I even thought of h3 as a spatial tokenizer to convert the space of GPS points (words) into a finite set.
When I finished reading the paper, all I was thinking about was BERTrip, the language model I was about to create for trajectory data. This will be huge if it works as we might be able to replace graphs by ML models. ?
But first things first, I had to go back to finishing some details for my interview and the only additional model I read about was RoBERTa from Facebook. The system I was asked to design in the interview was unsurprisingly NLP intensive, so the solution I offered to build was ‘obviously’ to use BERT to finetune a downstream task for Question Answering, which I argued could help with the problem at hand (sorry if I can’t disclose more details about the interview question itself.)?
The interview went well and the whole experience left me in a good mood. So, the very next day, I went back to Jay's blog post in which he provided a link to a Google colab to play with BERT fine tuning using Google’s cloud TPU. I read through the code to understand the expected format of the input training data, then started converting some 800k trajectories into sentences to allow training BERTrip for masked gps points and next trajectory segment prediction tasks.
The main result is shown on this figure from Jan 2021.
I took a trajectory with a bunch of GPS points and removed three key points (see black circle in the left figure.) I then asked BERTrip to predict the missing tokens (gps points) between tokens #4 and #5. Repeat 15 times by including the predictions made to the input of the next iteration.?The results were then plotted in the bottom picture in green. Remember that the map is added only as an overlay to help visualise the results and wasn’t used in any way during the training/prediction processes. I stared 5 minutes at the rendered trajectory then spent an hour going over the code to make sure what I saw was real.
In my head, if a language model trained on trajectories is capable of producing such a beautiful completion for a sparse trajectory (hopefully it wasn't memorising,) then it was possible to imagine a new paradigm in which maps would be represented as Q&A models instead of graphs, in which case we'll no longer need all those expensive graph-based procedures for shortest path, route recommendation, and travel time prediction!
This idea seemed like a good opportunity for a research project, but that needed a huge investment and I was already packing for London. Luckily, my brave and bright friend and colleague Mashaal Musleh, who's doing a PhD in spatial computing, took on him the challenge to foster that shallow idea and managed to get outstanding publications in prestigious venues such as VLDB, SIGSPATIAL (SRC Winner), SIGMOD (Best Demo Award) and TSAS.
Software Engineer @ Google || Daily Content on Software Engineering & Cracking FAANG ||?Ex-Hike,?Amazon
4 个月Hi Sofiane Abbar I am a SWE at Google & Content Creator in the SWE niche and I want to discuss a potential collaboration with you, I am out of connection requests for this week Can you please send a conn. request and add "Collab" as a text in note, and then I can tell you about it
Senior Software Engineer at Qatar Computing Research Institute
5 个月I remember when you spoke about this awesome idea back in 2020.. Insightful ??
A/Head of Data-Science at AIQ | ex-SLB | ex-MIT
5 个月Insightful ! As usual with you Sofiane Abbar ??
Sr. Project & Operation Manager
5 个月Insightful
Applied AI Scientist | Academic | PhD
5 个月This was an enjoyable read.