Europython 2018 - Day 4
Fourth day of the conference. Fatigue starts to take its toll, with some people taking naps where they find a quiet place in the conference center! Here are my daily take-outs
Deep Learning with PyTorch for Fun and Profit
Alexander Hendorf is a senior developper. After attending a presentation about AI in arts, he has decided to play with the idea. He chose PyTorch as a development framework because of:
- "research friendliness"
- accessibility to "Pythonistas"
- presence of a lively community
- support by Facebook
He started with applying "style transfer" between comics, or comics and photographs. The results he shown were impressive.
He then started a project around "die drei", a teenager's audio-series, which have 200+ audio+transcripts available. He set to create new "die drei"-like episodes.
He first tried to create some realistic texts based on the style of the actual texts. His results were, at best, mixed. He then voice synthesis. Same story: the results were not of commercial quality. Whe he tried to create illustrations, his results were much better. He has not tried to create a plot yet, but sees this as a difficult exercise.
His remarks:
- do not hesitate to "play" with a toy project
- beware of cherry picking: usually, only the best results are shown. There can be a lot of chaff, which is rarely shown.
- there is a lot of material available on the Internet. Code quality is "not uniform": still a lot of Python 2.7 code (which will be retired end 2019), closures, ...
- Beware of hype
When to use Machine Learning: Tips, Tricks and Warnings
Pascal van Kooten is a senior datascientist working for Jibes Data Analytics. Over 5 years, he has contributed to a dozen projects in as many companies.
To him, machine learning is a subset of Artificial Intelligence. The objective is to generalise from observations.
He explained some projects, mostly personal, and drew the following conclusions:
- simpler is better than complex
- analyse before starting to apply machine learning
- machine learning is probably not very suitable in environments with strict rules/strong compliance requirements
- When you start using machine learning:
- Build a domain-specific platform
- Don't start with the most complex problems
- Do not overoptimise
- Do not underestimate the work to be done outside the models proper: data preparation, results exploitation
- Use cross-validation and anomaly detection
- The models you build must be able to go to production.
Alisa Dammer works as a developper for Joblift. She has presented a toy project: predicting the sleep cycle of a typical IT student. One of her friends has recorded his sleep cycle (about 1.000 observations), and she has tried to model the data.
Considering the dataset as a one-dimensional time series, she has had little succes: about 30% accuracy. She has been able to improve the score with feature engineering: she has added some categorical variables linked to the time, like day/night, season, meal times, and finally academic calendar (exams, holidays). She finally reached 95 % accuracy.
Her conclusions:
- feature engineering is important for small datasets/low dimensionality
- feature engineering is complex, time consuming and requires domain-specific expertise.
- she has tested different kinds of models, and it appeared that feature engineering had more impact on the end result than the choice of the model.
More Than You Ever Wanted To Know About Python Functions
Mark Smith works as developer advocate for Nexmo. He guided us in to the sutleties of funtion definitions, closures and methods. The talk was full of practical tips, but difficult to summarise. His slides and code will be made available.
Heid Thorpe works as a Datascientist for the Australian government.
This is probably the strangest talk I have seen so far. She has explained how she "invents" data with a deep learning (LSTM) system. To me, with a scientific background, inventing data is a capital sin. but, in fact, her objective is to invent fake, but realistic test data. This can be used to test applications, but also to test security. Testing, and creating test datasets are boring activities, which deserve to be automated.
She uses an LSTM neural network. She needs some example of "real" data and, from there, she can create huge test datasets, which can be fed into automated test systems.
She first fed Shakespeare sonnets into the system and, fater a few epochs, the system gererated realistic (although meaningless) texts. She then showed how she coud use it to generate the XML part of .docx files.