MLConf: The Machine Learning Conference 2015
MLConf, the Machine Learning Conference, was hosted on Friday at the 230 Fifth rooftop nightclub in New York. Despite the distinctly un-geeky venue, it was nevertheless the most wonderfully nerdy day I've had yet this year.
While the execution was not flawless (e.g., the resolution of the primary presentation screen was frustratingly low, few activities were provided during lecture breaks, the vendor fair was minuscule, and no time was allotted for audience questions), the event, thanks to the high quality of its speakers, was both entertaining and highly informative.
Each speaker had a polished presentation, good pace, and was comfortable with providing real-world technical examples, diving into specific machine learning packages and snippets of code. On top of that, they thankfully kept blatant pushing of products or services to a minimum. Remarkably, given the topic area, many were even able to provide a healthy amount of humour.
Far from being comprehensive or balanced, below are my notes of the talks. My understanding is that slides and videos from each will become available from the conference page, so if some content piques your interest, the full dollop should be available there. All photos were pulled from the MLConf website.
Finding Structured Data at Scale and Scoring its Quality
Corinna Cortes, Head of Research at Google
- described the machine learning (ML) techniques developed at Google to enable Structured Snippets, the quick (and overwhelmingly accurate) facts automatically provided when you use their search engine (or any other, as Dr. Cortes noted, "self-respecting search engine")
- the data within these Snippets is cleverly harvested from a broad range of websites, and may have originally been provided in tabular or free format
- publicly-available Biperpedia provides a comprehensive ontology of values and attributes of information that could make it into Structured Snippets
- ideally want bulk of classified probabilities to be at, or near, 0 or 1; ambiguity would make for potentially irrelevant or inaccurate Snippets
- lattice regression recommended for models having up to 20 features (input variables)
You Thought What?! The Promise of Real-Time Brain Decoding
Ted Willke, Senior Principal Engineer at Intel Labs
- much to the joy of my functional MRI-scanning research background, this was the first of two talks focused on neuroscience experiments that leverage the powerful brain imaging technology
- provided fun, audience-engaging inattentional blindness demonstrations
- discussed real-time neurofeedback: to reward viewer for maintaining attention on either the face or the place in a diagram where two images (one of a face, one of a place) overlap: e.g., the more attention the viewer allocated to the face image, the more the fusiform face area (FFA) in their brain was engaged, the more the neurons in that area require oxygen, the more the fMRI scanner picks up activity in the FFA, therefore signalling software to iteratively improve the actual on-screen visibility of the face image while simultaneously decreasing visibility of the place image, all together creating a quantifiable, reinforcing positive feedback loop between a human's thoughts and a machine
- to evaluate face vs. place attention, results are cleaner if you compare (V4-FFA) to (V4-(parahippocampal place area)) than just FFA vs. PPA
- Intel Math Kernel Library apparently enables much faster computation of whole brain Pearson correlations and z-scoring by leveraging matrix operations
- the first of a number of talks to mention upcoming Xeon Phi highly-parallel processors as a pivotal tool for applying ML techniques to very large data sets such as those produced by fMRI studies
Hacking GPUs for Deep Learning
Jeff Johnson, Research Engineer at Facebook
- covered contemporary, accelerating history of deep (convolutional) neural networks, which are approaching human accuracy on some classification tasks
- best methods are still all supervised, e.g., just image categorisation or just text; unsupervised is much trickier
- deep nets are large "flop eaters", e.g., due to the serial dependency of multi-layer networks
- ...but deep nets are also small in a sense because inputs (like a 512x512 image) are broken down into a relatively small number of "important" features before downstream processing within the series of operations
- CPUs can be as fast as GPUs for deep nets, but require significantly more work to optimise to obtain that comparable speed
- like Willke before him, touted Xeon Phi processors
- drastically different data structures are required for optimising problems of drastically different data sizes
Learning Through Exploration
Alina Beygelzimer, Senior Research Scientist at Yahoo Labs
- evaluating a new machine learning system on data collected by an existing, deployed system can result in misleading results that are suboptimal, e.g., on click-through rate for ads
Graph Traversal at 30 billion edges per second with NVIDIA GPUs
Bryan Thompson, Chief Scientist and Founder at SYSTAP
- algorithms run on graphs can counterintuitively end up running more slowly as more CPU cores are added because they're a bandwidth-intensive process
- the solution is parallel processing with GPUs
- "curing cancer may be a billion edge problem"; in comparison, facebook has a more than trillion edge network to perform calculations with; "takes 20 min for them to solve" problems within; Mr. Thompson believes that with SYSTAP's technology, this 20-minute operation could be reduced to seconds
- performing any operations on disk is speed and cost efficiency suicide; using CPU is cheapest option but much slower; using GPU is intermediate in cost but fastest
- GTEPS: giga traversed edges per second
Building Machine Learning Applications with Sparkling Water
Michal Malohlava, Software Engineer, H2O.ai
- the product Dr. Mahlohlava demonstrated, Sparkling Water, combines H2O with Spark
- opined that scalable applications must be: distributed, able to process large amounts of data from different sources, easy to develop and experiment, and employ a powerful ML engine
- 500 people regularly committing code to Spark, making it Apache's most-committed-to project
- provided a live demo of Sparkling Water ML workflow to categorise email into spam or "ham"
Analytics Communication: Re-Introducing Complex Modeling
Dan Mallinger, Data Science Practice Manager, Think Big Analytics
- it is the responsibility of data scientists, not the executives they report to, to make analytics understandable and therefore confidently executable
- models should not need to be re-fit every month; underlying process that produces the data in the real world likely does not change that regularly
- the usefulness of bootstrapping samples (e.g., with the R bootstrap package), particularly when working with non-parametric distributions
- if shuffling (permuting) the values of a variable makes has no impact on its utility in a model, then it's not important (this can be evaluated easily with R caret package)
- always aim to provide confidence bands, model sensitivities and the impact of context changes to decision makers
- R sensitivity package enables further sensitivity and robustness testing
- black box, e.g., neural net, can be converted to white box using prototype selection; this can sometimes improve upon black box model fit and might be met with less skepticism by non-technical management
Mobile Network Fraud Analysis and Detection
Ilona Murynets, Senior Member of Technical Staff at AT&T Security Research Center
- discussed several types of SIM card fraud that can be perpetrated to enable international phone calls
- provided examples of features she looks at and strategies she employs to identify and eliminate these "stolen" calls, such as looking at number of SIM cards used per device
- so much data that, as with many ML applications, initial filtering or sampling of data is required
- SMS spam grows 500% annually but can be reported in the US by contacting 7726, assisting in elimination of this annoyance
- attributes of spammer accounts can be teased from normal behaviour, such as: genuine users only have geographical footprint to a few areas (e.g., concentrated to where they grew up, where they went to university, and where they work) but spammers send messages broadly across all inhabited areas
Learning About Brain: Sparse Modeling and Beyond
Irina Rish, Research Staff, IBM T.J. Watson Research Center
- the overarching premise of Dr. Rish's talk was that, while ML typically focuses on improving computational capabilities in silica, ML can also be leveraged to improve the in vivo cognitive capabilities of the human brain
- fMRI traditionally involved univariate analysis, ignoring interactions between voxels, despite the brain being a highly interactive net of cognitive modules
- use sparse modelling (e.g., LASSO, elastic net, as deployed in a genomics paper I co-authored) and predictive feature selection to identify most important voxels and voxel interactions
- elastic net provided better results than LASSO in this particular neuroimaging application
- some other "structured LASSO" techniques available ("group", "fused")
- trying out several sparse methods can yield most optimal result overall
- prior probabilities can be defined to include scan-specific information (such as that schizophrenia phenotypes may have network irregularities)
- computational psychiatry: schizophrenics, manic depressives and controls can be distinguished by analysing story text; so can drunk vs. sober, and high on MDMA vs. not
- inexpensive NeuroSky (single sensor electroencephalograph) can be used to identify when vehicle driver is not paying sufficient attention to task at hand
All the Data and Still Not Enough!
Claudia Perlich, Chief Scientist at Dstillery
- we have more and more data but little more of the data we want; the art of data science is making due with the next best data
- too much time spent optimising percentage points and not enough spent thinking creatively about how to solve problems: are there different data we need? Should we be approaching the data or problem from a different angle entirely?
- if we try to use click-through data to optimise an advertising campaign, we're in big trouble: 90% of clicks may be accidental -- our ML algorithms may simply be optimising to people with vision difficulty or a motor disorder (ha!)
- conversion rate in a retargeting campaign (e.g., serving car ads to someone who's visited an auto site) can be >10% while ordinary conversion rate is closer to 1%
- geographical data are very noisy making hyperlocal advertising challenging; using geo data from mobile phones, 30% of Americans travel at speed of sound every day and tens of thousands of people are often (stacked?) at the exact same point
Recommendation Architecture: Understanding the Components of a Personalized Recommendation System
Jeremy Schiff, Senior Manager, Data Science at OpenTable
- most A/B testing ideas end up not working, so the key to success is having engineering solutions that facilitate the easy creation and iteration of novel A/B test ideas
Value Creator | Problem Solver | Inefficiency Exploiter
9 年Great notes - thanks for sharing!