登录查看更多内容

MLConf: The Machine Learning Conference 2015

Jon Krohn

Co-Founder & Data Science at Nebula: Build Stellar Teams with A.I. ??

发布日期: 2015年3月30日

MLConf, the Machine Learning Conference, was hosted on Friday at the 230 Fifth rooftop nightclub in New York. Despite the distinctly un-geeky venue, it was nevertheless the most wonderfully nerdy day I've had yet this year.

While the execution was not flawless (e.g., the resolution of the primary presentation screen was frustratingly low, few activities were provided during lecture breaks, the vendor fair was minuscule, and no time was allotted for audience questions), the event, thanks to the high quality of its speakers, was both entertaining and highly informative.

Each speaker had a polished presentation, good pace, and was comfortable with providing real-world technical examples, diving into specific machine learning packages and snippets of code. On top of that, they thankfully kept blatant pushing of products or services to a minimum. Remarkably, given the topic area, many were even able to provide a healthy amount of humour.

Far from being comprehensive or balanced, below are my notes of the talks. My understanding is that slides and videos from each will become available from the conference page, so if some content piques your interest, the full dollop should be available there. All photos were pulled from the MLConf website.

Finding Structured Data at Scale and Scoring its Quality
Corinna Cortes, Head of Research at Google

- described the machine learning (ML) techniques developed at Google to enable Structured Snippets, the quick (and overwhelmingly accurate) facts automatically provided when you use their search engine (or any other, as Dr. Cortes noted, "self-respecting search engine")
- the data within these Snippets is cleverly harvested from a broad range of websites, and may have originally been provided in tabular or free format
- publicly-available Biperpedia provides a comprehensive ontology of values and attributes of information that could make it into Structured Snippets
- ideally want bulk of classified probabilities to be at, or near, 0 or 1; ambiguity would make for potentially irrelevant or inaccurate Snippets
- lattice regression recommended for models having up to 20 features (input variables)

You Thought What?! The Promise of Real-Time Brain Decoding
Ted Willke, Senior Principal Engineer at Intel Labs

- much to the joy of my functional MRI-scanning research background, this was the first of two talks focused on neuroscience experiments that leverage the powerful brain imaging technology
- provided fun, audience-engaging inattentional blindness demonstrations
- discussed real-time neurofeedback: to reward viewer for maintaining attention on either the face or the place in a diagram where two images (one of a face, one of a place) overlap: e.g., the more attention the viewer allocated to the face image, the more the fusiform face area (FFA) in their brain was engaged, the more the neurons in that area require oxygen, the more the fMRI scanner picks up activity in the FFA, therefore signalling software to iteratively improve the actual on-screen visibility of the face image while simultaneously decreasing visibility of the place image, all together creating a quantifiable, reinforcing positive feedback loop between a human's thoughts and a machine
- to evaluate face vs. place attention, results are cleaner if you compare (V4-FFA) to (V4-(parahippocampal place area)) than just FFA vs. PPA
- Intel Math Kernel Library apparently enables much faster computation of whole brain Pearson correlations and z-scoring by leveraging matrix operations
- the first of a number of talks to mention upcoming Xeon Phi highly-parallel processors as a pivotal tool for applying ML techniques to very large data sets such as those produced by fMRI studies

Hacking GPUs for Deep Learning
Jeff Johnson, Research Engineer at Facebook

- covered contemporary, accelerating history of deep (convolutional) neural networks, which are approaching human accuracy on some classification tasks
- best methods are still all supervised, e.g., just image categorisation or just text; unsupervised is much trickier
- deep nets are large "flop eaters", e.g., due to the serial dependency of multi-layer networks
- ...but deep nets are also small in a sense because inputs (like a 512x512 image) are broken down into a relatively small number of "important" features before downstream processing within the series of operations
- CPUs can be as fast as GPUs for deep nets, but require significantly more work to optimise to obtain that comparable speed
- like Willke before him, touted Xeon Phi processors
- drastically different data structures are required for optimising problems of drastically different data sizes

Learning Through Exploration
Alina Beygelzimer, Senior Research Scientist at Yahoo Labs

- evaluating a new machine learning system on data collected by an existing, deployed system can result in misleading results that are suboptimal, e.g., on click-through rate for ads

Graph Traversal at 30 billion edges per second with NVIDIA GPUs
Bryan Thompson, Chief Scientist and Founder at SYSTAP

- algorithms run on graphs can counterintuitively end up running more slowly as more CPU cores are added because they're a bandwidth-intensive process
- the solution is parallel processing with GPUs
- "curing cancer may be a billion edge problem"; in comparison, facebook has a more than trillion edge network to perform calculations with; "takes 20 min for them to solve" problems within; Mr. Thompson believes that with SYSTAP's technology, this 20-minute operation could be reduced to seconds
- performing any operations on disk is speed and cost efficiency suicide; using CPU is cheapest option but much slower; using GPU is intermediate in cost but fastest
- GTEPS: giga traversed edges per second

Building Machine Learning Applications with Sparkling Water
Michal Malohlava, Software Engineer, H2O.ai

- the product Dr. Mahlohlava demonstrated, Sparkling Water, combines H2O with Spark
- opined that scalable applications must be: distributed, able to process large amounts of data from different sources, easy to develop and experiment, and employ a powerful ML engine
- 500 people regularly committing code to Spark, making it Apache's most-committed-to project
- provided a live demo of Sparkling Water ML workflow to categorise email into spam or "ham"

Analytics Communication: Re-Introducing Complex Modeling
Dan Mallinger, Data Science Practice Manager, Think Big Analytics

- it is the responsibility of data scientists, not the executives they report to, to make analytics understandable and therefore confidently executable
- models should not need to be re-fit every month; underlying process that produces the data in the real world likely does not change that regularly
- the usefulness of bootstrapping samples (e.g., with the R bootstrap package), particularly when working with non-parametric distributions
- if shuffling (permuting) the values of a variable makes has no impact on its utility in a model, then it's not important (this can be evaluated easily with R caret package)
- always aim to provide confidence bands, model sensitivities and the impact of context changes to decision makers
- R sensitivity package enables further sensitivity and robustness testing
- black box, e.g., neural net, can be converted to white box using prototype selection; this can sometimes improve upon black box model fit and might be met with less skepticism by non-technical management

Mobile Network Fraud Analysis and Detection
Ilona Murynets, Senior Member of Technical Staff at AT&T Security Research Center

- discussed several types of SIM card fraud that can be perpetrated to enable international phone calls
- provided examples of features she looks at and strategies she employs to identify and eliminate these "stolen" calls, such as looking at number of SIM cards used per device
- so much data that, as with many ML applications, initial filtering or sampling of data is required
- SMS spam grows 500% annually but can be reported in the US by contacting 7726, assisting in elimination of this annoyance
- attributes of spammer accounts can be teased from normal behaviour, such as: genuine users only have geographical footprint to a few areas (e.g., concentrated to where they grew up, where they went to university, and where they work) but spammers send messages broadly across all inhabited areas

Learning About Brain: Sparse Modeling and Beyond
Irina Rish, Research Staff, IBM T.J. Watson Research Center

- the overarching premise of Dr. Rish's talk was that, while ML typically focuses on improving computational capabilities in silica, ML can also be leveraged to improve the in vivo cognitive capabilities of the human brain
- fMRI traditionally involved univariate analysis, ignoring interactions between voxels, despite the brain being a highly interactive net of cognitive modules
- use sparse modelling (e.g., LASSO, elastic net, as deployed in a genomics paper I co-authored) and predictive feature selection to identify most important voxels and voxel interactions
- elastic net provided better results than LASSO in this particular neuroimaging application
- some other "structured LASSO" techniques available ("group", "fused")
- trying out several sparse methods can yield most optimal result overall
- prior probabilities can be defined to include scan-specific information (such as that schizophrenia phenotypes may have network irregularities)
- computational psychiatry: schizophrenics, manic depressives and controls can be distinguished by analysing story text; so can drunk vs. sober, and high on MDMA vs. not
- inexpensive NeuroSky (single sensor electroencephalograph) can be used to identify when vehicle driver is not paying sufficient attention to task at hand

All the Data and Still Not Enough!
Claudia Perlich, Chief Scientist at Dstillery

- we have more and more data but little more of the data we want; the art of data science is making due with the next best data
- too much time spent optimising percentage points and not enough spent thinking creatively about how to solve problems: are there different data we need? Should we be approaching the data or problem from a different angle entirely?
- if we try to use click-through data to optimise an advertising campaign, we're in big trouble: 90% of clicks may be accidental -- our ML algorithms may simply be optimising to people with vision difficulty or a motor disorder (ha!)
- conversion rate in a retargeting campaign (e.g., serving car ads to someone who's visited an auto site) can be >10% while ordinary conversion rate is closer to 1%
- geographical data are very noisy making hyperlocal advertising challenging; using geo data from mobile phones, 30% of Americans travel at speed of sound every day and tens of thousands of people are often (stacked?) at the exact same point

Recommendation Architecture: Understanding the Components of a Personalized Recommendation System
Jeremy Schiff, Senior Manager, Data Science at OpenTable

- most A/B testing ideas end up not working, so the key to success is having engineering solutions that facilitate the easy creation and iteration of novel A/B test ideas

Tony DiLoreto

Value Creator | Problem Solver | Inefficiency Exploiter

9 年

Great notes - thanks for sharing!

要查看或添加评论，请登录

Jon Krohn的更多文章

A Data Science Approach to Maximizing Data Scientist Salary

2016年4月20日

A Data Science Approach to Maximizing Data Scientist Salary

I recently had the honor of speaking to the New York chapter of Women in Machine Learning and Data Science about the…

1 条评论
How to Transition from Academia to Data Science

2015年10月18日

How to Transition from Academia to Data Science

On Friday, October 9th, I had the honor of sitting on a great panel to discuss how academic scientists can transition…

3 条评论
"What is Code?"

2015年8月9日

"What is Code?"

Recently, the number of cell phones on the planet exceeded the number of humans, penetrating markets that include even…
The First Self-Aware Machines

2015年8月3日

The First Self-Aware Machines

A robot has demonstrated self-awareness. Is Homo sapiens now on the verge of extinction? Big names like Hawking, Musk…
Genomics: Imminent, Positive, Disruptive Tech

2015年7月16日

Genomics: Imminent, Positive, Disruptive Tech

The complexity of life is beautiful; the biology of the human body alone is astounding. We have cells that form hard…

2 条评论
The Neurophysiology of Happiness

2015年4月12日

The Neurophysiology of Happiness

As recently as two decades ago, academic psychologists predominantly took for granted that baseline happiness in adults…

2 条评论
Your Future Self's Aspirations for You

2015年3月18日

Your Future Self's Aspirations for You

My brilliant friend, machine vision engineer Alex Flint, early this morning offered an anthropological thought…

2 条评论
What Consumers Want: It's Immaterial

2015年3月7日

What Consumers Want: It's Immaterial

Perhaps the most fundamental tenet of a business is anticipating what your consumer desires. Convincing prospective…
Zero to One: Peter Thiel's Call for Innovation

2015年3月2日

Zero to One: Peter Thiel's Call for Innovation

Zero to One, published in September, is perhaps the most fascinating book I've ever read. It has certainly had a…

3 条评论
Shiny: Render R Plots into Interactive HTML

2015年2月23日

Shiny: Render R Plots into Interactive HTML

Following strong, long-standing recommendations from friends (Chris Meaney) and colleagues (Anna Nicanorova), I…

1 条评论

See all articles

MLConf: The Machine Learning Conference 2015

Jon Krohn

Co-Founder & Data Science at Nebula: Build Stellar Teams with A.I. ??

Jon Krohn的更多文章

社区洞察

其他会员也浏览了

Introducing YOLO-NAS, An Object Detection Foundation Model Generated by Deci's NAS Engine

The Tensor Processing Unit: Google's New AI Supercomputer for Faster Machine Learning

Brain-on-a-chip: How does it work?

Who needs muscles when you have brains?

GULF COAST 2ND ANNUAL ARTIFICIAL INTELLIGENCE CONFERENCE

Machine Learning Chips Market Study: An Emerging Hint of Opportunity

Adversarial Examples Are Not Bugs, They Are Features

Cost-effective Machine Learning Inference Offload for Edge Computing

Video. Dual Multi-Kalman Filter. Forecasting the Past, the Present, and the Future of Dynamic Systems. Deep Learning Linking Physical Laws & Data

Unlocking the Potential of Dual-Encoders for Extreme Multi-Label Classification

Jon Krohn的更多文章

A Data Science Approach to Maximizing Data Scientist Salary

How to Transition from Academia to Data Science

"What is Code?"

The First Self-Aware Machines

Genomics: Imminent, Positive, Disruptive Tech

The Neurophysiology of Happiness

Your Future Self's Aspirations for You

What Consumers Want: It's Immaterial

Zero to One: Peter Thiel's Call for Innovation

Shiny: Render R Plots into Interactive HTML

社区洞察

其他会员也浏览了

Introducing YOLO-NAS, An Object Detection Foundation Model Generated by Deci's NAS Engine

The Tensor Processing Unit: Google's New AI Supercomputer for Faster Machine Learning

Brain-on-a-chip: How does it work?

Who needs muscles when you have brains?

GULF COAST 2ND ANNUAL ARTIFICIAL INTELLIGENCE CONFERENCE

Machine Learning Chips Market Study: An Emerging Hint of Opportunity

Adversarial Examples Are Not Bugs, They Are Features

Cost-effective Machine Learning Inference Offload for Edge Computing

Video. Dual Multi-Kalman Filter. Forecasting the Past, the Present, and the Future of Dynamic Systems. Deep Learning Linking Physical Laws & Data

Unlocking the Potential of Dual-Encoders for Extreme Multi-Label Classification