Solving the ML App interoperability issue with Deep Learning Frameworks - 2/2
In part 1 of the blog, the need for App interoperability, offing in the market today and their short comings were discussed. In the reminder of this blog, I will argue that Deep Learning Frameworks (such as TensorFlow and its higher level abstraction Keras) provide an alternative. Before getting into the details, let us state the larger goals - what that proposed solution needs to offer?
- separate data and data flows from model spec
- human readability
- scalability
- low barrier to translation
- low barrier to adoption
We also need to look at the problem, not just from a technology PoV (schema, frameworks, readers, writers, languages, etc..), but also from a math/science PoV and even from from market trends. We raise the following questions:
- Is there an oracle of sorts, like a universal algorithm, that subsumes all other algorithms?
- Is there an engineering framework that can realize this?
- Is there enough business interest in it to make it commercially viable?
The answer is an obvious no (largely due to 1 above), but it allows us to search for a larger class of models, and a corresponding enabler. We know we can not find an exact solution, but is there an approximate solution. Deep Learning is a strong contender.
Deep Learning is a super algorithm. Its expressional, representational power comes from its modular lego block architecture. Essentially, we are going to see Modeling (elicitation and inference) not as a monolithic block, but as a coming together of lot reusable, atomic building blocks : layers, activations, losses, regularization, initialization, optimization, etc..., are different blocks that each come with a set of options. Collective number of modeling possibilities is enormous, as a result. Sampling Models/Algorithms that can be expressed via DNNs (unless otherwise mentioned, equivalence may not be exact but in principle) shows the possibility landscape:
- Many regression and classification models, exactly (with different loss functions)
- SVMs and Kernel machines (with combination of active functions, layer freezing, loss functions)
- Matrix Factorizations and other bilinear regressions (via graph compositions and merging)
- non-linear State Space Models (RNNs)
- Markov Random Fields, Graphical Models (RBMs)
- Data Reduction Techniques (Auto Encoders)
- Variational Bayes' and Deep Generative Models
- Many more
Same DNN framework can handle variety of data -
- multiple inputs - multiple outputs
- variable input size and variable output size
- tensors, text, speech, image, video and their combinations
Let us also understand the limitations. While not every Algorithm may not have its representation via DNN, say, my bet is that 80% of the useful model landscape can be addressed with one DNN framework. A popular misconception is that DNNs need lot of data. DNNs can work with both small and big data. Me, with Prof. Lord and Dr. Ali, are working on some exciting problems in this area. Quite a bit work needs to be done to address such gaps. There is lot more (to be elaborated in subsequent blogs) but let us move to the next point.
Deep Learning Frameworks are an engineering reality. Google released TensorFlow, FaceBook PyTorch, and many other established labs have released their tools (Theano, MXNet, Caffe). That means that, Deep Learning can become commoditized sooner than later.
In more concrete terms, let us take Keras with TensorFlow backend as a reference and see how it realizes the stated goals.
- Keras is an abstraction on the top of TensforFlow and Theano, allowing rapid experimentation without getting too much into atomic computations. Keras'a author Francois Cholet announced support for MXNet as a backend, last year. MXNet is in incubation at Apache, and Google is backing Keras. It lowers the barrier to adoption.
- Keras models can be saved into different formats - json and yml (even better). It is one shot - many birds. Models are readable by both machines and humans alike. Due to DNNs expressiveness, many models can be subsumed. Many models - single exchange format.
- Content and Form are separated. DNN weights (which can be long, and boring) are saved separately from the structure. I like to inspect the structure first and investigate details later. Tools like TensorBoard, though a lot is left desired from it, are providing help in that direction. DataScientist would be happy.
- DNNs are fundamentally computational graphs. It means that all models within the DNN framework can be stored as graphs (as a data structure). A rich, well-established, well-studied, mature data structure like graph brings lots of benefits. For example, I can immediately see whether a model has a valid syntax (a graph is DAG or not, essentially, programmatically). Model has rich data structure, not an arbitrary imposed syntax.
- TensforFlow can work against any infrastructure - cpu, gpu, gpu-farms, cpu-farms, and even in embedded systems. Essentially, write code once, scale anyhow. [update:] On March, 2018, TensforFlow Hub was talked about. It is pitching the same ideas.
- Writing translator between DNN frameworks is easier than doing it between Algorithm implementations. Better yet, can we define a standard for DNNs? It is hard but it can be done with few developers - whole community is not needed. Adoption asymmetry is prevented. [updated] Sowmya pointed out that Open Neural Network Exchange (onnx) is one such format. Happy to note that translations between different frameworks are already supported.
Deep Learning Frameworks can ride on the AI wave. All organizations big and small are betting on riding the AI wave. They have a commercial incentive to have the early mover advantage or capture and/or hold on to their market penetration. NVIDIA and IBM, for example are training thousands of people on Deep Learning technologies. MOOC vendors such as Udacity, Coursera are offering Deep Learning and MachineLearning courses - they are big hits - essentially tapping into the market potential and workforce aspirations. There is both supply and demand.
So it seem like, DNNs and Deep Learning Frameworks, together, can address interoperability. However, it does not prevent co-existence of other algorithms that are interoperable. DNNs can be a nice default to work with.
Overall, I think that Deep Learning Frameworks have the potential to commoditize and democratize machine learning applications at large. Interoperability is one tiny but important aspect, which they can address with right application, as argued above.
Research @ National Research Council, Canada
6 年I did not read the first part - did you also write about ONNX somewhere.. reg inter-operability?