Why Julia is better framework for AI?
Birendra Kumar Sahu
Senior Director Of Engineering | Head of Data Engineering and Science & integration platform, Ex-Razorpay, Ex-Teradata, Ex-CTO
What is Julia?
Julia is a promising language focused mainly on the scientific computing domain. It provides execution speeds comparable to C/C++ with high-level abstractions comparable to MATLAB. It also has parallelism built into the language model which is missing in most of AI/ML/DL programming languages.
These facts make it a good candidate for writing code for deep learning because currently deep learning frameworks mostly use C++ at the backend where performance matters and languages like Python at the frontend where ease of use matters, and parallelism is a big part of writing non-trivial deep learning code. Julia should be able do all this. Syntax similar to MATLAB ensures that many people will be able to transition into Julia easily.
The language and its ecosystem is being actively developed by people at MIT and around the world to make it more performance efficient and user friendly. So its definitely a language worth investing your time in.
Helping realize that goal is Flux, a machine-learning software library for Julia that's designed to make ML code easier to write, to simplify the training process, and to offer certain performance benefits over rival frameworks on hardware accelerators such as GPUs and Google's TPUs [Tensor Processing Units].
Today the Python and R languages typically dominate machine learning, with Python still the fastest-growing programming language in terms of developer popularity, driven in large part by the strength of its machine-learning frameworks and libraries. In comparison, only a relatively small proportion of developers use the fledgling Julia.
Being designed from the ground up for mathematical and numerical computing, Julia is unusually well-suited for expressing ML algorithms. Meanwhile, its mix of modern design and new ideas in the compiler makes it easier to address the high-performance needs of cutting edge ML.
Coding with Julia:
Model Building:
#The core concept in Flux is the model . A model (or "layer") is simply a
#function with parameters. For example, in plain Julia code, we could define
#the following function to represent a logistic regression
W = randn(3,5)
b = randn(3)
affine(x) = W * x + b
x1 = rand(5)
y1 = softmax(affine(x1))
#affine is simply a function which takes some vector x1 and outputs
#a new one y1 . For example, x1 could be data from an image and y1 could be
#predictions about the content of that image. However, affine isn't static.
#It has parameters W and b , and if we tweak those parameters we'll tweak
#the result – hopefully to make the predictions more accurate.
#Flux's core feature is taking gradients of Julia code. The gradient function
#takes another Julia function f and a set of arguments, and returns the
#gradient with respect to each argument.
using Flux.Tracker
f(x) = 3x^2 + 2x + 1
df(x) = Tracker.gradient(f, x)[1]
df(2) # 14.0 (tracked)
#Creating Model:
#Consider a simple linear regression, which tries to predict an output array y
#from an input x.
W = rand(2, 5)
b = rand(2)
predict(x) = W*x .+ b
function loss(x, y)
? = predict(x)
sum((y .- ?).^2)
end
x, y = rand(5), rand(2) # Dummy data
loss(x, y) # ~ 3
#To improve the prediction we can take the gradients of W and b with respect to
#the loss and perform gradient descent. Let's tell Flux that W and b
#are parameters, just like we did above.
using Flux.Tracker
W = param(W)
b = param(b)
gs = Tracker.gradient(() -> loss(x, y), Params([W, b]))
#Now that we have gradients, we can pull them out and update W to train the
#model. The update!(W, Δ) function applies W = W + Δ, which we can use for
#gradient descent.
using Flux.Tracker: update!
Δ = gs[W]
# Update the parameter and reset the gradient
update!(W, -0.1Δ)
loss(x, y)
#Building Layers
#It's common to create more complex models than the linear regression above.
#For example, we might want to have two linear layers
W1 = param(rand(3, 5))
b1 = param(rand(3))
layer1(x) = W1 * x .+ b1
W2 = param(rand(2, 3))
b2 = param(rand(2))
layer2(x) = W2 * x .+ b2
model(x) = layer2((layer1(x)))
model(rand(5)) # => 2-element vector
#this works but is fairly unwieldy, with a lot of repetition – especially as
#we add more layers. One way to factor this out is to create a function
#that returns linear layers.
function linear(in, out)
W = param(randn(out, in))
b = param(randn(out))
x -> W * x .+ b
end
linear1 = linear(5, 3)
linear2 = linear(3, 2)
model(x) = linear2((linear1(x)))
model(rand(5)) # => 2-element vector
#Another (equivalent) way is to create a struct that explicitly represents the
#affine layer.
struct Affine
W
b
end
Affine(in::Integer, out::Integer) =
Affine(param(randn(out, in)), param(randn(out)))
# Overload call, so the object can be used as a function
(m::Affine)(x) = m.W * x .+ m.b
a = Affine(10, 5)
a(rand(10)) # => 5-element vector
#Trainning Model
using Flux
x = rand(784)
y = rand(10)
data = [(x, y)]
m = Chain( Dense(784, 32, σ), Dense(32, 10), softmax)
loss(x, y) = Flux.mse(m(x), y)
ps = Flux.params(m)
opt = ADAM(params(m))
# later
Flux.train!(loss, ps, data, opt)
using Flux
model = Flux.Chain(Dense(14*16+4, 64, relu),Dense(64, 16, relu),
Dense(16, 1, relu));
x = rand(Bool, 14*16+4)
y = 100
loss(x,y) = sum((model(x) .- y).^2)
opt = ADAM(params(model))
Flux.@epochs 100 Flux.train!(loss, [(x,y)], opt)
#Combining Models
mymodel1(x) = softmax(affine(x))
mymodel1(x1)
#mymodel2 is exactly equivalent to mymodel1 because it simply calls the
#provided functions in sequence.
mymodel2 = Chain(affine, softmax)
mymodel2(x1)
mymodel3 = Chain( Affine(5, 5), σ, Affine(5, 5), softmax)
m = Chain( Affine(128,128), relu,
Affine(64, 64), relu,
Affine(10, 10), softmax)
Visualization with Julia - Plots:
using Plots
x = 1:10; y = rand(10); # These are the plotting data
plot(x,y)
x = 1:10; y = rand(10,2) # 2 columns means two lines
plot(x,y)
z = rand(10)
plot!(x,z)
x = 1:10; y = rand(10,2) # 2 columns means two lines
p = plot(x,y)
z = rand(10)
plot!(p,x,z)
x = 1:10; y = rand(10,2) # 2 columns means two lines
plot(x,y,title="Two Lines",label=["Line 1" "Line 2"],lw=3)
plot(x,y,seriestype=:scatter,title="My Scatter Plot")
y = rand(10,4)
plot(x,y,layout=(4,1))
p1 = plot(x,y) # Make a line plot
p2 = scatter(x,y) # Make a scatter plot
p3 = plot(x,y,xlabel="This one is labelled",lw=3,title="Subtitle")
p4 = histogram(x,y) # Four histograms each with 10 points? Why not!
plot(p1,p2,p3,p4,layout=(2,2),legend=false)
using Distributions
plot(Normal(3,5),lw=3)
regX = rand(100)
regY = 100 * regX + rand(Normal(0, 10), 100)
using PyPlot
scatter(regX, regY)
using RDatasets
iris = dataset("datasets","iris")
@df iris marginalhist(:PetalLength, :PetalWidth)
y = rand(100,4) # Four series of 100 points each
violin(["Series 1" "Series 2" "Series 3" "Series 4"],y,leg=false)
boxplot!(["Series 1" "Series 2" "Series 3" "Series 4"],y,leg=false)
Deep learning with Julia - Flux:
#The purpose of the regression is to get the coefficient, meaning 100 here.
#Before making the model, I need to adjust the form of data to Flux
regData = []
for i in 1:length(regX)
push!(regData, (regX[i], regY[i]))
end
#By Flux, it is easy to build model and train it. Here, I'll define model,
#loss and optimizer. And the model is trained with the data.
using Flux
function main()
train_X=rand(5,5,3,10)
train_Y=rand(3,10)
model=Chain(Flux.Conv((2,2),3=>2,relu),
x -> maxpool(x, (1,1)),
x -> reshape(x, :, size(x, 4)),
Dense(32, 3))
model(train_X)
loss(x,y)=Flux.mse(model(x),y)
opt=Flux.SGD(Flux.params(model),0.001)
Flux.train!(loss,[(train_X,train_Y)],opt)
println(Flux.params(model))
end
main()
#Classification
using Distributions
function makeData()
groupOne = rand(MvNormal([10.0, 10.0], 10.0 * 2), 100)
groupTwo = rand(MvNormal([0.0, 0.0], 10 * 2), 100)
groupThree = rand(MvNormal([15.0, 0.0], 10.0 * 2), 100)
return hcat(groupOne, groupTwo, groupThree)'
end
x = makeData()
xTest = makeData()
y = []
for i in 1:300
if 1 <= i <= 100
push!(y, [1, 0, 0])
elseif 101 <= i <= 200
push!(y, [0, 1, 0])
else
push!(y, [0, 0, 1])
end
end
clusterData = []
for i in 1:length(y)
push!(clusterData, (x[i, :], y[i]))
end
#By visualizing those, we can see the data are composed of three clusters.
#The purpose is to make model to classify.
scatter(x[1:100, 1], x[1:100, 2], color="blue")
scatter(x[101:200, 1], x[101:200, 2], color="red")
scatter(x[201:300, 1], x[201:300, 2], color="green")
#make the model. The model structure is bit more complex than the regression’s
#one. On the output layer, the softmax function is used.
#Also, as a loss function, it is adapting crossentropy.
modelClassify = Chain(Dense(2, 5),
Dense(5, 3),
softmax)
loss(x, y) = Flux.crossentropy(modelClassify(x), y)
opt = SGD(Flux.params(modelClassify), 0.01)
@epochs 100 Flux.train!(loss, clusterData, opt)
#To the test data, the code below does prediction and checks the accuracy.
predicted = modelClassify(xTest').data
Major Advantages with Julia:
Julia is fast!
Julia is designed for parallelism, and provides built-in primitives for parallel computing at every level: instruction level parallelism, multi-threading and distributed computing.
The Julia compiler can also generate native code for various hardware accelerators, such as GPUs and Xeon Phis. Packages such as DistributedArrays.jl and Dagger.jl provide higher levels of abstraction for parallelism.
Easy to code and easy to use:
Julia has high level syntax, making it an accessible language for programmers from any background or experience level.
Julia uses multiple dispatch as a paradigm, making it easy to express many object-oriented and functional programming patterns. The standard library provides asynchronous I/O, process control, logging, profiling, a package manager etc.
Julia is dynamically-typed, feels like a scripting language, and has good support for interactive use.
Data Visualization and Plotting:
Data visualization has a complicated history. Plotting software makes trade-offs between features and simplicity, speed and beauty, and a static and dynamic interface. Some packages make a display and never change it, while others make updates in real-time.
lots.jl is a visualization interface and toolset. It provides a common API across various backends, like GR.jl, PyPlot.jl, and PlotlyJS.jl. Users who prefer a more grammar of graphics style API might like the pure Julia Gadfly.jl plotting package.
Interact with your Data:
The Julia data ecosystem lets you load multidimensional datasets quickly, perform aggregations, joins and preprocessing operations in parallel, and save them to disk in efficient formats. You can also perform online computations on streaming data with OnlineStats.jl. Whether you're looking for the convenient and familiar DataFrames, or a new approach with JuliaDB, Julia provides you a rich variety of tools. The Queryverse package acts a meta package through which you can access these tools with Julian APIs. In addition to working with tabular data, the JuliaGraphs packages make it easy to work with combinatorial data.
Julia can work with almost all databases using JDBC.jl and ODBC.jl drivers. In addition, it also integrates with the Hadoop ecosystem using Spark.jl, HDFS.jl, and Hive.jl.
Scalable Machine Learning:
Julia provides powerful tools for deep learning (Flux.jl and Knet.jl), machine learning and AI. Julia’s mathematical syntax makes it an ideal way to express algorithms just as they are written in papers, build trainable models with automatic differentiation, GPU acceleration and support for terabytes of data with JuliaDB.
Source: https://julialang.org