登录查看更多内容

Why Julia is better framework for AI?

Birendra Kumar Sahu

Senior Director Of Engineering | Head of Data Engineering and Science & integration platform, Ex-Razorpay, Ex-Teradata, Ex-CTO

发布日期: 2019年2月23日

What is Julia?

Julia is a promising language focused mainly on the scientific computing domain. It provides execution speeds comparable to C/C++ with high-level abstractions comparable to MATLAB. It also has parallelism built into the language model which is missing in most of AI/ML/DL programming languages.

These facts make it a good candidate for writing code for deep learning because currently deep learning frameworks mostly use C++ at the backend where performance matters and languages like Python at the frontend where ease of use matters, and parallelism is a big part of writing non-trivial deep learning code. Julia should be able do all this. Syntax similar to MATLAB ensures that many people will be able to transition into Julia easily.

The language and its ecosystem is being actively developed by people at MIT and around the world to make it more performance efficient and user friendly. So its definitely a language worth investing your time in.

Helping realize that goal is Flux, a machine-learning software library for Julia that's designed to make ML code easier to write, to simplify the training process, and to offer certain performance benefits over rival frameworks on hardware accelerators such as GPUs and Google's TPUs [Tensor Processing Units].

Today the Python and R languages typically dominate machine learning, with Python still the fastest-growing programming language in terms of developer popularity, driven in large part by the strength of its machine-learning frameworks and libraries. In comparison, only a relatively small proportion of developers use the fledgling Julia.

Being designed from the ground up for mathematical and numerical computing, Julia is unusually well-suited for expressing ML algorithms. Meanwhile, its mix of modern design and new ideas in the compiler makes it easier to address the high-performance needs of cutting edge ML.

Coding with Julia:

Model Building:

#The core concept in Flux is the model . A model (or "layer") is simply a 
#function with parameters. For example, in plain Julia code, we could define 
#the following function to represent a logistic regression 

W = randn(3,5)
b = randn(3)
affine(x) = W * x + b
x1 = rand(5) 
y1 = softmax(affine(x1)) 


#affine is simply a function which takes some vector x1 and outputs 
#a new one y1 . For example, x1 could be data from an image and y1 could be 
#predictions about the content of that image. However, affine isn't static. 
#It has parameters W and b , and if we tweak those parameters we'll tweak 
#the result – hopefully to make the predictions more accurate.


#Flux's core feature is taking gradients of Julia code. The gradient function 
#takes another Julia function f and a set of arguments, and returns the 
#gradient with respect to each argument.


using Flux.Tracker
f(x) = 3x^2 + 2x + 1
df(x) = Tracker.gradient(f, x)[1]
df(2) # 14.0 (tracked)

 

#Creating Model:

#Consider a simple linear regression, which tries to predict an output array y 
#from an input x.

W = rand(2, 5)
b = rand(2)
predict(x) = W*x .+ b

function loss(x, y)
   ? = predict(x)
   sum((y .- ?).^2)
end

x, y = rand(5), rand(2) # Dummy data
loss(x, y) # ~ 3


#To improve the prediction we can take the gradients of W and b with respect to 
#the loss and perform gradient descent. Let's tell Flux that W and b 
#are parameters, just like we did above.

 
using Flux.Tracker

W = param(W)
b = param(b)
gs = Tracker.gradient(() -> loss(x, y), Params([W, b]))
 

#Now that we have gradients, we can pull them out and update W to train the 
#model. The update!(W, Δ) function applies W = W + Δ, which we can use for 
#gradient descent.

using Flux.Tracker: update!
Δ = gs[W]


# Update the parameter and reset the gradient

update!(W, -0.1Δ)
loss(x, y)

#Building Layers

#It's common to create more complex models than the linear regression above. 
#For example, we might want to have two linear layers

W1 = param(rand(3, 5))
b1 = param(rand(3))
layer1(x) = W1 * x .+ b1
W2 = param(rand(2, 3))
b2 = param(rand(2))
layer2(x) = W2 * x .+ b2
model(x) = layer2((layer1(x)))
model(rand(5)) # => 2-element vector

#this works but is fairly unwieldy, with a lot of repetition – especially as 
#we add more layers. One way to factor this out is to create a function 
#that returns linear layers.

function linear(in, out)
   W = param(randn(out, in))
   b = param(randn(out))
   x -> W * x .+ b
end


linear1 = linear(5, 3) 
linear2 = linear(3, 2)
model(x) = linear2((linear1(x)))
model(rand(5)) # => 2-element vector

 

#Another (equivalent) way is to create a struct that explicitly represents the 
#affine layer.

struct Affine
   W
   b
end

 
Affine(in::Integer, out::Integer) = 
        Affine(param(randn(out, in)), param(randn(out)))

 
# Overload call, so the object can be used as a function

(m::Affine)(x) = m.W * x .+ m.b
a = Affine(10, 5)
a(rand(10)) # => 5-element vector

#Trainning Model

using Flux
x = rand(784)
y = rand(10)
data = [(x, y)] 

m = Chain( Dense(784, 32, σ), Dense(32, 10), softmax)
loss(x, y) = Flux.mse(m(x), y)
ps = Flux.params(m)
opt = ADAM(params(m))

# later
Flux.train!(loss, ps, data, opt)

using Flux
model = Flux.Chain(Dense(14*16+4, 64, relu),Dense(64, 16, relu),
        Dense(16, 1, relu));

x = rand(Bool, 14*16+4)
y = 100
loss(x,y) = sum((model(x) .- y).^2)
opt = ADAM(params(model))
Flux.@epochs 100 Flux.train!(loss, [(x,y)], opt)

#Combining Models

mymodel1(x) = softmax(affine(x))
mymodel1(x1)


#mymodel2 is exactly equivalent to mymodel1 because it simply calls the 
#provided functions in sequence.

mymodel2 = Chain(affine, softmax)
mymodel2(x1)

mymodel3 = Chain( Affine(5, 5), σ, Affine(5, 5), softmax)
m = Chain( Affine(128,128), relu,
           Affine(64, 64), relu,
           Affine(10, 10), softmax)

Visualization with Julia - Plots:

using Plots
x = 1:10; y = rand(10); # These are the plotting data
plot(x,y)

x = 1:10; y = rand(10,2) # 2 columns means two lines
plot(x,y)

z = rand(10)
plot!(x,z)

x = 1:10; y = rand(10,2) # 2 columns means two lines
p = plot(x,y)
z = rand(10)
plot!(p,x,z)

x = 1:10; y = rand(10,2) # 2 columns means two lines
plot(x,y,title="Two Lines",label=["Line 1" "Line 2"],lw=3)

plot(x,y,seriestype=:scatter,title="My Scatter Plot")
y = rand(10,4)
plot(x,y,layout=(4,1))

p1 = plot(x,y) # Make a line plot
p2 = scatter(x,y) # Make a scatter plot
p3 = plot(x,y,xlabel="This one is labelled",lw=3,title="Subtitle")
p4 = histogram(x,y) # Four histograms each with 10 points? Why not!
plot(p1,p2,p3,p4,layout=(2,2),legend=false)


using Distributions
plot(Normal(3,5),lw=3)

regX = rand(100)
regY = 100 * regX + rand(Normal(0, 10), 100)

using PyPlot
scatter(regX, regY)

using RDatasets
iris = dataset("datasets","iris")
@df iris marginalhist(:PetalLength, :PetalWidth)

y = rand(100,4) # Four series of 100 points each

violin(["Series 1" "Series 2" "Series 3" "Series 4"],y,leg=false)
boxplot!(["Series 1" "Series 2" "Series 3" "Series 4"],y,leg=false)

Deep learning with Julia - Flux:

#The purpose of the regression is to get the coefficient, meaning 100 here.
#Before making the model, I need to adjust the form of data to Flux

regData = []
for i in 1:length(regX)
    push!(regData, (regX[i], regY[i]))
end

#By Flux, it is easy to build model and train it. Here, I'll define model, 
#loss and optimizer. And the model is trained with the data.

using Flux
function main()
    train_X=rand(5,5,3,10)
    train_Y=rand(3,10)
    model=Chain(Flux.Conv((2,2),3=>2,relu),
    x -> maxpool(x, (1,1)),
    x -> reshape(x, :, size(x, 4)),
    Dense(32, 3))
    model(train_X)
    loss(x,y)=Flux.mse(model(x),y)
    opt=Flux.SGD(Flux.params(model),0.001)
    Flux.train!(loss,[(train_X,train_Y)],opt)
    println(Flux.params(model))
end

main()

#Classification
using Distributions
function makeData()
    groupOne = rand(MvNormal([10.0, 10.0], 10.0 * 2), 100)
    groupTwo = rand(MvNormal([0.0, 0.0], 10 * 2), 100)
    groupThree = rand(MvNormal([15.0, 0.0], 10.0 * 2), 100)
    return hcat(groupOne, groupTwo, groupThree)'
end

x = makeData()
xTest = makeData()

y = []
for i in 1:300
    if 1 <= i <= 100
        push!(y, [1, 0, 0])
    elseif 101 <= i <= 200
        push!(y, [0, 1, 0])
    else
        push!(y, [0, 0, 1])
    end
end

 
clusterData = []
for i in 1:length(y)
    push!(clusterData, (x[i, :], y[i]))
end

 

#By visualizing those, we can see the data are composed of three clusters. 
#The purpose is to make model to classify.

scatter(x[1:100, 1], x[1:100, 2], color="blue")
scatter(x[101:200, 1], x[101:200, 2], color="red")
scatter(x[201:300, 1], x[201:300, 2], color="green")

#make the model. The model structure is bit more complex than the regression’s 
#one. On the output layer, the softmax function is used. 
#Also, as a loss function, it is adapting crossentropy.

modelClassify = Chain(Dense(2, 5),
    Dense(5, 3),
    softmax)

loss(x, y) = Flux.crossentropy(modelClassify(x), y)
opt = SGD(Flux.params(modelClassify), 0.01)
@epochs 100 Flux.train!(loss, clusterData, opt)


#To the test data, the code below does prediction and checks the accuracy.
predicted = modelClassify(xTest').data

Major Advantages with Julia:

Julia is fast!

Julia is designed for parallelism, and provides built-in primitives for parallel computing at every level: instruction level parallelism, multi-threading and distributed computing.

The Julia compiler can also generate native code for various hardware accelerators, such as GPUs and Xeon Phis. Packages such as DistributedArrays.jl and Dagger.jl provide higher levels of abstraction for parallelism.

Easy to code and easy to use:

Julia has high level syntax, making it an accessible language for programmers from any background or experience level.

Julia uses multiple dispatch as a paradigm, making it easy to express many object-oriented and functional programming patterns. The standard library provides asynchronous I/O, process control, logging, profiling, a package manager etc.

Julia is dynamically-typed, feels like a scripting language, and has good support for interactive use.

Data Visualization and Plotting:

Data visualization has a complicated history. Plotting software makes trade-offs between features and simplicity, speed and beauty, and a static and dynamic interface. Some packages make a display and never change it, while others make updates in real-time.

lots.jl is a visualization interface and toolset. It provides a common API across various backends, like GR.jl, PyPlot.jl, and PlotlyJS.jl. Users who prefer a more grammar of graphics style API might like the pure Julia Gadfly.jl plotting package.

Interact with your Data:

The Julia data ecosystem lets you load multidimensional datasets quickly, perform aggregations, joins and preprocessing operations in parallel, and save them to disk in efficient formats. You can also perform online computations on streaming data with OnlineStats.jl. Whether you're looking for the convenient and familiar DataFrames, or a new approach with JuliaDB, Julia provides you a rich variety of tools. The Queryverse package acts a meta package through which you can access these tools with Julian APIs. In addition to working with tabular data, the JuliaGraphs packages make it easy to work with combinatorial data.

Julia can work with almost all databases using JDBC.jl and ODBC.jl drivers. In addition, it also integrates with the Hadoop ecosystem using Spark.jl, HDFS.jl, and Hive.jl.

Scalable Machine Learning:

Julia provides powerful tools for deep learning (Flux.jl and Knet.jl), machine learning and AI. Julia’s mathematical syntax makes it an ideal way to express algorithms just as they are written in papers, build trainable models with automatic differentiation, GPU acceleration and support for terabytes of data with JuliaDB.

Source: https://julialang.org

Why Julia is better framework for AI?

Birendra Kumar Sahu

Senior Director Of Engineering | Head of Data Engineering and Science & integration platform, Ex-Razorpay, Ex-Teradata, Ex-CTO

What is Julia?

Coding with Julia:

Model Building:

Visualization with Julia - Plots:

Deep learning with Julia - Flux:

Major Advantages with Julia:

Julia is fast!

Easy to code and easy to use:

Data Visualization and Plotting:

Interact with your Data:

Scalable Machine Learning:

更多精彩文章

社区洞察

其他会员也浏览了

Operations on SO3 Lie Group in Python

How To Build An Artificial Neural Network in Java

Implementing Vision Transformer (ViT) in Python: A Step-by-Step Guide

Python’s Top 6 Machine Learning Algorithms

CNN image detection with VGG16, AlexNet, InceptionV3, Resnet50

Data Phoenix Digest - ISSUE 1.2023

DeciCoder-6B and DeciDiffusion 2.0: Models Built for Accuracy, Speed, and Cost-Efficiency

NuminaMath 7B TIR: A New Era in AI-Powered Mathematical Problem-Solving

Library related interview questions along with brief answers:

Programming Languages For AI & ML

What is Julia?

Coding with Julia:

Model Building:

Visualization with Julia - Plots:

Deep learning with Julia - Flux:

Major Advantages with Julia:

Julia is fast!

Easy to code and easy to use:

Data Visualization and Plotting:

Interact with your Data:

Scalable Machine Learning:

Understanding Decision Science: A Guide with Real-Time Examples

2024年10月25日

Unlocking the Power of LLMs for Context-Aware SQL and Reporting/Visualization Generation

2024年10月13日

The Modernization of Software Platforms: The Journey from Monolith to Microservices

2024年10月12日

Understanding Data Mesh: A Modern Approach to Data Architecture

2024年10月9日

How Generative AI is Transforming Data Engineering

2024年10月6日

Understanding the Data Lakehouse Engine: Bridging the Gap Between Data Lakes and Data Warehouses

2024年10月6日

The Importance of Emotional Intelligence in Engineering Leadership

2024年10月6日

From Data to Decisions: How Data Engineering Fuels AI Transformation and Common Pitfalls to Avoid?

2024年10月6日

K-Means Clustering Algorithm - Case Study

2018年5月18日

How to build large image processing analytic system in Data Lake:

2017年9月18日

社区洞察

其他会员也浏览了

Operations on SO3 Lie Group in Python

How To Build An Artificial Neural Network in Java

Implementing Vision Transformer (ViT) in Python: A Step-by-Step Guide

Python’s Top 6 Machine Learning Algorithms

CNN image detection with VGG16, AlexNet, InceptionV3, Resnet50

Data Phoenix Digest - ISSUE 1.2023

DeciCoder-6B and DeciDiffusion 2.0: Models Built for Accuracy, Speed, and Cost-Efficiency

NuminaMath 7B TIR: A New Era in AI-Powered Mathematical Problem-Solving

Library related interview questions along with brief answers:

Programming Languages For AI & ML