Ray is a fast and simple framework for building and running distributed applications
Ray is packaged with the following libraries for accelerating machine learning workloads:
- Tune: Scalable Hyperparameter Tuning
- RLlib: Scalable Reinforcement Learning
- Distributed Training
Install Ray with: pip install ray. For nightly wheels, see the Installation page.
Why Ray?
For Distrubuted computation over clusters for AI Applications (especially for Reinforacement learning)
Evoluation of ML Applications
1> Supervised Learning
2> RL Applications
Supervised v/s RL Application
RL Application Pattern
Process inputs from sensors in parallel & Real time and execute large # of simulations in range of 100's of millions and rollout outcomes are used to update policy
Policy is managed by a DNN (Deep Neural Network)
RL Application Requirments
1 Handle dynamic task graph {Hetrogenous durations & Hetrogenous compuations }
2. Schedule Millions of tasks/sec
3.Easy to parallelize ML algorithms
Ray API (Remote Functions):
It doesnot allyou user to share multable state between tasks which is an important tasks as most of tasks as stateful
import ray import numpy as np ray.init() @ray.remote def zeros(shape): return np.zeros(shape) @ray.remote def dot(a,b): return np.dot(a,b) id1=zeros.remote([5,5]) id2=zeros.remote([5,5]) id3=dot.remote(id1,id2) ray.get(id3)
To achive stateful we use Ray API (actors)
@ray.remote class Counter(object): def __init__(self): self.value=0 def postinc(self): self.value +=1 return self.value c=Counter.remote() id1=c.postinc.remote() id2=c.postinc.remote() id3=c.postinc.remote() ray.get([id1,id2,id3])
State is shared b/w actor methods and Actor method return object ids
Also Supports GPU
@ray.remote(num_gpus=3)
Ray Archiecture
- Bottom up distributed scheduler
- Can be thaught off as spark drivers and workers
- Shared Object Store using apche arrow data layout which brings down data for deseralization
- In contract to spark workers can also submit tasks which can be assigned to other worker
- Global control store takes meta data out of rest of system and centralize in single sharded database ( Easy to replicate , build profiling tools , serves as message bus )
Ray's performance
If you on Windows platform as of today it's not available It's Only available on Linux and Mac
Evolution Strategies
Try lot of different policies and see which one works best
class Worker(onkect): def do_dimulation(policy,seed): #perform simulation return reward workers= [Worker() for i in range(50)] policy=intital_policy() for i in range(200): seeds=generate_seeds(i) rewards=[workers[j].do_simulation(policy,seed[j] for j in range(50)] policy=compute_update(policy,rewards,seeds)
Using Ray
@ray.remote class Worker(onkect): def do_dimulation(policy,seed): #perform simulation return reward workers= [Worker.remote() for i in range(50)] policy=intital_policy() for i in range(200): seeds=generate_seeds(i) rewards=[workers[j].do_simulation.remote(policy,seed[j] for j in range(50)] policy=compute_update(policy,rewards,seeds)
RaY + Spark:
Complemenary : Spark handles data processing and classic ML Algos while Ray is for RL
Interoperability: via Object Store ( Apache Arrow)