GPU renting for Kaggle and work. My experience
Ivan Isaev
ML tech-lead and senior engineer | Ex-Head of ML & DS | Ex-Head of Engineering | Kaggle Competitions Master
Recently I wrote an article about GCE advantages and drawbacks.
This article is about the Vast.ai service which I use for similar purposes: renting GPUs to train ML models for Kaggle and working needs.?
What is Vast.ai?
Vast is a cost effective service for P2P GPU’s renting.
Brief description of available Vast functionality and features:
One time I had an issue with GPU and was able to reach end owner of GPU in discord chat and he helped me to fix issue (I got about 1 Tb of data and migrate it to other GPU would be quite time consuming)
Worth mentioning that SSH is closed after stopping instance and you couldn’t run bash scripts on instance anymore but downloading the data from instance using SSH is still available even after instance stopping.
Vast advantages and drawbacks
Vast advantages:
领英推荐
0.25 hourly for 24GB Ram gpu, storage costs nearly nothing , no hidden costs. If you use it 24/7 this is just 180 usd per month. Similar resources from GCE will cost more than 2K usd per month including all unclear hidden costs with networking, disk operations, etc. (I tried it).
Vast drawbacks:?
Couple of words about Colab+Gdrive for large data
Additionally to Vast and GCE I tried Colab for the purpose of working with large data amounts (RSNA competition with nearly 1 Tb of data).
Colab VM size for GPU is 100 Gb only (about 20 usd/day). And a few hundreds Gb for TPU (about 30 usd/day). So to work with 1 Tb of data in Colab I rented Gdrive 2 Tb storage also (about 10 usd per month). But even though I rented this all, I couldn't make it work. Possibly this is feasible but you are still limited by a hundred or a few hundreds of GPU/TPU VM storage.
Additional drawbacks of Colab and Gdrive you should consider when using them together:
Hope sharing this experience will be helpful for those who are looking for a suitable service to rent GPUs for ML models training on large amounts of data. Good luck!