How fast is your Data Loader...?
Pankaj Mishra (Ph.D)
Democratizing AI and ML in the best interest of industries
One of the bottleneck Computer Vision (CV) tasks is image loading. It can be a culprit behind the lag you are getting in your execution.
The importance of this task would be an implementation of a?Dataloader?class in any CNN training framework. It is crucial to make image loading fast. If it is not so, the training procedure becomes CPU bound and wastes precious GPU time.
The most famous Python libraries which allow us to read images most efficiently are -
Some additional formats which are readily used are -
Well, this article is highly inspired by this article [Link], hence, I won't be sharing any code here. But some intuitive visualization and tabular data would serve the purpose.
领英推荐
Conclusion
The comparison results on JPEG images are really interesting. We can see that the TurboJpeg is the fastest library. Another important thing to mention is that Pillow-SIMD is faster than the original Pillow. In our task, the loading speed increased nearly by 40%.?For image database – TFRecords shows better mean results than LMDB, in particular, because of the built-in decoder function. On the other hand, LMDB allows us to read images faster.?
Thanks for following this article till here..!!
Note: Some of the content here is from third-party websites and content platforms.
Happy Learning..!!
Pankaj Mishra