Libraries for large datasets in Python
People say the best way to learn something is to teach it. It's obviously a bit of cliche, but I have to admit it's one of things which really does ring true. When I'm researching the markets or coding something, I'm generally pretty focused at the task at hand. I'll learn what I need to do to get the task done. However, when I'm teaching (or when I've be coauthoring The Book of Alternative Data), the objective is go over a subject in a wide ranging way, rather than snippets here and there. Inevitably the whole process of preparing a course or writing a book, means I learn lots of stuff in the process which I haven't yet picked up on the job.
Over the past few weeks I've been preparing a course on Python, alt data/NLP and large datasets, which I'll be teaching at Queen Mary University of London (and if you'd like me to teach it at your firm let me know!). I'm also going to be teaching a more general Python course, Python for finance for QDC (sign up here). I recently tweeted about some of the Python libraries I've found useful for working with large datasets (in particular time series), and the thread got a lot of interest and lots of great suggestions, so I've elaborated on it below. Next week, I'll write another column, inspired by many of the replies on tips and tricks you can use to reduce the file size of large datasets and speed up their computation with tools like Numba or Cython, so stay tuned for that, including a few ideas which were tweeted by @ewankirk.
For time series that fit in memory, Pandas is...
To read the rest of the article, on the Cuemacro, please click here!
All Excel Functions Specialist at Norman Harker & Associates
4 年Saeed! No doubt in my mind that the best way to learn is to try and teach. To teach, you first have to understand. It is that understanding that brings the knowledge.
Vice President - Bank of America | J P Morgan | IIT Kanpur | Artificial Intelligence I Strategy
4 年Very essential for hft quant and traders .
Delivering Data-Driven Business Solutions | AI, ML, Data Science & Digital Transformation
4 年I had good experiences using h2o.ai too; it has both Python and R API.