Python for big data computation on a single computer

Python for big data computation on a single computer

Is it possible to process big data on a laptop? Most people would answer with a blatant no. In this post, I would like to convince you that not only it is possible but, in some cases, it can be convenient.

But first, let’s go back to the basics. What does big data analysis entail? Well, according to one of the simplest and most accepted operational definitions, big data computation happens whenever you need to process a dataset that doesn’t completely fit in the RAM of a single computer.

As per this definition, big data computation doesn’t have to be distributed among many machines. If you only have one machine and your dataset fits in its hard disk, but not in your RAM, you are facing a big data challenge. This situation calls for techniques from the area of out-of-core algorithms () . The name stems from the fact that the only possible approach in this case is to load one chunk of data into memory do something with them, store (intermediate) results to disk, load another chunk of data into memory, rinse, repeat. 

But wait! - you would say – Writing and reading intermediate results to disk is very expensive and time-consuming. You are a 100% percent right. In fact, a comparison of the time needed to read 1000 MB of data from different media goes as follows...

To keep reading go here.

David Ray

Co-owner/Design/Development at SVOPower Technologies

6 年

“Well, according to one of the simplest and most accepted operational definitions, big data computation happens whenever you need to process a dataset that doesn’t completely fit in the RAM of a single computer.” Apparently, I’ve been working in Big Data for decades and didn’t even know it. Only recently has memory and disk storage become so big neither is a problem!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了