Reading Large Files in C

To read an entire file efficiently you need to know the file size FIRST.

So whats the fastest way to programatically find the file size. If you read?stackoverflow?you’ll find a number of suggestions, but many use fseek to jump to the end of the file, find your location and use that as the file size…

This works, but is 30% slower than using stat or fstat

I just ran a bunch of tests comparing using?seek,?lseek,?stat, and?fstat?also comparing using file streams and file descriptors to see what seems to be the fastest. For the test I create a 100M file.

TL;DR — using file descriptors, fstat and read was the fastest and using file streams and seek was the slowest. Go to the bottom to see the real slowest.


For the test I ran this on a small Linux box I have running a headless ArchLinux server. I ran the test: checking the file size,?malloc?a buffer, read the entire file into the buffer, close the file, free the buffer.

I ran the test 3 times with 1000 cycles each time and using?clock_gettime?to calculate the elapsed time.

Just simply comparing JUST the time it takes to get the file size using?stat?or?fstat?were at least 30% faster than using?seek?or?lseek.

Comparing just the speed of using file streams vs file descriptors, they were pretty nearly the same — descriptors were about 1–3% faster.

In comparing getting the file size, opening the file,?malloc?a buffer, read the entire 100M, close the file and free the buffer -- using file descriptors and fstat were 6-8% faster than using?seek?or?lseek. Probably because the bulk of the time is spent in the file read vs the getting the file size, which dilutes the overall performance benefit.

BTW — do not use?fgetc?and read the file 1 character at a time. This is crazy inefficient and really really slow! Like 1700% slower!!!!

要查看或添加评论,请登录

Geoff Mulligan的更多文章

社区洞察

其他会员也浏览了