cURL multiple URLs using a config file

Problem

Today I needed to get 30,000 URLs, and save the resulting json to their respective files. Initially I called curl for each one, but that was too slow due to close/open the connection multiple times.

I wanted curl to use one session, so I needed to call it once. I couldn’t use -O, because the URLs had no filename, they ended with a slash:

https://domain.com/customer/1/
https://domain.com/customer/4/
https://domain.com/customer/32/

Solution part 1

The solution that ended up working was to create a config to pass to URL, which contained the output filenames, and the urls.

-o customer.1.json
url = “https://domain.com/customer/1/”
-o customer.2.json
url = “https://domain.com/customer/2/”
-o customer.3.json

url = “https://domain.com/customer/3/”

And then tell to use the config file.

curl -K configfile

This helped, but it still was too slow.

Solution part 2

I wanted to run multiple curl sessions, each getting some part of the dataset.

Linux includes the command split that I could use to split the config file into parts, but I needed to be careful so that each pair of lines stayed together. The easiest way was to count the lines in the file, and then split it into a number of parts where:

  1. The number of lines in each resulting configfile was even.
  2. I ended up with about 8 files. This seems a reasonable number of curls to run concurrently.

My file had 65088 lines, which is 8 × 8136. I split using this number of lines, which gave me 8 files.

split -l 8136 configfile.txt configfile.part

I then ran curl 8 times concurrently. This seems to be fast enough. And I’m running inside a screen in case my connection dies.

Kent Riboe

MT Proxy - Helps FX Brokers Earn More, Having Happier Clients, By Speeding Up Trading And Delivering A Better Online Trading Experience To Clients Worldwide #MTProxy??

2 年

You can also do: curl URL1 URL2 URL3 etc, and it will just go over them in line, with same connection. This is a lot faster than reconnecting.

回复
Stas Kozlov

Innovation ?? | OpenSource ?? | Automation ??? | AI ?? | Telco ??

4 年

I'm wondering do not see Perl snippets here B-)

回复
Elliott Spira

Field CTO at Port | Building Next-Gen Developer Experiences

4 年

Nice. But screen over tmux?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了