cURL multiple URLs using a config file
Brock Henry
Senior Backend Developer | Python/Perl Developer | Databases | Cloud Technology - AWS, Azure | Networking
Problem
Today I needed to get 30,000 URLs, and save the resulting json to their respective files. Initially I called curl for each one, but that was too slow due to close/open the connection multiple times.
I wanted curl to use one session, so I needed to call it once. I couldn’t use -O, because the URLs had no filename, they ended with a slash:
https://domain.com/customer/1/ https://domain.com/customer/4/ https://domain.com/customer/32/
Solution part 1
The solution that ended up working was to create a config to pass to URL, which contained the output filenames, and the urls.
-o customer.1.json url = “https://domain.com/customer/1/” -o customer.2.json url = “https://domain.com/customer/2/” -o customer.3.json
url = “https://domain.com/customer/3/”
And then tell to use the config file.
curl -K configfile
This helped, but it still was too slow.
Solution part 2
I wanted to run multiple curl sessions, each getting some part of the dataset.
Linux includes the command split that I could use to split the config file into parts, but I needed to be careful so that each pair of lines stayed together. The easiest way was to count the lines in the file, and then split it into a number of parts where:
- The number of lines in each resulting configfile was even.
- I ended up with about 8 files. This seems a reasonable number of curls to run concurrently.
My file had 65088 lines, which is 8 × 8136. I split using this number of lines, which gave me 8 files.
split -l 8136 configfile.txt configfile.part
I then ran curl 8 times concurrently. This seems to be fast enough. And I’m running inside a screen in case my connection dies.
MT Proxy - Helps FX Brokers Earn More, Having Happier Clients, By Speeding Up Trading And Delivering A Better Online Trading Experience To Clients Worldwide #MTProxy??
2 年You can also do: curl URL1 URL2 URL3 etc, and it will just go over them in line, with same connection. This is a lot faster than reconnecting.
Innovation ?? | OpenSource ?? | Automation ??? | AI ?? | Telco ??
4 年I'm wondering do not see Perl snippets here B-)
Field CTO at Port | Building Next-Gen Developer Experiences
4 年Nice. But screen over tmux?