Today I had to do some advance curling and I though I would share what I did
to really get the most out of what I was doing. Essentially I had a huge CSV of
values that I wanted to go through and get the status code for. One option
possible was to bust out Ruby or Elixir and write some quick software to
accomplish the task; but bash once again has come out on top with the one
liner that seems to be ideal.
To start with, dealing with CSVs can be a pain; however, most times cut has
your back. Lets says you only want the second column of a CSV. To do that use
the following command…
|
This breaks the input of a file with each line using the delimiter of , and
takes the second field; denoted with -f 2. If you wanted to grab say the
first and third field it would look like this…
|
In my case I only wanted the first field; which contains the URLs that I wanted to check. Here is the command that was used; which will be broken down:
|
This is taking the urls from the csv, running 10 curls at a time, and outputting
the http status code and url tried to a results.txt file. I don’t want to go
into too much detail on the curl part of this call but I would like to explain
more on what xargs is doing. The -n part is saying to run each line as an
argument to a command. The -P part is how many items you want to run in
parallel; in our case ten at a time. The -I URL part is taking our single
parameter and substituting anywhere it finds URL with the line from the list.
Lastly the output of all of these results are being aggregated into a
results.txt file with the following format:
|
If you have what seems like a complicated task; look again. It may be as simple
as the one liner I found! A big thanks to Lee Jones
for showing me that xargs has a parallel option to it. I haven’t looked into
it yet; however, it looks like the gnu parallel command
has a bit more fire-power to it for processing.