Pages

Sunday, August 29, 2010

wget: downloader extraordinaire

I always found browsers too bothersome or intrusive when downloading files. The task of downloading files via browsers always require too much clicking and configuring. I'd rather spend time on the task at hand rather than configuring the crap out of some unintuitive graphical user interface.

I always have a shell handy when I'm browsing, so I always download files using wget.

I use an alias to predefine the arguments in $HOME/.bashrc I usually need to download like this:


alias wget="$(which wget) -m -U 'Mozilla/5.0 (compatible; Konqueror/3.2; Linux)' \
-e robots=off --wait 1 -c -nd -nH"


I put a which in there because I'm too lazy to type /usr/bin/wget; the purpose of putting it there is to escape any other current aliases or scripts which might get called using the same invocation name, namely wget.

The arguments there are crafted to make wget ignore robots.txt, to make wget pretend to be a browser. This is generally frowned upon. People call this bad netiquette. As I said, I'd rather get the job done...

If you decide to download recursively, you'll need the following commands:

  • Download recursively with -r --level n, where n is the maximum recursion depth;
  • To reject all files except, you'll need -R<.ext>, where .ext is some type of file extension, for instance .mp3
  • To allow only a certain file extension, you'll need -A<.ext>.

Have fun!

No comments:

Post a Comment

Please help to keep this blog clean. Don't litter with spam.