Monday, June 12, 2006

Recursive Download

For downloading only a particular type of files (say, *.pdf) form a http link, i used the following commands in sequence First, I downloaded the page containing the links to all the pdf files (ok, point to note here; I was not able to download recursively because of the robots.txt restriction)


wget http://my.requiredsite.com/page1/download/

This created a index.html file in the current directory for me with relative links to all pdf files. Then,
wget -r -nd -np -l4 -A "*.swf" -F -i ./index.html --base=
http://my.requiredsite.com/page1/download/
to download only the pdf files from the site.

No comments: