host Performs a simple lookup of an internet address (using the Domain Name System, DNS). Simply type:
dig The "domain information groper" tool. More advanced then host... Give a host-name as an argument to output information about that host, including it's IP address, hostname and various
For example, to look up information about "www.amazon.com" type:
To find the host name for a given IP address (ie a reverse lookup), use dig with the `-x' option.
This will look up the address (which may or may not exist) and returns the address of the host, for example if that was the address of http://slashdot.org then it would return
dig takes a huge number of options (at the point of being too many), refer to the manual page for more information.
whois (now BW whois) is used to look up the contact information from the "whois" databases, the servers are only likely to hold major sites. Note that contact
information is likely to be hidden or restricted as it is often abused by crackers and others looking for a way to cause malicious damage to organisation's.
wget (GNU Web get) used to download files from the World Wide Web.
To archive a single web site, use the -m or --mirror (mirror) option.
Use the -nc (no clobber) option to stop wget from overwriting a file if you already have it.
Use the -c or --continue option to continue a file that was unfinished by wget or another program.
Simple usage example:
This would simply get a file from a site.
wget can also retrieve multiple files using standard wildcards, the same as the type used in bash, like *, [ ], ?. Simply use wget as per normal but use single quotation marks ('
') on the URL to prevent bash from expanding the wildcards. There are complications if you are retrieving from a http site (see below...).
Advanced usage example, (used from wget manual page):
wget --spider --force-html -i bookmarks.html
This will parse the file bookmarks.html and check that all the links exist.
Advanced usage; this is how you can download multiple files using http (using a wildcard...).
Notes: http doesn't support downloading using standard wildcards, ftp does so you may use wildcards with ftp and it will work fine. A work-around for this http limitation is shown below:
wget -r -l1 --no-parent -A.gif http://www.website.com
This will download (recursively), to a depth of one, in other words in the current directory and not below that. This command will ignore references to the parent directory, and downloads anything
that ends in ".gif". If you wanted to download say, anything that ends with ".pdf" as well than add a -A.pdf before the website address. Simply change the website address and the type of
file being downloaded to download something else. Note that doing -A.gif is the same as doing -A "*.gif" (double quotes only, single quotes will not work).
wget has many more options refer to the examples section of the manual page, this tool is very well documented.
||Alternative website downloaders
You may like to try alternatives like httrack. A full GUI website downloader written in python and available for GNU/Linux
curl curl is another remote downloader. This remote downloader is designed to work without user interaction and supports a variety of protocols, can upload/download and has a large number
of tricks/work-arounds for various things. It can access dictionary servers (dict), ldap servers, ftp, http, gopher, see the manual page for full details.
To access the full manual (which is huge) for this command type:
For general usage you can use it like wget. You can also login using a user name by using the -u option and typing your username and password like this:
curl -u username:password http://www.placetodownload/file
To upload using ftp you the -T option:
curl -T file_name ftp://ftp.uploadsite.com
To continue a file use the -C option:
curl -C - -o file http://www.site.com