Mastering the wget Command: A Comprehensive Guide


A Comprehensive Guide to the wget Command

wget is a powerful command-line utility in Linux designed for downloading files from the web. It’s capable of handling various protocols including HTTP, HTTPS, and FTP, making it a versatile tool for developers and system administrators. This post will delve into the essential functionalities of wget, providing practical examples to enhance your file downloading experience.

Basic Usage

To download a file from a specific URL, simply execute:

wget https://example.com/foo

This command downloads the contents of the URL and saves it as “foo” in your current directory.

Customizing Output File Names

If you want to save the downloaded contents under a different name, use the -O option:

wget -O bar https://example.com/foo

This command will save the contents from the URL to a file named “bar”.

Downloading Complete Web Pages

To download a single web page along with all its associated resources—like scripts, stylesheets, and images—use the following command while maintaining a pace between requests:

wget -p -k -w 3 https://example.com/somepage.html

Here:

  • -p ensures all resources are downloaded.
  • -k converts links for offline viewing.
  • -w 3 introduces a 3-second wait between requests to avoid overwhelming the server.

Mirroring Websites

If you want to download an entire directory and its sub-directories without grabbing embedded page elements, use:

wget -m -np https://example.com/somepath/
  • -m enables mirroring, which effectively replicates the site structure locally.
  • -np prevents wget from ascending to the parent directory.

Managing Download Speed and Retries

To limit your download speed and specify a number of retries in case of failures, use:

wget --limit-rate=300k -t 100 https://example.com/somepath/

This command will cap the download speed at 300KB/s and retry the download up to 100 times.

Downloading with Authentication

For scenarios requiring Basic Authentication, such as downloading files from secure servers, you can specify the username and password:

wget --user username --password password https://example.com

This allows you to access protected resources seamlessly.

Continuing Incomplete Downloads

If your download was interrupted, you can resume it using:

wget -c https://example.com

The -c option tells wget to continue from where the download left off.

Downloading Multiple Files from a Text File

To download all URLs listed in a text file, directing them to a specific directory, use:

wget -P path/to/directory -i URLs.txt

The -P option specifies the target directory, while -i indicates the input file containing the URLs.

Conclusion

wget is an invaluable utility for anyone needing to download files, whether you’re a seasoned developer or a casual user. With its multitude of options, you can customize your downloads to suit your needs, from simple file retrieval to complex web scraping tasks. For more information, visit the official GNU wget documentation. Happy downloading!

See Also