Make Offline Copy of a Site with Wget on Windows and Linux

Make Offline Mirror Copy of a Site with Wget on Windows and Linux

Sometimes you need to get a browsable copy of a web site, so you can access it offline, put on an USB stick or even upload to your smartphone and read it when flying or traveling. While it is easy in modern browsers and operating systems to save a web page to PDF, it can be annoying to process every single page. Here comes wget.

Wget is an open-source download manager. It is a console app developed primarily for Linux, but successfully ported on other OSes, including Windows and MacOS.

If you are not familiar with wget, you should definitely give it a try. It is very powerful. It allows fetching files from web sites using HTTP, HTTPS and FTP, the Internet protocols we are using these days. Its behavior is controlled by command line arguments.

Wget supports a variety of options to retrieve files on slow or unstable connections, including retries, continue where it left of, and more. It supports the "robots.txt" file, so it can work like a web crawler. It can retrieve modified files only, supports wildcards, file type limits, and regular extensions.

Wget supports the recursive retrieval of HTML web sites and FTP servers, allowing you to make a web site mirror. Here is how it can be done.

Before proceeding, you need to get the wget app.

Get Wget on Windows

I usually use binaries from these sources:

Both do their work.

Get Wget on Linux

Use your distro's package manager. Some examples (run them as root):

Debian/Ubuntu/Mint:

apt install wget

CentOS/Redhat

yum install wget

Arch Linux

pacman -Sy wget

Void Linux

xbps-install -S wget

To Make an Offline Copy of a Site with Wget,

  1. Open command prompt / terminal.
  2. On Windows, type the full path to the wget.exe file.
  3. On Linux, type just wget.
  4. Now, type the following arguments to get the following command: wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://site-to-download.com
  5. Replace the https://site-to-download.com portion with the actual site URL you want to make a mirror of.

You are done!

Here are the switches we use:

  • --mirror - applies a number of options to make the download recursive.
  • --no-parent – Do not crawl the parent directory in order to get a portion of the site only.
  • --convert-links - makes all the links to work properly with the offline copy.
  • --page-requisites - download JS and CSS files to retain the original page style when browsing a local mirror.
  • --adjust-extension - adds the appropriate extensions (e.g. html, css, js) to files if they were retrieved without them.

That's it.

1 thought on “Make Offline Copy of a Site with Wget on Windows and Linux

  1. Dave

    Very interesting. I’m going to try it out later on a game wiki. Often these game sights vanish when the games get old even though they contain information practically required to be able to use the software.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *