Make Offline Mirror Copy of a Site with Wget on Windows and Linux
Sometimes you need to get a browsable copy of a web site, so you can access it offline, put on an USB stick or even upload to your smartphone and read it when flying or traveling. While it is easy in modern browsers and operating systems to save a web page to PDF, it can be annoying to process every single page. Here comes wget.
Wget is an open-source download manager. It is a console app developed primarily for Linux, but successfully ported on other OSes, including Windows and MacOS.
If you are not familiar with wget, you should definitely give it a try. It is very powerful. It allows fetching files from web sites using HTTP, HTTPS and FTP, the Internet protocols we are using these days. Its behavior is controlled by command line arguments.
Wget supports a variety of options to retrieve files on slow or unstable connections, including retries, continue where it left of, and more. It supports the "robots.txt" file, so it can work like a web crawler. It can retrieve modified files only, supports wildcards, file type limits, and regular extensions.
Wget supports the recursive retrieval of HTML web sites and FTP servers, allowing you to make a web site mirror. Here is how it can be done.
Before proceeding, you need to get the wget app.
Get Wget on Windows
I usually use binaries from these sources:
Both do their work.
Get Wget on Linux
Use your distro's package manager. Some examples (run them as root):
apt install wget
yum install wget
pacman -Sy wget
xbps-install -S wget
To Make an Offline Copy of a Site with Wget,
- Open command prompt / terminal.
- On Windows, type the full path to the wget.exe file.
- On Linux, type just wget.
- Now, type the following arguments to get the following command:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://site-to-download.com
- Replace the
https://site-to-download.comportion with the actual site URL you want to make a mirror of.
You are done!
Here are the switches we use:
--mirror- applies a number of options to make the download recursive.
--no-parent– Do not crawl the parent directory in order to get a portion of the site only.
--convert-links- makes all the links to work properly with the offline copy.
--page-requisites- download JS and CSS files to retain the original page style when browsing a local mirror.
--adjust-extension- adds the appropriate extensions (e.g. html, css, js) to files if they were retrieved without them.