Wget Mirror Site – Download JS and CSS Files by Ignoring robots.txt – Wget Tutorial

By | November 10, 2020

Wget is a powerful tool to mirror a site, however, you may find it can not download js or css files when mirroring. In this tutorial, we will introduce you how to do.

If you are using win 10, you can read this tutorial to learn how to use wget.

Best Practice to Use wget in Windows 10 – Wget Tips

How to download js or css files when mirroring a site using wget?

You should make wget ignore the robots.txt to download js or css files.

Some sites may use robots.txt to forbid wget to download some resources. For example:

User-agent: *
Disallow: /Js/
Disallow: /Css/
Disallow: /Math/
Disallow: /CodeMirror/
Disallow: /App_Themes/

wget will can not download resources in /Js/, /Css/ directories.

You can use option below to make wget ignore robots.txt

-e robots=off

Then wget can download js and css files.

wget --mirror --convert-links --page-requisites --no-parent -e robots=off -P E:\site http://www.example.com/

Wget Mirror Site - Download JS and CSS Files by Ignoring robots.txt - Wget Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *