Create a temporary external Web mirror
13 Apr 2018
There are occasions when a specific website is relaying live information about occuring events but the server cannot handle the amount of people coming. This causes long load times, server errors, and everyone starts impatiently refreshing the page causing only more harm. You can try to provide a Web mirror to help them mitigate server load, but it requires you to think about it a bit. The solution described here is simple and can be applied everytime without discussion with the original website administrator about the installation of a load balancer and other more complicated solutions.
Context: the Notre-Dame-des-Landes ZAD is under government attack and relaying live information about the cop progression and the various expulsions and destructions of their homes. It is at times unavailable and needs some support.
If you have a Web server with large bandwidth available, it is possible to build a mirror of the original website and update it every now and then to have a more or less up to date version. We are going to use HTTrack to build this mirror.
You need a VPS or any computer reachable for the Internet, an access to its shell, and httrack available. Go to a publicly available folder served of your Web server and build an original mirror of the website. There are a few options to consider here that are discussed below.
httrack "https://zad.nadir.org/" -v -K0 -r2 -x
- Replace the URL by the site you want
-v: verbose mode to see if any error occurs
-K0: forces links to be modified as relative
-r2: set mirror depth to 2
-x: replace all external links to a local HTML page proxy
-K0 option changes all HTML links in
<a> tags to relative links, so they
point to your local copy instead of the original website. It does not however
handle stylesheets or scripts links in the HTML head.
-r2 option limits the mirror to a depth of 2, which means only 1 link
traversal level is done. If you clone the home page of the website, this means
that the links on the home page will be mirrored as well, but the links and
those "subpages" will not be fetched. Most of the time a limit of 2 is enough
and allows you to mirror only the more interesting content and not the whole
website, which would contribute to overload the original website.
-x option changes all external links, even those in the HTML head, to a
external.html proxy page. This removes all direct connection to the
original website and means that your mirrored pages won't contain links that
would continue to be loaded from the original server. Note that this will most
of the time remove the styling and scripts of the pages mirrored!
Once the website has been mirrored once, check that everything looks OK and that
your Web browser does not try to connect to the original website. Then you can
update the mirror every few minutes with the
--update option, directly in a
shell loop (consider using tmux to let it run without being connected to the
shell all the time) or in your crontab.
while true ; do httrack "https://zad.nadir.org/" -v -K0 -r2 -x --update ; sleep 8m ; done
This will update only modified files every 8 minutes. Choose the update time wisely: if it is too short, you will just end up overloading the site as well. Then spread the word about your mirror and hope people start using it instead of the original website. Maybe contact the original admins so they can put a link to your mirror somewhere.