Date: Fri, 12 Aug 2016 10:04:31 +0200 From: Polytropon <freebsd@edvax.de> To: galtsev@kicp.uchicago.edu Cc: freebsd-questions@freebsd.org Subject: Re: script to make webpage snapshot Message-ID: <20160812100431.8af84eeb.freebsd@edvax.de> In-Reply-To: <33717.128.135.52.6.1470954497.squirrel@cosmo.uchicago.edu> References: <33717.128.135.52.6.1470954497.squirrel@cosmo.uchicago.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 11 Aug 2016 17:28:17 -0500 (CDT), Valeri Galtsev wrote: > Dear Experts, > > Could someone recommend a script or utility one can run from command line > on Linux or UNIX machine to make a snapshot image of webpage? When you say "snapshot", what exactly do you mean? I'm not sure I understand your description correctly. Is a snapshot (a) a _visual_ snapshot (image format or PDF) of how the web page renders inside a web browser, or (b) an exactl local _copy_ (files and directories) on your disk? For option (a), lang/phantomjs has been suggested. Check the mailing list archives - I've been asking that kind if question some years ago, but I cannot remember (or even find) the answers I got. ;-) For option (b), wget probably isn't bad, as long as you add some options to avoid unneeded traffic, such as % wget -r -l 0 -k -nc <source> If you are interested only in a specific sub-path, or subset of file types (or want to reject them), use the -A or -R options. Use -U to set the user agent string to a "real" web browser if needed. See "man wget" for details. This set of options should provide the ability to only "snapshot" those elements of the web page content that have been changed. Things you already have on your local disk won't be downloaded. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160812100431.8af84eeb.freebsd>