From owner-freebsd-questions@freebsd.org Fri Aug 12 08:04:40 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC26DBB7757 for ; Fri, 12 Aug 2016 08:04:40 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx01.qsc.de (mx01.qsc.de [213.148.129.14]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A667B1D8F for ; Fri, 12 Aug 2016 08:04:40 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from r56.edvax.de (port-92-195-102-47.dynamic.qsc.de [92.195.102.47]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx01.qsc.de (Postfix) with ESMTPS id D3B343CE40; Fri, 12 Aug 2016 10:04:31 +0200 (CEST) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id u7C84VEL002344; Fri, 12 Aug 2016 10:04:31 +0200 (CEST) (envelope-from freebsd@edvax.de) Date: Fri, 12 Aug 2016 10:04:31 +0200 From: Polytropon To: galtsev@kicp.uchicago.edu Cc: freebsd-questions@freebsd.org Subject: Re: script to make webpage snapshot Message-Id: <20160812100431.8af84eeb.freebsd@edvax.de> In-Reply-To: <33717.128.135.52.6.1470954497.squirrel@cosmo.uchicago.edu> References: <33717.128.135.52.6.1470954497.squirrel@cosmo.uchicago.edu> Reply-To: Polytropon Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Aug 2016 08:04:41 -0000 On Thu, 11 Aug 2016 17:28:17 -0500 (CDT), Valeri Galtsev wrote: > Dear Experts, > > Could someone recommend a script or utility one can run from command line > on Linux or UNIX machine to make a snapshot image of webpage? When you say "snapshot", what exactly do you mean? I'm not sure I understand your description correctly. Is a snapshot (a) a _visual_ snapshot (image format or PDF) of how the web page renders inside a web browser, or (b) an exactl local _copy_ (files and directories) on your disk? For option (a), lang/phantomjs has been suggested. Check the mailing list archives - I've been asking that kind if question some years ago, but I cannot remember (or even find) the answers I got. ;-) For option (b), wget probably isn't bad, as long as you add some options to avoid unneeded traffic, such as % wget -r -l 0 -k -nc If you are interested only in a specific sub-path, or subset of file types (or want to reject them), use the -A or -R options. Use -U to set the user agent string to a "real" web browser if needed. See "man wget" for details. This set of options should provide the ability to only "snapshot" those elements of the web page content that have been changed. Things you already have on your local disk won't be downloaded. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...