From owner-freebsd-questions Wed Mar 19 3: 8:26 2003 Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AB9D937B401 for ; Wed, 19 Mar 2003 03:08:24 -0800 (PST) Received: from smtp.mailbox.co.uk (smtp.mailbox.co.uk [195.82.125.32]) by mx1.FreeBSD.org (Postfix) with ESMTP id 423CF43F3F for ; Wed, 19 Mar 2003 03:08:23 -0800 (PST) (envelope-from wayne.pascoe@penguinpowered.org.uk) Received: from [212.18.244.168] (helo=marvin.penguinpowered.org.uk) by smtp.mailbox.co.uk with esmtp (Exim 3.36 #1) id 18vbQk-0000Ot-00 for freebsd-questions@freebsd.org; Wed, 19 Mar 2003 11:08:22 +0000 Received: from waynep by marvin.penguinpowered.org.uk with local (Exim 3.33 #1) id 18vbaq-000Jy6-00 for freebsd-questions@freebsd.org; Wed, 19 Mar 2003 11:18:48 +0000 Date: Wed, 19 Mar 2003 11:18:48 +0000 From: Wayne Pascoe To: freebsd-questions@freebsd.org Subject: Checking an out of date website Message-ID: <20030319111848.GA76626@marvin.penguinpowered.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-System: FreeBSD i386 with kernel 4.7-RC Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi all, I'm wondering if someone can recommend a tool to help me do the following : I have a very large website (2.4GB) of which much of the content is out of date / no longer used. What I want is a tool that will download a copy of the website, so that I can see which bits are still linked to and part of the site heirachy. If I match the list of files from this tool against the list of files on the server, the difference should be all files that are not linked to from the site, and are not navigable to. In theory, I could then safely delete these files. I've found a couple of tools that can mostly do this. My problem comes in from the fact that much of the navigation is done in flash though. So most of the link checkers / spiders can't follow these links. This could mean that what I end up with is not the whole of the site that is still active. Does anyone have any advice for a solution to this problem ? Thanks in advance, -- Wayne Pascoe To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message