Date: Sat, 28 Jul 2012 22:14:11 +0100 From: "Simon L. B. Nielsen" <simon@FreeBSD.org> To: Glen Barber <gjb@glenbarber.us> Cc: webmaster@freebsd.org, Glen Barber <gjb@FreeBSD.org>, World Wide Web Owner <www@FreeBSD.org>, FreeBSD Documentation Masters <doceng@FreeBSD.org> Subject: Re: Removal of old/outdated files from www.FreeBSD.org site Message-ID: <48CC53EB-AB26-4F52-99AA-7D4ED0B8F85F@FreeBSD.org> In-Reply-To: <40854dbc-f4c1-4609-9f48-791a1886c0c9@email.android.com> References: <20120728041732.GH1485@glenbarber.us> <9B7CD8B1-42CB-487D-9C27-C9F6D39CD600@FreeBSD.org> <40854dbc-f4c1-4609-9f48-791a1886c0c9@email.android.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28 Jul 2012, at 18:08, Glen Barber wrote: > "Simon L. B. Nielsen" <simon@FreeBSD.org> wrote: >=20 >> On 28 Jul 2012, at 05:17, Glen Barber wrote: >>=20 >> [Stale files] >>=20 >>> http://www.freebsd.org/doc/en/books/porters-handbook/x5798.html >>> http://www.freebsd.org/doc/en/books/porters-handbook/x5802.html >>> http://www.freebsd.org/doc/en/books/porters-handbook/x5834.html >>>=20 >>> If someone from clusteradm@ (or someone else with access to the >> machine >>> on which the documentation build output exists) can remove these old >>> files so they are not archived by search engines (since in some >> cases, >>> old file names can indicate very old files, and worse, very old >>> information that could be potentially dangerous to a user looking = for >>> specific information), I would greatly appreciate it. >>=20 >> The problem is that it's not a simple thing to do. Our build installs >> with the option to not install if files are identical, so timestamps >> can't be used alone. >=20 > Ah, I did forget about that. >=20 >> E.g. >> http://www.freebsd.org/doc/en/books/porters-handbook/TRADEMARKS.html = : >>=20 >> -r--r--r-- 1 www wwwadm 4494 Jan 9 2010 TRADEMARKS.html >>=20 >> I have removed the mentioned files but I don't have time to do a full >> sweep as I might end up deleting too much. >=20 > Ok, thank you. My big concern is if someone "accidentally" finds an = old document and does something potentially dangerous to their system. I agree they should be moreved. As a reference, the build script is at: = http://svnweb.freebsd.org/doc/head/share/tools/webupdate So anyone wanting to try and fix that can start reading that. The simple = brute force solution would e.g. be a weekly install to a separate dir = and then check which files should not be in the dir we serve = www.freebsd.org off. Another solution might be to make the weekly full build install to a = different dir and switch the clean and the old dir... but I slightly = worry that any error in the script will result in no content on www. >>> Furthermore, if someone with the appropriate access can provide a >> list >>> of similarly-named files (which also likely can be fixed with adding >> a >>> section id to the source), I will personally fix the section id so >> these >>> files do not occur again. (It would be even more helpful if the >> files >>> could be provided as an attachment so I can view the source to track >>> down from where they are being generated.) >>=20 >> There is no need for special access to do that. Just build all the = docs >> in html-split and find xNNNN.html files. It's a regular thing which = has >> to be done as people forget when adding new content. I also remember >> hunting down those files when I more active in doc. >=20 > I will look into this for a permanent solution then. It is difficult = to spot unless local changes are made though. But, 'make clean' = followed by 'svn stat' will reveal these edge cases. Hmm, how is it difficult to spot? A build of a document should never = ever produce an xNNNNN.html file. If it does, a sect1 is missing an id. Or am I missing something here? If you don't want to build everything, you could also just=20 --=20 Simon L. B. Nielsen
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48CC53EB-AB26-4F52-99AA-7D4ED0B8F85F>