Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 28 Jul 2012 22:14:11 +0100
From:      "Simon L. B. Nielsen" <simon@FreeBSD.org>
To:        Glen Barber <gjb@glenbarber.us>
Cc:        webmaster@freebsd.org, Glen Barber <gjb@FreeBSD.org>, World Wide Web Owner <www@FreeBSD.org>, FreeBSD Documentation Masters <doceng@FreeBSD.org>
Subject:   Re: Removal of old/outdated files from www.FreeBSD.org site
Message-ID:  <48CC53EB-AB26-4F52-99AA-7D4ED0B8F85F@FreeBSD.org>
In-Reply-To: <40854dbc-f4c1-4609-9f48-791a1886c0c9@email.android.com>
References:  <20120728041732.GH1485@glenbarber.us> <9B7CD8B1-42CB-487D-9C27-C9F6D39CD600@FreeBSD.org> <40854dbc-f4c1-4609-9f48-791a1886c0c9@email.android.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 28 Jul 2012, at 18:08, Glen Barber wrote:

> "Simon L. B. Nielsen" <simon@FreeBSD.org> wrote:
>=20
>> On 28 Jul 2012, at 05:17, Glen Barber wrote:
>>=20
>> [Stale files]
>>=20
>>> http://www.freebsd.org/doc/en/books/porters-handbook/x5798.html
>>> http://www.freebsd.org/doc/en/books/porters-handbook/x5802.html
>>> http://www.freebsd.org/doc/en/books/porters-handbook/x5834.html
>>>=20
>>> If someone from clusteradm@ (or someone else with access to the
>> machine
>>> on which the documentation build output exists) can remove these old
>>> files so they are not archived by search engines (since in some
>> cases,
>>> old file names can indicate very old files, and worse, very old
>>> information that could be potentially dangerous to a user looking =
for
>>> specific information), I would greatly appreciate it.
>>=20
>> The problem is that it's not a simple thing to do. Our build installs
>> with the option to not install if files are identical, so timestamps
>> can't be used alone.
>=20
> Ah, I did forget about that.
>=20
>> E.g.
>> http://www.freebsd.org/doc/en/books/porters-handbook/TRADEMARKS.html =
:
>>=20
>> -r--r--r--  1 www  wwwadm  4494 Jan  9  2010 TRADEMARKS.html
>>=20
>> I have removed the mentioned files but I don't have time to do a full
>> sweep as I might end up deleting too much.
>=20
> Ok, thank you.  My big concern is if someone "accidentally" finds an =
old document and does something potentially dangerous to their system.

I agree they should be moreved.

As a reference, the build script is at: =
http://svnweb.freebsd.org/doc/head/share/tools/webupdate

So anyone wanting to try and fix that can start reading that. The simple =
brute force solution would e.g. be a weekly install to a separate dir =
and then check which files should not be in the dir we serve =
www.freebsd.org off.

Another solution might be to make the weekly full build install to a =
different dir and switch the clean and the old dir... but I slightly =
worry that any error in the script will result in no content on www.

>>> Furthermore, if someone with the appropriate access can provide a
>> list
>>> of similarly-named files (which also likely can be fixed with adding
>> a
>>> section id to the source), I will personally fix the section id so
>> these
>>> files do not occur again.  (It would be even more helpful if the
>> files
>>> could be provided as an attachment so I can view the source to track
>>> down from where they are being generated.)
>>=20
>> There is no need for special access to do that. Just build all the =
docs
>> in html-split and find xNNNN.html files. It's a regular thing which =
has
>> to be done as people forget when adding new content. I also remember
>> hunting down those files when I more active in doc.
>=20
> I will look into this for a permanent solution then.  It is difficult =
to spot unless local changes are made though.  But, 'make clean' =
followed by 'svn stat' will reveal these edge cases.

Hmm, how is it difficult to spot? A build of a document should never =
ever produce an xNNNNN.html file. If it does, a sect1 is missing an id.

Or am I missing something here?

If you don't want to build everything, you could also just=20

--=20
Simon L. B. Nielsen




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48CC53EB-AB26-4F52-99AA-7D4ED0B8F85F>