Date: Sun, 4 Mar 2001 10:56:18 +0000 From: Nik Clayton <nik@freebsd.org> To: Stefan `Sec` Zehl <sec@42.org> Cc: doc@FreeBSD.ORG Subject: Re: cvs commit: www/en Makefile Message-ID: <20010304105618.A300@canyon.nothing-going-on.org> In-Reply-To: <20010303173639.B25057@matrix.42.org>; from sec@42.org on Sat, Mar 03, 2001 at 05:36:39PM %2B0100 References: <200102241031.f1OAVTZ82598@freefall.freebsd.org> <20010225064044.A68105@canyon.nothing-going-on.org> <20010227122027.A2079@paula.panke.de.freebsd.org> <20010227121401.A2631@canyon.nothing-going-on.org> <20010228224508.A2745@paula.panke.de.freebsd.org> <20010228233653.A1692@canyon.nothing-going-on.org> <20010303173639.B25057@matrix.42.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On Sat, Mar 03, 2001 at 05:36:39PM +0100, Stefan `Sec` Zehl wrote:
> On Wed, Feb 28, 2001 at 11:36:53PM +0000, Nik Clayton wrote:
> > On Wed, Feb 28, 2001 at 10:45:08PM +0100, Wolfram Schneider wrote:
> > > Symlinks on a web server are evil. It hurt. Don't do that! Period!!!
> >
> > References? Granted, it's been about 18 months since I was doing web
> > work professionally, but that's not a view I've heard expoused with such
> > vehemence. It's true that depending on the server configuration you
> > might incur an extra lstat(2) call, but that's about it.
> >
> > If it's a huge problem, we can always make .../{FAQ, handbook, tutorials}
> > be real directories, and then populate them with hardlinks instead.
> >
> > Either way, the content is only on the disk once, rather than in multiple
> > places.
>
> Symlinks _are_ evil. The alternate paths will (eventually) get linked
> somewhere. This will induce more load by the Webspiders which find
> everthing twice. These alternate locations will pollute the caches, too.
> The pages will show up in duplicate.
robots.txt solves all these problems.
> And last of all, you can't tell its
> a symlink which means this breaks down when mirroring via wget/webcopy.
You shouldn't be mirroring like that, you should be pulling down the
www/ and doc/ repositories, and building the site locally.
<snip>
> If you really must do it, put rules in the webserver config to disallow
> acces to all the alternate paths except one.
We can't do that. The whole point is to put *all* the documentation
somewhere central, whilst maintaining support for legacy URLs like
/handbook/ and /FAQ/. I think we can do this in one of three ways:
1. Use Alias or similar in the webserver config file.
Pro: Uses very little disk space.
Con: Has all the problems you outline above, in terms of the same
content being available from multiple URLs.
Con: Means that our mirrors have to know what our web server
config file looks like.
Con: Means that if you try to test the website locally you need to
be running a webserver in order to check everything.
2. Install the same content multiple times in the web tree.
Pro: Very simple to do.
Pro: Means content will work when testing locally.
Con: Additional disk space taken up by duplicated content.
3. Use symlinks.
Pro: All the advantages of (2), without the Con:
The alternative is to continue kludging documentation in to our existing
structure on the website. This structure boils down to:
1. "Important" documents are accessible from the document root
(/handbook, /FAQ).
2. Everything else comes under a tutorials/ section.
This categorisation doesn't work. It used to work when we only had the
FAQ, the Handbook, and a couple of other small documents, but the doc/
repo is growing. The second chapter of "The Design and Implementation
of 4.4BSD" shouldn't (IMHO) appear directly under the document root, nor
is it a tutorial. Ditto for the FDP Primer, the Porter's Handbook, the
Committer's Guide, the Developer's Handbook, ... . I hope to have a
chapter of "The FreeBSD Corporate Networker's Guide" up soon as well,
which won't fit in to the existing structure.
We have a structure under doc/ that works and that is very easy to classify
documentation in to. The paths are a little long, but most people are
never going to be typing those paths in -- they'll either be clicking on
links on our site, or links returned from a search engine, or entries in
their bookmarks, so I think this is a non-issue.
The only thing we absolutely *must* do is make sure that existing URLs
continue to work. And I think the best way to do this is with a
combination of symlinks and a robots.txt file that stops search engines
from indexing the linked content.
N
--
FreeBSD: The Power to Serve http://www.freebsd.org/
FreeBSD Documentation Project http://www.freebsd.org/docproj/
--- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 ---
[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (FreeBSD)
Comment: For info see http://www.gnupg.org
iEYEARECAAYFAjqiH1EACgkQk6gHZCw343VuhwCcCn8Swhw9M1ndPY8fsIygiyOV
bMcAoIwlNL34XRZRpewIdjX5u3pLlQwq
=6lIY
-----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010304105618.A300>
