Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 4 Mar 2001 10:56:18 +0000
From:      Nik Clayton <nik@freebsd.org>
To:        Stefan `Sec` Zehl <sec@42.org>
Cc:        doc@FreeBSD.ORG
Subject:   Re: cvs commit: www/en Makefile
Message-ID:  <20010304105618.A300@canyon.nothing-going-on.org>
In-Reply-To: <20010303173639.B25057@matrix.42.org>; from sec@42.org on Sat, Mar 03, 2001 at 05:36:39PM %2B0100
References:  <200102241031.f1OAVTZ82598@freefall.freebsd.org> <20010225064044.A68105@canyon.nothing-going-on.org> <20010227122027.A2079@paula.panke.de.freebsd.org> <20010227121401.A2631@canyon.nothing-going-on.org> <20010228224508.A2745@paula.panke.de.freebsd.org> <20010228233653.A1692@canyon.nothing-going-on.org> <20010303173639.B25057@matrix.42.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--a8Wt8u1KmwUX3Y2C
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Mar 03, 2001 at 05:36:39PM +0100, Stefan `Sec` Zehl wrote:
> On Wed, Feb 28, 2001 at 11:36:53PM +0000, Nik Clayton wrote:
> > On Wed, Feb 28, 2001 at 10:45:08PM +0100, Wolfram Schneider wrote:
> > > Symlinks on a web server are evil. It hurt. Don't do that! Period!!!
> >=20
> > References?  Granted, it's been about 18 months since I was doing web
> > work professionally, but that's not a view I've heard expoused with such
> > vehemence.  It's true that depending on the server configuration you
> > might incur an extra lstat(2) call, but that's about it.
> >=20
> > If it's a huge problem, we can always make .../{FAQ, handbook, tutorial=
s}=20
> > be real directories, and then populate them with hardlinks instead.
> >=20
> > Either way, the content is only on the disk once, rather than in multip=
le
> > places.
>=20
> Symlinks _are_ evil. The alternate paths will (eventually) get linked
> somewhere. This will induce more load by the Webspiders which find
> everthing twice. These alternate locations will pollute the caches, too.
> The pages will show up in duplicate.=20

robots.txt solves all these problems.

> And last of all, you can't tell its
> a symlink which means this breaks down when mirroring via wget/webcopy.

You shouldn't be mirroring like that, you should be pulling down the
www/ and doc/ repositories, and building the site locally.

<snip>

> If you really must do it, put rules in the webserver config to disallow
> acces to all the alternate paths except one.

We can't do that.  The whole point is to put *all* the documentation
somewhere central, whilst maintaining support for legacy URLs like
/handbook/ and /FAQ/.  I think we can do this in one of three ways:

  1.  Use Alias or similar in the webserver config file.

      Pro: Uses very little disk space.

      Con: Has all the problems you outline above, in terms of the same
      content being available from multiple URLs.

      Con: Means that our mirrors have to know what our web server
      config file looks like.

      Con: Means that if you try to test the website locally you need to
      be running a webserver in order to check everything.

  2.  Install the same content multiple times in the web tree.

      Pro: Very simple to do.

      Pro: Means content will work when testing locally.

      Con: Additional disk space taken up by duplicated content.

  3.  Use symlinks.

      Pro: All the advantages of (2), without the Con:

The alternative is to continue kludging documentation in to our existing
structure on the website.  This structure boils down to:

  1.  "Important" documents are accessible from the document root
      (/handbook, /FAQ).

  2.  Everything else comes under a tutorials/ section.

This categorisation doesn't work.  It used to work when we only had the
FAQ, the Handbook, and a couple of other small documents, but the doc/
repo is growing.  The second chapter of "The Design and Implementation=20
of 4.4BSD" shouldn't (IMHO) appear directly under the document root, nor=20
is it a tutorial.  Ditto for the FDP Primer, the Porter's Handbook, the=20
Committer's Guide, the Developer's Handbook, ... .  I hope to have a
chapter of "The FreeBSD Corporate Networker's Guide" up soon as well,
which won't fit in to the existing structure.

We have a structure under doc/ that works and that is very easy to classify
documentation in to.  The paths are a little long, but most people are
never going to be typing those paths in -- they'll either be clicking on
links on our site, or links returned from a search engine, or entries in
their bookmarks, so I think this is a non-issue.

The only thing we absolutely *must* do is make sure that existing URLs
continue to work.  And I think the best way to do this is with a
combination of symlinks and a robots.txt file that stops search engines
from indexing the linked content.

N
--=20
FreeBSD: The Power to Serve             http://www.freebsd.org/
FreeBSD Documentation Project           http://www.freebsd.org/docproj/

          --- 15B8 3FFC DDB4 34B0 AA5F  94B7 93A8 0764 2C37 E375 ---

--a8Wt8u1KmwUX3Y2C
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (FreeBSD)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjqiH1EACgkQk6gHZCw343VuhwCcCn8Swhw9M1ndPY8fsIygiyOV
bMcAoIwlNL34XRZRpewIdjX5u3pLlQwq
=6lIY
-----END PGP SIGNATURE-----

--a8Wt8u1KmwUX3Y2C--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010304105618.A300>