Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 08 Dec 1997 20:48:21 -0700
From:      Duane Wessels <wessels@nlanr.net>
To:        John Fieber <jfieber@indiana.edu>
Cc:        www@FreeBSD.ORG, kostas@nlanr.net
Subject:   Re: URNs, Mirror sites, and Squid 
Message-ID:  <199712090348.UAA28336@surf>
In-Reply-To: Your message of Mon, 08 Dec 1997 20:04:07 -0500

next in thread | raw e-mail | index | archive | help
John Fieber writes:

>On Mon, 8 Dec 1997, Duane Wessels wrote:
>
>> For more details, see http://squid.nlanr.net/Squid/urn-support.html,
>> or please us for clarification.
>
>Www.freebsd.org has the "FreeBSD"  pages, which are mirrored
>around the world, but it also has quite a few pages that are not
>mirrored and consequently requests for those should not go to a
>mirror, personal home page (/~foobar/...) for example but there
>are some others.  Do you have a canned script that has a
>relatively simple framework for handling these sorts of
>exceptions?

I do now.  It makes the script a bit more complex, but not too much
I hope.  

In fact, here's a script which I think will work for the FreeBSD site.
I just copied your list of mirrors from your home page.

Interestingly, the your mirrors list brings out a small problem.  Some
of the entries end with 'index.html' or 'freebsd.html'.  This simple
script assumes that you can do straight mappings and substring
replacements.  We don't want to map 'urn:www.freebsd.org:/foo' to
'http://www.xx.freebsd.org/index.html/foo'.    I guess we'll have to
see how much of a problem that really becomes.

Duane W.

==============================================================================
#!/usr/local/bin/perl

print "content-type: text/plain\r\n";
print "Expires: ", &http_time(time+3600), "\r\n";
print "\r\n";

if ($ENV{'REQUEST_METHOD'} eq "POST") {
	read(STDIN, $request, $ENV{'CONTENT_LENGTH'});
} elsif ($ENV{'REQUEST_METHOD'} eq "GET" ) {
	$request = $ENV{'QUERY_STRING'};
}
$request = &url_decode($request);

#
# special hack; turn 'urn:foo' into 'urn:foo:/'
# but this doesn't yet work with Squid (1.2.beta9), i.e. Squid
# won't call this script unless the second colon is present.
#
$request .= ':/' unless ($request =~ /([^:]+):([^:]+):/);

$state = 0;
while (<DATA>) {
	chop;
	s/#.*//;
	next unless (/./);
	if ($state == 0) {
		next if (/^\s/);	# skip indented lines
		$URN = $_;
		$state = 1 unless (index($request, $URN, 0) < 0);
	}
	if ($state == 1) {
		next unless (/^\s/);	# skip non-indented lines
		$state = 2;
	}
	if ($state == 2) {
		last unless (/^\s/);	# exit on next non-indented line
		s/^\s+//;
		$URL = $_;
		print $URL . substr($request, length($URN)) . "\n";
	}
}

exit 0;

sub url_decode {
	local($_) = @_;
	tr/+/ /;
	s/%(..)/pack("c",hex($1))/ge;
	$_;
}

sub http_time {
	local($t) = @_;
	local(@T) = gmtime($t);
	local(@WD) = ('Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat');
	local(@MO) = ( 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec');
	sprintf "%s, %d %s %d %02d:%02d:%02d GMT",
		$WD[$T[6]],
		$T[3],
		$MO[$T[4]],
		$T[5],
		$T[2],$T[1],$T[0];
}


# The follwing lines are read as <DATA> above.  The data consists of any
# number of "sections."  Each "section" consists of two parts:  first URN
# prefixes, then URL prefixes indented with whitespace.  e.g.:
#
#	URN1
#	URN2
#		URL1
#		URL2
#		URL3
#		URL4
#
# If one of the specified URN prefixes matches the requested URN, then
# then, for every listed URL prefix, the URL prefix is substituted for
# the URN prefix in the request.  We only process one "section" for each
# URN request.  Thus, more-specific subsets of URN-space should be
# specified before less-specific ones.
__END__
#
# Twiddle directories are not mirrored
urn:www.freebsd.org:/~
urn:www.freebsd.org:/%7e
	http://www.freebsd.org/%7e
#
# mirrors for our Web/HTTP site.
#
urn:www.freebsd.org:/
	http://www.ar.freebsd.org/
	http://www.au.freebsd.org/FreeBSD/
	http://www2.au.freebsd.org/
	http://www3.au.freebsd.org/
	http://www.br.freebsd.org/www.freebsd.org/
	http://www2.br.freebsd.org/www.freebsd.org/
	http://www3.br.freebsd.org/
	http://www.br.freebsd.org/
	http://www2.br.freebsd.org/
	http://www.ca.freebsd.org/
	http://www.cz.freebsd.org/
	http://sunsite.auc.dk/www.freebsd.org/
	http://www.ee.freebsd.org/
	http://www.fi.freebsd.org/
	http://www.fr.freebsd.org/
	http://www.de.freebsd.org/
	http://www.de.freebsd.org/de/
	http://www.hu.freebsd.org/
	http://www.hu.freebsd.org/hu/
	http://www.is.freebsd.org/
	http://www.ie.freebsd.org/
	http://www.it.freebsd.org/
	http://www.jp.freebsd.org/www.freebsd.org/
	http://www.jp.freebsd.org/
	http://www.kr.freebsd.org/
	http://www.lv.freebsd.org/
	http://www.nl.freebsd.org/
	http://www.pl.freebsd.org/
	http://www.pt.freebsd.org/
	http://www2.pt.freebsd.org/
	http://www3.pt.freebsd.org/
	http://www.ru.freebsd.org/
	http://www2.ru.freebsd.org/
	http://www3.ru.freebsd.org/
	http://www.za.freebsd.org/
	http://www2.za.freebsd.org/
	http://www.se.freebsd.org/www.freebsd.org/
	http://www.tw.freebsd.org/
	http://www.ua.freebsd.org/
	http://www2.ua.freebsd.org/
	http://www.uk.freebsd.org/
	http://www.freebsd.org/
	http://www6.freebsd.org/
	http://www7.freebsd.org/
	http://www2.freebsd.org/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199712090348.UAA28336>