Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 07 Apr 2008 20:46:17 -0300
From:      =?UTF-8?B?Sm/Do28gQ2FybG9zIE1lbmRlcyBMdcOtcw==?= <jonny@jonny.eng.br>
To:        pav@FreeBSD.org
Cc:        hubs@FreeBSD.org
Subject:   Re: package distribution crisis - CDN needed
Message-ID:  <47FAB249.2020001@jonny.eng.br>
In-Reply-To: <1207605059.1031.38.camel@ikaros.oook.cz>
References:  <1207605059.1031.38.camel@ikaros.oook.cz>

next in thread | previous in thread | raw e-mail | index | archive | help
Pav Lucistnik wrote:
> Okay the situation recently was that the mirrors had no chance keeping
> up with all the package sets I've been uploading to ftp-master.
>
> We clearly need to move beyond rsync/cvsup synced ftp mirrors. This does
> not scale.
>
> I do propose a creation of a CDN (Content Delivery Network), having
> these features:
>
> - no mirroring of a complete package set! (Also no directory listings.)
>   When client requests the file, and the file is not in the local cache,
>   the file is downloaded from the upstream server and while it's being
>   obtained, it's already being sent to the client. This is basically
>   squid.
>
> - if the file is present in the local cache, it's returned from local
>   cache.
>
> - local cache is invalidated when a new package set is available on
>   an upstream server. Invalidating mechanism:
>   option a) cronjob that polls upstream server every 5 minutes for a
>             file that gives current package set IDs (pull)
>   option b) master server sends notification to all mirrors to
>             invalidate a package set (push)
>   optimization: when package set was invalidated, don't delete old
>   files, instead on next hit, verify timestamp against upstream server
>
> - atomic package set uploads to master from pointyhat (probably having
>   two directories that are switched over on master)
>
> - everything runs over http
>
> - default source of files for "pkg_add -r" command
>
> The goal is to refresh a package set on a daily basis.
>
>
> I don't know if we can use some existing software for this (Squid?
> Apache mod_proxy?) or if we will need to put something new together.
> Ideas?
>   
I am not sure if this would solve anything, but if we go further in this 
direction, I'd like to see some architecture with prefetch capability.

Note also that a real CDN would hide from the final user the real data 
location, and this would be selected by some sort of proximity and/or 
load information.  Some CDNs indeed use proxy cache to central server as 
means of populating its own data, but proxy caching is only a small part 
of the solution.

I did not follow whatever situation happened recently, but I had some 
trouble in the past with late announcements for mirror administrators.  
I had sometimes received the announce just like any other FreeBSD user.  
And even in that cases, packages were distributed much time earlier than 
final release.

                                        Jonny

-- 
João Carlos Mendes Luís - Networking Engineer - jonny@jonny.eng.br




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47FAB249.2020001>