Date: Sun, 9 Nov 2003 11:31:59 +0100 (CET) From: Barry Bouwsma <freebsd-misuser@remove-NOSPAM-to-reply.NOSPAM.dyndns.dk> To: hubs@freebsd.org Subject: RE: cvsup server operation Message-ID: <200311091031.hA9AVw734616@NOSPAM.spam.NOSPAM.spam.NOSPAM.dyndns.dk> References: <20031010060149.GA3707@math.uic.edu> <XFMail.20031010132106.jdp@polstra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[Oooh, this is an old one, but I had to chime in. Don't reply, or at least drop the hostname part of the above valid IPv6-only address above to obtain some theoretically IPv4-capable e-mail which will probably bounce anyway] Boy howdy! Youse'all said: > But I check the log files periodically. Any time I notice somebody > abusing a mirror (e.g., with cronjob updates more frequently than once > an hour) I simply blacklist them in the cvsupd.access file. I feel no > remorse at all about denying access to greedy jerks. Likewise, when Ooooh, yeah! My sort of sysadminning! I have a new hero. > I catch people doing simultaneous updates from multiple machines at > their site, I add a rule to cvsupd.access that limits them to 1 update > at a time from their subnet. I always have a great big smile on my > face when I do that. No guilt whatsoever. :-) Ah, mmmm. A sadist after my own heart. Wonderful. Now, not to rain on your parade or anything, but let me turn the tables and allow me to play the victim. Just help me into those handcuffs (a little tighter please, oh yesss), let's negotiate a safeword (or not), and put that ball gag into my mouth and I'll now attempfmft tmm mmf gmfmmm rrmmmmfmffgg... As you (of all people) are well aware, there are several different behavioural patterns that a CVSup session may follow, with different load patterns (cpu, bandwidth). Not only do I play a masochist on TV but I also suffer a dial-in connection a hundred times slower than your average broadband abuser, a CPU 50 times slower than your typical gamer, and disk bandwidth that, well, matches my online bandwidth, so I really and truly am a masochist (oh portblock me harder, oooh). At one extreme is the checkout mode of CVsup, where one is initially populating a repository, or appending to the mail archives, and such. This saturates the download bandwidth, but generally leaves the CPU idle along with the upstream path. At the other extreme is updating an up-to-date repository, with few changes, where the upstream bandwidth generally is pegged with the list file contents, with a comparable amount of data being returned, but little for the CPU to do. The clever user will keep the disk idle with the `-s' option. A third extreme is tagging of a repository, where practically every file is touched. The upstream bandwidth from the list file roughly matches the downstream bandwidth with the tags needed, but here these tags need to be merged into pretty much every file, checksums generated, and compared for errors. This is heavily CPU-intensive (at least on my double-digit MHz machine where I mirror the repositories), and keeps the disk doing something, but if run alone, one sees vast quantities of idle bandwidth in both directions. At least on my machine. Probably not for a Normal human, though. But your normal CVSup session will be a mixture of dumping the list file upstream, getting some new files in checkout, adding deltas and calculating checksums, and generally keeping busy (more busy the longer one goes without updating), yet still with a scattering of all the above described idle scenarios. So in order to maximize the amount of data in both directions, and thus minimize online time, it helps greatly to combine a download-bandwidth- heavy cvsup session (say, updating the mail archives, or checking out DragonFly like I've been doing for a few weeks off and on), with an upload/CPU-intensive update (pretty much anything else that I've done somewhat recently, or when a new tag appears). That means I almost always have two if not three cvsup processes at the same time running -- a virginal checkout, plus something upstream- bandwidth intensive -- usually a freshly-tagged update, or www rsync, or gnats update, in order to keep the pipes in both directions as full as possible and my CPU in a sweat. I can spend five hours getting tags added to a repository, or six hours to get the same tags and complete updates of a handful of other repositories that alone would require a few hours, so I do that. For reasons of topology, I'd prefer to do all the updates all from the nearest site, but due to access restrictions, I invariably need to split the updates among three sites. (I see your sneer. Hrmph) Also, since I try to mirror all the repositories I can get my manacled hands on, I must naturally use different sites, so the per-site one-connection limit seldom affects me except when I forget I can't update mail-archive and www for FreeBSD from the same site at the same time. Watching `netstat -w 1' shows me how much more effective this is too, unless all of the sessions end up checking out heaps of new files. But with my infrequent online activity, I'm usually downloading fresh tags somewhere (4.9-RELEASE heading my way Real Soon Now[tm], honest) as well as a brand new repository. This isn't quite the case where I'm using one of them thar newfangled download-accelerators to open 50 FTP connections for one file to get so much download bandwidth it all over you screen, but instead trying to make more effective use of both directions, pipelining, and so on. Your logs will also give a clue as to the relative activity, comparing the ratio of data in to data out to hint to the type of update, along with the time required, that really doesn't distinguish between tagging over a 2400 baud dial-in vs. tagging a dog-slow machine over broadband with idle bandwidth up the wazoo. Of course, no MODERN machine is going to see idle bandwidth when hit with new tags on every file, so I'm not going to claim that anyone else will see any benefit from multiple sessions, but now you know quite well who it is you're tightening the thumbscrews upon with each tweak to your access files. Happy yet? Thank you sir, may I have another. Barry Bouwsma whip me, beat me, make me buildworld without -DNOCLEAN (of course, I'm not leeching off any servers you admin, but I just wanted to point out there could be reasons other than greed to be running several cvsup sessions at once, at least that I attempt to justify...)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200311091031.hA9AVw734616>