Date: Mon, 13 May 2013 19:44:11 +0100 From: Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> To: Adrian Chadd <adrian@freebsd.org> Cc: freebsd-hackers@freebsd.org, Eugen-Andrei Gavriloaie <shiretu@gmail.com> Subject: Re: Managing userland data pointers in kqueue/kevent Message-ID: <20130513194411.5a2dfa2e@shy.leonerd.org.uk> In-Reply-To: <CAJ-VmonjOS7Axq18VxL6B53T4DVgAYdY3LaT8nd7afFj2dq3cA@mail.gmail.com> References: <CCE4FFC4-F846-4F81-85EE-776B753C63C6@gmail.com> <20130513185357.1c552be5@shy.leonerd.org.uk> <CAJ-VmomQmPjtUhUo2%2BK=0Ychw-=qgawrZt3hnYeCPNNhA9T50A@mail.gmail.com> <CAJ-VmonKC_7J=aNgRntub9DN%2BEfQxrhMjstXHSJ634%2BaFemcLg@mail.gmail.com> <20130513191513.786f4f02@shy.leonerd.org.uk> <8A02C28F-89CB-4AE3-A91A-89565F041FDE@gmail.com> <CAJ-VmonjOS7Axq18VxL6B53T4DVgAYdY3LaT8nd7afFj2dq3cA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/DY4oum+taxoYivj5kXTwjRm Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Mon, 13 May 2013 11:23:45 -0700 Adrian Chadd <adrian@freebsd.org> wrote: > Just as a data point, I managed 50,000 + connections, at 5,000 + a > second, doing a gigabit + of traffic, mid-2000s, with the userland > management of all of the socket/disk FD stuff. >=20 > The biggest overhead at the time was actually the read/write > copyin/copyout, NOT the locking overhead of managing this stuff. Why? > Because I architected the HTTP side of things to specifically pin FDs > to threads, and not allow arbitrary threads to deal with arbitrary > FDs. This removed the need for almost all of the state locking that > people are concerned about here. I think then this comes from different experiences. I'm guessing this application was: a) Written in C b) Entirely filled with identically-typed identical-purpose file descriptors c) Didn't really use any EV_ONESHOT events d) Didn't close sockets apart from when it received EOF and perhaps most importantly e) Was entirely self-contained - did everything from one unified block of source code. I.e. a very simple set of semantics. I'll explain the situation that I had. The reason I ran into the problem needing EV_DROPWATCH/EV_DROPPED was because I was trying to fix Perl's IO::KQueue. IO::KQueue tries to wrap kqueue/kevent for Perl, allowing the userland Perl code to store an arbitrary Perl data pointer in the udata field. This data is reference-counted. Userland might let the kernel store the only copy of that data, because it comes back in event notifications anyway. Because of this, the reference count has to be artificially incremented to account for the extra pointer in the kernel. Without knowing when the kernel will decide to drop that pointer, I never know when I should decrement the refcount myself. It has no knowledge of what userland is doing with this. It can't know when userland might be EV_ONESHOT'ing. It doesn't really know what events will be oneshot anyway (such as the process exit watches). Finally, it has no idea what other modules are going to call close() on it. This final problem was the real killer - while the first two -could- be worked around with more complex code structures, not knowing what other CPAN modules will ever call close() makes it impossible to handle. Simply asking every CPAN module to "please just call fd_close() instead of close()" doesn't work here. As compared: having the kernel tell userland when it calls knote_drop() is much simpler. It knows exactly when it is doing this, so simply pushing an event up to userland to tell it it did so is simple. If any more cases than the three known (EV_ONESHOT or other single-shot events; EV_DELETE, close()) are added, userland - and in particular, the IO::KQueue module, will not need updating. It will continue to decrement refcounts and free data perfectly happily when kernel has dropped the watch. I've used this pattern before in C libraries + higher-level language wrappers, and found it to be nicely simple to both implement and use. Because it follows the -same- event notification path that userland is already using, it manages to avoid quite a number of the race-conditions that a secondary, separate data structure and locking often runs into; e.g. if userland is trying to add a new thing into it just at the time there's a notification "in-flight" from the kernel about an old thing that it used to have. Principly - the fact that kernel tells -userland- about the delete, means that it can atomically *guarantee* that this *will* be the last event about this particular item. Userland must not delete its own data structure about it until this notification happens. If it does this, lots of semantics become a lot simpler. --=20 Paul "LeoNerd" Evans leonerd@leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/ --Sig_/DY4oum+taxoYivj5kXTwjRm Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlGRNHsACgkQvLS2TC8cBo1jYACbBS6dal7hCHHhjdD380VV/d+f FsoAoLmO/1K334Fn4N23BpDjy1U4HzP3 =lksG -----END PGP SIGNATURE----- --Sig_/DY4oum+taxoYivj5kXTwjRm--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130513194411.5a2dfa2e>