Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 May 2013 19:44:11 +0100
From:      Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, Eugen-Andrei Gavriloaie <shiretu@gmail.com>
Subject:   Re: Managing userland data pointers in kqueue/kevent
Message-ID:  <20130513194411.5a2dfa2e@shy.leonerd.org.uk>
In-Reply-To: <CAJ-VmonjOS7Axq18VxL6B53T4DVgAYdY3LaT8nd7afFj2dq3cA@mail.gmail.com>
References:  <CCE4FFC4-F846-4F81-85EE-776B753C63C6@gmail.com> <20130513185357.1c552be5@shy.leonerd.org.uk> <CAJ-VmomQmPjtUhUo2%2BK=0Ychw-=qgawrZt3hnYeCPNNhA9T50A@mail.gmail.com> <CAJ-VmonKC_7J=aNgRntub9DN%2BEfQxrhMjstXHSJ634%2BaFemcLg@mail.gmail.com> <20130513191513.786f4f02@shy.leonerd.org.uk> <8A02C28F-89CB-4AE3-A91A-89565F041FDE@gmail.com> <CAJ-VmonjOS7Axq18VxL6B53T4DVgAYdY3LaT8nd7afFj2dq3cA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/DY4oum+taxoYivj5kXTwjRm
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Mon, 13 May 2013 11:23:45 -0700
Adrian Chadd <adrian@freebsd.org> wrote:

> Just as a data point, I managed 50,000 + connections, at 5,000 + a
> second, doing a gigabit + of traffic, mid-2000s, with the userland
> management of all of the socket/disk FD stuff.
>=20
> The biggest overhead at the time was actually the read/write
> copyin/copyout, NOT the locking overhead of managing this stuff. Why?
> Because I architected the HTTP side of things to specifically pin FDs
> to threads, and not allow arbitrary threads to deal with arbitrary
> FDs. This removed the need for almost all of the state locking that
> people are concerned about here.

I think then this comes from different experiences.

I'm guessing this application was:

  a) Written in C
  b) Entirely filled with identically-typed identical-purpose file
     descriptors
  c) Didn't really use any EV_ONESHOT events
  d) Didn't close sockets apart from when it received EOF
and perhaps most importantly
  e) Was entirely self-contained - did everything from one unified
     block of source code.

I.e. a very simple set of semantics. I'll explain the situation that I
had.

The reason I ran into the problem needing EV_DROPWATCH/EV_DROPPED was
because I was trying to fix Perl's IO::KQueue.

IO::KQueue tries to wrap kqueue/kevent for Perl, allowing the userland
Perl code to store an arbitrary Perl data pointer in the udata field.
This data is reference-counted. Userland might let the kernel store the
only copy of that data, because it comes back in event notifications
anyway. Because of this, the reference count has to be artificially
incremented to account for the extra pointer in the kernel. Without
knowing when the kernel will decide to drop that pointer, I never know
when I should decrement the refcount myself.

It has no knowledge of what userland is doing with this. It can't know
when userland might be EV_ONESHOT'ing. It doesn't really know what
events will be oneshot anyway (such as the process exit watches).
Finally, it has no idea what other modules are going to call close() on
it. This final problem was the real killer - while the first two
-could- be worked around with more complex code structures, not knowing
what other CPAN modules will ever call close() makes it impossible to
handle. Simply asking every CPAN module to "please just call fd_close()
instead of close()" doesn't work here.

As compared: having the kernel tell userland when it calls knote_drop()
is much simpler. It knows exactly when it is doing this, so simply
pushing an event up to userland to tell it it did so is simple. If any
more cases than the three known (EV_ONESHOT or other single-shot events;
EV_DELETE, close()) are added, userland - and in particular, the
IO::KQueue module, will not need updating. It will continue to
decrement refcounts and free data perfectly happily when kernel has
dropped the watch.

I've used this pattern before in C libraries + higher-level language
wrappers, and found it to be nicely simple to both implement and use.
Because it follows the -same- event notification path that userland is
already using, it manages to avoid quite a number of the
race-conditions that a secondary, separate data structure and locking
often runs into; e.g. if userland is trying to add a new thing into it
just at the time there's a notification "in-flight" from the kernel
about an old thing that it used to have.

Principly - the fact that kernel tells -userland- about the delete,
means that it can atomically *guarantee* that this *will* be the last
event about this particular item. Userland must not delete its own data
structure about it until this notification happens. If it does this,
lots of semantics become a lot simpler.

--=20
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

--Sig_/DY4oum+taxoYivj5kXTwjRm
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlGRNHsACgkQvLS2TC8cBo1jYACbBS6dal7hCHHhjdD380VV/d+f
FsoAoLmO/1K334Fn4N23BpDjy1U4HzP3
=lksG
-----END PGP SIGNATURE-----

--Sig_/DY4oum+taxoYivj5kXTwjRm--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130513194411.5a2dfa2e>