Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 May 2013 02:14:55 -0400
From:      Julian Elischer <julian@freebsd.org>
To:        "Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk>
Cc:        freebsd-hackers@freebsd.org, Adrian Chadd <adrian@freebsd.org>, Eugen-Andrei Gavriloaie <shiretu@gmail.com>
Subject:   Re: Managing userland data pointers in kqueue/kevent
Message-ID:  <519327DF.6060002@freebsd.org>
In-Reply-To: <20130513194411.5a2dfa2e@shy.leonerd.org.uk>
References:  <CCE4FFC4-F846-4F81-85EE-776B753C63C6@gmail.com> <20130513185357.1c552be5@shy.leonerd.org.uk> <CAJ-VmomQmPjtUhUo2%2BK=0Ychw-=qgawrZt3hnYeCPNNhA9T50A@mail.gmail.com> <CAJ-VmonKC_7J=aNgRntub9DN%2BEfQxrhMjstXHSJ634%2BaFemcLg@mail.gmail.com> <20130513191513.786f4f02@shy.leonerd.org.uk> <8A02C28F-89CB-4AE3-A91A-89565F041FDE@gmail.com> <CAJ-VmonjOS7Axq18VxL6B53T4DVgAYdY3LaT8nd7afFj2dq3cA@mail.gmail.com> <20130513194411.5a2dfa2e@shy.leonerd.org.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On 5/13/13 2:44 PM, Paul LeoNerd Evans wrote:
> On Mon, 13 May 2013 11:23:45 -0700
> Adrian Chadd <adrian@freebsd.org> wrote:
>
>> Just as a data point, I managed 50,000 + connections, at 5,000 + a
>> second, doing a gigabit + of traffic, mid-2000s, with the userland
>> management of all of the socket/disk FD stuff.
>>
>> The biggest overhead at the time was actually the read/write
>> copyin/copyout, NOT the locking overhead of managing this stuff. Why?
>> Because I architected the HTTP side of things to specifically pin FDs
>> to threads, and not allow arbitrary threads to deal with arbitrary
>> FDs. This removed the need for almost all of the state locking that
>> people are concerned about here.
> I think then this comes from different experiences.
>
> I'm guessing this application was:
>
>    a) Written in C
>    b) Entirely filled with identically-typed identical-purpose file
>       descriptors
>    c) Didn't really use any EV_ONESHOT events
>    d) Didn't close sockets apart from when it received EOF
> and perhaps most importantly
>    e) Was entirely self-contained - did everything from one unified
>       block of source code.
>
> I.e. a very simple set of semantics. I'll explain the situation that I
> had.
>
> The reason I ran into the problem needing EV_DROPWATCH/EV_DROPPED was
> because I was trying to fix Perl's IO::KQueue.
>
> IO::KQueue tries to wrap kqueue/kevent for Perl, allowing the userland
> Perl code to store an arbitrary Perl data pointer in the udata field.
> This data is reference-counted. Userland might let the kernel store the
> only copy of that data, because it comes back in event notifications
> anyway. Because of this, the reference count has to be artificially
> incremented to account for the extra pointer in the kernel. Without
> knowing when the kernel will decide to drop that pointer, I never know
> when I should decrement the refcount myself.
>
> It has no knowledge of what userland is doing with this. It can't know
> when userland might be EV_ONESHOT'ing. It doesn't really know what
> events will be oneshot anyway (such as the process exit watches).
> Finally, it has no idea what other modules are going to call close() on
> it. This final problem was the real killer - while the first two
> -could- be worked around with more complex code structures, not knowing
> what other CPAN modules will ever call close() makes it impossible to
> handle. Simply asking every CPAN module to "please just call fd_close()
> instead of close()" doesn't work here.
>
> As compared: having the kernel tell userland when it calls knote_drop()
> is much simpler. It knows exactly when it is doing this, so simply
> pushing an event up to userland to tell it it did so is simple. If any
> more cases than the three known (EV_ONESHOT or other single-shot events;
> EV_DELETE, close()) are added, userland - and in particular, the
> IO::KQueue module, will not need updating. It will continue to
> decrement refcounts and free data perfectly happily when kernel has
> dropped the watch.
>
> I've used this pattern before in C libraries + higher-level language
> wrappers, and found it to be nicely simple to both implement and use.
> Because it follows the -same- event notification path that userland is
> already using, it manages to avoid quite a number of the
> race-conditions that a secondary, separate data structure and locking
> often runs into; e.g. if userland is trying to add a new thing into it
> just at the time there's a notification "in-flight" from the kernel
> about an old thing that it used to have.
>
> Principly - the fact that kernel tells -userland- about the delete,
> means that it can atomically *guarantee* that this *will* be the last
> event about this particular item. Userland must not delete its own data
> structure about it until this notification happens. If it does this,
> lots of semantics become a lot simpler.

I was responsible for the u_data field. It was not in the original 
design that was
proposed and I suggested it to Jonathan. I was thinking purely of a 
simple way for
an event to supply added information to its handler that would obviate 
the need for
the app to keep complicated tracking structures. I was not thinking in 
terms of
"badly behaved" (sic) third party high level ops using it through a 
language binding.
I admit that I did not think about the close issue at that time.

Your suggested changes are not unreasonable however we could do with more
discussion. The point about tracking objects that may be arbitrarily 
destroyed without
the framework being notified is valid and aligns well with general 
robustness principals.

I would suggest that one answer would be to create an extension to 
register a
kevent to catch these events..

(the knote_drop())

The returned event could have all the appropriate information for the event being dropped..

>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?519327DF.6060002>