Date: Fri, 03 Oct 2008 12:17:44 +0300 From: Danny Braniss <danny@cs.huji.ac.il> To: Robert Watson <rwatson@FreeBSD.org> Cc: freebsd-hackers@freebsd.org, Jeremy Chadwick <koitsu@freebsd.org>, freebsd-stable@freebsd.org, Claus Guttesen <kometen@gmail.com> Subject: Re: bad NFS/UDP performance Message-ID: <E1KlgnA-000F6w-NT@cs1.cs.huji.ac.il> In-Reply-To: <alpine.BSF.1.10.0810031003440.41647@fledge.watson.org> References: <E1Kj7NA-000FXz-3F@cs1.cs.huji.ac.il> <20080926081806.GA19055@icarus.home.lan> <E1Kj9bR-000H7t-0g@cs1.cs.huji.ac.il> <20080926095230.GA20789@icarus.home.lan> <E1KjEZw-000KkH-GP@cs1.cs.huji.ac.il> <alpine.BSF.1.10.0809271114450.20117@fledge.watson.org> <E1KjY2h-0008GC-PP@cs1.cs.huji.ac.il> <b41c75520809290140i435a5f6dge5219cd03cad55fe@mail.gmail.com> <E1Klfac-000DzZ-Ie@cs1.cs.huji.ac.il> <alpine.BSF.1.10.0810030910351.41647@fledge.watson.org> <E1KlgYe-000Es2-8u@cs1.cs.huji.ac.il> <alpine.BSF.1.10.0810031003440.41647@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
>
> On Fri, 3 Oct 2008, Danny Braniss wrote:
>
> >> OK, so it looks like this was almost certainly the rwlock change. What
> >> happens if you pretty much universally substitute the following in
> >> udp_usrreq.c:
> >>
> >> Currently Change to
> >> --------- ---------
> >> INP_RLOCK INP_WLOCK
> >> INP_RUNLOCK INP_WUNLOCK
> >> INP_RLOCK_ASSERT INP_WLOCK_ASSERT
> >
> > I guess you were almost certainly correct :-) I did the global subst. on the
> > udp_usrreq.c from 19/08, __FBSDID("$FreeBSD: src/sys/netinet/udp_usrreq.c,v
> > 1.218.2.3 2008/08/18 23:00:41 bz Exp $"); and now udp is fine again!
>
> OK. This is a change I'd rather not back out since it significantly improves
> performance for many other UDP workloads, so we need to figure out why it's
> hurting us so much here so that we know if there are reasonable alternatives.
>
> Would it be possible for you to do a run of the workload with both kernels
> using LOCK_PROFILING around the benchmark, and then we can compare lock
> contention in the two cases? What we often find is that relieving contention
> at one point causes new contention at another point, and if the primitive used
> at that point handles contention less well for whatever reason, performance
> can be reduced rather than improved. So maybe we're looking at an issue in
> the dispatched UDP code from so_upcall? Another less satisfying (and
> fundamentally more difficult) answer might be "something to do with the
> scheduler", but a bit more analysis may shed some light.
gladly, but have no idea how to do LOCK_PROFILING, so some pointers would be
helpfull.
as a side note, many years ago I checked out NFS/TCP and it was really bad,
I even remember NetApp telling us to drop TCP, but now, things look rather
better. Wonder what caused it.
danny
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1KlgnA-000F6w-NT>
