Date: Fri, 03 Oct 2008 12:17:44 +0300 From: Danny Braniss <danny@cs.huji.ac.il> To: Robert Watson <rwatson@FreeBSD.org> Cc: freebsd-hackers@freebsd.org, Jeremy Chadwick <koitsu@freebsd.org>, freebsd-stable@freebsd.org, Claus Guttesen <kometen@gmail.com> Subject: Re: bad NFS/UDP performance Message-ID: <E1KlgnA-000F6w-NT@cs1.cs.huji.ac.il> In-Reply-To: <alpine.BSF.1.10.0810031003440.41647@fledge.watson.org> References: <E1Kj7NA-000FXz-3F@cs1.cs.huji.ac.il> <20080926081806.GA19055@icarus.home.lan> <E1Kj9bR-000H7t-0g@cs1.cs.huji.ac.il> <20080926095230.GA20789@icarus.home.lan> <E1KjEZw-000KkH-GP@cs1.cs.huji.ac.il> <alpine.BSF.1.10.0809271114450.20117@fledge.watson.org> <E1KjY2h-0008GC-PP@cs1.cs.huji.ac.il> <b41c75520809290140i435a5f6dge5219cd03cad55fe@mail.gmail.com> <E1Klfac-000DzZ-Ie@cs1.cs.huji.ac.il> <alpine.BSF.1.10.0810030910351.41647@fledge.watson.org> <E1KlgYe-000Es2-8u@cs1.cs.huji.ac.il> <alpine.BSF.1.10.0810031003440.41647@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> > On Fri, 3 Oct 2008, Danny Braniss wrote: > > >> OK, so it looks like this was almost certainly the rwlock change. What > >> happens if you pretty much universally substitute the following in > >> udp_usrreq.c: > >> > >> Currently Change to > >> --------- --------- > >> INP_RLOCK INP_WLOCK > >> INP_RUNLOCK INP_WUNLOCK > >> INP_RLOCK_ASSERT INP_WLOCK_ASSERT > > > > I guess you were almost certainly correct :-) I did the global subst. on the > > udp_usrreq.c from 19/08, __FBSDID("$FreeBSD: src/sys/netinet/udp_usrreq.c,v > > 1.218.2.3 2008/08/18 23:00:41 bz Exp $"); and now udp is fine again! > > OK. This is a change I'd rather not back out since it significantly improves > performance for many other UDP workloads, so we need to figure out why it's > hurting us so much here so that we know if there are reasonable alternatives. > > Would it be possible for you to do a run of the workload with both kernels > using LOCK_PROFILING around the benchmark, and then we can compare lock > contention in the two cases? What we often find is that relieving contention > at one point causes new contention at another point, and if the primitive used > at that point handles contention less well for whatever reason, performance > can be reduced rather than improved. So maybe we're looking at an issue in > the dispatched UDP code from so_upcall? Another less satisfying (and > fundamentally more difficult) answer might be "something to do with the > scheduler", but a bit more analysis may shed some light. gladly, but have no idea how to do LOCK_PROFILING, so some pointers would be helpfull. as a side note, many years ago I checked out NFS/TCP and it was really bad, I even remember NetApp telling us to drop TCP, but now, things look rather better. Wonder what caused it. danny
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1KlgnA-000F6w-NT>