Date: Sun, 11 Nov 2007 20:26:27 +0300 From: Alexey Popov <lol@chistydom.ru> To: Kris Kennaway <kris@FreeBSD.org> Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Subject: Re: amrd disk performance drop after running under high load Message-ID: <47373B43.9060406@chistydom.ru> In-Reply-To: <47349A17.3080806@FreeBSD.org> References: <47137D36.1020305@chistydom.ru> <47140906.2020107@FreeBSD.org> <47146FB4.6040306@chistydom.ru> <47147E49.9020301@FreeBSD.org> <47149E6E.9000500@chistydom.ru> <4715035D.2090802@FreeBSD.org> <4715C297.1020905@chistydom.ru> <4715C5D7.7060806@FreeBSD.org> <471EE4D9.5080307@chistydom.ru> <4723BF87.20302@FreeBSD.org> <47344E47.9050908@chistydom.ru> <47349A17.3080806@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi. Kris Kennaway wrote: >>> In the "good" case you are getting a much higher interrupt rate but >>> with the data you provided I can't tell where from. You need to run >>> vmstat -i at regular intervals (e.g. every 10 seconds for a minute) >>> during the "good" and "bad" times, since it only provides counters >>> and an average rate over the uptime of the system. >> >> Now I'm running 10-process lighttpd and the problem became no so big. >> >> I collected interrupt stats and it shows no relation beetween >> ionterrupts and slowdowns. Here is it: >> http://83.167.98.162/gprof/intr-graph/ >> >> Also I have similiar statistics on mutex profiling and it shows >> there's no problem in mutexes. >> http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ >> >> I have no idea what else to check. > I don't know what this graph is showing me :) When precisely is the > system behaving poorly? Take a look at "Disk Load %" picture at http://83.167.98.162/gprof/intr-graph/ At ~ 17:00, 03:00-04:00, 13:00-14:00, 00:30-01:30, 11:00-13:00 it shows peaks of disk activity which really never happen. As I said in the beginning of the thread in this "peak" moments disk becomes slow and vmstat shows 100% disk load while performing < 10 tps. Other grafs at this page shows that there's no relation to interrupts rate of amr or em device. You advised me to check it. When I was using single-process lighttpd the problem was much harder as you can see at http://83.167.98.162/gprof/graph/ . At first picture on this page you can see disk load peaks at 18:00 and 15:00 which leaded to decreasing network output because disk was too slow. Back in this thread we suspected UMA mutexes. In order to check it I collected mutex profiling stats and draw graphs over time and they also didn't show anything interesting. All mutex graphs were smooth while disk load peaks. http://83.167.98.162/gprof/mtx-graph/mtxgifnew/ With best regards, Alexey Popov
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47373B43.9060406>