Date: Wed, 17 Jul 2013 22:18:52 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Mark Johnston <markj@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: syncer causing latency spikes Message-ID: <20130717191852.GS5991@kib.kiev.ua> In-Reply-To: <20130717180720.GA8289@charmander> References: <20130717180720.GA8289@charmander>
next in thread | previous in thread | raw e-mail | index | archive | help
--lx6+J6gHrzwEpYNy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 17, 2013 at 02:07:55PM -0400, Mark Johnston wrote: > Hello, >=20 > I'm trying to investigate and solve some postgres latency spikes that > I'm seeing as a result of some behaviour in the syncer. This is with > FreeBSD 8.2 (with some local modifications and backports, r231160 in > particular). The system has an LSI 9261-8i RAID controller (backed by > mfi(4)) and the database and WALs are on separate volumes, a RAID 6 and > a RAID 1 respectively. It has about 96GB of RAM installed. >=20 > What's happening is that the syncer tries to fsync a large database file > and goes to sleep in getpbuf() with the corresponding vnode lock held > and the following stack: >=20 > #3 0xffffffff805fceb5 in _sleep (ident=3D0xffffffff80ca8e20, lock=3D0xff= ffffff80d6bc20, priority=3D-2134554464, wmesg=3D0xffffffff80a4fe43 "wswbuf0= ", timo=3D0) at /d2/usr/src/sys/kern/kern_synch.c:234 > #4 0xffffffff808780d5 in getpbuf (pfreecnt=3D0xffffffff80ca8e20) at /d2/= usr/src/sys/vm/vm_pager.c:339 > #5 0xffffffff80677a00 in cluster_wbuild (vp=3D0xffffff02ea3d7ce8, size= =3D16384, start_lbn=3D20869, len=3D2) at /d2/usr/src/sys/kern/vfs_cluster.c= :801 > #6 0xffffffff808477ed in ffs_syncvnode (vp=3D0xffffff02ea3d7ce8, waitfor= =3DVariable "waitfor" is not available.) at /d2/usr/src/sys/ufs/ffs/ffs_vno= ps.c:306 > #7 0xffffffff808488cf in ffs_fsync (ap=3D0xffffff9b0cd27b00) at /d2/usr/= src/sys/ufs/ffs/ffs_vnops.c:190 > #8 0xffffffff8096798a in VOP_FSYNC_APV (vop=3D0xffffffff80ca5300, a=3D0x= ffffff9b0cd27b00) at vnode_if.c:1267 > #9 0xffffffff8068bade in sync_vnode (slp=3D0xffffff002ab8e758, bo=3D0xff= ffff9b0cd27bc0, td=3D0xffffff002ac89460) at vnode_if.h:549 > #10 0xffffffff8068bdcd in sched_sync () at /d2/usr/src/sys/kern/vfs_subr.= c:1841 >=20 > (kgdb) frame 6 > #6 0xffffffff808477ed in ffs_syncvnode (vp=3D0xffffff02ea3d7ce8, waitfor= =3DVariable "waitfor" is not available.) at /d2/usr/src/sys/ufs/ffs/ffs_vno= ps.c:306 > 306 vfs_bio_awrite(bp); > (kgdb) vpath vp > 0xffffff02ea3d7ce8: 18381 > 0xffffff02eab1dce8: 16384 > 0xffffff02eaaf0588: base > 0xffffff01616d8b10: data > 0xffffff01616d8ce8: pgsql > 0xffffff002af9f588: <mount point> > 0xffffff002af853b0: d3 > 0xffffff002abd6b10: / > (kgdb) >=20 > During such an fsync, DTrace shows me that syncer sleeps of 50-200ms are > happening up to 8 or 10 times a second. When this happens, a bunch of > postgres threads become blocked in vn_write() waiting for the vnode lock > to become free. It looks like the write-clustering code is limited to > using (nswbuf / 2) pbufs, and FreeBSD prevents one from setting nswbuf > to anything greater than 256. Syncer is probably just a victim of profiling. Would postgres called fsync(2), you then blame the fsync code for the pauses. Just add a tunable to allow the user to manually-tune the nswbuf, regardless of the buffer cache sizing. And yes, nswbuf default max probably should be bumped to something like 1024, at least on 64bit architectures which do not starve for kernel memory. >=20 > Since the sleeps are happening in the write-clustering code, I tried > disabling write clustering on the mountpoint (with the noclusterw mount > option) and found that this replaces my problem with another one: the > syncer periodically generates large bursts of writes that create a > backlog in the mfi(4) bioq. Then postgres' reads take a few seconds to > return, causing more or less the same end result. Well, this is exactly what clustering code is for, merge a lot of short i/o requests into the bigger i/o ? >=20 > Does anyone have any suggestions on what I can do to help reduce the > impact of the syncer on my systems? I can't just move to a newer version > of FreeBSD, but I'm willing to backport changes if anyone can point me > to something that might help. As a side note, syncer code for UFS was redone in HEAD and 9 to only iterate over the active vnodes. This is mostly irrelevant to your problems which happen when syncer does perform the i/o, but still it saves a lot of CPU. --lx6+J6gHrzwEpYNy Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJR5u4bAAoJEJDCuSvBvK1BIicQAIX9KpI4uwHfTpcdzry0pBjO g5jl7r23MH2Y/34snX1oNfW0TWIP9dSypMZnmioswLLEVa3G/CFL+dr+A1tCaznc 4WctmktebGlyhjN1qZus/xLhNA1xeiXvSjiPccFTBnd53ce79+FrKvpsQICLXJwh hzPfHFG4x3VCkbRP/tKj/F000MlYaZm2QUXsp7uSJahd/L0E1UfdRuw/oxUG9dG8 KqtUuq8i2uBZMCR8b/AsRZ9LNOymjc6DKcDEeuQi4xTp51GvMMqy/mspkb0LJDBj YjsbdhV9fGOArq1TehknwTps0ApdK10++mv0r2QKj5IDOsYhOKDJMg9giYa81bbb jD3qdMMPVeLqjQxqeGOx1jbKwV0NXvHonOU2b41Bo3xJL8WkdtT3RxeyPNYeLhUZ 2UYTr1k+zCDUnSckK1RH1ezLLiFxfOczpODoWecQI1pq3LNHI3LoH3zblsqzugiJ nO/eMes2P4szPXTYy8pFI6q8m+XFeOsdJi6j+C+GC56O95Gfzp3I8gaPIKSaidoS hXuCAXzqZRW1A6ByaMMl5eZIJdXwDWCT+OEWZh0jfwyB2IXO/FmAgqfLTnCRvdJK uiV1Uvofm325EXldTHT5HeAZW6SHrJHZj6m9It1Tjw13JuuZkgT2w+/n9O3kVYD3 Fv8y4JZIk4/cne0TQo6Z =8U9c -----END PGP SIGNATURE----- --lx6+J6gHrzwEpYNy--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130717191852.GS5991>