Date: Fri, 20 Jan 2023 11:18:25 +0100 From: Peter Dambier <peter@peter-dambier.de> To: stable@freebsd.org Subject: Re: close() taking minutes Message-ID: <20230120111825.46d0e337.peter@peter-dambier.de> In-Reply-To: <Y8oUemqHiPilKCK7@disp.intra.daemon.contact> References: <Y8oUemqHiPilKCK7@disp.intra.daemon.contact>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 20 Jan 2023 05:11:38 +0100 Peter <pmc@citylink.dinoex.sub.org> wrote: > Does this look familiar with anybody? ("truss -dD ls -l something") > > 0.047760369 0.000017548 > open("/etc/nsswitch.conf",O_RDONLY|O_CLOEXEC,0666) = 4 (0x4) > 0.047845099 0.000011290 > fstat(4,{ mode=-rw-r--r-- ,inode=13886,size=272,blksize=4096 }) = 0 > (0x0) 0.047932166 0.000034182 read(4,"#\n# nsswitch.conf(5) - name > ser"...,4096) = 272 (0x110) 0.048007308 0.000010661 > read(4,0x801848000,4096) = 0 (0x0) 45.827173802 45.779056767 > close(4) = 0 (0x0) ^^^^^^^^^^^^ ... 45.866272643 > 0.000020608 open("/etc/localtime",O_RDONLY,077) = 6 (0x6) > 45.866398048 0.000017731 > fstat(6,{ mode=-r--r--r-- ,inode=41572,size=2309,blksize=4096 }) = 0 > (0x0) 45.866505596 0.000054084 > read(6,"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0"...,41448) = 2309 (0x905) > 49.511709215 3.645026276 close(6) = 0 (0x0) ^^^^^^^^^^^ > > (The longest I've seen was more than 5 minutes) > > There is otherwise nothing wrong with the machine, only I am running > a postgres table merge for some 10 mio. rows. And only during that the > effect happens, reproducibly. > > The concerned disk is a mirror of two consumer SATA-SSD. > > "gstat -pod" doesn't look bad, 1-2 items in the queue. > > zfs says it knows nothing of these delays, it just shows two > consumer SSD usually busy with their own garbage collection: > > # zpool iostat -vlq im 3 > capacity operations bandwidth > total_wait disk_wait syncq_wait asyncq_wait scrub trim > syncq_read syncq_write asyncq_read asyncq_write scrubq_read > trimq_write pool alloc free read write read > write read write read write read write read write > wait wait pend activ pend activ pend activ pend activ > pend activ pend activ ---------------- ----- ----- ----- > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > ----- ----- ----- ----- ----- im 30.6G 14.9G > 0 4.84K 0 111M - 2ms - 670us - - > - 2ms - - 0 0 0 0 0 0 > 47 2 0 0 0 0 mirror-0 30.6G > 14.9G 0 4.84K 0 111M - 2ms - 670us > - - - 2ms - - 0 0 0 0 > 0 0 47 2 0 0 0 0 ada5p9.elip1 > - - 0 1.65K 0 55.7M - 3ms - 927us > - - - 3ms - - 0 0 0 0 > 0 0 0 0 0 0 0 0 ada1p9.elip1 > - - 0 3.19K 0 55.8M - 2ms - 536us > - - - 1ms - - 0 0 0 0 > 0 0 47 2 0 0 0 0 ---------------- > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > ----- ----- ----- ----- ----- ----- ----- ----- capacity > operations bandwidth total_wait disk_wait syncq_wait > asyncq_wait scrub trim syncq_read syncq_write asyncq_read > asyncq_write scrubq_read trimq_write pool alloc > free read write read write read write read write read > write read write wait wait pend activ pend activ pend > activ pend activ pend activ pend activ ---------------- > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > ----- ----- ----- ----- ----- ----- ----- ----- > im 30.6G 14.9G 0 5.69K 0 103M - > 3ms - 544us - - - 2ms - - > 0 0 0 0 0 0 84 2 0 0 > 0 0 mirror-0 30.6G 14.9G 0 5.69K 0 > 103M - 3ms - 544us - - - 2ms > - - 0 0 0 0 0 0 98 4 > 0 0 0 0 ada5p9.elip1 - - 0 2.05K > 0 51.4M - 5ms - 837us - - - 4ms > - - 0 0 0 0 0 0 93 2 > 0 0 0 0 ada1p9.elip1 - - 0 3.64K > 0 51.6M - 1ms - 379us - - - 1ms > - - 0 0 0 0 0 0 0 0 > 0 0 0 0 ---------------- ----- ----- ----- ----- > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- > ----- ----- ----- ----- > > So where does it happen? (13.1-RELEASE-p5) > > The root, /usr and /var are on ufs, on the same(!) disk (unmirrored), > and have no problem. > > The next 100-some filesystems are on the "im" pool, and there is the > problem, so it is somehow related to ZFS. (The above truss output > was in a jail, which is on the ZFS.) > > I had other things saturating these disks, with no such effect. It > seems postgres does something different here. > > It does also not really look like a congestion problem, because I > can have two or three tasks get stuck in this (and all of them will), > and they continue at very much different times. Looks rather like > some lock wait thing. > > Any clues, anybody? > Maybe unrelated or ot. Both BSD and linux, firefox in partcular or mplayer, kplayer xine ... the program has many windows or tabs open and slowly behaves like molasses. Closing or killing it takes a lot of time and cpu and sometimes never ends. "ps" will show ghosts that go away by killing "X" I have a feeling that I have seen it only on 32 bit os not on 64 bit. I am running "twm" as my window manager that is why logging out is not a good idea on my FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64 cheers Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230120111825.46d0e337.peter>