Date: Tue, 23 Aug 2022 09:17:22 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 266000] noticable higher i/o and cpu usage in 13.1 zfs on root (virtualized) Message-ID: <bug-266000-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D266000 Bug ID: 266000 Summary: noticable higher i/o and cpu usage in 13.1 zfs on root (virtualized) Product: Base System Version: 13.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: olle@dalnix.se I've noticed, on a couple of Digital Ocean vm:s, that after upgrading to 13.1-RELEASE-p1 the hypervisor graphs show a lot higher I/O and CPU usage t= han normal. So far I've upgraded a couple of boxes. All are out of the box ZFS on root installs. prison03: It's a webserver with about dozen of jails for websites, one jail for db, a= nd one for a proxy server. The jail roots and "web" data are on separate block storage volumes. The he storage volumes are also using ZFS. First i noticed it was behaving a bit sluggish when doing simple tasks as running find. Second I noticed the hypervisor graphs had much higher CPU usage than norma= l. Talked to DO support a bit, tried moving it to another hypervisor etc. Didn= 't help. Then I rebooted to the "old" kernel, and cpu and i/o went down (altho= ugh I couldn't actually test this, since it's a production box, and booting the= old kernel pf wouldn't work). But, running find etc went by snappy as before. I have some annotated screenshots of the hypervisor graphs here: https://nextcloud.dalnix.se/index.php/s/8C9yrQqgGbSoQ37 After downgrading, it's snappy fast and graphs are back to normal. prison04: Same here, much higher I/O after upgrade. Graphs: https://nextcloud.dalnix.se/index.php/s/r2A8JXcRJF97rZW Only hosts one website. Normally not doing much. prison08: Sluggish. The graphs are way off for what it actually does. It's an old web server, with no traffic. The only thing running are normal system stuff and offloading some ZFS snapshots. Since this is a box scheduled for destruction, I never noticed the high cpu= , so I have no before and after graphs. https://nextcloud.dalnix.se/index.php/s/dRiF94ED2oYDCmt last pid: 14617; load averages: 3.24, 2.96, 2.85=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 up 30+21:30:47 09:02:23 very wierd, it doesn't do anything, still busy =3DD. *******01db03: This is a dedicated database server. With this one, the i/o cpu went so bad it made it unusable when people actu= ally started to use it. It's a DB server for a GIS type app. Normally it doesn't have *that* much load. But, the (I'm guessing) i/o wait, caused the DB serv= er to stop responding. I did some troubleshooting on the DB level, and "fixed" the thing that was causing it. Looked at slow queries, and one, took longer and longer to the point of no return. So I added a index to a table, and that "fixed" it. But, obviously this application error wasn't a problem previous to 13.1. Graph screenshots available here: https://nextcloud.dalnix.se/index.php/s/9pNsyaJa62wRaiW I have a bunch of physical hardware servers as well, but, they do not appea= r to have any issues. Also two upgraded (psysical hardware) storage servers. They all seem fine. = No increased load. Is this an OpenZFS 2.1.4 + kvm (I *think* DO uses kvm) bug? --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-266000-227>