Date: Thu, 5 Apr 2018 09:00:15 -0600 From: Warner Losh <imp@bsdimp.com> To: "Eugene M. Zheganin" <eugene@zhegan.in> Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org> Subject: Re: TRIM, iSCSI and %busy waves Message-ID: <CANCZdfrQBqcb5ASPFGjLN=nif4bQ9vTjMtBuDaOvvUKkM7u3XA@mail.gmail.com> In-Reply-To: <92b92a3d-3262-c006-ed5a-dc2f9f4a5cb9@zhegan.in> References: <92b92a3d-3262-c006-ed5a-dc2f9f4a5cb9@zhegan.in>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Apr 5, 2018 at 8:08 AM, Eugene M. Zheganin <eugene@zhegan.in> wrote: > Hi, > > I have a production iSCSI system (on zfs of course) with 15 ssd disks and > it's often suffering from TRIMs. > > Well, I know what TRIM is for, and I know it's a good thing, but sometimes > (actually often) I'm seeing my disks in gstat are overwhelmed by the TRIM > waves, this looks like a "wave" of 20K 100%busy delete operations starting > on first pool disk, then reaching second, then third,... - at the time it > reaches the 15th disk the first one if freed from TRIM operations, and in > 20-40 seconds this wave begins again. > There's two issues here. First, %busy doesn't necessarily mean what you think it means. Back in the days of one operation at a time, it might have been a reasonable indicator that the drive is busy. However, today with queueing a 100% busy disk often can take additional load. The second problem is that TRIMs suck for a lot of reasons. FFS (I don't know about ZFS) sends lots of TRIMs at once when you delete a file. These TRIMs are UFS block sized, so need to be combined in the ada/da layer. The combining in the ada and da drivers isn't optimal, but implements a 'greedy' method where we pack as much as possible into each TRIM, which makes each TRIM take longer. Plus, TRIMs are non NCQ commands, so force a drain of all the other commands to do them. And we don't have any throttling in 11.x (at the moment), so they tend to flood the device and starve out other traffic when there's a lot of them. Not all controllers support NCQ trim (LSI doesn't at the moment, I don't think). With NCQ we only queue one at a time and that helps. I'm working on trim shaping in -current right now. It's focused on NVMe, but since I'm doing the bulk of it in cam_iosched.c, it will eventually be available for ada and da. The notion is to measure how long the TRIMs take, and only send them at 80% of that rate when there's other traffic in the queue (so if trims are taking 100ms, send them no faster than 8/s). While this will allow for better read/write traffic, it does slow the TRIMs down which slows down whatever they may be blocking in the upper layers. Can't speak to ZFS much, but for UFS that's freeing of blocks so things like new block allocation may be delayed if we're almost out of disk (which we have no signal for, so there's no way for the lower layers to prioritize trims or not). > I'm also having a couple of iSCSI issues that I'm dealing through bounty > with, so may be this is related somehow. Or may be not. Due to some issues > in iSCSI stack my system sometimes reboots, and then these "waves" are > stopped for some time. > > So, my question is - can I fine-tune TRIM operations ? So they don't > consume the whole disk at 100%. I see several sysctl oids, but they aren't > well-documented. > You might be able to set the delete method. > P.S. This is 11.x, disks are Toshibas, and they are attached via LSI HBA. > Which LSI HBA? Warner
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrQBqcb5ASPFGjLN=nif4bQ9vTjMtBuDaOvvUKkM7u3XA>