From owner-freebsd-stable@freebsd.org Thu Apr 5 15:00:17 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B700CF8BB72 for ; Thu, 5 Apr 2018 15:00:17 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x230.google.com (mail-it0-x230.google.com [IPv6:2607:f8b0:4001:c0b::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 75E2880465 for ; Thu, 5 Apr 2018 15:00:17 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x230.google.com with SMTP id 142-v6so3118119itl.5 for ; Thu, 05 Apr 2018 08:00:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=SAxHPIjWBlNDo9LcWJvRBv8ssFA6B3fIEb1os0KsN0A=; b=ouevNWiYQ8xZeJIrWqHPGnUhQHdETUrvLn7qLnwRmmO/NuhiNJjMOBzxTe/J+SRMCz xFHOP3VSk4sR5M5yaU4daGhatIxkQqBoso2EfCN3DQLa3z5z1L3LDyKTEu4/RWo3Ji9N yWOhVvc58USd67onH/rzKBN/sNHxKFdhTIqRLQK1gaUPf+WADmOBWCVlWk89QGvo+bDW U6T6CcoFRZt7e4Lca4856rrc4E5g5/G1OWciINRByCx3PScKEY0RaIFj3Bq/rKU3Felt qhxeB8NQ0n62MBvo0ZyOzeAgJPEVDZ3fUqGX3wyC7kzEVNPrhv2AWRdE3MIS3aKxubsx f5yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=SAxHPIjWBlNDo9LcWJvRBv8ssFA6B3fIEb1os0KsN0A=; b=gMLkVmWmEcFy81JJlgUsXD/3Uiv1UVKxe3yaOCfdfv+VFX4tcQUQZHkB76Yc8QYioI QtRRn0Leym5wRsvLSNiuj/fpUvz1yWVadrp6mQlpSlwOMk0Oj7+NfKAlvXHMH3PPvu8v sJseM+iH271UaKW4ztbhWOvJXr7p0RC5vV8KbkTE/43q1mjkBujsgUPYX8QIoRE+Resi xBWaA4Uhs8QLqQZbW/foRcg+CLxv8iWPMiFvGQSnd5rFvXEw63oRvya7k4HNsr5N2sxe 2xZkM+kE/IeW62DlrxCoTkj7qTiEcos1Jzb0H4XV0vcymaTOJKEHCxB7ppW6S3sboygy lbqA== X-Gm-Message-State: ALQs6tBsWh4kPOo4lnSFbOICh9bMYem1vdfH4O7XqhT+7uxggSs53CM0 K6M+FylvboNzoNc07Ec0ZWWPnwGcNb6BCccPi2nscQ== X-Google-Smtp-Source: AIpwx48JN1/LP11/fesKJIp1MWrVDrR/K2SkJ4b2LCemZ4wlsNXKUSbL1q1q50tfoAnJ6Qt8sXmIby10DkwXrJ1+564= X-Received: by 2002:a24:19c9:: with SMTP id b192-v6mr13816655itb.1.1522940416449; Thu, 05 Apr 2018 08:00:16 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.203.196 with HTTP; Thu, 5 Apr 2018 08:00:15 -0700 (PDT) X-Originating-IP: [2603:300b:6:5100:1052:acc7:f9de:2b6d] In-Reply-To: <92b92a3d-3262-c006-ed5a-dc2f9f4a5cb9@zhegan.in> References: <92b92a3d-3262-c006-ed5a-dc2f9f4a5cb9@zhegan.in> From: Warner Losh Date: Thu, 5 Apr 2018 09:00:15 -0600 X-Google-Sender-Auth: tZnYmkrkOiSratNsHiWpECCoVEw Message-ID: Subject: Re: TRIM, iSCSI and %busy waves To: "Eugene M. Zheganin" Cc: FreeBSD-STABLE Mailing List Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Apr 2018 15:00:18 -0000 On Thu, Apr 5, 2018 at 8:08 AM, Eugene M. Zheganin wrote: > Hi, > > I have a production iSCSI system (on zfs of course) with 15 ssd disks and > it's often suffering from TRIMs. > > Well, I know what TRIM is for, and I know it's a good thing, but sometimes > (actually often) I'm seeing my disks in gstat are overwhelmed by the TRIM > waves, this looks like a "wave" of 20K 100%busy delete operations starting > on first pool disk, then reaching second, then third,... - at the time it > reaches the 15th disk the first one if freed from TRIM operations, and in > 20-40 seconds this wave begins again. > There's two issues here. First, %busy doesn't necessarily mean what you think it means. Back in the days of one operation at a time, it might have been a reasonable indicator that the drive is busy. However, today with queueing a 100% busy disk often can take additional load. The second problem is that TRIMs suck for a lot of reasons. FFS (I don't know about ZFS) sends lots of TRIMs at once when you delete a file. These TRIMs are UFS block sized, so need to be combined in the ada/da layer. The combining in the ada and da drivers isn't optimal, but implements a 'greedy' method where we pack as much as possible into each TRIM, which makes each TRIM take longer. Plus, TRIMs are non NCQ commands, so force a drain of all the other commands to do them. And we don't have any throttling in 11.x (at the moment), so they tend to flood the device and starve out other traffic when there's a lot of them. Not all controllers support NCQ trim (LSI doesn't at the moment, I don't think). With NCQ we only queue one at a time and that helps. I'm working on trim shaping in -current right now. It's focused on NVMe, but since I'm doing the bulk of it in cam_iosched.c, it will eventually be available for ada and da. The notion is to measure how long the TRIMs take, and only send them at 80% of that rate when there's other traffic in the queue (so if trims are taking 100ms, send them no faster than 8/s). While this will allow for better read/write traffic, it does slow the TRIMs down which slows down whatever they may be blocking in the upper layers. Can't speak to ZFS much, but for UFS that's freeing of blocks so things like new block allocation may be delayed if we're almost out of disk (which we have no signal for, so there's no way for the lower layers to prioritize trims or not). > I'm also having a couple of iSCSI issues that I'm dealing through bounty > with, so may be this is related somehow. Or may be not. Due to some issues > in iSCSI stack my system sometimes reboots, and then these "waves" are > stopped for some time. > > So, my question is - can I fine-tune TRIM operations ? So they don't > consume the whole disk at 100%. I see several sysctl oids, but they aren't > well-documented. > You might be able to set the delete method. > P.S. This is 11.x, disks are Toshibas, and they are attached via LSI HBA. > Which LSI HBA? Warner