From owner-freebsd-hackers@freebsd.org Fri Jul 10 03:55:55 2020 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 650E935E0E3 for ; Fri, 10 Jul 2020 03:55:55 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B2zjR0HnRz43m6; Fri, 10 Jul 2020 03:55:54 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wm1-x344.google.com with SMTP id o2so4227348wmh.2; Thu, 09 Jul 2020 20:55:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=4Gc7S+RLsCDgmOn98j4FEMke/HnZGJQVgU9jtgVn7GA=; b=IphF7AI92Y2JPBbYqi0iE02c/o3OD1X8E76kE0+Bf+WPIGly2X/pomeKO6C8alnds7 CjmKOf7Xxo6gpkfdf6GNT3ZGk884uB4qxYAmGtU6JftNhsxrKo6D6vmqZlFfJy6Np6Ke +R7z0ZHshDjbKnjD20NYhUnCqDWUiswb2kjLpjjjx9SMEvi6FQKLHYFt/PkRzLna+OlQ djjPL0ATlR893LclwmMKb4Hk4Tzn1ncm/BuheqB4KtfZ3BM5t7c6tYLlq9gtKywh6C9o jdhNuMF8AI7NBhXajQ0mJ+GYJtxTsFwT0eGqyTplOEB1JLkX2qlBhJq+LA9YbstNi9wE bGNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=4Gc7S+RLsCDgmOn98j4FEMke/HnZGJQVgU9jtgVn7GA=; b=ZSMRqEsrNCi+bOwLwZ64kCQGa1Kz3OmSiX5+TfECqrEXPoiY0HBqeTxMDSYHnk1/8t V+SifzQpJMXWvzu41g/9yZ204g+DSZrv2ds8Feph07GEczRaTn/JuaglTPnNCflC7ZV3 7aeqNk5kXX3U/guwFae2qG8F3V4sa4zum9o9+WkqklPxydagqNsbkLFJvsaym49qx34x RKuIUpQRkogidDwiJcVmDULAwPLEvyPgX/pZZNtAQo4YDljoO5/i820Tnjdr9kK93KrP ti8wJm3sfbu83ihcYDvBhpqxKVNPER9AcPumKeOlhgFPWECMq+Agh0jXF2tMkEGi3UT1 UqTg== X-Gm-Message-State: AOAM531IfvtBbWwhiGvC035ARwkGQldXwqJaIJ0P7CyMl8nh69hf+1D+ Iqs6oyP+XsNpMqBHhnmUDiH322bmw2D1kFYaoT9G6Q== X-Google-Smtp-Source: ABdhPJwyrZTACn8Dy1JrfAQgjpgTPQ0V9QKFaIadT1mQgjGyqjxQ5rabh+hjs13X+hsLPZmnCsLPY9D1Q9eG/wzxeik= X-Received: by 2002:a05:600c:2209:: with SMTP id z9mr2997975wml.178.1594353351941; Thu, 09 Jul 2020 20:55:51 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a5d:43cf:0:0:0:0:0 with HTTP; Thu, 9 Jul 2020 20:55:50 -0700 (PDT) In-Reply-To: References: From: Mateusz Guzik Date: Fri, 10 Jul 2020 05:55:50 +0200 Message-ID: Subject: Re: Right-sizing the geli thread pool To: Alan Somers Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4B2zjR0HnRz43m6 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jul 2020 03:55:55 -0000 On 7/9/20, Alan Somers wrote: > Currently, geli creates a separate thread pool for each provider, and by > default each thread pool contains one thread per cpu. On a large server > with many encrypted disks, that can balloon into a very large number of > threads! I have a patch in progress that switches from per-provider thread > pools to a single thread pool for the entire module. Happily, I see read > IOPs increase by up to 60%. But to my surprise, write IOPs _decreases_ by > up to 25%. dtrace suggests that the CPU usage is dominated by the > vmem_free call in biodone, as in the below stack. > > kernel`lock_delay+0x32 > kernel`biodone+0x88 > kernel`g_io_deliver+0x214 > geom_eli.ko`g_eli_write_done+0xf6 > kernel`g_io_deliver+0x214 > kernel`md_kthread+0x275 > kernel`fork_exit+0x7e > kernel`0xffffffff8104784e > > I only have one idea for how to improve things from here. The geli thread > pool is still fed by a single global bio queue. That could cause cache > thrashing, if bios get moved between cores too often. I think a superior > design would be to use a separate bio queue for each geli thread, and use > work-stealing to balance them. However, > > 1) That doesn't explain why this change benefits reads more than writes, > and > 2) work-stealing is hard to get right, and I can't find any examples in the > kernel. > > Can anybody offer tips or code for implementing work stealing? Or any > other suggestions about why my write performance is suffering? I would > like to get this change committed, but not without resolving that issue. > I can't comment on revamping the design, but: void vmem_free(vmem_t *vm, vmem_addr_t addr, vmem_size_t size) { qcache_t *qc; MPASS(size > 0); if (size <= vm->vm_qcache_max && __predict_true(addr >= VMEM_ADDR_QCACHE_MIN)) { qc = &vm->vm_qcache[(size - 1) >> vm->vm_quantum_shift]; uma_zfree(qc->qc_cache, (void *)addr); } else vmem_xfree(vm, addr, size); } What sizes are being passed here? Or more to the point, is it feasible to bump qcache to stick to uma in this call? If lock contention is indeed coming from vmem_xfree this change would get rid of the problem without having to rework anything. For read performance, while it is nice there is a win, it may still be less than it should. I think it is prudent to get a flamegraph from both cases. -- Mateusz Guzik