From owner-freebsd-hackers@freebsd.org Thu Jul 9 21:26:54 2020 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B91FB356E33 for ; Thu, 9 Jul 2020 21:26:54 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oo1-f50.google.com (mail-oo1-f50.google.com [209.85.161.50]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B2q4Y6mqvz4fNC for ; Thu, 9 Jul 2020 21:26:53 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oo1-f50.google.com with SMTP id o36so609872ooi.11 for ; Thu, 09 Jul 2020 14:26:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=2WPGsz5MrRFAiGX++hgmlY7WkBbPwOgUEKbdoMYW7rQ=; b=MNgM5ZO45e7jaYINPfivWWjxx6UAnXtZtqpOf12yJybZMU8ZDmSdjcJlxubcqPXjf4 aM4izifWKnn+uvzo+FiVotDQgI/XPt8pQhfoC95B4DnXmbgJgytLe20cNQNnjXhvz+EG naZNJZ8QYg7PPXm0oWh+e9Dkezk1YSFdUahupQVDcjTJ0uhKkL1Ebpq68XzzwhYDCFZX b55PC/boeo3jZouqjAE6eN7auyxSKdkfsObDo39hgXu9+he4XDoh0zxHwbsQG82MoCjP gW48Q7AAClPu71POAe1BDaGULD6DkH3XVxvh29NvtzRpO3mJgt3B2+x/D0uKGciFyQTu BIQg== X-Gm-Message-State: AOAM533+9Xx8BpP/5VMlNCm/WLGDoUqO8a3qLLKMzE18gwZWA+bKivE6 WVk0ZC6ZzRktzYBFUk7y+gqZWxnvB92A4wolDhexYBH+wxM= X-Google-Smtp-Source: ABdhPJzny1L5oLZ3e2LXa5jHg32lwOlWu3ncMECuiVZqjym0W+z6nMhcNYROt4eKih10BOg4hi7aeM1jy8j4+hAaGaY= X-Received: by 2002:a4a:2a41:: with SMTP id x1mr57377327oox.79.1594330012234; Thu, 09 Jul 2020 14:26:52 -0700 (PDT) MIME-Version: 1.0 From: Alan Somers Date: Thu, 9 Jul 2020 15:26:41 -0600 Message-ID: Subject: Right-sizing the geli thread pool To: FreeBSD Hackers X-Rspamd-Queue-Id: 4B2q4Y6mqvz4fNC X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.161.50 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [0.45 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.55)[-0.549]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RWL_MAILSPIKE_GOOD(0.00)[209.85.161.50:from]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17:c]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; TO_DN_ALL(0.00)[]; NEURAL_SPAM_LONG(0.18)[0.180]; RCVD_IN_DNSWL_NONE(0.00)[209.85.161.50:from]; NEURAL_SPAM_SHORT(0.82)[0.822]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; FREEMAIL_ENVFROM(0.00)[gmail.com]; TO_DOM_EQ_FROM_DOM(0.00)[] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2020 21:26:54 -0000 Currently, geli creates a separate thread pool for each provider, and by default each thread pool contains one thread per cpu. On a large server with many encrypted disks, that can balloon into a very large number of threads! I have a patch in progress that switches from per-provider thread pools to a single thread pool for the entire module. Happily, I see read IOPs increase by up to 60%. But to my surprise, write IOPs _decreases_ by up to 25%. dtrace suggests that the CPU usage is dominated by the vmem_free call in biodone, as in the below stack. kernel`lock_delay+0x32 kernel`biodone+0x88 kernel`g_io_deliver+0x214 geom_eli.ko`g_eli_write_done+0xf6 kernel`g_io_deliver+0x214 kernel`md_kthread+0x275 kernel`fork_exit+0x7e kernel`0xffffffff8104784e I only have one idea for how to improve things from here. The geli thread pool is still fed by a single global bio queue. That could cause cache thrashing, if bios get moved between cores too often. I think a superior design would be to use a separate bio queue for each geli thread, and use work-stealing to balance them. However, 1) That doesn't explain why this change benefits reads more than writes, and 2) work-stealing is hard to get right, and I can't find any examples in the kernel. Can anybody offer tips or code for implementing work stealing? Or any other suggestions about why my write performance is suffering? I would like to get this change committed, but not without resolving that issue. -Alan