From owner-freebsd-fs@freebsd.org Sat Oct 6 15:42:22 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6B47810BD3BA for ; Sat, 6 Oct 2018 15:42:22 +0000 (UTC) (envelope-from list_freebsd@bluerosetech.com) Received: from echo.brtsvcs.net (echo.brtsvcs.net [IPv6:2607:f740:c::4ae]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1073078547 for ; Sat, 6 Oct 2018 15:42:21 +0000 (UTC) (envelope-from list_freebsd@bluerosetech.com) Received: from chombo.houseloki.net (unknown [IPv6:2601:1c2:1402:1771:6a05:caff:fe2d:8b38]) by echo.brtsvcs.net (Postfix) with ESMTPS id 8022A38D09; Sat, 6 Oct 2018 08:42:13 -0700 (PDT) Received: from [IPv6:fe80::dd2a:ba9e:2d4a:7c5f] (unknown [IPv6:fe80::dd2a:ba9e:2d4a:7c5f]) by chombo.houseloki.net (Postfix) with ESMTPSA id 5C1D3B3CF; Sat, 6 Oct 2018 08:42:12 -0700 (PDT) Subject: Re: ZFS/NVMe layout puzzle To: Garrett Wollman , freebsd-fs@freebsd.org References: <23478.24397.495369.226706@khavrinen.csail.mit.edu> From: Mel Pilgrim Message-ID: <775752d2-1e66-7db9-5a4f-7cd775e366a6@bluerosetech.com> Date: Sat, 6 Oct 2018 08:42:10 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <23478.24397.495369.226706@khavrinen.csail.mit.edu> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2018 15:42:22 -0000 On 2018-10-04 11:43, Garrett Wollman wrote: > Say you're using an all-NVMe zpool with PCIe switches to multiplex > drives (e.g., 12 4-lane NVMe drives on one side, 1 PCIe x8 slot on the > other). Does it make more sense to spread each vdev across switches > (and thus CPU sockets) or to have all of the drives in a vdev on the > same switch? I have no intuition about this at all, and it may not > even matter. (You can be sure I'll be doing some benchmarking.) > > I'm assuming the ZFS code doesn't have any sort of CPU affinity that > would allow it to take account of the PCIe topology even if that > information were made available to it. In this scenario, the PCIe switch takes the role of an HBA in terms of fault vulnerability.