From owner-freebsd-hackers@FreeBSD.ORG Fri Jun 28 22:15:26 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 467EA2C2; Fri, 28 Jun 2013 22:15:26 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ea0-x22f.google.com (mail-ea0-x22f.google.com [IPv6:2a00:1450:4013:c01::22f]) by mx1.freebsd.org (Postfix) with ESMTP id A9C9F1945; Fri, 28 Jun 2013 22:15:25 +0000 (UTC) Received: by mail-ea0-f175.google.com with SMTP id z7so1273456eaf.20 for ; Fri, 28 Jun 2013 15:15:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=d0KwWyRhazjX8nas5jCUtCFTbsIGXua5hjGKpYY5LIA=; b=a3nklq2CkqisLxTSZJYgVivL2TSv+Mxf0WMDw+DdKyFlppfOoVceRJwum8wnvCOab2 1nTDqVnTG0E3XIkpjYscKeg9GQTcuvsVPtReV1DpJMXMp/ZDhlTwpAwYq3KQplD30ZBk w44LaxXL1Fnlkw78wGhsfUrsQInd9nnwEzLrj6OsORqIkqw0ul1B6r79h5ilNOjG+ojR Dj+xMyO+Oa3nBsd2DCB750QWgZqMhxv1dwHgI30hIU2+g/qvqXHv1BuHpLWTrSLcteQP cbJzuvffrRyLZXPc9eUm24MeB0Hmt4VI7b7L/KuSMs/PgQsbJDAlqS8okozT2uf+VTqr WEHw== X-Received: by 10.14.100.2 with SMTP id y2mr15556761eef.75.1372457724616; Fri, 28 Jun 2013 15:15:24 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by mx.google.com with ESMTPSA id bj46sm13140611eeb.13.2013.06.28.15.15.22 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 28 Jun 2013 15:15:23 -0700 (PDT) Sender: Alexander Motin Message-ID: <51CE0AF7.6090906@FreeBSD.org> Date: Sat, 29 Jun 2013 01:15:19 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130616 Thunderbird/17.0.6 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: b_freelist TAILQ/SLIST References: <51CCAE14.6040504@FreeBSD.org> <20130628065732.GL91021@kib.kiev.ua> In-Reply-To: <20130628065732.GL91021@kib.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Adrian Chadd , hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jun 2013 22:15:26 -0000 On 28.06.2013 09:57, Konstantin Belousov wrote: > On Fri, Jun 28, 2013 at 12:26:44AM +0300, Alexander Motin wrote: >> While doing some profiles of GEOM/CAM IOPS scalability, on some test >> patterns I've noticed serious congestion with spinning on global >> pbuf_mtx mutex inside getpbuf() and relpbuf(). Since that code is >> already very simple, I've tried to optimize probably the only thing >> possible there: switch bswlist from TAILQ to SLIST. As I can see, >> b_freelist field of struct buf is really used as TAILQ in some other >> places, so I've just added another SLIST_ENTRY field. And result >> appeared to be surprising -- I can no longer reproduce the issue at all. >> May be it was just unlucky synchronization of specific test, but I've >> seen in on two different systems and rechecked results with/without >> patch three times. > This is too unbelievable. Could it be, e.g. some cache line conflicts > which cause the trashing, in fact ? I think it indeed may be a cache trashing. I've made some profiling for getpbuf()/relpbuf() and found interesting results. With patched kernel using SLIST profiling shows mostly one point of RESOURCE_STALLS.ANY in relpbuf() -- first lock acquisition causes 78% of them. Later memory accesses including the lock release are hitting the same cache line and almost free. With "clean" kernel using TAILQ I see RESOURCE_STALLS.ANY spread almost equally between lock acquisition, bswlist access and lock release. It looks like the cache line is constantly erased by something. My guess was that patch somehow changed cache line sharing. But several checks with nm shown that, while memory allocation indeed changed slightly, in both cases content of the cache line in question is absolutely the same, just shifted in memory by 128 bytes. I guess the cache line could be trashed by threads doing adaptive spinning on lock after collision happened. That trashing increases lock hold time and even more increases chance of additional collisions. May be switch from TAILQ to SLIST slightly reduces lock hold time, reducing chance of cumulative effect. The difference is not big, but in this test this global lock acquired 1.5M times per second by 256 threads on 24 CPUs (12xL2 and 2xL3 caches). Another guess was that we have some bad case of false cache line sharing, but I don't know how that can be either checked or avoided. At the last moment mostly for luck I've tried to switch pbuf_mtx from mtx to mtx_padalign on "clean" kernel. For my surprise that also seems fixed the congestion problem, but I can't explain why. RESOURCE_STALLS.ANY still show there is cache trashing, but the lock spinning has gone. Any ideas about what is going on there? -- Alexander Motin