From owner-freebsd-threads@FreeBSD.ORG Tue Apr 24 13:54:51 2012 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A7E89106566C for ; Tue, 24 Apr 2012 13:54:51 +0000 (UTC) (envelope-from martin@lispworks.com) Received: from lwfs1-cam.cam.lispworks.com (mail.lispworks.com [193.34.186.230]) by mx1.freebsd.org (Postfix) with ESMTP id B1F8B8FC08 for ; Tue, 24 Apr 2012 13:54:50 +0000 (UTC) Received: from higson.cam.lispworks.com (higson [192.168.1.7]) by lwfs1-cam.cam.lispworks.com (8.14.3/8.14.3) with ESMTP id q3ODhfUp016214; Tue, 24 Apr 2012 14:43:41 +0100 (BST) (envelope-from martin@lispworks.com) Received: from higson.cam.lispworks.com (localhost.localdomain [127.0.0.1]) by higson.cam.lispworks.com (8.14.4) id q3ODhf9U032687; Tue, 24 Apr 2012 14:43:41 +0100 Received: (from martin@localhost) by higson.cam.lispworks.com (8.14.4/8.14.4/Submit) id q3ODhe2C032683; Tue, 24 Apr 2012 14:43:40 +0100 Date: Tue, 24 Apr 2012 14:43:40 +0100 Message-Id: <201204241343.q3ODhe2C032683@higson.cam.lispworks.com> From: Martin Simmons To: Konstantin Belousov In-reply-to: <20120423130343.GT2358@deviant.kiev.zoral.com.ua> (message from Konstantin Belousov on Mon, 23 Apr 2012 16:03:43 +0300) References: <20120423084120.GD76983@zxy.spb.ru> <20120423094043.GS32749@zxy.spb.ru> <20120423113838.GT32749@zxy.spb.ru> <20120423120720.GS2358@deviant.kiev.zoral.com.ua> <20120423130343.GT2358@deviant.kiev.zoral.com.ua> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: freebsd-threads@freebsd.org, jack.ren@intel.com Subject: Re: About the memory barrier in BSD libc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Apr 2012 13:54:51 -0000 >>>>> On Mon, 23 Apr 2012 16:03:43 +0300, Konstantin Belousov said: > > On Mon, Apr 23, 2012 at 08:33:05PM +0800, Fengwei yin wrote: > > On Mon, Apr 23, 2012 at 8:07 PM, Konstantin Belousov > > wrote: > > > On Mon, Apr 23, 2012 at 07:44:34PM +0800, Fengwei yin wrote: > > >> On Mon, Apr 23, 2012 at 7:38 PM, Slawa Olhovchenkov wrote: > > >> > On Mon, Apr 23, 2012 at 07:26:54PM +0800, Fengwei yin wrote: > > >> > > > >> >> On Mon, Apr 23, 2012 at 5:40 PM, Slawa Olhovchenkov wrote: > > >> >> > On Mon, Apr 23, 2012 at 05:32:24PM +0800, Fengwei yin wrote: > > >> >> > > > >> >> >> On Mon, Apr 23, 2012 at 4:41 PM, Slawa Olhovchenkov wrote: > > >> >> >> > On Mon, Apr 23, 2012 at 02:56:03PM +0800, Fengwei yin wrote: > > >> >> >> > > > >> >> >> >> Hi list, > > >> >> >> >> If this is not correct question on the list, please let me know and > > >> >> >> >> sorry for noise. > > >> >> >> >> > > >> >> >> >> I have a question regarding the BSD libc for SMP arch. I didn't see > > >> >> >> >> memory barrier used in libc. > > >> >> >> >> How can we make sure it's safe on SMP arch? > > >> >> >> > > > >> >> >> > /usr/include/machine/atomic.h: > > >> >> >> > > > >> >> >> > #define mb() š š__asm __volatile("lock; addl $0,(%%esp)" : : : "memory") > > >> >> >> > #define wmb() š __asm __volatile("lock; addl $0,(%%esp)" : : : "memory") > > >> >> >> > #define rmb() š __asm __volatile("lock; addl $0,(%%esp)" : : : "memory") > > >> >> >> > > > >> >> >> > > >> >> >> Thanks for the information. But it looks no body use it in libc. > > >> >> > > > >> >> > I think no body in libc need memory barrier: libc don't work with > > >> >> > peripheral, for atomic opertions used different macros. > > >> >> > > >> >> If we check the usage of __sinit(), it is a typical singleton pattern which > > >> >> needs memory barrier to make sure no potential SMP issue. > > >> >> > > >> >> Or did I miss something here? > > >> > > > >> > What architecture with cache incoherency and FreeBSD support? > > >> > > >> I suppose it's not related with cache inchoherency (I could be wrong). > > >> It's related > > >> with reorder of instruction by CPU. > > >> > > >> Here is the link talking about why need memory barrier for singleton: > > >> http://www.oaklib.org/docs/oak/singleton.html > > >> > > >> x86 has strict memory model and may not suffer this kind of issue. But > > >> ARM need to > > >> take care of it IMHO. > > > > > > Please note that __sinit is idempotent, so double-initialization is not > > > an issue there. The only possible problematic case would be other thread > > > executing exit and not noticing non-NULL value for __cleanup while current > > > thread just set it. > > > > > > I am not sure how much real this race is. Each call to _sinit() is immediately > > > followed by a lock acquire, typically FLOCKFILE(), which enforces full barrier > > > semantic due to pthread_mutex_lock call. The exit() performs __cxa_finalize() > > > call before checking __cleanup value, and __cxa_finalize() itself locks > > > atexit_mutex. So the race is tiny and probably possible only for somewhat > > > buggy applications which perform exit() while there are stdio operations > > > in progress. > > > > > > Also note that some functions assign to __cleanup unconditionally. > > > > > > Do you see any real issue due to non-synchronized access to __cleanup ? > > > > No. I didn't see real issue. I am just reviewing the code. > > > > If you don't think __sinit has issue, let's check another code: > > line 68 in libc/stdio/fclose.c > > line 133 in libc/stdio/findfp.c (function __sfp()) > > > > Which is trying to free a fp slot by assign 0 to fp->_flags. But if > > the instrucation > > could be re-ordered, another CPU could see fp->_flags is assigned to 0 > > before the > > cleanup from line 57 to 67. > > > > Let's say, if another CPU is in line 133 of __sfp(), it could see > > fp->_flags become > > 0 before it's aware of the cleanup (Line 57 to line 67 in > > libc/stdio/fclose.c) happen. > > > > Note: the mutex of FUNLOCKFILE(fp) in line 69 of libc/stdio/fclose.c > > just could make sure > > line 70 happen after line 68. It can't impact the re-order of line 57 > > ~ line 68 by CPU. > > Yes, FUNLOCKFILE() there would have no effect on the potential CPU reordering > of the writes. But does the order of these writes matter at all ? > > Please note that __sfp() reinitializes all fields written by fclose(). > Only if CPU executing fclose() is allowed to reorder operations so that > the external effect of _flags = 0 assignment can be observed before that > CPU executes other operations from fclose(), there could be a problem. > > This is definitely impossible on Intel, and I indeed do not know about > other architectures enough to reject such possibility. The _flags member > is short, so atomics cannot be used there. The easier solution, if this > is indeed an issue, is to lock thread_lock around _flags = 0 assignment > in fclose(). This can be a problem, even on Intel, because the compiler can reorder the stores. E.g. if I compile the following with gcc -O4 on amd64: struct foo { int x, y; }; int foo(struct foo *p) { int x = bar(); p->y = baz(); p->x = x; } then I get the following assembly language, which sets p->x before p->y: movq %rdi, %rbx call bar movl %eax, %ebp xorl %eax, %eax call baz movl %ebp, (%rbx) movl %eax, 4(%rbx) __Martin