From owner-svn-src-all@FreeBSD.ORG Wed Dec 22 18:02:28 2010 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24621106566C; Wed, 22 Dec 2010 18:02:28 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id B18798FC17; Wed, 22 Dec 2010 18:02:27 +0000 (UTC) Received: from c122-107-113-54.carlnfd1.nsw.optusnet.com.au (c122-107-113-54.carlnfd1.nsw.optusnet.com.au [122.107.113.54]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id oBMI2O7n014446 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 23 Dec 2010 05:02:25 +1100 Date: Thu, 23 Dec 2010 05:02:24 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jung-uk Kim In-Reply-To: <201012221202.07774.jkim@FreeBSD.org> Message-ID: <20101223041210.B2346@besplex.bde.org> References: <201012220018.oBM0IgOg079632@svn.freebsd.org> <201012220803.12023.jhb@freebsd.org> <201012221202.07774.jkim@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, John Baldwin Subject: Re: svn commit: r216634 - in head/sys/amd64: amd64 ia32 include linux32 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Dec 2010 18:02:28 -0000 On Wed, 22 Dec 2010, Jung-uk Kim wrote: > On Wednesday 22 December 2010 08:03 am, John Baldwin wrote: >> On Tuesday, December 21, 2010 7:18:42 pm Jung-uk Kim wrote: >>> Log: >>> Improve PCB flags handling and make it more robust. Add two >>> new functions for manipulating pcb_flags. These inline functions >>> are very similar to atomic_set_char(9) and atomic_clear_char(9) >>> but without unnecessary LOCK prefix for SMP. Add comments about >>> the rationale[1]. Use these functions wherever possible. >>> Although there are some places where it is not strictly necessary >>> (e.g., a PCB is copied to create a new PCB), it is done across >>> the board for sake of consistency. Turn pcb_full_iret into a PCB >>> flag as it is safe now. Move rarely used fields before pcb_flags >>> and reduce size of pcb_flags to one byte. Fix some style(9) nits >>> in pcb.h while I am in the neighborhood. >>> >>> Reviewed by: kib >>> Submitted by: kib[1] >>> MFC after: 2 months >> >> Is there really a need to have the flags field be a char instead of >> an int or long? It seems to me that flags fields in general should >> be an int unless there is a strong need otherwise (e.g. >> hardware-defined flag). 'orl' will work just as well as 'orb'. > > Actually, there was little disagreement between kib and me and > committed version was fifth reincarnation of original patch. I I would have a big disagreement :-). I don't like using macros or functions to obfuscate setting and clearing of boolean flags. Here the function may be needed for locking, but... I already asked kib not to use a function for (partially) managing 2 FPU flags (there should be only one of these and then the management becomes simple boolean again). Putting pcb_full_iret in with the other flags gives a tiny pessimization -- it now takes a load-modify-store to set it, where it only took a store. I think this and any other flags that need special management should be in a separate struct member (possibly in a separate cache line) like pcb_full_iret was. This corresponds to td_flags being separate from td_pflags. Accesses to these td flags are not obfuscated. Many of the accesses still aren't atomic unless there is higher-level locking, since they are of the form { set one flag; clear another; }. It's unclear if there is any higher-level locking for the FPU flags, whose management has become quite complicated: - fpudna(), fpudrop(): everything is locked by a critical section. This was needed only for the fpucurthread switch. It is unclear if it is now also needed for the flags management. Here the flags must be managed inside the critical section due to the ordering. - fpugetregs(): does all flags managament outside of its critical sections - fpuuserinited(): in critical section iff caller provided one (same for fpudrop(), but it asserts that the caller provided one) - fpusetregs(): carefully exits from its critical section before managing flags. Now the ordering permits this. - fpu_kern_enter(): no critical sections in sight (but when it abuses fpuexit() to enter, fpuexit() will use one. - fpuexit(): doesn't manage flags. Probably not needed, especially since it is never used to exit (thread exit abuses fpudrop() instead), and when it is used to enter as above, its caller manages the flags. - fpu_kern_leave(): carefully exits from its critical section before managing flags. This is the main fpu place that has to set one flag and clear another. - fpu_kern_thread(): no critical section in sight for flags management. - is_fpu_kern_thread(): only reads flags. Since there is no locking in sight, the flags are presumably non-volatile. The one in td_pflags certainly is, as is made clear by documented locking for td_pflags. The one in pcb_flags is presumably the same, so it very naturally belongs in a pcb analog of td_pflags. > originally used 'u_int pcb_flags' but used byte byte opcodes for > assembly to make it consistent with compiler generated code. kib Unfortuneatly its easier for the compiler to maintain the optimization of doing byte accesses if that is good. > disliked the inconsistencies and told me to reconsider. After > several versions, I proposed we use char and move forward for now, > then we increase the size if it ever grows more than 8 bits. So, we > settled. I have no problem with int version if there is no > measurable performance change. I'll test it soon. I've never seen any performance loss from using 32-bit accesses, except with the original i386 since it had no cache and a worse instruction fetcher so that even the extra length of 32-bit immediate operands slowed it down. With not-so-old CPUs it is the 8-bit accesses that are more likely to be slower, since they may be misaligned and/or cause store-to-load mismatch penalties. There are a couple of (read-only) accesses to td_flags and td_pflags in asm code very need the pcb_full_iret ones. The former use simple testl's. Bruce