From owner-cvs-all@FreeBSD.ORG Sun Jun 29 08:36:33 2008 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6CBC1106566C for ; Sun, 29 Jun 2008 08:36:33 +0000 (UTC) (envelope-from christoph.mallon@gmx.de) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by mx1.freebsd.org (Postfix) with SMTP id AF1908FC0A for ; Sun, 29 Jun 2008 08:36:32 +0000 (UTC) (envelope-from christoph.mallon@gmx.de) Received: (qmail invoked by alias); 29 Jun 2008 08:36:31 -0000 Received: from p54A3F259.dip.t-dialin.net (EHLO tron.homeunix.org) [84.163.242.89] by mail.gmx.net (mp032) with SMTP; 29 Jun 2008 10:36:31 +0200 X-Authenticated: #1673122 X-Provags-ID: V01U2FsdGVkX1/j3Ot/+i3jHNzuWC7GuZVm5MOBvHSDgc9Fgv9xJ1 i2ZqHAyI/mEY7c Message-ID: <4867498D.5050409@gmx.de> Date: Sun, 29 Jun 2008 10:36:29 +0200 From: Christoph Mallon User-Agent: Thunderbird 2.0.0.9 (X11/20071230) MIME-Version: 1.0 To: Bruce Evans References: <200806252105.m5PL5AUp064418@repoman.freebsd.org> <48654667.1040401@gmx.de> <20080627222404.GJ1215@alchemy.franken.de> <48657058.6020102@gmx.de> <20080628121025.F89039@delplex.bde.org> <4865F89D.4090207@gmx.de> <20080629121025.K92490@delplex.bde.org> In-Reply-To: <20080629121025.K92490@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 Cc: cvs-src@freebsd.org, src-committers@freebsd.org, cvs-all@freebsd.org, Marius Strobl Subject: Re: cvs commit: src/sys/sparc64/include in_cksum.h X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jun 2008 08:36:33 -0000 Bruce Evans wrote: >>> Right. Though I've never seen unnecessary's __volatiles significantly >>> affecting i386 code. This is because the code in the asms can't be >>> removed completely, and can't be moved much either. With out of order >>> execution, the type of moves that are permitted (not across >>> dependencies) > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> are precisely the type of moves that the CPU's scheduler can do or undo >>> no matter how the compiler orders the code. >> >> I disagree. For example look at the use of in_addword() in >> dev/sk/if_sk.cv in line 2819: >> csum1 = htons(csum & 0xffff); >> csum2 = htons((csum >> 16) & 0xffff); >> ipcsum = in_addword(csum1, ~csum2 & 0xffff); >> /* checksum fixup for IP options */ >> len = hlen - sizeof(struct ip); >> if (len > 0) { >> return; >> } >> >> The calculation will be executed even if the following if (len > 0) >> leaves the function and the value of ipcsum is unused. >> If in_addword() is not marked volatile it can be moved after the if >> and not be executed in all cases. csum1 and csum2 can be moved after >> the if, too. > > No, volatile has no effect on whether the above calculation will be > executed, since the early return has no dependencies on the caclulation. The volatile induces a dependency. > Old versions of gcc used to handle volatile like that, but this changed > in gcc-3 or earlier. gcc.info now says: > > % The `volatile' keyword indicates that the instruction has important > % side-effects. GCC will not delete a volatile `asm' if it is reachable. > ^^^^^^^^^^^^^^^^^^^ This is not about whether the code is reachable or not (it is reachable), it is about whether the result is used (i.e. whether the code is dead). > % (The instruction can still be deleted if GCC can prove that > % control-flow will never reach the location of the instruction.) Note > % that even a volatile `asm' instruction can be moved relative to other > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > % code, including across jump instructions. For example, on many targets > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ jump != conditional jump. If it moves the *volatile* asm statement across the if, it would only appear on *some* execution paths, which is wrong. It is perfectly fine to move it, when the statement is not volatile, though. > Even if gcc didn't move the caclulation, then CPUs with out of order > execution might schedule it so that it is effectively never executed > (most likely by executing it in otherwise-unused pipelines while the > main pipeline returns). This is valid for the same reasons that gcc > can move the volatile asms -- the return doesn't depend on the result > of the caclulation. This is for the CPU to decide. If the assembler block really contains "important" stuff like memory barriers, writes to machine control registers etc., the CPU will not "effectively never execute" the code. The compiler does not know this, all it sees is the word "volatile". > The above C code is fairly bad, but generates not so bad code on i386: > > % % movl %esi, %eax > % #APP > % xchgb %ah, %al # byte operations can be slow; this one not > # too bad, but I wonder if rorw $8 is better > # (rorl $16 is already used for corresponding > # 32-bit operations) where there is no xchg > # alternative > % #NO_APP And this again is an example why not to use inline assembler, but let the compiler decide this: unsigned short swap16(unsigned short x) { return x >> 8 | x << 8; } is compiled to swap16: movl 4(%esp), %eax rolw $8, %ax ret The compiler is even able to do optimisations, which it absolutely cannot do, if inline assembler is used, example: unsigned short id(unsigned short x) { return swap16(swap16(x)); } results in id: movl 4(%esp), %eax ret Maybe the MD htons() macros should be replaced by MI code. > % shrl $16, %esi > % movl %esi, %edx > % #APP > % xchgb %dh, %dl # as above > % #NO_APP > % notl %edx # poor asm code -- the top 16 bits are unused > # except here to stall for merging them with > # the previous byte operation The compiler simply does not know, that the inline assembler only operates on parts of the register. Another reason not to use inline assembler: u_int g(u_short x) { return ~swap16(x); } g: movl 4(%esp), %eax rolw $8, %ax movzwl %ax, %eax # avoid stall notl %eax ret Regards Christoph