Date: Sun, 25 May 2003 07:54:27 +1000 From: Peter Jeremy <peterjeremy@optushome.com.au> To: "David G. Lawrence" <dg@dglawrence.com> Cc: freebsd-stable@freebsd.org Subject: Re: 4GB limit with netstat Message-ID: <20030524215427.GA27340@cirb503493.alcatel.com.au> In-Reply-To: <20030522094306.GD60352@nexus.dglawrence.com> References: <20030521104442.G65751@prg.traveller.cz> <20030522092501.GZ10795@imhotep.yuckfou.org> <20030522094306.GD60352@nexus.dglawrence.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 22, 2003 at 02:43:06AM -0700, David G. Lawrence wrote: > I've forgotten the orginaly discussion last year - just how expensive >is it again to do a locked 64bit update on x86? If it is less than say >8x the time to do a 32bit increment, then we should probably just bite >the bullet and do it for the few counters where it makes sense (input >and output bytes and packets). On a 386 or 486, you can't do it - but since we don't support SMP, "addl %eax,counter; adcl %edx,counter+4" would work as long as the counter was not updated or referenced at interrupt level (and a di/ei pair would fix the interrupt problem without spoiling interrupt latency too much). On Pentium-and-above, you need to use a locked cmpxchg8b in a loop: movl update,%esi 1: movl %esi,%ebx xorl %ecx,%ecx movl counter,%eax movl counter+5,%edx addl %eax,%ebx adcl %edx,%ecx lock cmpxchg8b counter jnz 1b I'm not sure how much slower this is than lock addl %esi,counter but it's definitely a lot more code. Ignoring the 'lock', 8x slower would seem very optimistic but I think Matt once pointed out that the lock prefix is incredibly expensive on SMP so the overall performance degradation might not be so bad. Of course, there's the second-order effect of needing 4 additional work registers - which leaves only a single register unused by the operation. This means the compiler has to spill virtually all temporaries, further degrading performance. Of course, both approaches suffer from the problem that there's no easy way to atomically load a 64-bit value. This means you need to lock reads as well as writes - which will significantly increase overall complexity and reduce performance. Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030524215427.GA27340>