Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 May 2003 07:54:27 +1000
From:      Peter Jeremy <peterjeremy@optushome.com.au>
To:        "David G. Lawrence" <dg@dglawrence.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 4GB limit with netstat
Message-ID:  <20030524215427.GA27340@cirb503493.alcatel.com.au>
In-Reply-To: <20030522094306.GD60352@nexus.dglawrence.com>
References:  <20030521104442.G65751@prg.traveller.cz> <20030522092501.GZ10795@imhotep.yuckfou.org> <20030522094306.GD60352@nexus.dglawrence.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 22, 2003 at 02:43:06AM -0700, David G. Lawrence wrote:
>   I've forgotten the orginaly discussion last year - just how expensive
>is it again to do a locked 64bit update on x86? If it is less than say
>8x the time to do a 32bit increment, then we should probably just bite
>the bullet and do it for the few counters where it makes sense (input
>and output bytes and packets).

On a 386 or 486, you can't do it - but since we don't support SMP,
"addl %eax,counter; adcl %edx,counter+4" would work as long as the
counter was not updated or referenced at interrupt level (and a di/ei
pair would fix the interrupt problem without spoiling interrupt
latency too much).

On Pentium-and-above, you need to use a locked cmpxchg8b in a loop:
     movl update,%esi
  1: movl %esi,%ebx
     xorl %ecx,%ecx
     movl counter,%eax
     movl counter+5,%edx
     addl %eax,%ebx
     adcl %edx,%ecx
     lock cmpxchg8b counter
     jnz 1b
I'm not sure how much slower this is than
     lock addl %esi,counter
but it's definitely a lot more code.  Ignoring the 'lock', 8x slower
would seem very optimistic but I think Matt once pointed out that the
lock prefix is incredibly expensive on SMP so the overall performance
degradation might not be so bad.  Of course, there's the second-order
effect of needing 4 additional work registers - which leaves only a
single register unused by the operation.  This means the compiler has
to spill virtually all temporaries, further degrading performance.

Of course, both approaches suffer from the problem that there's no
easy way to atomically load a 64-bit value.  This means you need to
lock reads as well as writes - which will significantly increase
overall complexity and reduce performance.

Peter



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030524215427.GA27340>