From owner-freebsd-arch  Tue Jan 15  1:10:24 2002
Delivered-To: freebsd-arch@freebsd.org
Received: from prg.traveller.cz (prg.traveller.cz [193.85.2.77])
	by hub.freebsd.org (Postfix) with ESMTP
	id AD12F37B402; Tue, 15 Jan 2002 01:09:54 -0800 (PST)
Received: from prg.traveller.cz (localhost [127.0.0.1])
	by prg.traveller.cz (8.12.1[KQ-CZ](1)/8.12.1/pukvis) with ESMTP id g0F99qle001391;
	Tue, 15 Jan 2002 10:09:52 +0100 (CET)
Received: from localhost (mime@localhost)
	by prg.traveller.cz (8.12.1[KQ-CZ](1)/pukvis) with ESMTP id g0F99qJe001384;
	Tue, 15 Jan 2002 10:09:52 +0100 (CET)
Date: Tue, 15 Jan 2002 10:09:52 +0100 (CET)
From: Michal Mertl <mime@traveller.cz>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: "James E. Housley" <jeh@FreeBSD.ORG>,
	Terry Lambert <tlambert2@mindspring.com>,
	Thomas Hurst <tom.hurst@clara.net>, <arch@FreeBSD.ORG>
Subject: Re: 64 bit counters again
In-Reply-To: <20020114114911.A24990@technokratis.com>
Message-ID: <Pine.BSF.4.41.0201141848330.82342-100000@prg.traveller.cz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG

The more important part of the code (I should probably have used different
Subject) is to have all (some) kernel counters use an API which implements
different operations for counters being modified in different situations.
If the SYSCNTR_T is defined 32 and default branch in switch, which decides
what operation to do, would do regular addition, the kernel operates the
same as before with no performance hit and slightly less prone to
problems because operation isn't read-modify-write.

I wrote it so it's just matter of one #define change to have all counters
switch to 32 or 64 bit. It also would be possible to have something like
atomic funtctions - different counter functions for various counter sizes,
maybe even something like "use counter at least 32 bits wide - If we need
atomicity on <586 we can't simply implement these ops" - I agree that some
counters would have sufficient precision and scale in 32 bit.

You could have a counter, which can be modified by multiple CPUs, ISRs at
the same time - you would need atomic ops. You could also have a counter,
which is guaranteed to be modified only from single point in code where
there's no possibility other thing would try to do that at the same time.

If I understand correctly (and I won't be surprised if I am wrong, that's
why I asked some questions about that here a week or so ago), on STABLE
almost everything in kernel is very protected (Giant Mutex) - so we can
get away with simple implementation. Remember - our current STABLE (and
CURRENT) implementation uses just compiler generated operations (i+=10;).
That in some cases (SMP in some nonlocked part of kernel) probably isn't
right even in 32 bit. If compiler generates read, modify, write
instructions, it can probably go wrong even on UP. I give you a tool to
have these "bugs" cleanly fixed.

There can be performance penalty for this fix - if we need atomic ops and
don't use them and then start to use them, it's inevitable. But as I wrote
in my original post, I tried to roughly measure 32 bit implementation
(equivalent to what we have now), 32 bit atomic and 64 bit
(atomic/non-atomic) implementation in real world - pushing traffic through
an interface and saw no difference. Things can probably go quite a bit
worse, if atomic implementations included lock instructions (SMP) and
multiple simultaneous modifications were atempted, but I don't believe it
will occur often in real world. I'll measure SMP atomic lock;cmpxchg8b
anyway soon.

I kind of measured the speed of just different addition implementation
(DUAL pIII cumine, ServerWorks and 440BX dual boards with SMP kernel) -

64 bit simple (addl change,(mem),adcl 0,4(mem)): 8.22 clocks
64 bit atomic UP (cmpxchg8b): 34.92
64 bit atomic SMP (lock; cmpxchg8b): 49.68
32 bit simple (compiler generates read-modify-write - wrong but used
throughout kernel!!!, only when counter variable is declared volatile
compiler generates addl change,(mem)): 3.00
32 bit atomic UP (addl change,(mem)): 3.00
32 bit atomic SMP (lock; addl change,(mem)): 20.94

It seems you and others were right. SMP atomic implementations are a bit
expensive. But even the worst case 50 clocks for locked cmpxchg8b isn't
that bad but IMMV.

Anyway - again, what's there now is '32 bit simple' - it's either bad
(could be wrong esp. on SMP) or good enough. If it's bad it should be
changed. Easy way is using something like I proposed. If it's good than
'64 bit simple' is at least as good (it isn't read-modify-write) and only
bit more expensive (8 clocks against 3 - both very small numbers).


---

I totally disagree with Terry that 64 bit counter is only slightly more
difficult to overflow then 32 bit. We will never (in near future - several
years) have terabit interfaces nor wirespeed 10Gbit (and 10Gbit fullduplex
would overflow in 500 years). If you don't like 64 bit, your fix (to count
in KB) only adds 10 bits of range, switch to 64 bit adds 32 bit. I
understand that global counter could be made 64 bit a the counter you're
actually counting on 32 bit so you would add 42 bits. That won't overflow
for sure :-).


---

I deeply apologise that the patch I posted was badly broken. I revamped
the code quite a lot before doing the diff and it resulted in lots of
cut-n-paste and similar errors :-(. Kernel&world build went without
problems but the code was wrong (in atomic.h and syscntr.h). New patch
will be available soon.


-- 
Michal Mertl
mime@traveller.cz


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message