From owner-freebsd-arch Fri Jan 18 16:58:10 2002 Delivered-To: freebsd-arch@freebsd.org Received: from prg.traveller.cz (prg.traveller.cz [193.85.2.77]) by hub.freebsd.org (Postfix) with ESMTP id 4F58837B404 for ; Fri, 18 Jan 2002 16:57:58 -0800 (PST) Received: from prg.traveller.cz (localhost [127.0.0.1]) by prg.traveller.cz (8.12.1[KQ-CZ](1)/8.12.1/pukvis) with ESMTP id g0J0vug8007567; Sat, 19 Jan 2002 01:57:56 +0100 (CET) Received: from localhost (mime@localhost) by prg.traveller.cz (8.12.1[KQ-CZ](1)/pukvis) with ESMTP id g0J0vuSv007564; Sat, 19 Jan 2002 01:57:56 +0100 (CET) Date: Sat, 19 Jan 2002 01:57:56 +0100 (CET) From: Michal Mertl To: Terry Lambert Cc: arch@FreeBSD.ORG Subject: Re: 64 bit counters again In-Reply-To: <3C48A0E7.F97BC01@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 18 Jan 2002, Terry Lambert wrote: > > > The additional locks required for i386 64 bit atomicity will, > > > if the counter is accessed by more than one CPU, result in > > > bus contention for inter-CPU coherency. > > > > What additional locks? The lock prefix for cmpxchg8b? It's required for 32 > > bit too and it increases time spent on operation from 3 to 21 clocks > > making the difference between 32 and 64 bit "only" 29 clocks instead on > > 47. > > The additional locks on PPC, SPARC, and Alpha. Do I understand correctly that 64 bit atomic operations are significantly more expensive/impossible without lock on these platforms? That sounds strange for 64 bit platforms. > THe lock also is a barrier instruction. You need to read the > Intel programming guid on barrier instructions. On a P4, it Yes. That's what I described with "(memory bandwidth with lock operations?) will suffer". > > > > What do you mean by that? Zero-copy operation? Like sendfile? Is Apache > > > > 1.x zero-copy? > > > > > > Yes, zero copy. Sendfile isn't ideal, but works. Apache is > > > not zero copy. The idea is to not include a lot of CPU work > > > on copies between the user space and the kernel, which aren't > > > going to happen in an extremely optimized application. > > > > An "extremely optimized" application is a thing which would have > > an administrator who doesn't enable costly counters. > > No. If we are talking a BSD-based embedded system, then it's just > one written by someone who was not playing at being an engineer > (assuming the performance requirements were there; otherwise, > their just an engineer who went after the low hanging fruit, and > it's a legitimate design decision). > Yes if the default mode of operation of mentioned counters is going to stay the same as today. I'd like to stop this thread because you're still explaining me why 64 bit is expensive when I already switched the subject. I don't push anyone to 64 bit. What I'm offering now is: 1) 64 bit atomic ops which probably could be easily added - probably even to base tree (/machine/atomic.h). 2) "API" for counters which I think can help people to change most/all counter accesses to use the right operation (what's right depends on the actual counter and/or point on kernel codepath, where it's occuring - I don't know what's right - you say we always need atomicity) - it can be atomic, "simple", per-cpu (this may be hard to implement but with inlining it should be possible) or whatever). It's everyones decision to #define what to use. I would keep default 32 bit "simple" - same performance - same (potencial) problems. I'll polish my patch to STABLE and post a link to hackers. > > > Well, you probably should collect *all* statistics you can, > > > in the most "this is the only thing I'm doing with the box" > > > way you can, before and after the code change, and then plot > > > the ones that get worse (or better) as a result of the change. > > > > Will do eventually, but unfotunately don't have the time to devote to it > > at the moment. > > I think it's a requirement to advocate this change. No if it doesn't go into the standard tree and no if the actual functionality/performance isn't affected by default. > > > I think the answer is "yes, we need atomic counters". Whether they > > > need to be 64 bit or just 32 bit is really application dependent > > > (we have all agreed to that, I think). > > > > Thanks. Do you think it's always true (STABLE/CURRENT,network device > > ISRs, /sys/netinet routines) ? > > I think it's true of all open-ended counters, where there is > a risk of overflow if they are 32 bit, and some application > could be bitten by the overflow, and still be consideted to > be "well written"... in other words, anywhere overflow is > *expected*. I don't understand again. > > > See Bruce's posting about atomicity; I think it speaks very > > > eleoquently on the issue (much more brief than what I'd write > > > to say the same thing ;^)). > > > > If you mean the email where he talks about atomic_t ("atomic_t would be > > "int" if anything") it doesn't fully apply. I am not inventing atomic_t > > anymore anyway :-). Isn't there a platform, which better works with 64 bit > > ints than with 32 bits (a-la 32/16 bits on modern i386)? > > Yes. IA64. SPARC 9b (SPARC64) and Alpha, which are 64 > bits, require locks, since they don't have the ability to > do an atomic "lock; cmpxchg8b". Can they do "lock; add const,(mem)" in 32 or 64 bit? I suppore not. I don't care about cmp part - we need addition. cmpxchg8b is used because that's the only way on X86 CPU to access (read/modify/write) 64 bit memory in one operation (without using FPU). -- Michal Mertl mime@traveller.cz To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message