Date: Thu, 31 Aug 1995 06:32:27 -0700 (PDT) From: "Rodney W. Grimes" <rgrimes@gndrsh.aac.dev.com> To: marino.ladavac@aut.alcatel.at Cc: hardware@freebsd.org Subject: Re: Upgrade to my machine Message-ID: <199508311332.GAA11309@gndrsh.aac.dev.com> In-Reply-To: <9508311224.AA23659@atuhc16.aut.alcatel.at> from "marino.ladavac@aut.alcatel.at" at Aug 31, 95 02:24:49 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > Rod Grimes wrote: > > > > > > Rod Grimes wrote: > > > > [ is the L1 cache static? ] > > > Not possitive on the true staticness of it. Though the 486 can not > > change clock rates, the SL enhance 486 can, as well as all Pentiums. > > This is the clock stop SL/enhancement features. To change clock > > frequencies you do a stop clock, shut the clock clear down, the start > > the clock up at the new frequency. > > > > > > Furthermore, since the L1 cache is on the same chip with the rest of the > > > CPU, refresh can be done completely transparently in the cycles when > > > cache is not read/written to. However, I do not know if there are such > > > spare cycles in Pentium case; there should be some at '486. > > > I am pretty sure they are not fully dynamic, thus they do not require > > refresh, as fully dynamic would mean having to do a write back cycle > > after the destructive read that dynamic memory cells use. (You dump > > the stored charge of the cell into the bit line when you read a DRAM > > cell). > > Since I do not know much about chip innards, and manufacturing technologies, > this question is really a shot in the dark, but: > > would it be possible to implement the cache in a following manner: cache > itself is dynamic. The line that is presently read is buffered in some > fully static memory, and refreshed after the read completes. Basically, > this implies a L0 cache, fully static but very small. The likelihood that > the same L1 cache line will be read in the next cycle is small, and if it > occurs, the data is still available in L0 cache. It should be sufficient > to have only a few lines of L0. If the L1 cache is being refreshed and > the requested lines are not available in L0, read is stalled (cannot be > noticed by the user as cycle counting has been progressively impossible > with introduction of caches and pipelines.) That is the standard implementation of static column dynamic memory. You have what are known as the the static column data latches used to refresh the DRAM cells during the precharge period. Thing is you can not run any other read or write cycle even if you had a second set of L0 latches as the bit lines are shared. You would have to have 2 bit lines and 2 transfer transistors per storage cell, and that would double the size of your array. The cache array in the pentium is already extremly compex on the data side as it can dual issue data (one to each pipe) and due a store all in 1 clock cycle. This is known as tri ported memory, and given a closer look at the Pentium die photograph suspect the cache is using close to 1.5M transitors, 1M for the data cache and .5M for the instruction cache. > > My sighted data was for the 486DX2, not the 486DX4, sorry for leaving > > that detail out (I thought I had it in there, but above it appears > > missing :-(). > > My mistake. You did, in the previous mail. Okay, I am not looseing my mind then :-), though I have made an order of magnitude error in million vs billion. > > > Well, a rough guess can be made from the die size and the manufacturing > > > process. This way one could get the high limit (connections ignored in > > > favor of transistors.) > > > > > > The die is, what, 11 mm by 10 mm? > > Not noted in the data books, not avaliable in die form :-(. > > > > Process is .35 micron? > > A80502-60 and 66 are 5V 0.8 micron technology sporting ``3.1M transistors'' > > A80502-75/90/100 are 3.3V 0.6 micron technology sporting ``3.3M transistors' > > > > Let's say > > > that a 3x3 grid can house a transistor (can it? I have no idea) then > > > you can put cca. 1 million of them onto 1 square mm. There is about > > > 110 square mm of area available. > > > Smallest transistor I could build with 0.6 micron technology is 1.8 micron > > by 1.2 micron, and that is assuming multiple transistors in the same well, > > quite common in cmos logic design. > > So, it would seem that one needs at least a 4x3 grid to house a transistor > (we need some room between transistors, right? Actaully no, when building something like a 3 input CMOS nand gate the source of one transistor is the drain of the next so I need a 3 gate P-channel and a 3 gate N-channel. That takes 3 * 0.8 for the gate widths, 4*0.8 for the source, the drain, and the shared S/D's in the middle. So width is 7*0.8, length is fixed at 2 * 0.8 (I don't have an Intel process spec, but am assuming from typical high density BiCMOS process work I have done in the past that cell lenghts are typically 2 to 4 times the feature size.) So my 6 transistor 3 input nand gate is 2 * ((7 * 0.8) + (2 * 0.8)) or 14.4 microns square, or 2.4 microns square per transitor. We should probably assume this to be a reasonble areal usage rate. > Or did you include that gap in the numbers above?) See above, it is not that simple :-) > This gives about 230,000 transistors per sq. mm. > for a total of cca. 25 million on a 11x10 mm die. > > This would be the upper theoretical limit. I would guess that the cache is > really implemented (at least partially) dynamically. Nope, I dont think so, I can do a tri ported qusi static 6 trans/cell 8Kx8 (really it is a 1kx64) static cache in about 1 million transistors using TGMX's style read columns and decoders. Wish I had my DN5500 and mentor tools up and running, I would go lay one out and run a gate/transitor count report on it just as a final sanity check to my back of the envelope calculations :-). -- Rod Grimes rgrimes@gndrsh.aac.dev.com Accurate Automation Company Reliable computers for FreeBSD
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199508311332.GAA11309>