From owner-freebsd-current  Thu Nov  2 12:41:52 1995
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id MAA15042
          for current-outgoing; Thu, 2 Nov 1995 12:41:52 -0800
Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id MAA15030
          for <current@FreeBSD.ORG>; Thu, 2 Nov 1995 12:41:48 -0800
Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.9/8.6.9) id HAA02262; Fri, 3 Nov 1995 07:41:19 +1100
Date: Fri, 3 Nov 1995 07:41:19 +1100
From: Bruce Evans <bde@zeta.org.au>
Message-Id: <199511022041.HAA02262@godzilla.zeta.org.au>
To: current@FreeBSD.ORG, uhclem%nemesis@fw.ast.com
Subject: Re: Time problems
Sender: owner-current@FreeBSD.ORG
Precedence: bulk

>[3]Perhaps the 8254 clock is being accessed too fast.  Try adding some delays
>[3]before each inb() and outb() in clock.c:getit().  Count to 100 or so to
>[3]get at least 1 usec delay.

>Uh, I don't think this will work as you expect on a Pentium or a P6. It is
>too easy for the parallel integer unit(s) to execute the inb/outbs in one
>unit and do the nice delay loop in the other, thus wrecking your timing delay.
>On the Pentium and up you must force these types of "timed" instruction
>sequences to be done sequentially.

It worked :-).  Perhaps the compiler allocated the same register for
the loop counter as for the inb.  But even if the same register is,
used, there's nothing to stop the following optimization:

	movl	$1000,%eax
1:	decl	%eax
	jne	1b
	inb	$0x43,%al

to:
	unit 1				unit 2
	------				------
	movl	$1000000,%eax		inb	$0x43,%hiddenreg
1:	decl	%eax			/* stall */
	jne	1b			/* stall */
					movl	%hiddenreg,%al

except that a lot of code might break.

>I am chasing a bug in the printer driver (-STROBE pulse timing is out of spec)
>as we write that is probably caused by someone assuming that *all* the
>instructions would execute in the order they were coded.  Not anymore.

I/O instructions must be executed sequentially.  Someone assumed that
back to back i/o instructions take longer than than 0.5 usec.  They do
take much longer than that for 8MHz ISA buses, and code to check whether
a further delay is necessary would make ISA systems even slower.

>In *general* you are guaranteed that IN and OUT instructions will generate
>-IOR and -IOW cycles in the order they were coded, but any code that has no
>dependencies/effects on the IN/OUT opcodes can be executed out of order in
>relation to the IN/OUT opcodes.  So if the purpose of that code is to

I think ordering is guaranteed for i/o to uncached memory too.

>If the inbs and outbs were actually calls to inbs and outbs rather
>than being inline, the serialization problems tend to go away.  At least
>on the Pentium.  This may not be the case on the P6 and other processors
>with multiple execution units.

Perhaps both jmp and call synchronize everything?  Then the loop would
work.  I'm also depending on the gcc feature of not optimizing away
delay loops.

Bruce