Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Feb 2003 03:44:11 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Dag-Erling Smorgrav <des@ofug.org>
Cc:        current@FreeBSD.ORG, <sos@FreeBSD.ORG>
Subject:   Re: ata dumps broken again
Message-ID:  <20030227031649.T15538-100000@gamplex.bde.org>
In-Reply-To: <xzp7kbnz3d9.fsf@flood.ping.uio.no>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 26 Feb 2003, Dag-Erling Smorgrav wrote:

> Top-of-tree -CURRENT:
>
> db> call doadump
> Dumping 639 MB
> ata1: resetting devices ..
> mi_switch(c4fad9ec,f,f,1c,5f74e) at mi_switch+0x21b
> ithread_schedule(c48fb380,1,c4faea50,e99cf84c,c025850c) at ithread_schedule+0xf6
> sched_ithd(f) at sched_ithd+0x38
> Xintr15() at Xintr15+0x6c
> --- interrupt, eip = 0xc017388b, esp = 0xe99cf830, ebp = 0xe99cf84c ---
> critical_exit(0,c489f900,c489f92c,e99cf884,c0128324) at critical_exit+0x2b
> DELAY(a,256c,82,40267d87,0) at DELAY+0x47
> ata_wait(c489f92c,40,0,0,0) at ata_wait+0x84
> ata_command(c489f92c,c6,0,0,10) at ata_command+0x2c5
> ad_reinit(c489f92c,c489f92c,ec) at ad_reinit+0x30
> ata_reinit(c489f900,c489f900,1,e99cf960,e99cf9a8) at ata_reinit+0x265
> addump(c48f3764,c02f67c0,0,18003c00,0,200) at addump+0xe8
> dumpsys(c02cee20,c02cee40,b,e99cf9f8,c016eec0) at dumpsys+0x28b
> doadump(0,0,0,0,0,0,0,0,0,0) at doadump+0x20
> db_fncall(0,0,e99cfaa8,e99cfa60,0) at db_fncall+0x7c
> db_command(c02a3380,c02a31a0,c029de74,c029de78,c028024d) at db_command+0xfb
> db_command_loop(0,0,e99cfc28,c02c1ec8,e99cfb4c) at db_command_loop+0x5c
> db_trap(c,0,1,10,e99cfbe0) at db_trap+0x5e
> kdb_trap(c,0,e99cfbe0) at kdb_trap+0xe6
> trap_fatal(e99cfbe0,c4,c4faea50,12ab9a0,0) at trap_fatal+0x1cc
> trap_pfault(e99cfbe0,0,c4) at trap_pfault+0x154
> trap(18,10,10,c7886300,c4caf500) at trap+0x38b
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc01e94fb, esp = 0xe99cfc20, ebp = 0xe99cfc60 ---
> in6_pcbbind(c4bc1390,c7886300,c4faea50) at in6_pcbbind+0x1fb
> tcp6_usr_bind(c4caf500,c7886300,c4faea50) at tcp6_usr_bind+0x9f
> sobind(c4caf500,c7886300,c4faea50,c4caf500,e99cfd14) at sobind+0x16
> kern_bind(c4faea50,3,c7886300,c7886300,0) at kern_bind+0x70
> bind(c4faea50) at bind+0x30
> syscall(2f,2f,2f,804a3e0,0) at syscall+0x310
> Xint0x80_syscall() at Xint0x80_syscall+0x1d
> --- syscall (104), eip = 0x280b1a63, esp = 0xbfbffa2c, ebp = 0xbfbffa88 ---
> Context switches not allowed in the debugger.
>
> (kgdb) l *(ad_reinit+0x30)
> 0xc0133770 is in ad_reinit (../../../dev/ata/ata-disk.c:874).
> 869
> 870         /* reinit disk parameters */
> 871         ad_invalidatequeue(atadev->driver, NULL);
> 872         ata_command(atadev, ATA_C_SET_MULTI, 0,
> 873                     adp->transfersize / DEV_BSIZE, 0, ATA_WAIT_READY);
> 874         atadev->setmode(atadev, adp->device->mode);
> 875     }
> 876
> 877     void
> 878     ad_print(struct ad_softc *adp)
> (kgdb) l *(ata_command+0x2c5)
> 0xc01287a5 is in ata_command (../../../dev/ata/ata-all.c:1126).
> 1121            break;
> 1122
> 1123        case ATA_WAIT_READY:
> 1124            atadev->channel->active |= ATA_WAIT_READY;
> 1125            ATA_OUTB(atadev->channel->r_io, ATA_CMD, command);
> 1126            if (ata_wait(atadev, ATA_S_READY) < 0) {
> 1127                ata_prtdev(atadev, "timeout waiting for cmd=%02x s=%02x e=%02x\n",
> 1128                           command, atadev->channel->status,atadev->channel->error);
> 1129                error = -1;
> 1130            }

This seems to be caused by a known bug in ddb itself.  Try the following fix.

%%%
Index: db_interface.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/i386/db_interface.c,v
retrieving revision 1.70
diff -u -2 -r1.70 db_interface.c
--- db_interface.c	22 Feb 2003 23:41:27 -0000	1.70
+++ db_interface.c	23 Feb 2003 09:51:52 -0000
@@ -78,4 +78,5 @@
 kdb_trap(int type, int code, struct i386_saved_state *regs)
 {
+	u_int ef;
 	volatile int ddb_mode = !(boothowto & RB_GDB);

@@ -97,4 +98,8 @@
 	}

+	/* XXX is this correctly placed?  SMP stop/start doesn't seem to be. */
+	ef = read_eflags();
+	disable_intr();
+
 	switch (type) {
 	    case T_BPTFLT:	/* breakpoint */
@@ -217,4 +222,7 @@
 	regs->tf_cs     = ddb_regs.tf_cs & 0xffff;
 	regs->tf_ds     = ddb_regs.tf_ds & 0xffff;
+
+	write_eflags(ef);
+
 	return (1);
 }
%%%

The ata driver apparently wants to wait (without sleeping), but an
interrupt occurs and the scheduler wants to switch.  The patch fixes
letting interrupts occur withing ddb when ddb is entered for most fatal
traps (entrering ddb via a ddb trap doesn't have this bug).  The only
obvious bug in the driver is the syntax error in the resetting message.

I don't understand why the scheduler wants to switch.  kern_bind()
holds Giant and the interrupt is for ata and the ata interrupt handler
is not INTR_MPSAFE so it shouldn't be switched to.  Maybe the interrupt
is shared and is attached to an INTR_MPSAFE handler.  Any active
interrupt attached to an INTR_MPSAFE handler would cause this problem,
but the trace doesn't show any others.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030227031649.T15538-100000>