Date: Thu, 14 Dec 2000 03:27:19 GMT From: Tor Egge <tegge@cvsup.no.freebsd.org> To: FreeBSD-gnats-submit@freebsd.org Subject: kern/23538: ata device driver fails to abort queued commands when device disappears Message-ID: <200012140327.eBE3RJ501463@c2h5oh.idi.ntnu.no> Resent-Message-ID: <200012140330.eBE3U1R02094@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 23538
>Category: kern
>Synopsis: ata device driver fails to abort queued commands when device disappears
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Wed Dec 13 19:30:01 PST 2000
>Closed-Date:
>Last-Modified:
>Originator: Tor Egge
>Release: FreeBSD 4.2-RELEASE i386
>Organization:
Fast Search & Transfer ASA
>Environment:
FreeBSD c2h5oh.idi.ntnu.no 4.2-RELEASE FreeBSD 4.2-RELEASE #0: Fri Nov 24 15:04:56 GMT 2000 root@c2h5oh.idi.ntnu.no:/usr/src/sys/compile/VINUM i386
atapci0: <Intel PIIX4 ATA33 controller> port 0xffa0-0xffaf at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
ad0: 29314MB <IBM-DTLA-307030> [59560/16/63] at ata1-master UDMA33
ad2: 29314MB <IBM-DTLA-307030> [59560/16/63] at ata1-master UDMA33
>Description:
When a drive completely hangs, it apparently partially disappears from the
ata configuration while resetting the devices.
ad0: READ command timeout tag=0 serv=0 - resetting
ata0: resetting devices .. done
ad0: READ command timeout tag=0 serv=0 - resetting
ata0: resetting devices .. done
ad0: READ command timeout tag=0 serv=0 - resetting
ad0: READ command timeout tag=0 serv=0 - resetting
ad0: READ command timeout tag=0 serv=0 - resetting
ata0-master: timeout waiting for command=ef s=00 e=00
ad0: trying fallback to PIO mode
ad0: READ command timeout tag=0 serv=0 - resetting
ad0: READ command timeout tag=0 serv=0 - resetting
ad0: READ command timeout tag=0 serv=0 - resetting
vinum2.p0.s0: fatal read I/O error
vinum: vinum2.p0.s0 is crashed by force
vinum: vinum2.p0 is faulty
(kgdb) print *(struct ata_softc *) (((device_t) ata_devclass->devices[0])->softc)
$47 = {dev = 0xc3286780, channel = 0, r_io = 0xc3282700, r_altio = 0xc3282680,
r_bmio = 0xc3286804, r_irq = 0xc3282600, ih = 0xc0e6e920, ioaddr = 496,
altioaddr = 1014, bmaddr = 65440, chiptype = 1896972422, alignment = 1,
dev_param = {0xc3284200, 0x0}, dev_softc = {0xc331b400, 0x0}, mode = {0, 0},
flags = 16, devices = 0, status = 0 '\000', error = 0 '\000', active = 0,
ata_queue = {tqh_first = 0x0, tqh_last = 0xc32866d8}, atapi_queue = {
tqh_first = 0x0, tqh_last = 0xc32866e0}, running = 0x0}
(kgdb) print *((struct ad_softc *) ((struct ata_softc *) (((device_t) ata_devclass->devices[0])->softc))->dev_softc[0])
$48 = {controller = 0xc3286680, unit = 0, lun = 0, total_secs = 60036480,
heads = 16 '\020', sectors = 63 '?', transfersize = 8192, num_tags = 0,
flags = 2, tags = {0x0 <repeats 32 times>}, outstanding = -41300580,
queue = {queue = {tqh_first = 0xc4d5ae20, tqh_last = 0xc4d59638},
last_pblkno = 21131160, insert_point = 0x0, switch_point = 0xcb30a2d8},
stats = {dev_links = {stqe_next = 0xc331b0b8}, device_number = 1,
device_name = "ad", '\000' <repeats 13 times>, unit_number = 0,
bytes_read = 176262452224, bytes_written = 14137288704, bytes_freed = 0,
num_reads = 40563316, num_writes = 737265, num_frees = 0, num_other = 0,
busy_count = 7, block_size = 512, tag_types = {0, 0, 0},
dev_creation_time = {tv_sec = 0, tv_usec = 30618}, busy_time = {
tv_sec = 204575, tv_usec = 389409}, start_time = {tv_sec = 1625444,
tv_usec = 220943}, last_comp_time = {tv_sec = 1626835,
tv_usec = 368335}, flags = DEVSTAT_NO_ORDERED_TAGS,
device_type = DEVSTAT_TYPE_IF_IDE, priority = DEVSTAT_PRIORITY_DISK},
disk = {d_flags = 0, d_dsflags = 0, d_devsw = 0xc02d8720,
d_dev = 0xc3318380, d_slice = 0xc3349800, d_label = {d_magic = 0,
d_type = 0, d_subtype = 0, d_typename = '\000' <repeats 15 times>,
d_un = {un_d_packname = '\000' <repeats 15 times>, un_b = {
un_d_boot0 = 0x0, un_d_boot1 = 0x0}}, d_secsize = 512,
d_nsectors = 63, d_ntracks = 16, d_ncylinders = 59560,
d_secpercyl = 1008, d_secperunit = 60036480, d_sparespertrack = 0,
d_sparespercyl = 0, d_acylinders = 0, d_rpm = 0, d_interleave = 0,
d_trackskew = 0, d_cylskew = 0, d_headswitch = 0, d_trkseek = 0,
d_flags = 0, d_drivedata = {0, 0, 0, 0, 0}, d_spare = {0, 0, 0, 0, 0},
d_magic2 = 0, d_checksum = 0, d_npartitions = 0, d_bbsize = 0,
d_sbsize = 0, d_partitions = {{p_size = 0, p_offset = 0, p_fsize = 0,
p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0,
sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0,
p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0,
sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0,
p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0,
sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0,
p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0,
sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0,
p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0,
sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0,
p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0,
sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0,
p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0,
sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0,
p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0,
sgs = 0}}}}}, dev1 = 0xc3318480, dev2 = 0xc3318380}
Note that devices is now 0 on ata0, thus queued commands for ad0 are
never removed from the queue. This is bad, since access to other
vinum drives on the same physical disk will now never fail, just block
infinitely.
bash-2.04$ ps axl -N kernel.4 -M vmcore.4
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 238 1 1 3 0 928 0 ttyin Is+ #C5 0:00.01 (getty)
0 245 1 1 3 0 928 0 ttyin Is+ #C2 0:00.01 (getty)
0 244 1 0 3 0 928 0 ttyin Is+ #C2 0:00.01 (getty)
0 243 1 0 3 0 928 0 ttyin Is+ #C2 0:00.01 (getty)
0 242 1 1 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty)
0 241 1 1 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty)
0 240 1 0 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty)
0 239 1 0 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty)
0 0 0 0 -18 0 0 0 sched DLs ?? 0:02.45 (swapper)
0 1 0 0 10 0 528 0 wait ILs ?? 0:00.43 (init)
0 2 0 0 -18 0 0 0 psleep DL ?? 13:22.04 (pagedaemon)
0 3 0 0 18 0 0 0 psleep DL ?? 0:00.00 (vmdaemon)
0 4 0 0 -18 0 0 0 psleep DL ?? 0:04.34 (bufdaemon)
0 5 0 0 -2 0 0 0 getblk DL ?? 105:11.54 (syncer)
0 63 1 0 -6 0 596 0 biowr DLs ?? 0:00.01 (vinum)
0 166 1 0 -2 0 924 0 ffsfsn Ds ?? 0:26.47 (syslogd)
0 173 1 0 -6 -12 1260 0 biord D<s ?? 3:38.62 (ntpd)
0 194 1 0 -6 0 972 0 biord Ds ?? 0:19.50 (cron)
0 197 1 0 -18 0 2096 0 spread DLs ?? 0:19.77 (sshd)
1001 232 1 0 -6 0 1636 0 biord Ds ?? 0:35.23 (cvsupd)
1001 53151 232 28 -6 0 2740 0 biord D ?? 2:12.81 (cvsupd)
1001 53155 232 1 -6 0 3352 0 biord D ?? 0:34.44 (cvsupd)
1001 53159 232 2 -14 0 3084 0 inode D ?? 0:05.40 (cvsupd)
>How-To-Repeat:
Use 2 IDE disks (one slightly bad), use disk partitioning to get
more than one vinum logical drive on each physical disk.
>Fix:
Change ata_reinit to check for scp->devices being changed during the
ata_reset call and flush the request queues for the 'gone' devices by
setting b_error to ENXIO, setting the B_ERROR bit in b_flags and calling
biodone.
Change adstrategy to check for the device being 'gone' and return
ENXIO at once if so.
>Release-Note:
>Audit-Trail:
>Unformatted:
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012140327.eBE3RJ501463>
