Date: Thu, 14 Dec 2000 03:27:19 GMT From: Tor Egge <tegge@cvsup.no.freebsd.org> To: FreeBSD-gnats-submit@freebsd.org Subject: kern/23538: ata device driver fails to abort queued commands when device disappears Message-ID: <200012140327.eBE3RJ501463@c2h5oh.idi.ntnu.no> Resent-Message-ID: <200012140330.eBE3U1R02094@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 23538 >Category: kern >Synopsis: ata device driver fails to abort queued commands when device disappears >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Dec 13 19:30:01 PST 2000 >Closed-Date: >Last-Modified: >Originator: Tor Egge >Release: FreeBSD 4.2-RELEASE i386 >Organization: Fast Search & Transfer ASA >Environment: FreeBSD c2h5oh.idi.ntnu.no 4.2-RELEASE FreeBSD 4.2-RELEASE #0: Fri Nov 24 15:04:56 GMT 2000 root@c2h5oh.idi.ntnu.no:/usr/src/sys/compile/VINUM i386 atapci0: <Intel PIIX4 ATA33 controller> port 0xffa0-0xffaf at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 ad0: 29314MB <IBM-DTLA-307030> [59560/16/63] at ata1-master UDMA33 ad2: 29314MB <IBM-DTLA-307030> [59560/16/63] at ata1-master UDMA33 >Description: When a drive completely hangs, it apparently partially disappears from the ata configuration while resetting the devices. ad0: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad0: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad0: READ command timeout tag=0 serv=0 - resetting ad0: READ command timeout tag=0 serv=0 - resetting ad0: READ command timeout tag=0 serv=0 - resetting ata0-master: timeout waiting for command=ef s=00 e=00 ad0: trying fallback to PIO mode ad0: READ command timeout tag=0 serv=0 - resetting ad0: READ command timeout tag=0 serv=0 - resetting ad0: READ command timeout tag=0 serv=0 - resetting vinum2.p0.s0: fatal read I/O error vinum: vinum2.p0.s0 is crashed by force vinum: vinum2.p0 is faulty (kgdb) print *(struct ata_softc *) (((device_t) ata_devclass->devices[0])->softc) $47 = {dev = 0xc3286780, channel = 0, r_io = 0xc3282700, r_altio = 0xc3282680, r_bmio = 0xc3286804, r_irq = 0xc3282600, ih = 0xc0e6e920, ioaddr = 496, altioaddr = 1014, bmaddr = 65440, chiptype = 1896972422, alignment = 1, dev_param = {0xc3284200, 0x0}, dev_softc = {0xc331b400, 0x0}, mode = {0, 0}, flags = 16, devices = 0, status = 0 '\000', error = 0 '\000', active = 0, ata_queue = {tqh_first = 0x0, tqh_last = 0xc32866d8}, atapi_queue = { tqh_first = 0x0, tqh_last = 0xc32866e0}, running = 0x0} (kgdb) print *((struct ad_softc *) ((struct ata_softc *) (((device_t) ata_devclass->devices[0])->softc))->dev_softc[0]) $48 = {controller = 0xc3286680, unit = 0, lun = 0, total_secs = 60036480, heads = 16 '\020', sectors = 63 '?', transfersize = 8192, num_tags = 0, flags = 2, tags = {0x0 <repeats 32 times>}, outstanding = -41300580, queue = {queue = {tqh_first = 0xc4d5ae20, tqh_last = 0xc4d59638}, last_pblkno = 21131160, insert_point = 0x0, switch_point = 0xcb30a2d8}, stats = {dev_links = {stqe_next = 0xc331b0b8}, device_number = 1, device_name = "ad", '\000' <repeats 13 times>, unit_number = 0, bytes_read = 176262452224, bytes_written = 14137288704, bytes_freed = 0, num_reads = 40563316, num_writes = 737265, num_frees = 0, num_other = 0, busy_count = 7, block_size = 512, tag_types = {0, 0, 0}, dev_creation_time = {tv_sec = 0, tv_usec = 30618}, busy_time = { tv_sec = 204575, tv_usec = 389409}, start_time = {tv_sec = 1625444, tv_usec = 220943}, last_comp_time = {tv_sec = 1626835, tv_usec = 368335}, flags = DEVSTAT_NO_ORDERED_TAGS, device_type = DEVSTAT_TYPE_IF_IDE, priority = DEVSTAT_PRIORITY_DISK}, disk = {d_flags = 0, d_dsflags = 0, d_devsw = 0xc02d8720, d_dev = 0xc3318380, d_slice = 0xc3349800, d_label = {d_magic = 0, d_type = 0, d_subtype = 0, d_typename = '\000' <repeats 15 times>, d_un = {un_d_packname = '\000' <repeats 15 times>, un_b = { un_d_boot0 = 0x0, un_d_boot1 = 0x0}}, d_secsize = 512, d_nsectors = 63, d_ntracks = 16, d_ncylinders = 59560, d_secpercyl = 1008, d_secperunit = 60036480, d_sparespertrack = 0, d_sparespercyl = 0, d_acylinders = 0, d_rpm = 0, d_interleave = 0, d_trackskew = 0, d_cylskew = 0, d_headswitch = 0, d_trkseek = 0, d_flags = 0, d_drivedata = {0, 0, 0, 0, 0}, d_spare = {0, 0, 0, 0, 0}, d_magic2 = 0, d_checksum = 0, d_npartitions = 0, d_bbsize = 0, d_sbsize = 0, d_partitions = {{p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, p_fsize = 0, p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}}}}, dev1 = 0xc3318480, dev2 = 0xc3318380} Note that devices is now 0 on ata0, thus queued commands for ad0 are never removed from the queue. This is bad, since access to other vinum drives on the same physical disk will now never fail, just block infinitely. bash-2.04$ ps axl -N kernel.4 -M vmcore.4 UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 0 238 1 1 3 0 928 0 ttyin Is+ #C5 0:00.01 (getty) 0 245 1 1 3 0 928 0 ttyin Is+ #C2 0:00.01 (getty) 0 244 1 0 3 0 928 0 ttyin Is+ #C2 0:00.01 (getty) 0 243 1 0 3 0 928 0 ttyin Is+ #C2 0:00.01 (getty) 0 242 1 1 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty) 0 241 1 1 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty) 0 240 1 0 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty) 0 239 1 0 3 0 928 0 ttyin Is+ #C2 0:00.02 (getty) 0 0 0 0 -18 0 0 0 sched DLs ?? 0:02.45 (swapper) 0 1 0 0 10 0 528 0 wait ILs ?? 0:00.43 (init) 0 2 0 0 -18 0 0 0 psleep DL ?? 13:22.04 (pagedaemon) 0 3 0 0 18 0 0 0 psleep DL ?? 0:00.00 (vmdaemon) 0 4 0 0 -18 0 0 0 psleep DL ?? 0:04.34 (bufdaemon) 0 5 0 0 -2 0 0 0 getblk DL ?? 105:11.54 (syncer) 0 63 1 0 -6 0 596 0 biowr DLs ?? 0:00.01 (vinum) 0 166 1 0 -2 0 924 0 ffsfsn Ds ?? 0:26.47 (syslogd) 0 173 1 0 -6 -12 1260 0 biord D<s ?? 3:38.62 (ntpd) 0 194 1 0 -6 0 972 0 biord Ds ?? 0:19.50 (cron) 0 197 1 0 -18 0 2096 0 spread DLs ?? 0:19.77 (sshd) 1001 232 1 0 -6 0 1636 0 biord Ds ?? 0:35.23 (cvsupd) 1001 53151 232 28 -6 0 2740 0 biord D ?? 2:12.81 (cvsupd) 1001 53155 232 1 -6 0 3352 0 biord D ?? 0:34.44 (cvsupd) 1001 53159 232 2 -14 0 3084 0 inode D ?? 0:05.40 (cvsupd) >How-To-Repeat: Use 2 IDE disks (one slightly bad), use disk partitioning to get more than one vinum logical drive on each physical disk. >Fix: Change ata_reinit to check for scp->devices being changed during the ata_reset call and flush the request queues for the 'gone' devices by setting b_error to ENXIO, setting the B_ERROR bit in b_flags and calling biodone. Change adstrategy to check for the device being 'gone' and return ENXIO at once if so. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012140327.eBE3RJ501463>