Date: Mon, 24 Jun 2002 15:11:30 -0700 (PDT) From: Don Lewis <dl-freebsd@catspoiler.org> To: scsi@FreeBSD.ORG Subject: SCSI timeouts under extreme load Message-ID: <200206242211.g5OMBMM1084295@gw.catspoiler.org>
next in thread | raw e-mail | index | archive | help
I'm seeing SCSI timeouts when my disks are under extreme load. When this happens, iostat reports the activity in the range of 100 to 200 transfers per second or even more, which I suspect is mostly hitting the disk cache. I suspect that the problem is that the disk gets so busy that it defers some commands indefinitely, and it looks like CAM attempts to recover by resetting the device. Typically something gets wedged waiting for a page to get swapped in and the system becomes totally unresponsive until CAM breaks out the big hammer. I'm currently running 4.6-STABLE, but I've been seeing this problem for the last several months. I think things might work more smoothly if the I/O service times were tracked and I/O requests were throttled back if the service times started getting unreasonably long or if any I/O requests were outstanding an excessively long time. From dmesg.boot: ahc0: <Adaptec aic7890/91 Ultra2 SCSI adapter> port 0xd000-0xd0ff mem 0xe0000000 -0xe0000fff irq 5 at device 6.0 on pci0 aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs sa0 at ahc0 bus 0 target 4 lun 0 sa0: <TANDBERG SLR5 4/8GB =09:> Removable Sequential Access SCSI-2 device sa0: 3.300MB/s transfers da1 at ahc0 bus 0 target 1 lun 0 da1: <SEAGATE ST336737LW 0105> Fixed Direct Access SCSI-3 device da1: 80.000MB/s transfers (40.000MHz, offset 31, 16bit), Tagged Queueing Enabled da1: 35242MB (72176566 512 byte sectors: 255H 63S/T 4492C) da0 at ahc0 bus 0 target 0 lun 0 da0: <SEAGATE ST39173LW 4292> Fixed Direct Access SCSI-2 device da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled da0: 8683MB (17783240 512 byte sectors: 255H 63S/T 1106C) Jun 24 02:15:20 mousie /kernel: swap_pager: indefinite wait buffer: device: #da/ 0x20001, blkno: 67392, size: 4096 Jun 24 02:15:52 mousie /kernel: swap_pager: indefinite wait buffer: device: #da/ 0x20001, blkno: 67392, size: 4096 Jun 24 02:15:52 mousie /kernel: (da0:ahc0:0:0:0): SCB 0x4 - timed out Jun 24 02:15:52 mousie /kernel: ahc0: Dumping Card State while idle, at SEQADDR 0x9 Jun 24 02:15:52 mousie /kernel: ACCUM = 0x4, SINDEX = 0x64, DINDEX = 0x65, ARG_2 = 0x3 Jun 24 02:15:53 mousie /kernel: HCNT = 0x0 SCBPTR = 0x10 Jun 24 02:15:53 mousie /kernel: SCSISEQ = 0x12, SBLKCTL = 0xa Jun 24 02:15:53 mousie /kernel: DFCNTRL = 0x0, DFSTATUS = 0x89 Jun 24 02:15:53 mousie /kernel: LASTPHASE = 0x1, SCSISIGI = 0x0, SXFRCTL0 = 0x80 Jun 24 02:15:53 mousie /kernel: SSTAT0 = 0x0, SSTAT1 = 0xa Jun 24 02:15:53 mousie /kernel: STACK == 0x3, 0x17b, 0x165, 0x0 Jun 24 02:15:53 mousie /kernel: SCB count = 130 Jun 24 02:15:53 mousie /kernel: Kernel NEXTQSCB = 84 Jun 24 02:15:53 mousie /kernel: Card NEXTQSCB = 84 Jun 24 02:15:53 mousie /kernel: QINFIFO entries: Jun 24 02:15:53 mousie /kernel: Waiting Queue entries: Jun 24 02:15:53 mousie /kernel: Disconnected Queue entries: 16:14 23:2 2:3 15:9 24:47 20:59 0:109 31:49 25:16 22:4 Jun 24 02:15:53 mousie /kernel: QOUTFIFO entries: Jun 24 02:15:53 mousie /kernel: Sequencer Free SCB List: 17 21 18 27 8 11 30 28 1 14 3 7 9 5 6 13 10 29 12 19 4 26 Jun 24 02:15:53 mousie /kernel: Sequencer SCB Info: 0(c 0x64, s 0x7, l 0, t 0x6d ) 1(c 0x60, s 0x7, l 0, t 0xff) 2(c 0x66, s 0x7, l 0, t 0x3) 3(c 0x60, s 0x7, l 0, t 0xff) 4(c 0x60, s 0x7, l 0, t 0xff) 5(c 0x60, s 0x7, l 0, t 0xff) 6(c 0x60, s 0x7, l 0, t 0xff) 7(c 0x60, s 0x7, l 0, t 0xff) 8(c 0x60, s 0x7, l 0, t 0xff) 9(c 0x60, s 0x7, l 0, t 0xff) 10(c 0x60, s 0x7, l 0, t 0xff) 11(c 0x60, s 0x7, l 0, t 0xff) 12(c 0x60, s 0x7, l 0, t 0xff) 13(c 0x60, s 0x7, l 0, t 0xff) 14(c 0x60, s 0x7, l 0, t 0xff) 15(c 0x64, s 0x7, l 0, t 0x9) 16(c 0x64, s 0x7, l 0, t 0xe) 17(c 0x60, s 0x7, l 0, t 0xff) 18(c 0x60, s 0x7, l 0, t 0xff) 19(c 0x60, s 0x7, l 0, t 0xff) 20(c 0x64, s 0x7, l 0, t 0x3b) 21(c 0x60, s 0x7, l 0, t 0xff) 22(c 0x64, s 0x7, l 0, t 0x4) 23(c 0x64, s 0x7, l 0, t 0x2) 24(c 0x64, s 0x7, l 0, t 0x2f) 25(c 0x66, s 0x7, l 0, t 0x10) 26(c 0x60, s 0x7, l 0, t 0xff) 27(c 0 x60, s 0x7, l 0, t 0xff) 28(c 0x60, s 0x7, l 0, t 0xff) 29(c 0x60, s 0x7, l 0, t 0xff) 30(c 0x60, s 0x7, l 0, t 0xff) 31(c 0x64, s 0x7, l 0, t 0x31) Jun 24 02:15:54 mousie /kernel: Pending list: 14(c 0x60, s 0x7, l 0), 2(c 0x60, s 0x7, l 0), 3(c 0x62, s 0x7, l 0), 9(c 0x60, s 0x7, l 0), 47(c 0x60, s 0x7, l 0 ), 59(c 0x60, s 0x7, l 0), 109(c 0x60, s 0x7, l 0), 49(c 0x60, s 0x7, l 0), 16(c 0x62, s 0x7, l 0), 4(c 0x60, s 0x7, l 0) Jun 24 02:15:54 mousie /kernel: Kernel Free SCB list: 66 93 0 57 43 83 99 17 96 29 10 56 35 42 65 15 117 19 31 67 115 33 51 101 21 37 39 103 40 13 50 52 94 7 18 91 11 102 58 81 119 24 20 98 116 44 6 85 110 114 26 118 22 113 30 80 129 36 104 97 92 5 38 105 12 54 107 28 46 112 108 1 100 82 48 95 27 90 32 45 55 8 25 68 41 111 34 69 106 53 23 86 87 88 89 70 71 72 73 74 75 76 77 78 79 60 61 62 63 64 12 8 127 126 125 124 123 122 121 120 Jun 24 02:15:54 mousie /kernel: sg[0] - Addr 0x70ec000 : Length 4096 Jun 24 02:15:54 mousie /kernel: sg[1] - Addr 0x56ed000 : Length 4096 Jun 24 02:15:54 mousie /kernel: (da0:ahc0:0:0:0): Queuing a BDR SCB Jun 24 02:15:54 mousie /kernel: (da0:ahc0:0:0:0): Bus Device Reset Message Sent Jun 24 02:15:54 mousie /kernel: (da0:ahc0:0:0:0): no longer in timeout, status = 34b Jun 24 02:15:54 mousie /kernel: ahc0: Bus Device Reset on A:0. 10 SCBs aborted To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200206242211.g5OMBMM1084295>
