Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jun 2002 15:11:30 -0700 (PDT)
From:      Don Lewis <dl-freebsd@catspoiler.org>
To:        scsi@FreeBSD.ORG
Subject:   SCSI timeouts under extreme load
Message-ID:  <200206242211.g5OMBMM1084295@gw.catspoiler.org>

next in thread | raw e-mail | index | archive | help

I'm seeing SCSI timeouts when my disks are under extreme load.  When
this happens, iostat reports the activity in the range of 100 to 200
transfers per second or even more, which I suspect is mostly hitting the
disk cache.  I suspect that the problem is that the disk gets so busy
that it defers some commands indefinitely, and it looks like CAM
attempts to recover by resetting the device.  Typically something gets
wedged waiting for a page to get swapped in and the system becomes
totally unresponsive until CAM breaks out the big hammer.  I'm currently
running 4.6-STABLE, but I've been seeing this problem for the last
several months.

I think things might work more smoothly if the I/O service times were
tracked and I/O requests were throttled back if the service times
started getting unreasonably long or if any I/O requests were
outstanding an excessively long time.

From dmesg.boot:

ahc0: <Adaptec aic7890/91 Ultra2 SCSI adapter> port 0xd000-0xd0ff mem 0xe0000000
-0xe0000fff irq 5 at device 6.0 on pci0
aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs
sa0 at ahc0 bus 0 target 4 lun 0
sa0: <TANDBERG SLR5 4/8GB =09:> Removable Sequential Access SCSI-2 device
sa0: 3.300MB/s transfers
da1 at ahc0 bus 0 target 1 lun 0
da1: <SEAGATE ST336737LW 0105> Fixed Direct Access SCSI-3 device
da1: 80.000MB/s transfers (40.000MHz, offset 31, 16bit), Tagged Queueing Enabled
da1: 35242MB (72176566 512 byte sectors: 255H 63S/T 4492C)
da0 at ahc0 bus 0 target 0 lun 0
da0: <SEAGATE ST39173LW 4292> Fixed Direct Access SCSI-2 device
da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da0: 8683MB (17783240 512 byte sectors: 255H 63S/T 1106C)



Jun 24 02:15:20 mousie /kernel: swap_pager: indefinite wait buffer: device: #da/
0x20001, blkno: 67392, size: 4096
Jun 24 02:15:52 mousie /kernel: swap_pager: indefinite wait buffer: device: #da/
0x20001, blkno: 67392, size: 4096
Jun 24 02:15:52 mousie /kernel: (da0:ahc0:0:0:0): SCB 0x4 - timed out
Jun 24 02:15:52 mousie /kernel: ahc0: Dumping Card State while idle, at SEQADDR
0x9
Jun 24 02:15:52 mousie /kernel: ACCUM = 0x4, SINDEX = 0x64, DINDEX = 0x65, ARG_2
 = 0x3
Jun 24 02:15:53 mousie /kernel: HCNT = 0x0 SCBPTR = 0x10
Jun 24 02:15:53 mousie /kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Jun 24 02:15:53 mousie /kernel: DFCNTRL = 0x0, DFSTATUS = 0x89
Jun 24 02:15:53 mousie /kernel: LASTPHASE = 0x1, SCSISIGI = 0x0, SXFRCTL0 = 0x80
Jun 24 02:15:53 mousie /kernel: SSTAT0 = 0x0, SSTAT1 = 0xa
Jun 24 02:15:53 mousie /kernel: STACK == 0x3, 0x17b, 0x165, 0x0   
Jun 24 02:15:53 mousie /kernel: SCB count = 130
Jun 24 02:15:53 mousie /kernel: Kernel NEXTQSCB = 84
Jun 24 02:15:53 mousie /kernel: Card NEXTQSCB = 84
Jun 24 02:15:53 mousie /kernel: QINFIFO entries:
Jun 24 02:15:53 mousie /kernel: Waiting Queue entries:
Jun 24 02:15:53 mousie /kernel: Disconnected Queue entries: 16:14 23:2 2:3 15:9
24:47 20:59 0:109 31:49 25:16 22:4
Jun 24 02:15:53 mousie /kernel: QOUTFIFO entries:
Jun 24 02:15:53 mousie /kernel: Sequencer Free SCB List: 17 21 18 27 8 11 30 28
1 14 3 7 9 5 6 13 10 29 12 19 4 26
Jun 24 02:15:53 mousie /kernel: Sequencer SCB Info: 0(c 0x64, s 0x7, l 0, t 0x6d
) 1(c 0x60, s 0x7, l 0, t 0xff) 2(c 0x66, s 0x7, l 0, t 0x3) 3(c 0x60, s 0x7, l 
0, t 0xff) 4(c 0x60, s 0x7, l 0, t 0xff) 5(c 0x60, s 0x7, l 0, t 0xff) 6(c 0x60,
 s 0x7, l 0, t 0xff) 7(c 0x60, s 0x7, l 0, t 0xff) 8(c 0x60, s 0x7, l 0, t 0xff)
 9(c 0x60, s 0x7, l 0, t 0xff) 10(c 0x60, s 0x7, l 0, t 0xff) 11(c 0x60, s 0x7,
l 0, t 0xff) 12(c 0x60, s 0x7, l 0, t 0xff) 13(c 0x60, s 0x7, l 0, t 0xff) 14(c 
0x60, s 0x7, l 0, t 0xff) 15(c 0x64, s 0x7, l 0, t 0x9) 16(c 0x64, s 0x7, l 0, t
 0xe) 17(c 0x60, s 0x7, l 0, t 0xff) 18(c 0x60, s 0x7, l 0, t 0xff) 19(c 0x60, s
 0x7, l 0, t 0xff) 20(c 0x64, s 0x7, l 0, t 0x3b) 21(c 0x60, s 0x7, l 0, t 0xff)
 22(c 0x64, s 0x7, l 0, t 0x4) 23(c 0x64, s 0x7, l 0, t 0x2) 24(c 0x64, s 0x7, l
 0, t 0x2f) 25(c 0x66, s 0x7, l 0, t 0x10) 26(c 0x60, s 0x7, l 0, t 0xff) 27(c 0
x60, s 0x7, l 0, t 0xff) 28(c 0x60, s 0x7, l 0, t 0xff) 29(c 0x60, s 0x7, l 0, t
 0xff) 30(c 0x60, s 0x7, l 0, t 0xff) 31(c 0x64, s 0x7, l 0, t 0x31)
Jun 24 02:15:54 mousie /kernel: Pending list: 14(c 0x60, s 0x7, l 0), 2(c 0x60, 
s 0x7, l 0), 3(c 0x62, s 0x7, l 0), 9(c 0x60, s 0x7, l 0), 47(c 0x60, s 0x7, l 0
), 59(c 0x60, s 0x7, l 0), 109(c 0x60, s 0x7, l 0), 49(c 0x60, s 0x7, l 0), 16(c
 0x62, s 0x7, l 0), 4(c 0x60, s 0x7, l 0)
Jun 24 02:15:54 mousie /kernel: Kernel Free SCB list: 66 93 0 57 43 83 99 17 96 
29 10 56 35 42 65 15 117 19 31 67 115 33 51 101 21 37 39 103 40 13 50 52 94 7 18
 91 11 102 58 81 119 24 20 98 116 44 6 85 110 114 26 118 22 113 30 80 129 36 104
 97 92 5 38 105 12 54 107 28 46 112 108 1 100 82 48 95 27 90 32 45 55 8 25 68 41
 111 34 69 106 53 23 86 87 88 89 70 71 72 73 74 75 76 77 78 79 60 61 62 63 64 12
8 127 126 125 124 123 122 121 120
Jun 24 02:15:54 mousie /kernel: sg[0] - Addr 0x70ec000 : Length 4096
Jun 24 02:15:54 mousie /kernel: sg[1] - Addr 0x56ed000 : Length 4096
Jun 24 02:15:54 mousie /kernel: (da0:ahc0:0:0:0): Queuing a BDR SCB
Jun 24 02:15:54 mousie /kernel: (da0:ahc0:0:0:0): Bus Device Reset Message Sent
Jun 24 02:15:54 mousie /kernel: (da0:ahc0:0:0:0): no longer in timeout, status =
 34b
Jun 24 02:15:54 mousie /kernel: ahc0: Bus Device Reset on A:0. 10 SCBs aborted 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200206242211.g5OMBMM1084295>