Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Dec 2009 16:42:29 -0500
From:      Alexander Sack <pisymbol@gmail.com>
To:        freebsd-current@freebsd.org
Subject:   aac(4) resource FIB starvation on BUS scan revisited
Message-ID:  <3c0b01820912071342u1c722b2clf9c8413e40097279@mail.gmail.com>

next in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Folks:

I posted a similar thread on freebsd-scsi only to realize that scottl had
fixed my first issue during some MP CAM cleanup with respect to a race
during resource allocation issues on a later version of the driver we are
using (I believe we did the same thing to resolve a lock issue on bootup).

However on my RELENG_8 box with (2) Adaptec 5085s connected to some JBODs
(9TB each) I still have a FIB starvation issue during the LUN scan:

The number of FIBs allocated to this card is 512 (older cards are 256).  The
max_target per bus is 287.  On a six channel controller with a BUS scan done
in parallel I see a lot of this:

...
(probe501:aacp1:0:214:0): Request Requeued
(probe501:aacp1:0:214:0): Retrying Command
(probe520:aacp1:0:233:0): Request Requeued
(probe520:aacp1:0:233:0): Retrying Command
(probe528:aacp1:0:241:0): Request Requeued
(probe528:aacp1:0:241:0): Retrying Command
(probe540:aacp1:0:253:0): Request Requeued
(probe540:aacp1:0:253:0): Retrying Command
(probe541:aacp1:0:254:0): Request Requeued
(probe541:aacp1:0:254:0): Retrying Command
....

I think the driver is much happier with the following attached patch (with
dmesg).  The CAM probeXXX process is now much much faster with ZERO
retries.  Is there anything bad about adding PIM_SYNCSCAN to hba_misc?
What's the down side?  It ensures minimally you don't run out of FIBs during
a scan.

The patch also bumps the number of FIBs to the maximum since I think its
good to have that pool preallocated and its not that much memory on modern
systems (this also helps if you have a controller that supports 512).  Its 2
per page (FIBs are 2k) so its either 256 or 512, i.e. maximum of 1MB pool of
FIBs.  Perhaps that is not really necessary but again, why not?  (if I get
shot down so be it!)

Anybody?  Is this PR worthy?

-aps

[-- Attachment #2 --]
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-STABLE #2: Sun Dec  6 21:19:10 EST 2009
    root@watchmen.localdomain:/usr/home/asack/Development/freebsd/RELENG_8/src/sys/amd64/compile/GENERIC-DDB amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz (2327.52-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x1067a  Stepping = 10
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x40ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
real memory  = 17179869184 (16384 MB)
avail memory = 16526032896 (15760 MB)
ACPI APIC Table: <INTEL  S5000PAL>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 8 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
lapic0: Forcing LINT1 to edge trigger
kbd1 at kbdmux0
acpi0: <INTEL S5000PAL> on motherboard
acpi0: [ITHREAD]
ACPI Error: Package List length (6) larger than NumElements count (2), truncated
 20090521 dsobject-590
ACPI Error: Package List length (6) larger than NumElements count (2), truncated
 20090521 dsobject-590
ACPI Error: Package List length (6) larger than NumElements count (2), truncated
 20090521 dsobject-590
ACPI Error: Package List length (6) larger than NumElements count (2), truncated
 20090521 dsobject-590
ACPI Error: Package List length (6) larger than NumElements count (2), truncated
 20090521 dsobject-590
ACPI Error: Package List length (6) larger than NumElements count (2), truncated
 20090521 dsobject-590
ACPI Error: Package List length (6) larger than NumElements count (2), truncated
 20090521 dsobject-590
ACPI Error: Package List length (6) larger than NumElements count (2), truncated
 20090521 dsobject-590
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xca2,0xca3,0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci1
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 0.0 on pci3
pci4: <ACPI PCI bus> on pcib4
mfi0: <LSI MegaSAS 1064R> mem 0xb9000000-0xb900ffff,0xb8900000-0xb891ffff irq 18 at device 14.0 on pci4
mfi0: Megaraid SAS driver Ver 3.00 
mfi0: 1804 (313511129s/0x0020/info) - Shutdown command received from host
mfi0: 1805 (boot + 0s/0x0020/info) - Firmware initialization started (PCI ID 0411/1000/3501/8086)
mfi0: 1806 (boot + 0s/0x0020/info) - Firmware version 1.12.230-0598
mfi0: 1807 (boot + 0s/0x0020/info) - Firmware initialization started (PCI ID 0411/1000/3501/8086)
mfi0: 1808 (boot + 0s/0x0020/info) - Firmware version 1.12.230-0598
mfi0: 1809 (boot + 71s/0x0008/info) - Battery temperature is normal
mfi0: 1810 (boot + 71s/0x0008/info) - Battery Present
mfi0: 1811 (boot + 71s/0x0020/info) - Board Revision 
mfi0: 1812 (boot + 100s/0x0004/info) - Enclosure (SES) discovered on PD 0c(c None/p1)
mfi0: 1813 (boot + 100s/0x0002/info) - Inserted: Encl PD 0c
mfi0: 1814 (boot + 100s/0x0002/info) - Inserted: PD 0c(c None/p1) Info: enclPd=0c, scsiType=d, portMap=09, sasAddr=500150796b8c0000,0000000000000000
mfi0: 1815 (boot + 100s/0x0002/info) - Inserted: PD 0a(e0x0c/s0)
mfi0: 1816 (boot + 100s/0x0002/info) - Inserted: PD 0a(e0x0c/s0) Info: enclPd=0c, scsiType=0, portMap=00, sasAddr=71903a26a4948e89,0000000000000000
mfi0: 1817 (boot + 100s/0x0002/info) - Inserted: PD 0b(e0x0c/s1)
mfi0: 1818 (boot + 100s/0x0002/info) - Inserted: PD 0b(e0x0c/s1) Info: enclPd=0c, scsiType=0, portMap=01, sasAddr=71903a27a68d958a,0000000000000000
mfi0: [ITHREAD]
pcib5: <PCI-PCI bridge> at device 0.2 on pci3
pci5: <PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci2
pci6: <ACPI PCI bus> on pcib6
pcib7: <ACPI PCI-PCI bridge> irq 16 at device 2.0 on pci2
pci7: <ACPI PCI bus> on pcib7
em0: <Intel(R) PRO/1000 Network Connection 6.9.14> port 0x2020-0x203f mem 0xb8820000-0xb883ffff,0xb8400000-0xb87fffff irq 18 at device 0.0 on pci7
em0: Using MSI interrupt
em0: [FILTER]
em0: Ethernet address: 00:15:17:96:b8:c0
em1: <Intel(R) PRO/1000 Network Connection 6.9.14> port 0x2000-0x201f mem 0xb8800000-0xb881ffff,0xb8000000-0xb83fffff irq 19 at device 0.1 on pci7
em1: Using MSI interrupt
em1: [FILTER]
em1: Ethernet address: 00:15:17:96:b8:c1
pcib8: <ACPI PCI-PCI bridge> at device 0.3 on pci1
pci8: <ACPI PCI bus> on pcib8
pcib9: <PCI-PCI bridge> at device 3.0 on pci0
pci9: <PCI bus> on pcib9
pcib10: <ACPI PCI-PCI bridge> at device 4.0 on pci0
pci10: <ACPI PCI bus> on pcib10
aac0: <Adaptec RAID 5085> mem 0xb8e00000-0xb8ffffff irq 16 at device 0.0 on pci10
aac0: Enabling 64-bit address support
aac0: Enable Raw I/O
aac0: Enable 64-bit array
aac0: New comm. interface enabled
aac0: [ITHREAD]
aac0: Adaptec 5085, aac driver 2.0.0-1
aacp0: <SCSI Passthrough Bus> on aac0
aacp1: <SCSI Passthrough Bus> on aac0
aacp2: <SCSI Passthrough Bus> on aac0
pcib11: <ACPI PCI-PCI bridge> at device 5.0 on pci0
pci11: <ACPI PCI bus> on pcib11
aac1: <Adaptec RAID 5085> mem 0xb8c00000-0xb8dfffff irq 18 at device 0.0 on pci11
aac1: Enabling 64-bit address support
aac1: Enable Raw I/O
aac1: Enable 64-bit array
aac1: New comm. interface enabled
aac1: [ITHREAD]
aac1: Adaptec 5085, aac driver 2.0.0-1
aacp3: <SCSI Passthrough Bus> on aac1
aacp4: <SCSI Passthrough Bus> on aac1
aacp5: <SCSI Passthrough Bus> on aac1
pcib12: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci12: <ACPI PCI bus> on pcib12
pci12: <network> at device 0.0 (no driver attached)
pcib13: <ACPI PCI-PCI bridge> at device 7.0 on pci0
pci13: <ACPI PCI bus> on pcib13
pci13: <network> at device 0.0 (no driver attached)
pci0: <base peripheral> at device 8.0 (no driver attached)
pcib14: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci14: <ACPI PCI bus> on pcib14
vgapci0: <VGA-compatible display> port 0x1000-0x10ff mem 0xb0000000-0xb7ffffff,0xb9100000-0xb910ffff irq 17 at device 12.0 on pci14
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel 63XXESB2 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x3040-0x304f irq 20 at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
atapci1: <Intel 63XXESB2 SATA300 controller> port 0x3058-0x305f,0x3074-0x3077,0x3050-0x3057,0x3070-0x3073,0x3020-0x303f mem 0xb9400000-0xb94003ff irq 20 at device 31.2 on pci0
atapci1: [ITHREAD]
atapci1: AHCI called from vendor specific driver
atapci1: AHCI v1.10 controller with 6 3Gbps ports, PM supported
ata2: <ATA channel 0> on atapci1
ata2: [ITHREAD]
ata3: <ATA channel 1> on atapci1
ata3: [ITHREAD]
ata4: <ATA channel 2> on atapci1
ata4: [ITHREAD]
ata5: <ATA channel 3> on atapci1
ata5: [ITHREAD]
ata6: <ATA channel 4> on atapci1
ata6: [ITHREAD]
ata7: <ATA channel 5> on atapci1
ata7: [ITHREAD]
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
atrtc0: <AT realtime clock> port 0x70-0x71,0x74-0x77 irq 8 on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: [FILTER]
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
uart1: [FILTER]
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model IntelliMouse, device ID 3
cpu0: <ACPI CPU> on acpi0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
est2: <Enhanced SpeedStep Frequency Control> on cpu2
p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
est3: <Enhanced SpeedStep Frequency Control> on cpu3
p4tcc3: <CPU Frequency Thermal Control> on cpu3
cpu4: <ACPI CPU> on acpi0
est4: <Enhanced SpeedStep Frequency Control> on cpu4
p4tcc4: <CPU Frequency Thermal Control> on cpu4
cpu5: <ACPI CPU> on acpi0
est5: <Enhanced SpeedStep Frequency Control> on cpu5
p4tcc5: <CPU Frequency Thermal Control> on cpu5
cpu6: <ACPI CPU> on acpi0
est6: <Enhanced SpeedStep Frequency Control> on cpu6
p4tcc6: <CPU Frequency Thermal Control> on cpu6
cpu7: <ACPI CPU> on acpi0
est7: <Enhanced SpeedStep Frequency Control> on cpu7
p4tcc7: <CPU Frequency Thermal Control> on cpu7
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc8fff,0xc9000-0xcf7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
Timecounters tick every 1.000 msec
acd0: CDROM <CD-224E-R/1.CA> at ata0-slave UDMA33
mfi0: 1819 (313511305s/0x0020/info) - Time established as 12/07/09 14:28:25; (102 seconds since power on)
mfid0: <MFI Logical Disk> on mfi0
mfid0: 238418MB (488280064 sectors) RAID volume '' is optimal
aacd0: <RAID 5> on aac1
aacd0: 9533430MB (19524464640 sectors)
aacd1: <RAID 5> on aac1
aacd1: 9533430MB (19524464640 sectors)
ses0 at aacp5 bus 0 scbus5 target 0 lun 0
ses0: <Newisys SA2120 T033> Fixed Enclosure Services SCSI-5 device 
ses0: 3.300MB/s transfers
ses0: SCSI-3 SES Device
ses1 at aacp5 bus 0 scbus5 target 1 lun 0
ses1: <Newisys SA2120 T033> Fixed Enclosure Services SCSI-5 device 
ses1: 3.300MB/s transfers
ses1: SCSI-3 SES Device
lapic3: Forcing LINT1 to edge trigger
SMP: AP CPU #3 Launched!
lapic1: Forcing LINT1 to edge trigger
SMP: AP CPU #1 Launched!
lapic2: Forcing LINT1 to edge trigger
SMP: AP CPU #2 Launched!
lapic4: Forcing LINT1 to edge trigger
SMP: AP CPU #4 Launched!
lapic7: Forcing LINT1 to edge trigger
SMP: AP CPU #7 Launched!
lapic5: Forcing LINT1 to edge trigger
SMP: AP CPU #5 Launched!
lapic6: Forcing LINT1 to edge trigger
SMP: AP CPU #6 Launched!
Trying to mount root from ufs:/dev/mfid0s1a
em0: link state changed to UP

[-- Attachment #3 --]
Index: aac.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/aac/aac.c,v
retrieving revision 1.143.2.4
diff -u -r1.143.2.4 aac.c
--- aac.c	5 Nov 2009 18:34:01 -0000	1.143.2.4
+++ aac.c	7 Dec 2009 21:23:43 -0000
@@ -604,7 +604,7 @@
 	TAILQ_INIT(&sc->aac_fibmap_tqh);
 	sc->aac_commands = malloc(sc->aac_max_fibs * sizeof(struct aac_command),
 				  M_AACBUF, M_WAITOK|M_ZERO);
-	while (sc->total_fibs < AAC_PREALLOCATE_FIBS) {
+	while (sc->total_fibs < sc->aac_max_fibs) {
 		if (aac_alloc_commands(sc) != 0)
 			break;
 	}
Index: aac_cam.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/aac/aac_cam.c,v
retrieving revision 1.31.2.2
diff -u -r1.31.2.2 aac_cam.c
--- aac_cam.c	5 Nov 2009 18:34:01 -0000	1.31.2.2
+++ aac_cam.c	7 Dec 2009 21:23:43 -0000
@@ -261,7 +261,7 @@
 		cpi->target_sprt = 0;
 
 		/* Resetting via the passthrough causes problems. */
-		cpi->hba_misc = PIM_NOBUSRESET;
+		cpi->hba_misc = PIM_NOBUSRESET | PIM_SEQSCAN;
 		cpi->hba_eng_cnt = 0;
 		cpi->max_target = camsc->inf->TargetsPerBus;
 		cpi->max_lun = 8;	/* Per the controller spec */
Index: aacvar.h
===================================================================
RCS file: /home/ncvs/src/sys/dev/aac/aacvar.h,v
retrieving revision 1.52.2.2
diff -u -r1.52.2.2 aacvar.h
--- aacvar.h	2 Nov 2009 16:54:23 -0000	1.52.2.2
+++ aacvar.h	7 Dec 2009 21:23:44 -0000
@@ -57,13 +57,6 @@
 #define AAC_ADAPTER_FIBS	8
 
 /*
- * FIBs are allocated in page-size chunks and can grow up to the 512
- * limit imposed by the hardware.
- */
-#define AAC_PREALLOCATE_FIBS	128
-#define AAC_NUM_MGT_FIB		8
-
-/*
  * The controller reports status events in AIFs.  We hang on to a number of
  * these in order to pass them out to user-space management tools.
  */

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c0b01820912071342u1c722b2clf9c8413e40097279>