Date: Sat, 2 Sep 2006 04:23:21 -0500 From: "Alex Salazar" <umbilical.blisters@gmail.com> To: freebsd-stable@freebsd.org Cc: freebsd-current@freebsd.org Subject: Several issues on Dell 1950/2950 servers (6-STABLE and 7-CURRENT) Message-ID: <40c4bb930609020223h50c43537n1c8b32081ef5c1bf@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Apologies for the long message, and thanks in advance for any response. I've just bought one of those new generation Dell servers, specifically, the PowerEdge 1950. This is a dual Intel Dual Core Xeon 5050, 3.0 GHz, 667MHz FSB, 1GB 533MHz RAM, system. This server has a LSI Logic SAS 5/i integrated adapter and dual embedded Broadcom NetXtreme II 5708 Gigabit Ethernet NIC. When I tried to install from a FreeBSD 6.0-RELEASE i386 CD I had at hand, no hard disc was detected. After finding out that SAS controller was not supported on that release, I grabbed the most recent 6.1-STABLE i386 snapshot (200608) and tried again. This time, the hard disc was detected properly. The installation succeeded and, after the post-install configuration, the system was restarted. The OS booted up and the SAS controller was now detected and supported by the mpt(4) driver: --- mpt0: <LSILogic SAS Adapter> port 0xec00-0xecff mem 0xfc4fc000-0xfc4fffff, 0xfc4e0000-0xfc4effff irq 64 at device 8.0 on pci2 mpt0: Reserved 0x100 bytes for rid 0x10 type 4 at 0xec00 mpt0: Reserved 0x4000 bytes for rid 0x14 type 3 at 0xfc4fc000 mpt0: [GIANT-LOCKED] mpt0: MPI Version=1.5.12.0 --- And the related errors showed up immediately, for the first time: --- mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: MPI_EVENT_SAS_DEVICE_STATUS_CHANGE mpt0: mpt_cam_event: MPI_EVENT_SAS_DEVICE_STATUS_CHANGE mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). -- When the bootstrap process reached the SCSI probe, there were no activity on the screen for about five minutes, so I was forced to use the power off button, and after rebooting, the same symptoms were evident, so I rebooted the machine once again, this time in verbose mode. This debug information was being printed on the screen, one character at time, at about 1 char/sec: (probe8:mpt0:0:8:0): error 22 (probe8:mpt0:0:8:0): Unretryable Error (probe8:mpt0:0:8:0): error 22 (probe8:mpt0:0:8:0): Unretryable Error (probe0:mpt0:0:0:1): error 22 (probe0:mpt0:0:0:1): Unretryable Error (probe1:mpt0:0:8:1): Unexpected Bus Free (probe1:mpt0:0:8:1): Retrying Command ... (probe0:mpt0:0:8:7): Unexpected Bus Free (probe0:mpt0:0:8:7): Retrying Command (probe0:mpt0:0:8:7): Unexpected Bus Free (probe0:mpt0:0:8:7): Retrying Command (probe0:mpt0:0:8:7): Unexpected Bus Free (probe0:mpt0:0:8:7): Retrying Command (probe0:mpt0:0:8:7): Unexpected Bus Free (probe0:mpt0:0:8:7): Retrying Command (probe0:mpt0:0:8:7): Unexpected Bus Free (probe0:mpt0:0:8:7): error 5 (probe0:mpt0:0:8:7): Retries Exausted After 18 (eighteen) minutes, the error messages ceased, and the boot process continued as usually: --- pass0 at mpt0 bus 0 target 0 lun 0 pass0: <MAXTOR ATLAS15K2_073SAS BP00> Fixed Direct Access SCSI-5 device pass0: Serial Number K40C1Q5K pass0: 300.000MB/s transfers, Tagged Queueing Enabled pass1 at mpt0 bus 0 target 8 lun 0 pass1: <DP BACKPLANE 1.00> Fixed Enclosure Services SCSI-5 device pass1: 300.000MB/s transfers, Tagged Queueing Enabled ses0 at mpt0 bus 0 target 8 lun 0 ses0: <DP BACKPLANE 1.00> Fixed Enclosure Services SCSI-5 device ses0: 300.000MB/s transfers, Tagged Queueing Enabled ses0: SCSI-3 SES Device GEOM: new dida0 at mpt0 bus 0 target 0 lun 0 da0: <MAXTOR ATLAS15K2_073SAS BP00> Fixed Direct Access SCSI-5 device da0: Serial Number K40C1Q5K da0: 300.000MB/s transfers, Tagged Queueing Enabled da0: 70007MB (143374650 512 byte sectors: 255H 63S/T 8924C) --- As a workaround, I disabled the APICs (hint.apic.0.disabled), and that ~15 minutes delay at boot up, now was gone. Fine. (BTW, 7-CURRENT has the same problem, but without that huge delay) Once I was logged in the server, I proceeded to populate my ports tree, by using portsnap(8), so, when I extracted the tarball (portsnap extract), there was a lot of the following error message, at about 1 message per second: mpt0: Unhandled Event Notify Frame. Event 0xe (ACK not required). Once in a while, an error message like below, showed up: -- (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 0 1 55 6f 5f 0 0 20 0 (da0:mpt0:0:0:0): CAM Status: SCSI Status Error (da0:mpt0:0:0:0): SCSI Status: Check Condition (da0:mpt0:0:0:0): UNIT ATTENTION asc:29,2 (da0:mpt0:0:0:0): Scsi bus reset occurred -- After running some diagnostics included on some utilities CDs shipped with this server, I concluded this was a software issue: -- Device Name : SAS Disk 0:0 Description : SAS MAXTOR ATLAS15K2_073SAS Test Name : Disk Self Test Passes : 1 Result : passed Start Time : Mon Aug 21 02:04:06 2006 Completion Time : Mon Aug 21 02:26:12 2006 Result Event : The test operation completed successfully -- In order to perform those diagnostics, I had to install a SuSe Linux Enterprise Server 9, which was also shipped with this machine) After reinstalling FreeBSD, I logged remotely into the server, via ssh, and fetched the ports snapshot again and extracted once more. Suddenly, the screen activity ceased and the network connection timed out. Locally, on the server, there was a lot of mpt(4) errors and warnings. --- (da0:mpt0:0:0:0): CAM Status 0x18 (da0:mpt0:0:0:0): Retrying Command (... and about 500 more lines like those...) --- Then, some bce(4) errors, which caused the network interface to be shutdown: --- bce0: ../../../dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting bce0: link state changed to DOWN bce0: Gigabit link up bce0: link state changed to UP bce0: ../../../dev/bce/if_bce.c(5032): Watchdog timeout occurred, resetting --- And finally, those errors from mpt(4): --- request 0xc4c4a080:44717 timed out for ccb 0xc4e41400 (req->ccb 0xc4e41400) request 0xc4c4b430:44718 timed out for ccb 0xc4ca5800 (req->ccb 0xc4ca5800) request 0xc4c4cd80:44719 timed out for ccb 0xc4c52800 (req->ccb 0xc4c52800) (... and about 300 more lines like those ...) --- which were followed by the same number of lines like these: --- mpt0: completing timedout/aborted req 0xc4c4a080:44717 mpt0: completing timedout/aborted req 0xc4c4b430:44718 mpt0: completing timedout/aborted req 0xc4c4cd80:44719 --- and finishing with this line: --- mpt0: Timedout requests already complete. Interrupts may not be functioning. --- After one hour and a half, the system was still unstable and I was forced to reboot it. Those are the main issues (mpt(4) and bce(4)) regarding this hardware configuration and FreeBSD (6-STABLE and 7-CURRENT), however, two problems were showing up, as well. 1. The first network interface (labeled "Gb 1", on server's case) was detected as bce1, and the second one (Gb 2), as bce0. --- bce0: <Broadcom NetXtreme II BCM5708 1000Base-T (B1), v0.9.6> mem 0xf4000000-0xf5ffffff irq 16 at device 0.0 on pci8 bce0: Reserved 0x2000000 bytes for rid 0x10 type 3 at 0xf4000000 bce0: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz miibus0: <MII bus> on bce0 brgphy0: <BCM5708C 10/100/1000baseTX PHY> on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bce0: bpf attached bce0: Ethernet address: 00:13:72:f9:xx:xx bce0: [MPSAFE] ... bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B1), v0.9.6> mem 0xf8000000-0xf9ffffff irq 16 at device 0.0 on pci4 bce1: Reserved 0x2000000 bytes for rid 0x10 type 3 at 0xf8000000 bce1: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz miibus1: <MII bus> on bce1 brgphy1: <BCM5708C 10/100/1000baseTX PHY> on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bce1: bpf attached bce1: Ethernet address: 00:13:72:f9:xx:xx bce1: [MPSAFE] --- According to this log, bce0 is on pci8, while bce1 is on pci4. 2. Sometimes, the server refuses to be halted or rebooted via shutdown(8) or reboot(8). Any hint will be appreciated :) /var/run/dmesg.boot (verbose log) http://bsdero.tripod.com/dmesg.boot.txt /var/log/messages (with remarks) http://bsdero.tripod.com/messages.txt pciconf -lv output http://bsdero.tripod.com/pciconf.txt -- Alex Salazar
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?40c4bb930609020223h50c43537n1c8b32081ef5c1bf>