Date: Fri, 1 Sep 2006 18:42:38 +0200 From: "Patrick M. Hausen" <hausen@punkt.de> To: freebsd-stable@freebsd.org Subject: LSI/amr driver controller cache problem? Message-ID: <20060901164238.GA66726@hugo10.ka.punkt.de>
next in thread | raw e-mail | index | archive | help
Hi, all! We just set a brand new Intel SSR212CC box into production. This is basically a standard server with 2 LSI SATA RAID controllers and 12 drive bays in 2 rack units height. Intel sells it as a storage product. There's a variant of Windows 2003 server that turns this box into an iSCSI target. We want to use it for disk based backup with Amanda. The system runs 6-STABLE at the moment. amr0: <LSILogic MegaRAID 1.53> mem 0xfbef0000-0xfbefffff, 0xfcd00000-0xfcdfffff irq 72 at device 14.0 on pci6 amr0: <LSILogic Intel(R) RAID Controller SRCS28X> Firmware 814C, BIOS H431, 128MB RAM amr1: <LSILogic MegaRAID 1.53> mem 0xfbff0000-0xfbffffff, 0xfcf00000-0xfcffffff irq 96 at device 14.0 on pci8 amr1: <LSILogic Intel(R) RAID Controller SRCS28X> Firmware 814C, BIOS H431, 128MB RAM amrd0: <LSILogic MegaRAID logical drive> on amr0 amrd0: 1907348MB (3906248704 sectors) RAID 5 (optimal) Since the two RAID controllers come with a battery backup for their cache memory, I configured the logical drive with write back cache policy and the individual disk drives' write caches off. After cvsup and build/installworld, I noticed strange Sendmail failures (signal 11) on the box. Reinstalling Sendmail fixed the problem. Just to make sure I did installworld again, rebooted - Sendmail signal 11. Then it dawned at me that Sendmail is the last binary installed and written to the logical drive in the installworld process. I can reproduce the problem any time: installworld, reboot, Sendmail broken. Installworld or just reinstall Sendmail, don't reboot, everything's fine. No matter if I use "reboot" or "shutdown -r". Is it possible that the amr driver does not issue the necessary flush command to the controller (probably first part of the problem) and additionally the controller loses it's cache content at the following system reset despite it's BBU (second part of problem - iir controllers by ICP Vortex handle a system reset just fine, syncing the drives during boot)? And ideas? I don't have a different explanation. A coworker suggested a possible yet unknown UFS2 problem with large filesystems, but /usr is not large on this box. /var is. The last couple of writes before a system reboot are lost. Reliably. I will set the controller's cache policy back to "write through", but I'm still not sleeping well ... Thanks, Patrick P.S. As a side note: no problems at all with the em(4) driver so far on this one. -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25 Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060901164238.GA66726>