Date: Thu, 11 Nov 2010 13:12:55 -0500 From: Michael Boers <michaelscotttech@gmail.com> To: freebsd-questions@freebsd.org Subject: zfs mirrors and high availability Message-ID: <0C3B7D09-CF38-40D9-A483-F5860DE16652@gmail.com>
next in thread | raw e-mail | index | archive | help
I am running a 100% zfs based FreeBSD 8.0 system with 4 disks: two zfs mirrored boot drives and two zfs mirrored data drives. This morning the server went down with the following errors in the log file: Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSI Status Error Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: Check Condition Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND asc: 0,0 Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional sense information Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c87a0:2838 timed out for ccb 0xffffff0103acc000 (req->ccb 0xffffff0103acc000) Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c5110:2839 timed out for ccb 0xffffff035cab0800 (req->ccb 0xffffff035cab0800) Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req 0xffffff80003c87a0:2838 function 0 Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bef30:2840 timed out for ccb 0xffffff0007986800 (req->ccb 0xffffff0007986800) Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c8560:2841 timed out for ccb 0xffffff032d985000 (req->ccb 0xffffff032d985000) Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bf320:2842 timed out for ccb 0xffffff0103af2000 (req->ccb 0xffffff0103af2000) Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cbda0:2843 timed out for ccb 0xffffff0103b0b000 (req->ccb 0xffffff0103b0b000) Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bfd40:2844 timed out for ccb 0xffffff00102bf800 (req->ccb 0xffffff00102bf800) Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cad50:2845 timed out for ccb 0xffffff01e6f33000 (req->ccb 0xffffff01e6f33000) Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003caf00:2846 timed out for ccb 0xffffff01e6f24800 (req->ccb 0xffffff01e6f24800) Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003ccd60:2847 timed out for ccb 0xffffff01308a4000 (req->ccb 0xffffff01308a4000) Why didn't zfs stop talking to the disk that was clearly having issues? Are there sysctl or other variables that I can set that will allow zfs to mark a disk as failed more aggressively? Is there a way that I could have prevented the crash? The system was "up", pingable, but not accessible via ssh. My guess is that all disk related requests were queueing/stuck. A few more notes on my setup: Harware: Dell PowerEdge 2970, 1 CPU, 16 GB Ram pool: Storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM Storage ONLINE 0 0 0 mirror ONLINE 0 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 0 0 errors: No known data errors pool: zboot state: ONLINE scrub: scrub in progress for 0h22m, 72.03% done, 0h8m to go config: NAME STATE READ WRITE CKSUM zboot ONLINE 0 0 0 mirror ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk1 ONLINE 0 0 0 -- Thanks!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0C3B7D09-CF38-40D9-A483-F5860DE16652>