Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Oct 2010 11:27:38 -0400
From:      Sean Thomas Caron <scaron@umich.edu>
To:        freebsd-stable@freebsd.org
Cc:        scaron@umich.edu
Subject:   Spurious reboot in 8.1-RELEASE when reading from ZFS pool with > 9 disks
Message-ID:  <20101020112738.12467cvfvvh4zb0g@web.mail.umich.edu>

next in thread | raw e-mail | index | archive | help
Hi folks,

I've been playing with ZFS in 8.1-RELEASE (amd64) on a Sun Fire X4500  
with 16 GB RAM and in general it seems to work well when used as  
recommended.

BUT...

In spite of the suggestion of Sun and FreeBSD developers to the  
contrary, I have been trying to create some zraid pools of size  
greater than 9 disks and it seems to give some trouble.

If I try to create a pool containing more than 9 [500 GB] disks,  
doesn't matter if it is zraid1 or zraid2, the system seems to reboot  
given any amount of sustained reads from the pool (haven't tested  
mirrored pools). Now, I am not sure of:

- Whether the reboot in the zraid1 case is caused by exactly the same  
issue as the reboot in the zraid2 case

- Whether this is an issue of total number of member disks, or total  
amount of disk space in the pool. All I have to work with at the  
moment is 500 GB drives.

I am not doing any sysctl tuning; just running with the defaults or  
what the system automatically sizes. I tried playing around with  
tuning some sysctl parameters including setting arc_max to be very  
small and it didn't seem to help any; pools of greater than 9 disks in  
size are always unstable.

Writes seem to work OK; I can, say, pull stuff from over the network  
and save it to the pool, or I can do something like,

dd if=/dev/random of=/mybigpool/bigfile bs=1024 size=10M

and it will write data all day pretty happily. But if I try to read  
back from the pool, for example,

dd if=/mybigpool/bigfile of=/dev/null bs=1024

or even to just do something like,

cp /mybigpool/bigfile /mybigpool/bigfile_2

the system reboots pretty much immediately. I never see anything on  
the console at all; it just reboots.

Even if I build a new kernel with debugging options:

options KDB
options DDB

the system still just reboots; I never see anything on the console and  
I never get to the debugger.

So, as I say, very easy to reproduce the problem, just create a zraid  
pool of any type with more than 9 member disks, dump some data to it,  
then try to read it back, and the machine will reboot.

If I create a pool with 9 or fewer disks, the system seems perfectly  
stable. I was never able to reproduce the reboot behavior as long as  
the pools contained 9 or fewer drives, beating on it fairly hard with  
iozone and multiple current dd operations writing large files to and  
from memory.

Just wondering if anyone's seen this problem before and as to whether  
or not it is a known bug and may have been fixed in STABLE or CURRENT?  
Should I report this as a bug? Should I just create pools of 9 or  
fewer drives? Not sure if my customer is going to want to use STABLE  
or CURRENT in production but I wanted to run this by the list just to  
see.

Best,

-Sean



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101020112738.12467cvfvvh4zb0g>