From owner-freebsd-stable@FreeBSD.ORG Mon Jul 23 13:10:07 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A2F23106566C for ; Mon, 23 Jul 2012 13:10:07 +0000 (UTC) (envelope-from clay@milos.co.za) Received: from lisa.milos.co.za (lisa.milos.co.za [109.169.49.137]) by mx1.freebsd.org (Postfix) with ESMTP id EBB928FC12 for ; Mon, 23 Jul 2012 13:10:06 +0000 (UTC) Received: (qmail 87689 invoked by uid 89); 23 Jul 2012 13:08:47 -0000 Received: from unknown (HELO ClayDesktop) (clay@milos.co.za@192.168.200.6) by lisa.milos.co.za with ESMTPA; 23 Jul 2012 13:08:47 -0000 From: "Clayton Milos" To: Date: Mon, 23 Jul 2012 14:08:12 +0100 Message-ID: <00f701cd68d4$4a5dd030$df197090$@milos.co.za> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 thread-index: Ac1o1DS50X2WSRvFQL2TAXYU4FbH9w== Content-Language: en-gb Subject: ZFS causing panic X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jul 2012 13:10:07 -0000 Hi guys I've had an issue for some time now. When I'm copying a lot of files over to ZFS usually using SMB it causes a panic and locks up the server. I'm running FreeBSD 9.0-RELEASE with a custom kernel. I've just pulled unnecessary drivers out of the config and added: cpu HAMMER device pf device pflog options DEVICE_POLLING options HZ=1000 For full disclosure I am getting these errors in the syslog which means there's an ECC error occurring somewhere which I am trying to locate. I have replaced both of the CPU's and all of the RAM and am still getting it so perhaps the north bridge has bought the farm. I don't think that this is the issue though because I was getting panics before on other hardware. Current hardware is an 80G OS drive, 2x Opteron 285's and 16G (8x2G) of RAM on a Tyan 2892 motherboard. Raid card is an Areca 1120. I am running 2 pools. Both of them are 4 drive hardware RAID5. The one I'm having issues with is 4x3TB drives seen as a 9TB scsi drive: da0 at arcmsr0 bus 0 scbus6 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 166.666MB/s transfers (83.333MHz, offset 32, 16bit) da0: Command Queueing enabled da0: 8583068MB (17578123776 512 byte sectors: 255H 63S/T 1094187C) This is encrypted with GELI to make /dev/da0.eli upon which the pool is created. It looks like it's lost the pool now since the last panic: pool: homer state: FAULTED status: The pool metadata is corrupted and the pool cannot be opened. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-72 scan: scrub repaired 0 in 7h0m with 0 errors on Mon Jul 23 05:25:27 2012 config: NAME STATE READ WRITE CKSUM homer FAULTED 0 0 2 da0.eli ONLINE 0 0 8 Also I was running a script to check the kernel memory every 2 seconds. It appears that it was well within the 1G I have assigned it in /boot/loader.conf: TOTAL=695217852, 663.011 MB TOTAL=695217852, 663.011 MB TOTAL=695217852, 663.011 MB TOTAL=695219900, 663.013 MB TOTAL=695219900, 663.013 MB TOTAL=695345852, 663.133 MB TOTAL=695412412, 663.197 MB TOTAL=695228092, 663.021 MB TOTAL=695228092, 663.021 MB TOTAL=695226044, 663.019 MB My /boot/loader.conf contains: ng_bpf_load="YES" amdtemp_load="YES" ubsec_load="YES" vm.kmem_size="1024M" vm.kmem_size_max="1024M" vfs.zfs.arc_max="600M" vfs.zfs.vdev.cache.size="8M" vfs.zfs.txg.timeout="5" kern.maxvnodes="250000" This system is a home server so I can run a debug kernel if required and crash it again. My first question is am I doing something wrong? I think the values I've put in are sufficient but I could well have done it wrong. The server is also not writing the crash dump out by the looks. It hung on 1% and I had to power cycle it. This is the panic: panic: solaris assert: 0 == zap_increment_int(os, (-1ULL), user, delta, tx) (0x0 == 8x7a), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dm u_object.c, line: 1224 cpuid = 3 KDB: stack backtrace #0 0xffffffff8055b74e at kdb_backtrace+0x5e #1 0xffffffff80525c47 at panic+0x187 #2 0xffffffff80e71b9d at do_userquota_update+0xad #3 0xffffffff80e71dae at dmu_objset_do_userquota_updates+0x1de #4 0xffffffff80e882af at dso_pool_sync+0x11f #5 0xffffffff80e976e4 at spa_sunc+0x334 #6 0xffffffff80ea7ed3 at txg_sync_thread+0x253 #7 0xffffffff804f89ee at fork_exit+0x11e #8 0xffffffff8075847e at fork_trampoline+0xe Uptime: 14h31m10s Dumping 2489 out of 16370 MB:..1% Thanks for any help. //Clay