From owner-freebsd-stable@FreeBSD.ORG Tue Dec 4 17:54:47 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CF782B29 for ; Tue, 4 Dec 2012 17:54:47 +0000 (UTC) (envelope-from rcartwri@asu.edu) Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) by mx1.freebsd.org (Postfix) with ESMTP id 86F138FC14 for ; Tue, 4 Dec 2012 17:54:47 +0000 (UTC) Received: by mail-ie0-f182.google.com with SMTP id s9so8009084iec.13 for ; Tue, 04 Dec 2012 09:54:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=DTRrfAzfOMcXshAkFr6g9AlKnMV+y+l0hvheyPf7ofU=; b=TtyC5yyEC3X2ys4GSQ+jilMECtaa7BHbtqaSyTCCYk3Li0Kylba+N/HETi/NPxpa6Q 11XjtbZE49u/vW2VUAXmSm253kwO/BFPbCyL7o+83CPEJ4yT4GoLNJol979jco8N5u3y Mlr28D0l30i1y2tKAogaR3+vMAxXm10NrEilbGLQf1FY1dvbUoKvtTJSBkKWXgVtxF4x FHE+4ZJERPw+u/CrdMgeKNoFyUSmkrCX8mOu3lYCRlsBhnqvm8TaXj0TNl2vCiYfRiJG D40qq2F5v8IBV2/qE6/EPaAyn2hcWYn+//K4+/oCGJuja99Ze9OFAArsoGPz0pgH03rY 3yiQ== MIME-Version: 1.0 Received: by 10.50.173.103 with SMTP id bj7mr3617597igc.47.1354643686996; Tue, 04 Dec 2012 09:54:46 -0800 (PST) Received: by 10.64.64.39 with HTTP; Tue, 4 Dec 2012 09:54:46 -0800 (PST) In-Reply-To: <1769356561.1118634.1354631170432.JavaMail.root@erie.cs.uoguelph.ca> References: <1769356561.1118634.1354631170432.JavaMail.root@erie.cs.uoguelph.ca> Date: Tue, 4 Dec 2012 10:54:46 -0700 Message-ID: Subject: Re: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE From: "Reed A. Cartwright" To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmureX8QVOFDVSB6m58b9tMRz+CKutACcwmM4n3rf97MxPuy3byE+TUwUmZN9PWR8F9QRKy Cc: freebsd-fs@freebsd.org, "freebsd-stable@freebsd.org" , olivier olivier X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2012 17:54:47 -0000 I'm having similar issues after upgrading to 9.1-RC2 and RC3. I'm not using either NFS or a ZIL. On Tue, Dec 4, 2012 at 7:26 AM, Rick Macklem wrote: > Olivier wrote: >> Hi all >> After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm >> having >> severe problems with NFS sharing of a ZFS volume. nfsd appears to hang >> at >> random times (between once every couple hours to once every two days) >> while >> accessing a ZFS volume, and the only way I have found of resolving the >> problem is to reboot. The server console is sometimes still responsive >> during the nfsd hang, and I can read and write files to the same ZFS >> volume >> while nfsd is hung. I am pasting below the output of procstat -kk on >> nfsd, >> and details of my pool (nfsstat on the server gets hung when the >> problem >> has started occurring, and does not produce any output). The pool is >> v28 >> and was created from a bunch of volumes attached over Fibre Channel >> using >> the mpt driver. My system has a Supermicro board and 4 AMD Opteron >> 6274 >> CPUs. >> >> I did not experience any nfsd hangs with 9.0-RELEASE (same machine, >> essentially same configuration, same usage pattern). >> >> I would greatly appreciate any help to resolve this problem! >> Thank you >> Olivier >> >> PID TID COMM TDNAME KSTACK >> 1511 102751 nfsd nfsd: master >> mi_switch+0x186 >> sleepq_wait+0x42 >> __lockmgr_args+0x5ae >> vop_stdlock+0x39 >> VOP_LOCK1_APV+0x46 >> _vn_lock+0x47 >> zfs_fhtovp+0x338 >> nfsvno_fhtovp+0x87 >> nfsd_fhtovp+0x7a >> nfsrvd_dorpc+0x9cf >> nfssvc_program+0x447 >> svc_run_internal+0x687 >> svc_run+0x8f >> nfsrvd_nfsd+0x193 >> nfssvc_nfsd+0x9b >> sys_nfssvc+0x90 >> amd64_syscall+0x540 >> Xfast_syscall+0xf7 >> 1511 102752 nfsd nfsd: service >> mi_switch+0x186 >> sleepq_wait+0x42 >> __lockmgr_args+0x5ae >> vop_stdlock+0x39 >> VOP_LOCK1_APV+0x46 >> _vn_lock+0x47 >> zfs_fhtovp+0x338 >> nfsvno_fhtovp+0x87 >> nfsd_fhtovp+0x7a >> nfsrvd_dorpc+0x9cf >> nfssvc_program+0x447 >> svc_run_internal+0x687 >> svc_thread_start+0xb >> fork_exit+0x11f >> fork_trampoline+0xe >> 1511 102753 nfsd nfsd: service >> mi_switch+0x186 >> sleepq_wait+0x42 >> _cv_wait+0x112 >> zio_wait+0x61 >> zil_commit+0x764 >> zfs_freebsd_write+0xba0 >> VOP_WRITE_APV+0xb2 >> nfsvno_write+0x14d >> nfsrvd_write+0x362 >> nfsrvd_dorpc+0x3c0 >> nfssvc_program+0x447 >> svc_run_internal+0x687 >> svc_thread_start+0xb >> fork_exit+0x11f >> fork_trampoline+0xe >> 1511 102754 nfsd nfsd: service >> mi_switch+0x186 >> sleepq_wait+0x42 >> _cv_wait+0x112 >> zio_wait+0x61 >> zil_commit+0x3cf >> zfs_freebsd_fsync+0xdc >> nfsvno_fsync+0x2f2 >> nfsrvd_commit+0xe7 >> nfsrvd_dorpc+0x3c0 >> nfssvc_program+0x447 >> svc_run_internal+0x687 >> svc_thread_start+0xb >> fork_exit+0x11f >> fork_trampoline+0xe >> 1511 102755 nfsd nfsd: service >> mi_switch+0x186 >> sleepq_wait+0x42 >> __lockmgr_args+0x5ae >> vop_stdlock+0x39 >> VOP_LOCK1_APV+0x46 >> _vn_lock+0x47 >> zfs_fhtovp+0x338 >> nfsvno_fhtovp+0x87 >> nfsd_fhtovp+0x7a >> nfsrvd_dorpc+0x9cf >> nfssvc_program+0x447 >> svc_run_internal+0x687 >> svc_thread_start+0xb >> fork_exit+0x11f >> fork_trampoline+0xe >> 1511 102756 nfsd nfsd: service >> mi_switch+0x186 >> sleepq_wait+0x42 >> _cv_wait+0x112 >> zil_commit+0x6d >> zfs_freebsd_write+0xba0 >> VOP_WRITE_APV+0xb2 >> nfsvno_write+0x14d >> nfsrvd_write+0x362 >> nfsrvd_dorpc+0x3c0 >> nfssvc_program+0x447 >> svc_run_internal+0x687 >> svc_thread_start+0xb >> fork_exit+0x11f >> fork_trampoline+0xe >> > These threads are either waiting for a vnode lock or waiting inside > zil_commit() { at 3 different locations in zil_commit() }. A guess > would be that the ZIL hasn`t completed a write for some reason, so > 3 threads are waiting for it when one of them is holding a lock on > the vnode being written and the remaining threads are waiting for > that vnode lock. > > I am not a ZFS guy, so I cannot help further, except to suggest > that you try and determine what might cause a write to the ZIL to > stall. (Different device, different device driver...) > > Good luck with it, rick > >> >> PID TID COMM TDNAME KSTACK >> 1507 102750 nfsd - >> mi_switch+0x186 >> sleepq_catch_signals+0x2e1 >> sleepq_wait_sig+0x16 >> _cv_wait_sig+0x12a >> seltdwait+0xf6 >> kern_select+0x6ef >> sys_select+0x5d >> amd64_syscall+0x540 >> Xfast_syscall+0xf7 >> >> >> pool: tank >> state: ONLINE >> status: The pool is formatted using a legacy on-disk format. The pool >> can >> still be used, but some features are unavailable. >> action: Upgrade the pool using 'zpool upgrade'. Once this is done, the >> pool will no longer be accessible on software that does not support >> feature >> flags. >> scan: scrub repaired 0 in 45h37m with 0 errors on Mon Dec 3 03:07:11 >> 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> tank ONLINE 0 0 0 >> raidz1-0 ONLINE 0 0 0 >> da19 ONLINE 0 0 0 >> da31 ONLINE 0 0 0 >> da32 ONLINE 0 0 0 >> da33 ONLINE 0 0 0 >> da34 ONLINE 0 0 0 >> raidz1-1 ONLINE 0 0 0 >> da20 ONLINE 0 0 0 >> da36 ONLINE 0 0 0 >> da37 ONLINE 0 0 0 >> da38 ONLINE 0 0 0 >> da39 ONLINE 0 0 0 >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to >> "freebsd-stable-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Reed A. Cartwright, PhD Assistant Professor of Genomics, Evolution, and Bioinformatics School of Life Sciences Center for Evolutionary Medicine and Informatics The Biodesign Institute Arizona State University