From owner-freebsd-fs@FreeBSD.ORG  Wed May 20 15:43:26 2015
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E35EBDFC;
 Wed, 20 May 2015 15:43:25 +0000 (UTC)
Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A73F61A87;
 Wed, 20 May 2015 15:43:25 +0000 (UTC)
Received: from rack1.digiware.nl (unknown [127.0.0.1])
 by smtp.digiware.nl (Postfix) with ESMTP id 2926D16A406;
 Wed, 20 May 2015 17:43:22 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from smtp.digiware.nl ([127.0.0.1])
 by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id B8Dvz59DQ2Gf; Wed, 20 May 2015 17:43:11 +0200 (CEST)
Received: from [192.168.101.176] (vpn.ecoracks.nl [31.223.170.173])
 by smtp.digiware.nl (Postfix) with ESMTPA id 8658116A404;
 Wed, 20 May 2015 17:43:11 +0200 (CEST)
Message-ID: <555CAB90.7070506@digiware.nl>
Date: Wed, 20 May 2015 17:43:12 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: fs@freebsd.org, Edward Tomasz Napierala <trasz@FreeBSD.org>
Subject: ZFS / NFS deadlock??? (Was: Re: Unexpected reboot after ctld run
 into trouble.)
References: <55573756.9070503@digiware.nl>
In-Reply-To: <55573756.9070503@digiware.nl>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 May 2015 15:43:26 -0000

On 16/05/2015 14:25, Willem Jan Withagen wrote:
> Hi,
> 
> Found the following in my logs: 
> Losts of
> ----
> (0:4:0/0): Task Action: LUN Reset
> (0:4:0/0): CTL Status: Command Completed Successfully
> sonewconn: pcb 0xfffff8004e69e930: Listen queue overflow: 8 already in
> queue awaiting acceptance (740688 occurrences)
> (0:4:0/0): Task Action: LUN Reset
> (0:4:0/0): CTL Status: Command Completed Successfully
> sonewconn: pcb 0xfffff8004e69e930: Listen queue overflow: 8 already in
> queue awaiting acceptance (713721 occurrences)
> (0:4:0/0): Task Action: LUN Reset
> (0:4:0/0): CTL Status: Command Completed Successfully
> sonewconn: pcb 0xfffff8004e69e930: Listen queue overflow: 8 already in
> queue awaiting acceptance (691776 occurrences)
> ----
> 
> Which then ends in:
> ----
> panic: deadlkres: possible deadlock detected for 0xfffff8001ee94920,
> blocked for 1801009 ticks
> 
> 
> cpuid = 1
> Uptime: 14d13h13m47s
> Dumping 7557 out of 8175
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%Table 'FACP' at
> 0xcfedbcf8
> ----

> The system is running ZFS with ZFS-on-root: FreeBSD zfs.digiware.nl 
> 10.1-STABLE FreeBSD 10.1-STABLE #221 r282282: Fri May  1 06:51:41 
> CEST 2015
> 
> This could stem from the fact that I woke up my Win8 PC which has a 
> iscsi volume mounted. It is used to store security cam captures on 
> and does have somewhat bigger traffic on it.
> 
> Suggestions or question to look at are welcome.

> I do have a core in /var/crash, but will need some guidance to 
> retrieve stuff from it.

Followup to this story, after some discussion with/debugging by Edward
(trasz@):

>> Now, the bad news: I don't think I'll be able to help you with this
>> one. It looks like the problem is actually NFS-related. Using the
>> hex address from the deadlock message in dmesg:
>> 
>> % kgdb boot/kernel/kernel vmcore.3
>> 
>> (kgdb) p ((struct thread *)0xfffff8001ee94920)->td_proc->p_comm $6 
>> = "nfsd", '\0' <repeats 15 times> (kgdb) p ((struct thread 
>> *)0xfffff8001ee94920)->td_wmesg $7 = 0xffffffff80edcfc3 "zfs"
>> 
>> So it might actually be a ZFS deadlock the nfsd thread tripped on.

> The panic was triggered by deadlkres; it noticed that there was a 
> thread that spent way too much time waiting for something - so, 
> presumably, it become "hung" due to a deadlock.  

> The 0xfffff8001ee94920 in dmesg is the address of "struct thread" of the
>  problematic thread.

> The first print shows the "command name" (p_comm) of the process the 
> thread belongs to.  The second print shows the "wait channel", on
> which the thread sleeped.

So now the questions are:

1) Is this indeed a ZFS / NFS deadlock problem?
2) Who can/wil help to get this worked out?

Thanx,
--WjW