From owner-freebsd-bugs@freebsd.org Fri Aug 19 22:19:35 2016 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 175B3BBF515 for ; Fri, 19 Aug 2016 22:19:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 06D351E8C for ; Fri, 19 Aug 2016 22:19:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u7JMJYwE068405 for ; Fri, 19 Aug 2016 22:19:34 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 211990] iscsi fails to reconnect and does not release devices Date: Fri, 19 Aug 2016 22:19:35 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: ben.rubson@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 22:19:35 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211990 --- Comment #4 from Ben RUBSON --- One strange thing I noticed. (I put all things that could be interesting from my troubleshooting) As soon as I put the network interface down, I get the following message on target side, one per target : 17:01:00 srv2 kernel: WARNING: 192.168.2.1 (iqn.1994-09.org.freebsd:srv1): = no ping reply (NOP-Out) after 5 seconds; dropping connection Then, on initiator side, I get these messages for each target : Aug 19 17:01:07 srv1 kernel: iscsi_maintenance_thread_reconnect: 192.168.2.2 (iqn.2012-06.srv2:hm4): connection failed, destroying devices Aug 19 17:01:07 srv1 kernel: iscsi_session_cleanup: 192.168.2.2 (iqn.2012-06.srv2:hm4): freezing Aug 19 17:01:07 srv1 kernel: iscsi_session_cleanup: 192.168.2.2 (iqn.2012-06.srv2:hm4): deregistering SIM At this moment, on initiator side, one iscsid process per target appears. 10 seconds later, on initiator side, I get these messages for each target : Aug 19 17:01:18 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:hm4): l= ogin timed out after 11 seconds; reconnecting Aug 19 17:01:18 srv1 kernel: iscsi_maintenance_thread_reconnect: 192.168.2.2 (iqn.2012-06.srv2:hm4): connection failed, destroying devices And at the same time, a second iscsid process per target appears, so that I= get 2 iscsid processes per target : # ps auxxw | grep iscsid: root 866 0.0 0.0 16632 2144 - I 4:58pm 0:00.00 iscsid: 192.168.2.2 (iqn.2012-06.srv2:hm4) (iscsid) root 881 0.0 0.0 16632 2144 - I 4:58pm 0:00.00 iscsid: 192.168.2.2 (iqn.2012-06.srv2:hm4) (iscsid) (...) However sounds like there is a limit to 30 processes, as for 17 targets I w= ould have expected 34 processes, but I only get 30. If I put the NIC up before the second process is created, I only get one reconnection message per target in target logs. If I put the NIC up after the second process is created, I get a lot more reconnection messages in target logs, between 40 and 50 for 17 targets. Do we expect these additional processes ? I think we would only expect one process / one reconnection message per tar= get ? Seems strange to have all these "duplicated" connection retries. Another related question to the "30" processes found : Is there any limit to 30 targets ? I found a maxproc option in ctl.conf (default to 30) but I don't exactly kn= ow what it means (I tested values of 1 to 50 without seeing any change). No option found however on initiator side. I noticed that we can reproduce this bug easier when we "stress" the device= s : disconnect network as soon as targets are reconnected, and reconnect it as = soon as they are disconnected. Additionally to this, I had 8 kernel crashes, initator or target, each time with the same address / pointer : kernel: Fatal trap 12: page fault while in kernel mode kernel: fault virtual address =3D 0x1e8 kernel: instruction pointer =3D 0x20:0xffffffff80936933 I also got a stacktrace, but did not get it's pointer address. http://img4.hostingpics.net/pics/707217211990.png I'm also trying to get a full dump. However I'm not sure this kernel crash issue is related to the reconnection issue, perhaps there are 2 issues. # uname -v FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 18:38:15 UTC 2016 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC A lot of info ! I hope we will be able to correct these issues. Many thanks, Ben --=20 You are receiving this mail because: You are the assignee for the bug.=