Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 06 Sep 2018 15:15:30 +0200
From:      "Kristof Provost" <kp@FreeBSD.org>
To:        "Bjoern A. Zeeb" <bz@freebsd.org>
Cc:        "FreeBSD Net" <freebsd-net@freebsd.org>
Subject:   vnet shutdown / ifnet_departure_event
Message-ID:  <3F130DF9-40CA-45C9-944E-91E1CA2BF445@FreeBSD.org>

next in thread | raw e-mail | index | archive | help
Hi Bjoern,

I’m running into an issue with vnet shutdown. It manifests 
consistently with pfsync, but if I understand the problem fully it’s 
not really related to pfsync.

The issue is that we end up with a use-after-free of the struct ifnet of 
the pfsync interface.
When the jail shuts down the pfsync interface is destroyed, but because 
this is during vnet shutdown we skip a lot of the cleanup.
Including the `EVENTHANDLER_INVOKE(ifnet_departure_event, ifp);`, which 
means pf doesn’t get notified that the interface went away, so it 
keeps its struct pfi_kif for that interface, which it tries to clean up 
when we get round to doing the vnet shutdown for pf. At that point it 
tries to clear the if_pf_kif and pfg_pf_kif pointers, for an ifp which 
has already been freed.

Invoking the event handler from the ‘if (shutdown)’ code in 
if_detach_internal() fixes the problem, but I’m not totally confident 
that won’t have any unexpected side effects.

Best regards,
Kristof
From owner-freebsd-net@freebsd.org  Thu Sep  6 13:17:34 2018
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D95F1FFAF48
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Thu,  6 Sep 2018 13:17:33 +0000 (UTC)
 (envelope-from kayasaman@optiplex-networks.com)
Received: from x-ray.optiplex-networks.com (mail.optiplex-networks.com
 [212.159.80.20])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3E6BD76800
 for <freebsd-net@freebsd.org>; Thu,  6 Sep 2018 13:17:32 +0000 (UTC)
 (envelope-from kayasaman@optiplex-networks.com)
Received: from localhost (localhost [127.0.0.1])
 by x-ray.optiplex-networks.com (Postfix) with ESMTP id CA035A200E8;
 Thu,  6 Sep 2018 14:17:23 +0100 (BST)
Received: from x-ray.optiplex-networks.com ([127.0.0.1])
 by localhost (x-ray.optiplex-networks.com [127.0.0.1]) (amavisd-new,
 port 10032)
 with ESMTP id yfZlbvE0K4kU; Thu,  6 Sep 2018 14:17:20 +0100 (BST)
Received: from localhost (localhost [127.0.0.1])
 by x-ray.optiplex-networks.com (Postfix) with ESMTP id D87EDA204EF;
 Thu,  6 Sep 2018 14:17:20 +0100 (BST)
X-Virus-Scanned: amavisd-new at x-ray.optiplex-networks.com
Received: from x-ray.optiplex-networks.com ([127.0.0.1])
 by localhost (x-ray.optiplex-networks.com [127.0.0.1]) (amavisd-new,
 port 10026)
 with ESMTP id 6WpEYnNmZwyT; Thu,  6 Sep 2018 14:17:20 +0100 (BST)
Received: from Sting-Ray.optiplex-networks.com (unknown [192.168.20.30])
 by x-ray.optiplex-networks.com (Postfix) with ESMTPSA id B59AEA200E8;
 Thu,  6 Sep 2018 14:17:20 +0100 (BST)
Subject: Re: iSCSI issues after upgrading to 11.2 x64 RELEASE
To: Eugene Grosbein <eugen@grosbein.net>, Ryan Moeller <ryan@ixsystems.com>,
 freebsd-net@freebsd.org
References: <541494c3-d275-dee2-ff5e-8b276ef8d9d6@gmail.com>
 <0C437CBE-E525-4277-9315-6205206CDBB7@ixsystems.com>
 <fe9b8786-171e-9639-6b64-686530d54492@grosbein.net>
From: Kaya Saman <kayasaman@optiplex-networks.com>
Message-ID: <351b8574-d936-7efc-782c-bd50e28fa784@optiplex-networks.com>
Date: Thu, 6 Sep 2018 14:17:20 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.0
MIME-Version: 1.0
In-Reply-To: <fe9b8786-171e-9639-6b64-686530d54492@grosbein.net>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US-large
Content-Transfer-Encoding: quoted-printable
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>;
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Sep 2018 13:17:34 -0000

So, the system semi hung again.... ctrl + alt + F(n) only thing that was=20
working. :-(


On 9/4/18 6:08 PM, Eugene Grosbein wrote:
> 04.09.2018 23:57, Ryan Moeller wrote:
>
>>> The NIC's are Intel based using igb kernel driver:
>>>
>>> igb0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 m=
tu 9000
>>> options=3D6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN=
_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
>> I see your MTU is 9000, and as described by the other thread you linke=
d to, there are issues with 9k jumbo cluster allocation.
>> Some detailed notes are here, but the quick summary is: set MTU < 4096
>> https://gist.github.com/freqlabs/eba9b755f17a223260246becfbb150a1


Yes MTU 9000, though it seems the 9k issues are related to FreeBSD=20
only?? - my other OS's (OpenBSD and Linux based) seem to be able to=20
handle the setting fine as I haven't experienced any issues with them.=20
However, their driver implementation or handling of things maybe quite=20
different so I cannot form a direct comparison.


Taking your advice and reading through the link I reset the MTU to 4000=20
after the 'hang' mentioned above, so far no issues:


24652/3428/28080 mbufs in use (current/cache/total)
0/1358/1358/1525810 mbuf clusters in use (current/cache/total/max)
0/1081 mbuf+clusters out of packet secondary zone in use (current/cache)
24648/129/24777/762905 4k (page size) jumbo clusters in use (current/cach=
e/total/max)
0/0/0/226045 9k jumbo clusters in use (current/cache/total/max)
0/0/0/127150 16k jumbo clusters in use (current/cache/total/max)
104755K/4089K/108844K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed


>>
>>> Can anyone suggest anything to stop my system from completely locking=
 up and becoming unresponsive?
>>> At the moment I'm not sure if switching to 'Stable' or 'Current' bran=
ches is a good solution?
>> The problem has been mitigated for a while on 12-CURRENT, so that migh=
t be worth trying. Otherwise I=E2=80=99ve been hoping a committer will pu=
t this fix in 11-STABLE, but in the meantime you could manually apply the=
 patch:
>> https://reviews.freebsd.org/D16534 <https://reviews.freebsd.org/D16534=
>
> Intel NIC users also should be aware of chip hardware problems while de=
aling with 9k MTU, like documented here:
> https://www.intel.com/content/dam/www/public/us/en/documents/specificat=
ion-updates/i218-i219-ethernet-connection-spec-update.pdf
>
> In short, Intel does not recommend MTU over 8500.


That's really interesting!


The card in the system is one of these 4 port ones:=20
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/e=
thernet-controller-i350-datasheet.pdf


For now I'll keep the mtu at 4000 then when 12 becomes RELEASE, I'll try=20
cranking it up again to see if the problem has been fixed; however, I=20
set up a cron job to mail me the output of 'netstat -m' so I can keep=20
track of the mbufs though it's probably going to be more useful at full=20
whack - meaning 9k then now were it seems the issue has been temporarily=20
alleviated....


>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"


Kaya




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F130DF9-40CA-45C9-944E-91E1CA2BF445>