From owner-freebsd-net@freebsd.org Tue Mar 27 19:00:04 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 24B93F4F065 for ; Tue, 27 Mar 2018 19:00:04 +0000 (UTC) (envelope-from reshadpatuck1@gmail.com) Received: from mail-pg0-x242.google.com (mail-pg0-x242.google.com [IPv6:2607:f8b0:400e:c05::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 958A07A739 for ; Tue, 27 Mar 2018 19:00:03 +0000 (UTC) (envelope-from reshadpatuck1@gmail.com) Received: by mail-pg0-x242.google.com with SMTP id f10so8748950pgs.9 for ; Tue, 27 Mar 2018 12:00:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:user-agent:in-reply-to:references:mime-version :content-transfer-encoding:subject:to:cc:from:message-id; bh=BqSzvKygyRum4uUiLBP3on1FVOEjB8A/YRxnsC8vOxw=; b=InAzPH+i0jqOCuShxTP9yOE9qi9UkQKynFAf346QMzs5+0aGOUY/mLEDTL/2fmAMXy 8vnPbcd6HbOI3OMalo+Vtz3eCvmqrRroder45tl5xnj/oo82NQRuYeIpkDgCk27aKytN 48nakWYFMzVRvr0RR+ziM748bvZ/GY5h9eK2jsw0Yk8NCmy+CkheffzHg7zm8iPEWAuv YQ89y2KE9SMsZAOlS7bLZJyxGrKoOQGxHW4Lld2LOhJKChtej4ZJWlJnO2GVxKBiVNhY dFza53CACMm/a70YxehAreP/Dfaw0cYczuS/xH7ffQJ5HhaIakqzl2Ao0MfG4T9iNN+z HhAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:user-agent:in-reply-to:references :mime-version:content-transfer-encoding:subject:to:cc:from :message-id; bh=BqSzvKygyRum4uUiLBP3on1FVOEjB8A/YRxnsC8vOxw=; b=hmzRcWrHOsErCfShyyDTa4Xobq6wUUEVk/+ISH/avODxuU/yC3GwS97CvRT20+JzB3 nA4m6tzq58U8YDEY8KumZ3ZTyotPIN9gW/bGrOhfhsAnU2mGBwe8zdnVSRXBlinkN/9J sD5welzeC3PkbB5YxkzV+LIYy2AQZZy/96NfJ/StKDuV14BJKEEetwKgHsundozofSDI qG7FDS06NhY6u6jYBedR3UIbOKS/6Ii6nZOJtXZD7wLQ0CbPgj04eVJ/lT0lovdLNINx 4smGuioh/iv+OPKm6GMjJYJ02GpnXiXUaOPn/3TocUDL7CXVlIdI1GzrUFh5t3tge10C 6mCQ== X-Gm-Message-State: AElRT7HRxx7ZbsaXxtYwEom9hmyeUSFLa43FXbkRSKsfIx6RukYHYqdh 9wGElKMGPqVKz06aiZ36g3VoydMj X-Google-Smtp-Source: AIpwx4/Sq0x+YiTrjLKO4r5hG1/tmvriwExMqxqMJU4GYrZ4KnG4fbIN+SjB3i196OpKoqMObUgEsg== X-Received: by 10.167.128.204 with SMTP id a12mr389030pfn.177.1522177202333; Tue, 27 Mar 2018 12:00:02 -0700 (PDT) Received: from [192.168.1.103] ([60.243.103.35]) by smtp.gmail.com with ESMTPSA id t66sm4431535pgc.0.2018.03.27.11.59.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 27 Mar 2018 12:00:01 -0700 (PDT) Date: Wed, 28 Mar 2018 00:29:51 +0530 User-Agent: K-9 Mail for Android In-Reply-To: <2D15ABDE-0C25-4C97-AEA6-0098459A2795@lists.zabbadoz.net> References: <71B1A1BD-6FCF-47BB-9523-CCAAC03799A5@sigsegv.be> <1563563.7DUcjoHYMp@reshadlaptop.patuck.net> <1D6101CD-BCB4-4206-838B-1A75152ACCC4@sigsegv.be> <38C78C2B-87D2-4225-8F4B-A5EA48BA5D17@patuck.net> <5803CAA2-DC4A-4E49-B715-6DE472088DDD@sigsegv.be> <9CAB4522-0B0A-42BF-B9A4-BF36AFC60286@patuck.net> <7202AFF2-A314-41FE-BD13-C4C77A95E106@sigsegv.be> <2D15ABDE-0C25-4C97-AEA6-0098459A2795@lists.zabbadoz.net> MIME-Version: 1.0 Subject: Re: [vnet] [epair] epair interface stops working after some time To: freebsd-net@freebsd.org, "Bjoern A. Zeeb" , Kristof Provost CC: FreeBSD Net ,Reshad Patuck From: Reshad Patuck Message-ID: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Mar 2018 19:00:04 -0000 Hi, =E2=80=8B @Kristof: The current value of 'net=2Elink=2Eepair=2Enetisr_maxqlen' is 2100, I will= make it 210=2E Will this require a reboot? or can I just change the sysctl and reload the= epair module? =E2=80=8B @Bjoern: here is the output to 'netstat -Q' ``` # netstat -Q Configuration: Setting Current Limit Thread count 1 1 Default queue limit 256 10240 Dispatch policy direct n/a Threads bound to CPUs disabled n/a =E2=80=8B Protocols: Name Proto QLimit Policy Dispatch Flags ip 1 256 flow default --- igmp 2 256 source default --- rtsock 3 256 source default --- arp 4 256 source default --- ether 5 256 source direct --- ip6 6 256 flow default --- epair 8 2100 cpu default CD- =E2=80=8B Workstreams: WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled 0 0 ip 0 30 11409267 0 0 13574317 24983409 0 0 igmp 0 0 0 0 0 0 0 0 0 rtsock 0 1 0 0 0 42 42 0 0 arp 0 0 61109751 0 0 0 61109751 0 0 ether 0 0 115098020 0 0 0 1150980= 20 0 0 ip6 0 10 36157577 0 0 4273274 40430846 0 0 epair 0 2100 0 0 210972 303785724 3037857= 24 ``` =E2=80=8B I still have access to a machine in this state, but will need to reset it = to a working state soon=2E =E2=80=8B Please let me know if there is any information you would like me to get fr= om this machine before I reset it=2E =E2=80=8B Best, =E2=80=8B Reshad On 27 March 2018 8:18:29 PM IST, "Bjoern A=2E Zeeb" wrote: >On 27 Mar 2018, at 14:40, Kristof Provost wrote: > >> (Re-cc freebsd-net, because this is useful information) >> >> On 27 Mar 2018, at 13:07, Reshad Patuck wrote: >>> The epair crash occurred again today running the epair module code=20 >>> with the added dtrace sdt providers=2E >>> =E2=80=8B >>> Running the same command as last time, 'dtrace -n ::epair\*:' >returns=20 >>> the following: >>> ``` >>> CPU ID FUNCTION:NAME >> =E2=80=A6 >>> 0 66499 epair_transmit_locked:enqueued >>> ``` >> >>> Looks like its filled up a queue somewhere and is dropping=20 >>> connections post that=2E >>> =E2=80=8B >>> The value of the 'error' is 55 I can see both the ifp and m structs=20 >>> but don't know what to look for in them=2E >>> >> That=E2=80=99s useful=2E Error 55 is ENOBUFS, which in IFQ_ENQUEUE() me= ans=20 >> we=E2=80=99re hitting _IF_QFULL()=2E >> There don=E2=80=99t seem to be counters for that drop though, so that m= akes=20 >> it hard to diagnose without these extra probe points=2E >> It also explains why you don=E2=80=99t really see any drop counters=20 >> incrementing=2E >> >> The fact that this queue is full presumably means that the other side > >> is not reading packets off it any more=2E >> That=E2=80=99s supposed to happen in epair_start_locked() (Look for the= =20 >> IFQ_DEQUEUE() calls)=2E >> >> It=E2=80=99s not at all clear to my how, but it looks like the receive = side=20 >> is not doing its work=2E >> >> It looks like the IFQ code is already a fallback for when the netisr=20 >> queue is full=2E >> That code might be broken, or there might be a different issue that=20 >> will just mean you=E2=80=99ll always end up in the same situation,=20 >> regardless of queue size=2E >> >> It=E2=80=99s probably worth trying to play with=20 >> =E2=80=98net=2Eroute=2Enetisr_maxqlen=E2=80=99=2E I=E2=80=99d recommend= *lowering* it, to see=20 >> if the problem happens more frequently that way=2E If it does it=E2=80= =99ll be=20 >> helpful in reproducing and trying to fix this=2E If it doesn=E2=80=99t = the=20 >> full queues is probably a consequence rather than a cause/trigger=2E >> (Of course, once you=E2=80=99ve confirmed that lowering the netisr_maxq= len=20 >> makes the problem more frequent go ahead and increase it=2E) > >netstat -Q will be useful >_______________________________________________ >freebsd-net@freebsd=2Eorg mailing list >https://lists=2Efreebsd=2Eorg/mailman/listinfo/freebsd-net >To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd=2Eorg"