From owner-freebsd-net@freebsd.org Tue Oct 18 16:23:20 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C05BC16C59 for ; Tue, 18 Oct 2016 16:23:20 +0000 (UTC) (envelope-from peixotocassiano@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 761EE309 for ; Tue, 18 Oct 2016 16:23:20 +0000 (UTC) (envelope-from peixotocassiano@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 7262AC16C57; Tue, 18 Oct 2016 16:23:20 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 71CB2C16C56 for ; Tue, 18 Oct 2016 16:23:20 +0000 (UTC) (envelope-from peixotocassiano@gmail.com) Received: from mail-qk0-x22c.google.com (mail-qk0-x22c.google.com [IPv6:2607:f8b0:400d:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1F310308 for ; Tue, 18 Oct 2016 16:23:20 +0000 (UTC) (envelope-from peixotocassiano@gmail.com) Received: by mail-qk0-x22c.google.com with SMTP id n189so301096447qke.0 for ; Tue, 18 Oct 2016 09:23:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=q0rWp310sjm7IqDAc5fubBBvLPIV2GYOrG88es6GEMk=; b=cxrA4CT0lXoPtUPOSCQUpAVV308/9Vsk27oAqKqg1h6aNJVXul7fB8qdgH7fVk5qRn Ge345FwVx32M7yFf9Viw8jcaVmLIYL4OzNMWPdnGxKDvIj//a0ZfWZ7XoQkOSvr+MLrP tizHCTok84ieTLVSX18MrbKpTx2Lthu9VOWB8q8Mip0r/NpYaaI2ER0dTvfzV0JUSozf tTuGvvptTycBjmC/VIjSdOpSO+DYA8ZwZazlnc1imgMcogSGot5wDnMJPYJ8GYYwXO+6 +KsTvFyDlZkez7N0u5B+mTn1aBxY6cjjIIeNmH6PcQpiTNMOcDFtf6pT7l44OlPjcew8 zqdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=q0rWp310sjm7IqDAc5fubBBvLPIV2GYOrG88es6GEMk=; b=i202rJ9ObizCkTFzu4S+1C90U5UHc77a948Fqm4PsOeQe+OHSHgd/rtQCZxp2yFlua WF6uU60ltvJ5tZWTWsnxrH6DsAlWmPHNiTTNXpdEMTgRTP2w3jQJ0grIE4RRbRVoQ5t2 JZ2lCAt9LaPcHbfdJUznlfv10DAHXEcZxFNCnJT0dVA52ml/3jyU7YE6gAYBCZOFeeDT zlhMtUCHCFYhXHgC/ERnayei0uHNmHD3QQbcM5JMEWiU9MkejOJMoEkXEDm6h+yGA+y0 jy0+kJdKP/z5FXUlZjvyx63jihTKKk9ctTFMw3d4yBkx8RELpub1qeKnpSUNHcyhlaLE j7Vg== X-Gm-Message-State: AA6/9Rmzmo0CHME7lFoc0jdqY5ZMbqucv2AErkhSjV8XF8lbsavOn5wV7vThcU1/amUemps7VSSSr7ob8f2elw== X-Received: by 10.55.25.211 with SMTP id 80mr1321783qkz.230.1476807798855; Tue, 18 Oct 2016 09:23:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.166.26 with HTTP; Tue, 18 Oct 2016 09:23:18 -0700 (PDT) In-Reply-To: References: <57FC859F.5000200@grosbein.net> <2033449965.65391.1476244568309@mail.yahoo.com> <86183ea5-5855-5fb3-22f6-d25454859186@yahoo.com> <958e01c2-8459-9614-ddd6-d0953fc86c02@yahoo.com> From: Cassiano Peixoto Date: Tue, 18 Oct 2016 14:23:18 -0200 Message-ID: Subject: Re: FreeBSD10.3-RELEASE. Kernel panic. To: Donald Baud Cc: "net@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2016 16:23:20 -0000 Hi guys, I have some update about this issue. After my last email i had 3 crashes. Two of them had the same message on kernel debug: (kgdb) list *0xffffffff8228c918 0xffffffff8228c918 is in trim_map_seg_compare (/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:108). 103 trim_map_seg_compare(const void *x1, const void *x2) 104 { 105 const trim_seg_t *s1 =3D x1; 106 const trim_seg_t *s2 =3D x2; 107 108 if (s1->ts_start < s2->ts_start) { 109 if (s1->ts_end > s2->ts_start) 110 return (0); 111 return (-1); 112 } Current language: auto; currently minimal (kgdb) bt #0 doadump (textdump=3D) at pcpu.h:221 #1 0xffffffff80ad8e69 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80ad941b in vpanic (fmt=3D, ap=3D) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80ad9253 in panic (fmt=3D0x0) at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff80fa0d31 in trap_fatal (frame=3D0xfffffe02374957f0, eva=3D4294967343) at /usr/src/sys/amd64/amd64/trap.c:841 #5 0xffffffff80fa0f23 in trap_pfault (frame=3D0xfffffe02374957f0, usermode=3D0) at /usr/src/sys/amd64/amd64/trap.c:691 #6 0xffffffff80fa04cc in trap (frame=3D0xfffffe02374957f0) at /usr/src/sys/amd64/amd64/trap.c:442 #7 0xffffffff80f84141 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff8228c918 in trim_map_seg_compare (x1=3D0xfffffe0237495920, x2=3D0x100000007) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:108 #9 0xffffffff821a98e1 in avl_find (tree=3D, value=3D, where=3D0x0) at /usr/src/sys/cddl/contrib/opensolaris/common/avl/avl.c:268 #10 0xffffffff8228ce9e in trim_map_write_start (zio=3D= ) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:363 #11 0xffffffff822592df in zio_vdev_io_start (zio=3D0xfffff802191ea000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2866 #12 0xffffffff82255b26 in zio_execute (zio=3D) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1556 #13 0xffffffff822551e9 in zio_nowait (zio=3D0xfffff802191ea000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1610 #14 0xffffffff8223c738 in vdev_queue_io_done (zio=3D) = at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:884 #15 0xffffffff822594a9 in zio_vdev_io_done (zio=3D0xfffff8006daad000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2895 #16 0xffffffff82255b26 in zio_execute (zio=3D) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1556 #17 0xffffffff80b363ca in taskqueue_run_locked (queue=3D) at /usr/src/sys/kern/subr_taskqueue.c:449 #18 0xffffffff80b372d8 in taskqueue_thread_loop (arg=3D) at /usr/src/sys/kern/subr_taskqueue.c:703 #19 0xffffffff80a90055 in fork_exit (callout=3D0xffffffff80b371f0 , arg=3D0xfffff8001006b920, frame=3D0xfffffe0237495c= 00) at /usr/src/sys/kern/kern_fork.c:1038 #20 0xffffffff80f8467e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #21 0x0000000000000000 in ?? () (kgdb) up 8 #8 0xffffffff8228c918 in trim_map_seg_compare (x1=3D0xfffffe0237495920, x2=3D0x100000007) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:108 108 if (s1->ts_start < s2->ts_start) { But my last crash had a different message: (kgdb) list *0xffffffff80b3a89c 0xffffffff80b3a89c is in turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837). 832 833 /* 834 * Transfer the blocked list to the pending list. 835 */ 836 mtx_lock_spin(&td_contested_lock); 837 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], td_lockq); 838 mtx_unlock_spin(&td_contested_lock); 839 840 /* 841 * Give a turnstile to each thread. The last thread gets Current language: auto; currently minimal (kgdb) bt #0 doadump (textdump=3D) at pcpu.h:221 #1 0xffffffff80ad8e69 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80ad941b in vpanic (fmt=3D, ap=3D) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80ad9253 in panic (fmt=3D0x0) at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff80fa0d31 in trap_fatal (frame=3D0xfffffe0237384870, eva=3D48)= at /usr/src/sys/amd64/amd64/trap.c:841 #5 0xffffffff80fa0f23 in trap_pfault (frame=3D0xfffffe0237384870, usermode=3D0) at /usr/src/sys/amd64/amd64/trap.c:691 #6 0xffffffff80fa04cc in trap (frame=3D0xfffffe0237384870) at /usr/src/sys/amd64/amd64/trap.c:442 #7 0xffffffff80f84141 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff80b3a89c in turnstile_broadcast (ts=3D0x0, queue=3D1) at /usr/src/sys/kern/subr_turnstile.c:837 #9 0xffffffff80ad48cf in __rw_wunlock_hard (c=3D0xfffff8024f3c2960, tid=3D, file=3D, line=3D) at /usr/src/sys/kern/kern_rwlock.c:1027 #10 0xffffffff80e1a75c in vm_map_delete (map=3D, start=3D, end=3D) at /usr/src/sys/vm/vm_map.c:2960 #11 0xffffffff80e1828e in vmspace_exit (td=3D) at /usr/src/sys/vm/vm_map.c:3077 #12 0xffffffff80a88686 in exit1 (td=3D0xfffff80015533a00, rval=3D268849920, signo=3D0) at /usr/src/sys/kern/kern_exit.c:398 #13 0xffffffff80a87e1d in sys_sys_exit (td=3D0x0, uap=3D) at /usr/src/sys/kern/kern_exit.c:178 #14 0xffffffff80fa168e in amd64_syscall (td=3D, traced=3D0) at subr_syscall.c:135 #15 0xffffffff80f8442b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #16 0x0000000800b661aa in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) up 8 #8 0xffffffff80b3a89c in turnstile_broadcast (ts=3D0x0, queue=3D1) at /usr/src/sys/kern/subr_turnstile.c:837 837 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], td_lockq); As you can see we are dealing with random crashes. I feel i'm not moving forward here. it's not a hardware problem because i have 3 different servers with same issue. Donald, did you have a chance to try 11-RELEASE? Any other behavior? Anyone have some idea that could help? Thanks. On Thu, Oct 13, 2016 at 12:24 PM, Cassiano Peixoto < peixotocassiano@gmail.com> wrote: > Hi guys, > > First of all, thanks to share your thoughts about this issue. I think it= =E2=80=99s > really important to find out a solution for this issue together. > > I can see two behaviors related, but for me the root cause is the same: > > 1- mpd5 process stuck with umtxn flag > 2- system crash > > I=E2=80=99ve tested recently on FreeBSD 10.3 and FreeBSD-11-RC3. I=E2=80= =99ve tried all > suggested tunings with no success. > > My environment is: > - About 430 clients connected (but i can add more) > - Using ZFS > - igb NICs. > - Generic kernel > > Two days ago i updated my system to FreeBSD 11-RELEASE-p1 and after this > my system seems stable for almost 3 days. No crashes anymore. I need more > days to feel confident if something has changed. But anyway, my crashes > before happened every day. > > If it crashs again i=E2=80=99ll apply Donald recommendation and let you g= uys know. > > Let=E2=80=99s keep in touch, to try to at last fix it. > > Thanks. > > On Wed, Oct 12, 2016 at 8:24 PM, Donald Baud via freebsd-net < > freebsd-net@freebsd.org> wrote: > >> On 10/12/16 3:24 PM, Zaphod Beeblebrox wrote: >> >> While my mp5 servers are possibly less busy (I havn't had common >>> crashes), I have noticed a "group" of problems. >>> >>> 1. The carrier dropping communication (ie: fiber cut or l2 switch >>> breakage) of the L2TP streams can leave mpd5 in a state where it will n= ot >>> die and will not destroy interfaces (requires reboot to clear). >>> >> I've encountered that once on 10.3 and I had tweaked some sysctl values >> while monitoring : >> > vmstat -z | head -1; vmstat -z | grep -i netgraph >> >> you might want to search other people's experience with the following >> values: >> # net.graph.maxdgram #this is set in /etc/sysctl.conf >> # net.graph.recvspace #this is set in /etc/sysctl.conf >> # net.graph.maxdata #this is set in /boot/loader.conf >> # net.graph.maxalloc #this is set in /boot/loader.conf >> >> I'll leave others to comment on what's best to set as values with their >> experience on FreeBSD10.3. >> In my case, as I had explained, one of the recipes that worked for me is >> to comment out and leave those kernel values to their default. >> >> I've read in mpd5 mailing list some saying that FreeBSD-11 have had >> upgrades on the netgraph modules. >> I am now using FreeBSD-11 and It looks like I don't need any of the >> kernel tweaks that I've described. >> >> Also, may I suggest you troubleshoot the fiber-cut or L2 switch breakage >> by playing with some ipfw values to simulate a fiber-cut.: >> ex: ipfw add 100 deny ip from 10.10.10.10 to me >> >>> 2. There are race conditions between quagga and mpd5 for adding/droppin= g >>> routes. >>> >> While troubleshooting the crashes of the mpd5, I have removed net/quagga >> and installed net/bird instead. >> I am now using net/bird I've written a little howto to get you started >> with net/bird >> see: https://forums.freebsd.org/threads/56988/ >> >> 3. if A is a pppoe client and B is the mpd5 server, A cannot access TCP >>> services on B. It can access tcp services _beyond_ B, but not on B. (t= here >>> is a ticket open for this). >>> >>> On Wed, Oct 12, 2016 at 10:51 AM, Donald Baud via freebsd-net < >>> freebsd-net@freebsd.org > wrote: >>> >>> >>> On 10/12/16 1:13 AM, Julian Elischer wrote: >>> >>> On 11/10/2016 8:56 PM, Donald Baud via freebsd-net wrote: >>> >>> I've been plagued with these =3Ddaily=3D panics until I tri= ed >>> the following recipes and the server has been up for 30 >>> days so far: >>> >>> Normally I should expermient more to see which one of the >>> receipes is really the fix, but I'm just glad that the >>> server is stable for now. >>> >>> >>> this is really great information. >>> It makes debugging a lot more possible. >>> I know it is a hard question, but do you have a way to >>> simulate this workload? >>> >>> I have no real way to simulate this kind of workload >>> >>> >>> Sadly, I don't have a way to simulate the workload but I am very >>> interested to help fix these crashes since as Cassiano said, this >>> makes mpd5/freebsd useless for pppoe/l2tp termination. >>> >>> At this point, I would suggest that Cassiano and =D0=90=D0=BD=D0=B4= =D1=80=D0=B5=D0=B9 confirm >>> that they don't get panics when they apply the recipes that I am >>> using. >>> >>> I am still running many other cisco-vpdn gateways that I would >>> convert into mpd5/freebsd but my plan was stalled with the daily >>> crashes. >>> I'll wait a couple of weeks to be sure that my recipes are a valid >>> workaround before converting my remaining cisco gateways to mpd5. >>> >>> -Dbaud >>> >>> >>> >>> recipe-1: Don't let mpd5 start automatically when server >>> boots: >>> i.e. in: /etc/rc.conf >>> mpd5_enable=3D"NO" >>> and wait about 5 minutes after server boots then issue: >>> /usr/local/etc/rc.d/mpd5 onestart >>> >>> >>> recipe-2: recompile the kernel with the NETGRAPH_DEBUG >>> option: >>> options NETGRAPH >>> options NETGRAPH_DEBUG >>> options NETGRAPH_KSOCKET >>> options NETGRAPH_L2TP >>> options NETGRAPH_SOCKET >>> options NETGRAPH_TEE >>> options NETGRAPH_VJC >>> options NETGRAPH_PPP >>> options NETGRAPH_IFACE >>> options NETGRAPH_MPPC_COMPRESSION >>> options NETGRAPH_MPPC_ENCRYPTION >>> options NETGRAPH_TCPMSS >>> options IPFIREWALL >>> >>> recipe-3: recompile the kernel and disable the IPv6 and >>> SCTP options: >>> nooptions INET6 >>> nooptions SCTP >>> >>> recipe-4: Don't use any of the sysctl optimizations >>> in other words I commented out all values in sysctl.conf: >>> # net.graph.maxdgram=3D20480 (this is the default) >>> # net.graph.recvspace=3D20480 (this is the default) >>> >>> recipe-5: Don't use any of the loader.conf optimizations >>> in other words I commented out all values in loader.conf >>> # net.graph.maxdata=3D4096 (this is the default) >>> # net.graph.maxalloc=3D4096 (this is the default) >>> >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> In my case, I had the panics with 10.3 and 11-PRERELEASE >>> 11.0-PRERELEASE FreeBSD 11.0-PRERELEASE #2 r305587 >>> >>> With those recipes, I have been running without any crash >>> for a month and counting. Thats' 300 l2tp tunnels and >>> 1400 l2tp sessions generating 700Mbit/s. >>> >>> >>> -DBaud >>> >>> >>> On Tuesday, October 11, 2016 7:30 AM, Cassiano Peixoto >>> >> > wrote: >>> Hi, >>> >>> There are many users complaining about this: >>> >>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D186114 >>> >>> >>> I've been dealing with this issue for one year with no >>> solution. mpd5 as >>> pppoe server on FreeBSD is useless with this bug. >>> >>> I really would like to see it working again, i think it's >>> quite important >>> to both project and many users. >>> >>> Thanks. >>> >>> On Tue, Oct 11, 2016 at 3:24 AM, Eugene Grosbein >>> > wrote: >>> >>> 11.10.2016 11:02, =D0=90=D0=BD=D0=B4=D1=80=D0=B5=D0=B9 = =D0=9B=D0=B5=D1=83=D1=88=D0=BA=D0=B8=D0=BD =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >>> >>> Hello. I have problem with "FreeBSD nas >>> 10.3-RELEASE FreeBSD 10.3-RELEASE >>> #0: Fri Oct 7 21:12:56 YEKT 2016 >>> nas@nas:/usr/obj/usr/src/sys/nasv3 >>> amd64" >>> >>> Kernel panic is repeated at intervals of 2-3 days. >>> At first I thought that >>> the problem is in the hardware, but the problem >>> did not go away after >>> replacing the server platform. >>> >>> Coredumps and more info on link >>> https://drive.google.com/open? >>> id=3D0BxciMy2q7ZjTTkIxem9wTE1tM2M >>> >> ?id=3D0BxciMy2q7ZjTTkIxem9wTE1tM2M> >>> >>> Sorry for my english. >>> I'll wait for an answer. >>> >>> This is known and long-stanging problem in the FreeBSD >>> network stack. >>> It shows up when you have lots of network interfaced >>> created/removed >>> frequently >>> like in your case of Network Access Server (PPtP, >>> PPPoE etc). >>> >>> Generally, people run into this problem using mpd5 >>> network daemon. >>> mpd5 uses NETGRAPH kernel subsystem to process traffic >>> and >>> if an interface disappears (f.e., ,user disconnected) >>> while kernel still processes traffic obtained from >>> this interface, it >>> panices. >>> >>> There were lots of reports of this problem. Noone >>> seems to be working on >>> it at the moment. >>> You should fill a PR using Bugzilla and attach your >>> logs to it. >>> >>> Eugene Grosbein >>> >>> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> > >