From owner-freebsd-stable@freebsd.org Sun Jul 23 00:31:54 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8F0E7DB1D0D for ; Sun, 23 Jul 2017 00:31:54 +0000 (UTC) (envelope-from paul@ziemba.us) Received: from osmtp.ziemba.us (osmtp.ziemba.us [208.106.105.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3F6996B051 for ; Sun, 23 Jul 2017 00:31:54 +0000 (UTC) (envelope-from paul@ziemba.us) Received: from hairball.ziemba.us (localhost.ziemba.us [127.0.0.1]) by hairball.ziemba.us (8.15.2/8.15.2) with ESMTP id v6N0VUjW038716 for ; Sat, 22 Jul 2017 17:31:30 -0700 (PDT) (envelope-from paul@hairball.ziemba.us) Received: (from paul@localhost) by hairball.ziemba.us (8.15.2/8.15.2/Submit) id v6N0VQXu038709 for freebsd-stable@FreeBSD.org; Sat, 22 Jul 2017 17:31:26 -0700 (PDT) (envelope-from paul) Date: Sat, 22 Jul 2017 17:31:26 -0700 From: "G. Paul Ziemba" To: freebsd-stable@FreeBSD.org Subject: Re: stable/11 r321349 crashing immediately Message-ID: <20170723003126.GA83786@hairball.ziemba.us> References: <201707222012.v6MKCT95070706@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201707222012.v6MKCT95070706@gw.catspoiler.org> User-Agent: Mutt/1.6.1 (2016-04-27) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Jul 2017 00:31:54 -0000 On Sat, Jul 22, 2017 at 01:12:29PM -0700, Don Lewis wrote: > On 21 Jul, G. Paul Ziemba wrote: > >>Your best bet for a quick workaround for the stack overflow would be to > >>rebuild the kernel with a larger value of KSTACK_PAGES. You can find > >>teh default in /usr/src/sys//conf/NOTES. I bumped it from the default 4 to 5 in /boot/loader.conf: kern.kstack_pages=5 and that prevented this crash. Uptime 5.5 hours at this point (instead of 1.5 minutes). So what's the down-side of increasing kstack_pages? What if I made it 10? I see comments elsewhere about reducing space for user-mode threads but I'm not sure what that means in practical terms, or if there is some other overarching tuning parameter that should also be increased. > Page size is 4096. Ah, I forgot to count the 2^0 bit. > It's interesting that you are running into this on amd64. Usually i386 > is the problem child. Maybe stack frames are bigger due to 64-bit variables? (And of course we get paid mostly for adding code, not so much for removing it) > >>It would probably be a good idea to compute the differences in the stack > >>pointer values between adjacent stack frames to see of any of them are > >>consuming an excessive amount of stack space. For our collective amusement, I noted the stack pointer for each frame and calculated frame size and cumulative stack consumption. If there is some other stack overhead not shown in the trace, I can see it going over 0x4000: Frame Stack Pointer sz cumu function ----- ------------- --- ---- ---------------- 44 0xfffffe085cfa8a10 amd64_syscall 43 0xfffffe085cfa88b0 160 160 syscallenter 42 0xfffffe085cfa87f0 220 180 sys_execve 41 0xfffffe085cfa87c0 30 1B0 kern_execve 40 0xfffffe085cfa8090 730 8E0 do_execve 39 0xfffffe085cfa7ec0 1D0 AB0 namei 38 0xfffffe085cfa7d40 180 C30 lookup 37 0xfffffe085cfa7cf0 50 C80 VOP_LOOKUP 36 0xfffffe085cfa7c80 70 CF0 VOP_LOOKUP_APV 35 0xfffffe085cfa7650 630 1320 nfs_lookup 34 0xfffffe085cfa75f0 60 1380 VOP_ACCESS 33 0xfffffe085cfa7580 70 13F0 VOP_ACCESS_APV 32 0xfffffe085cfa7410 170 1560 nfs_access 31 0xfffffe085cfa7240 1D0 1730 nfs34_access_otw 30 0xfffffe085cfa7060 1E0 1910 nfsrpc_accessrpc 29 0xfffffe085cfa6fb0 B0 19C0 nfscl_request 28 0xfffffe085cfa6b20 490 1E50 newnfs_request 27 0xfffffe085cfa6980 1A0 1FF0 clnt_reconnect_call 26 0xfffffe085cfa6520 460 2450 clnt_vc_call 25 0xfffffe085cfa64c0 60 24B0 sosend 24 0xfffffe085cfa6280 240 26F0 sosend_generic 23 0xfffffe085cfa6110 170 2860 tcp_usr_send 22 0xfffffe085cfa5ca0 470 2CD0 tcp_output 21 0xfffffe085cfa5900 3A0 3070 ip_output 20 0xfffffe085cfa5880 80 30F0 looutput 19 0xfffffe085cfa5800 80 3170 if_simloop 18 0xfffffe085cfa57d0 30 31A0 netisr_queue 17 0xfffffe085cfa5780 50 31F0 netisr_queue_src 16 0xfffffe085cfa56f0 90 3280 netisr_queue_internal 15 0xfffffe085cfa56a0 50 32D0 swi_sched 14 0xfffffe085cfa5620 80 3350 intr_event_schedule_thread 13 0xfffffe085cfa55b0 70 33C0 sched_add 12 0xfffffe085cfa5490 120 34E0 sched_pickcpu 11 0xfffffe085cfa5420 70 3550 sched_lowest 10 0xfffffe085cfa52a0 180 36D0 cpu_search_lowest 9 0xfffffe085cfa52a0 0 36D0 cpu_search 8 0xfffffe085cfa5120 180 3850 cpu_search_lowest 7 0xfffffe085cfa5120 0 3850 cpu_search 6 0xfffffe085cfa4fa0 180 39D0 cpu_search_lowest 5 0xfffffe0839778f40 signal handler -- G. Paul Ziemba FreeBSD unix: 4:36PM up 5:28, 8 users, load averages: 6.53, 7.79, 7.94