From owner-freebsd-stable@freebsd.org Mon Jul 17 09:54:18 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2DF28D6DF1E for ; Mon, 17 Jul 2017 09:54:18 +0000 (UTC) (envelope-from vlad-fbsd@acheronmedia.com) Received: from mx.irealone.hr (mx.irealone.hr [IPv6:2a01:4f8:212:2d90::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EA34E7EBF1 for ; Mon, 17 Jul 2017 09:54:17 +0000 (UTC) (envelope-from vlad-fbsd@acheronmedia.com) Received: by mx.irealone.hr (Postfix, from userid 58) id 14DC37FAA; Mon, 17 Jul 2017 11:54:07 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on postfix.xoth.irealone.hr X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 Received: from mail.irealone.com (unknown [10.0.0.10]) by mx.irealone.hr (Postfix) with ESMTP id BFC577FA6 for ; Mon, 17 Jul 2017 11:54:06 +0200 (CEST) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 17 Jul 2017 11:54:06 +0200 From: "Vlad K." To: freebsd-stable@freebsd.org Subject: stack_guard hardening bsdinstall option in STABLE and 11.1 Organization: Acheron Media Message-ID: X-Sender: vlad-fbsd@acheronmedia.com User-Agent: Roundcube Webmail/1.2.5 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 09:54:18 -0000 Hello list, the stack_guard hardening option in bsdinstall is now setting 512 pages of it in CURRENT, as of r320674. It's said to MFC after 1 day (on Jul 5th), but STABLE hasn't got it yet. Is this simply an omission (understandable as the RELEASE is being prepared so things are a bit hectic I guess), or is there another reason? Can we assume that in 11.1 the sysctl is integer and can we safely set >1 number of pages, say 512 like the installer in CURRENT suggests? Thanks! -- Vlad K. From owner-freebsd-stable@freebsd.org Mon Jul 17 10:25:07 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 02D14D76C3D for ; Mon, 17 Jul 2017 10:25:07 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8E9BD7FF00 for ; Mon, 17 Jul 2017 10:25:06 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v6HAOx7n090310 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 17 Jul 2017 13:24:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v6HAOx7n090310 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v6HAOxXh090309; Mon, 17 Jul 2017 13:24:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 17 Jul 2017 13:24:59 +0300 From: Konstantin Belousov To: "Vlad K." Cc: freebsd-stable@freebsd.org Subject: Re: stack_guard hardening bsdinstall option in STABLE and 11.1 Message-ID: <20170717102459.GJ1935@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 10:25:07 -0000 On Mon, Jul 17, 2017 at 11:54:06AM +0200, Vlad K. wrote: > Hello list, > > the stack_guard hardening option in bsdinstall is now setting 512 pages > of it in CURRENT, as of r320674. It's said to MFC after 1 day (on Jul > 5th), but STABLE hasn't got it yet. Is this simply an omission > (understandable as the RELEASE is being prepared so things are a bit > hectic I guess), or is there another reason? > > Can we assume that in 11.1 the sysctl is integer and can we safely set > >1 number of pages, say 512 like the installer in CURRENT suggests? Default stack size on 32bit platforms is 2M. I left it to you as an excercise to guess what happens with the setting applied. For 64bit machines, default stack size is 4M, so there the failure mode is somewhat more involved. Anyway, this option is almost equivalent to executing 'rm /lib/libthr.so.3', perhaphs rm is even beter. SECURITY ! HARDENING ! From owner-freebsd-stable@freebsd.org Mon Jul 17 13:34:02 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 953E8D7F724 for ; Mon, 17 Jul 2017 13:34:02 +0000 (UTC) (envelope-from gjb@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6EDEC1938; Mon, 17 Jul 2017 13:34:02 +0000 (UTC) (envelope-from gjb@FreeBSD.org) Received: from FreeBSD.org (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by freefall.freebsd.org (Postfix) with ESMTPS id 8439614B4D; Mon, 17 Jul 2017 13:34:01 +0000 (UTC) (envelope-from gjb@FreeBSD.org) Date: Mon, 17 Jul 2017 13:33:59 +0000 From: Glen Barber To: "Vlad K." Cc: freebsd-stable@freebsd.org Subject: Re: stack_guard hardening bsdinstall option in STABLE and 11.1 Message-ID: <20170717133359.GP16843@FreeBSD.org> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="YolNsh7G+K7zsfIR" Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD 11.0-STABLE amd64 X-SCUD-Definition: Sudden Completely Unexpected Dataloss X-SULE-Definition: Sudden Unexpected Learning Event X-PEKBAC-Definition: Problem Exists, Keyboard Between Admin/Computer X-Spidey-Sense: Uh oh, Peter logged in User-Agent: Mutt/1.8.2 (2017-04-18) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 13:34:02 -0000 --YolNsh7G+K7zsfIR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 17, 2017 at 11:54:06AM +0200, Vlad K. wrote: > Hello list, >=20 > the stack_guard hardening option in bsdinstall is now setting 512 pages of > it in CURRENT, as of r320674. It's said to MFC after 1 day (on Jul 5th), = but > STABLE hasn't got it yet. Is this simply an omission (understandable as t= he > RELEASE is being prepared so things are a bit hectic I guess), or is there > another reason? >=20 > Can we assume that in 11.1 the sysctl is integer and can we safely set >1 > number of pages, say 512 like the installer in CURRENT suggests? >=20 No, this is not available in the 11.1 installer. Glen --YolNsh7G+K7zsfIR Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEjRJAPC5sqwhs9k2jAxRYpUeP4pMFAllsvMcACgkQAxRYpUeP 4pMJ8hAAnjDJQkVpsRsPh6CF738V65UMgOMkjDmgeE76C0Z9IsaOHvfbsOfFXurs uuL62YnobmhrPJ+iyOU7V3gWynX1LLYsugVZ3xfDmqo0UXqn+OHZy4gN8+wwfMGM I8pju+7+Hw5e4q0m7OyGsezN0A2WTTg7J7OtaTv4k3iKr08yTCDdTOdYR34DclOz a/i26tQbDVKWfKnGFgGCXaXqKuAsQ39qZBJV3e3qlxOibaJB4UfoVhAyXUrCML9p d0VyY2vQb37BJ4FQ+IqCmirvEvEO3QGT/WR53tnnzs67zhUSfu7iXDRrvw5I36wg HMp6I5rr5t8HmkgcJkzj8x646NZzHfSYzhlHnRY7oS6LM8KDdhLuUZJmhXkJLcVv 9FpJpK3biSPzTqU82PWb8+wer3+rsT01bdJW7Ua7eb0kIMQqhi6jmu/uKW1sRqFp 9zT3RS5uQcvBw7ha+1Y4c67JGDgh8aRi7+kfcd7IQCvntgee/4pCtEXUqWK6dq35 tSTbaNpYt8FSzG8RNk4ZxHSwh7dFMTsaHso4ZpxjXjgJXzSZb8VOP7H9UpAY1Ce6 9nG9RhBaMZ7tnY2SV/zec9Q6l5EwtYUx8uk/e89UddalwhEQhthmVwlNFnW90fut ZW+MfOqudhHkYftrKI0AM/db2XgbgNZs9zH4qTkfQbtNOSq8k88= =9ChI -----END PGP SIGNATURE----- --YolNsh7G+K7zsfIR-- From owner-freebsd-stable@freebsd.org Mon Jul 17 13:47:14 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 29B8DD7FAB1 for ; Mon, 17 Jul 2017 13:47:14 +0000 (UTC) (envelope-from vlad-fbsd@acheronmedia.com) Received: from mx.irealone.hr (xoth.irealone.hr [136.243.79.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E4F17205C for ; Mon, 17 Jul 2017 13:47:13 +0000 (UTC) (envelope-from vlad-fbsd@acheronmedia.com) Received: by mx.irealone.hr (Postfix, from userid 58) id 14AB67626; Mon, 17 Jul 2017 15:47:09 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on postfix.xoth.irealone.hr X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 Received: from mail.irealone.com (unknown [10.0.0.10]) by mx.irealone.hr (Postfix) with ESMTP id AD5DE7622 for ; Mon, 17 Jul 2017 15:47:08 +0200 (CEST) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 17 Jul 2017 15:47:08 +0200 From: "Vlad K." To: freebsd-stable@freebsd.org Subject: Re: stack_guard hardening bsdinstall option in STABLE and 11.1 Organization: Acheron Media In-Reply-To: <20170717133359.GP16843@FreeBSD.org> References: <20170717133359.GP16843@FreeBSD.org> Message-ID: <61f79801976fab6770471cd3e2359652@acheronmedia.com> X-Sender: vlad-fbsd@acheronmedia.com User-Agent: Roundcube Webmail/1.2.5 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 13:47:14 -0000 On 2017-07-17 15:33, Glen Barber wrote: > > No, this is not available in the 11.1 installer. > > Glen Thanks but that's why I asked why's that. r320674 said MFC after 1 day. Is it too late for 11.1-RELEASE, so it'll be applied to 11-STABLE, or is there another reason? If its' too late, does that mean it's too late for the installer, but the new stack_guard code is there in STABLE and I am guessing will be part of 11.1, so we can assume the sysctl to be an integer (as opposed to enable/disable semantics of the sysctl in 11.0)? In other words, is it safe to ramp up the gap size in 11.1? -- Vlad K. From owner-freebsd-stable@freebsd.org Mon Jul 17 14:03:47 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 027CBD7FE3B for ; Mon, 17 Jul 2017 14:03:47 +0000 (UTC) (envelope-from fiona.jackson@mypioneersolutions.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id D292628CD for ; Mon, 17 Jul 2017 14:03:46 +0000 (UTC) (envelope-from fiona.jackson@mypioneersolutions.com) Received: by mailman.ysv.freebsd.org (Postfix) id D1E20D7FE3A; Mon, 17 Jul 2017 14:03:46 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D1723D7FE39 for ; Mon, 17 Jul 2017 14:03:46 +0000 (UTC) (envelope-from fiona.jackson@mypioneersolutions.com) Received: from mail-pf0-x230.google.com (mail-pf0-x230.google.com [IPv6:2607:f8b0:400e:c00::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AA02328CC for ; Mon, 17 Jul 2017 14:03:46 +0000 (UTC) (envelope-from fiona.jackson@mypioneersolutions.com) Received: by mail-pf0-x230.google.com with SMTP id e26so9137192pfd.0 for ; Mon, 17 Jul 2017 07:03:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mypioneersolutions-com.20150623.gappssmtp.com; s=20150623; h=date:to:from:subject:message-id:mime-version :content-transfer-encoding; bh=oB2XFvhIuPsvIjd/0JssbxHpyvWl94x4l6jneL/jkhM=; b=PiYTC7VEHbASpuxO6YKQM9pxlfBSKe9J2CpYQaRqXEAjz2Z5vtUPZ7a/4WTt0XXOQE QPUcPUxLV4m/7c2B4irQdFSelHYvuHHyVWmNblQRA+2LP7OnNYdK5BRdAiFOG10mK+oY O21Jkvwokf1HROpePKVGSV/6a9KE8A3f0G3rEmK6dW59Z9lkIjvJ2TgV9pluRXXTLCWu xRWVhcRzswuqNaRZNxKxgaHeuGayiI+V6aVYRrpJGrZhXaHm7+MCpY2lsxvyCmrqg/Ei CMq5jQq1GhFnK9F+o8gQALBEIwQu4k3iT0t0xGQY0QFc3skxcI6U9NWWI/qpYiLY70ay EC4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:to:from:subject:message-id:mime-version :content-transfer-encoding; bh=oB2XFvhIuPsvIjd/0JssbxHpyvWl94x4l6jneL/jkhM=; b=cbX2yVM/SSPyWEEI7qbdaTwbvkNiO7pDrjKIkGVFnY0jQjJ82bNPQt+lakocy1HfjK 3SE0qo3Xhy7mxL9+ipIJcSrONDXB+e2KRPEMJ0gA6GKP+xR7Yth9uG5CPzm2kBsx5W6s MmvQT199ogmDElfz1AaXG8Y9dd/Pg5JiFFZ3pWl7eOghO7Dyjf7Au7owiNSx+bcTrAUU mcys9yPVO17+HNnBn3P+mLFA5IeKMQz/ZKltGF+ynrjW41r3f/7RlOJ2wljHi+DmRsfN 2HIb33tASJTRYqWDVST1NNsVxoUi2kqs8N71lbfpzt03ZzoxGpjMHHismynYqNRmC3vb xTEA== X-Gm-Message-State: AIVw110W4NL5ELt1A5xrPnbCmh5/NT3JjVuSlccxqo29yCETu298LtdU qjwSzUxSvN/8ZFfV8bGfuA== X-Received: by 10.99.102.68 with SMTP id a65mr28631725pgc.252.1500300225357; Mon, 17 Jul 2017 07:03:45 -0700 (PDT) Received: from mail-merge.com (mail-merge.com. [188.166.245.54]) by smtp.gmail.com with ESMTPSA id e64sm10236220pfb.129.2017.07.17.07.03.43 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 17 Jul 2017 07:03:44 -0700 (PDT) Date: Mon, 17 Jul 2017 10:03:41 -0400 To: stable@freebsd.org From: fiona.jackson@mypioneersolutions.com Subject: Oracle SCM users list. Message-ID: <62c70cc9a69385c859bc8deb7888eafe@mail-merge.com> X-Mailer: Mail Merge (https://mail-merge.com/) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 14:03:47 -0000     Hi,   We have updated our Oracle SCM users list and thought you would be interested in receiving fresh and accurate contacts for your marketing campaign. We provide data across the globe so you can target your audience and reach to a wider market and increase your lead flow.   We can filter the list according to your requirement as we have a dedicated data team of 200 members who verify the contacts and keep them updated.   We can also provide you with other technologies such as SAP Logistics, Infor SCM, Epicor WMS and many more. let me know your requirement and I will get back to you with more information regarding the same.   Thank you for your valuable time look forward to hear from you.   Regards Fiona Jackson   To OPT-OUT please reply Leave OUT in subject line.       From owner-freebsd-stable@freebsd.org Mon Jul 17 14:11:17 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EC748D880C4 for ; Mon, 17 Jul 2017 14:11:17 +0000 (UTC) (envelope-from gjb@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C57662D24; Mon, 17 Jul 2017 14:11:17 +0000 (UTC) (envelope-from gjb@FreeBSD.org) Received: from FreeBSD.org (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by freefall.freebsd.org (Postfix) with ESMTPS id D94CF1525D; Mon, 17 Jul 2017 14:11:16 +0000 (UTC) (envelope-from gjb@FreeBSD.org) Date: Mon, 17 Jul 2017 14:11:15 +0000 From: Glen Barber To: "Vlad K." Cc: freebsd-stable@freebsd.org Subject: Re: stack_guard hardening bsdinstall option in STABLE and 11.1 Message-ID: <20170717141115.GQ16843@FreeBSD.org> References: <20170717133359.GP16843@FreeBSD.org> <61f79801976fab6770471cd3e2359652@acheronmedia.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="TSQPSNmi3T91JED+" Content-Disposition: inline In-Reply-To: <61f79801976fab6770471cd3e2359652@acheronmedia.com> X-Operating-System: FreeBSD 11.0-STABLE amd64 X-SCUD-Definition: Sudden Completely Unexpected Dataloss X-SULE-Definition: Sudden Unexpected Learning Event X-PEKBAC-Definition: Problem Exists, Keyboard Between Admin/Computer X-Spidey-Sense: Uh oh, Peter logged in User-Agent: Mutt/1.8.2 (2017-04-18) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 14:11:18 -0000 --TSQPSNmi3T91JED+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 17, 2017 at 03:47:08PM +0200, Vlad K. wrote: > On 2017-07-17 15:33, Glen Barber wrote: > >=20 > > No, this is not available in the 11.1 installer. > >=20 >=20 > Thanks but that's why I asked why's that. r320674 said MFC after 1 day. Is > it too late for 11.1-RELEASE, so it'll be applied to 11-STABLE, or is the= re > another reason? >=20 > If its' too late, does that mean it's too late for the installer, but the > new stack_guard code is there in STABLE and I am guessing will be part of > 11.1, so we can assume the sysctl to be an integer (as opposed to > enable/disable semantics of the sysctl in 11.0)? In other words, is it sa= fe > to ramp up the gap size in 11.1? >=20 kib gave feedback on this in an earlier reply (which I missed before replying myself). Glen --TSQPSNmi3T91JED+ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEjRJAPC5sqwhs9k2jAxRYpUeP4pMFAllsxYMACgkQAxRYpUeP 4pOsrA//QzCcnzmVmf7CPrBAsihPgInE16UuDuTGjVm7BLOUmqmk5hI7yQjFjGOJ cYrFCIAFqa0U7yiR/CzPv0HyenZ3qv3FrLff+3LJGA++yXAMjHNkgvbUx2sLu6UZ IIemFCcQUKIZp05RgQWPsOBfoRJmhxY1vkcGAlVdyEs6shZnbdhOafCfKZec/OMe YlqODgTJwf0f7DyUaXDiiNpWJIFiWVb3iQ949uoOPvEiVMo7s3KAzt+VwI3VMQHA yMziCJflQ3OR9tIB8WLvO2spiyc8fHauXBMbKEyN9oPu+lVaAV7DVWGsUODI1hNT yRXDPCReDZiXlxuyyIMTeg0c6/tqXd/WdXLQfupDjS6DM4NBSlk0q4i/xT3hHG6M edCQO3W1c/vZ6Zg6m3ThNOHe2/31NsfTeIepu4pH1MBjmIrI6JGIem5Db4KcAFzv 1dxkmFLbk7JzndBWnxVAaUajGUzmOxFMSC83wkMpQ+a/TpPMhHLfOCG9eG7FUPBI MtQDgxhyb9zLMZJHk3XgNQ79VAoI7nQk76ABwtSgSoGAbBb9Ki6B+lgq4udtnQph 8VfkTsn4ZjKWos0m6TbKXBL96JvQfIAQjFuNdNvqMfGgkpH6nQyLyVXmA+1qkCu0 pmRo1gPeNvh6mKtJ3M+nOW/aV3/iRlyXyuHz0kkK5T058bXXOco= =nHae -----END PGP SIGNATURE----- --TSQPSNmi3T91JED+-- From owner-freebsd-stable@freebsd.org Mon Jul 17 15:03:11 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 03DC8D9574A for ; Mon, 17 Jul 2017 15:03:11 +0000 (UTC) (envelope-from vlad-fbsd@acheronmedia.com) Received: from mx.irealone.hr (xoth.irealone.hr [136.243.79.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BF6C563E99 for ; Mon, 17 Jul 2017 15:03:09 +0000 (UTC) (envelope-from vlad-fbsd@acheronmedia.com) Received: by mx.irealone.hr (Postfix, from userid 58) id 2439576F4; Mon, 17 Jul 2017 17:03:06 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on postfix.xoth.irealone.hr X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 Received: from mail.irealone.com (unknown [10.0.0.10]) by mx.irealone.hr (Postfix) with ESMTP id 9524376F0 for ; Mon, 17 Jul 2017 17:03:05 +0200 (CEST) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 17 Jul 2017 17:03:05 +0200 From: "Vlad K." To: freebsd-stable@freebsd.org Subject: Re: stack_guard hardening bsdinstall option in STABLE and 11.1 Organization: Acheron Media In-Reply-To: <20170717141115.GQ16843@FreeBSD.org> References: <20170717133359.GP16843@FreeBSD.org> <61f79801976fab6770471cd3e2359652@acheronmedia.com> <20170717141115.GQ16843@FreeBSD.org> Message-ID: <54f7e918d4507b50e64ea766b2b04035@acheronmedia.com> X-Sender: vlad-fbsd@acheronmedia.com User-Agent: Roundcube Webmail/1.2.5 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 15:03:11 -0000 On 2017-07-17 16:11, Glen Barber wrote: > > kib gave feedback on this in an earlier reply (which I missed before > replying myself). > Neither of which answered my questions, I'm sorry. My question was not about stack sizes in 32 or 64 bit installations, nor about the quality of the fix (if I parse the rm libtrh comment correctly). I simply asked if it's safe to assume the sysctl to be an integer in 11.1 (I'm guessing yes looking at the commits to STABLE, but wanted to be sure), and I also asked why wasn't the bsdinstall-er option change MFC'd after 1 day, two weeks ago, whether it's by omission, simply ENOTIME, or something else... -- Vlad K. From owner-freebsd-stable@freebsd.org Mon Jul 17 22:09:34 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2451BDA2579 for ; Mon, 17 Jul 2017 22:09:34 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-12.reflexion.net [208.70.210.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AF3CC748AF for ; Mon, 17 Jul 2017 22:09:33 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 9635 invoked from network); 17 Jul 2017 22:14:03 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 17 Jul 2017 22:14:03 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.1) with SMTP; Mon, 17 Jul 2017 18:09:31 -0400 (EDT) Received: (qmail 2887 invoked from network); 17 Jul 2017 22:09:31 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 17 Jul 2017 22:09:31 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 1E648EC892D for ; Mon, 17 Jul 2017 15:09:31 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: stack_guard hardening bsdinstall option in STABLE and 11.1 Message-Id: <047E43D8-9F99-4855-8AAC-882AFBC891C9@dsl-only.net> Date: Mon, 17 Jul 2017 15:09:30 -0700 To: FreeBSD-STABLE Mailing List X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 22:09:34 -0000 Vlad K. vlad-fbsd at acheronmedia.com wrote on Mon Jul 17 15:03:11 UTC 2017 : > I also asked why wasn't the bsdinstall-er option change > MFC'd after 1 day, two weeks ago, whether it's by omission, simply > ENOTIME, or something else... Given what Konstantin Belousov described (default stack space sizes and apparently guard pages eat into stack space instead of the overall space being bigger by the guard size), I think that would explain not moving from CURRENT: it was known to be a problem. (Although I expect Konstantin Belousov's note here is the first public description of the problem's details.) I agree that you did not get an answer for the other part: > I simply asked if it's safe to assume the sysctl to be an integer in > 11.1 I've not gone through any draft 11.1-release code to check. === Mark Millard markmi at dsl-only.net From owner-freebsd-stable@freebsd.org Mon Jul 17 23:01:23 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 59C9DDA36E6; Mon, 17 Jul 2017 23:01:23 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 065B67613F; Mon, 17 Jul 2017 23:01:23 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.ijs.si (Postfix) with ESMTPS id 3xBJgh48V6z9g; Tue, 18 Jul 2017 01:01:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:organization:subject:subject:from:from :date:date:content-transfer-encoding:content-type:content-type :mime-version:received:received:received:received; s=jakla4; t= 1500332476; x=1502924477; bh=ImLI30NzJkvUphL7G6XtDM8KFHbplOwj8aH K7UWXL0w=; b=KQEteV2qwTB2ethPcJggxvZE9LJ4dSOTTXC7MKq1gYP9H/Zxw/S uIbAY5ptlWlJbqsNIm7cDmZT7IDoLkLFX8SZu7pRFS0gTeg8v++umjZMmcW52Wwq zG4CvSDsg1oZOWAFcLxG6dSucMam7DNc3LcGamhufiyRoxrSmBinSsRY= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id OUB8vt3EGW9M; Tue, 18 Jul 2017 01:01:16 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 3xBJgc31qNz9f; Tue, 18 Jul 2017 01:01:16 +0200 (CEST) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 3xBJgc2nJTz1Rj; Tue, 18 Jul 2017 01:01:16 +0200 (CEST) Received: from sleepy.ijs.si (2001:1470:ff80:e001::76) by nabiralnik.ijs.si with HTTP (HTTP/2.0 POST); Tue, 18 Jul 2017 01:01:16 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 18 Jul 2017 01:01:16 +0200 From: Mark Martinec To: freebsd-stable@freebsd.org Cc: freebsd-hackers@freebsd.org Subject: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Organization: Jozef Stefan Institute Message-ID: X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.2.4 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 23:01:23 -0000 Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update upgrade method I ended up with a system which gets stuck while trying to attach the second set of disks. This happened already after the first phase of the upgrade procedure (installing and re-booting with a new kernel). The first set of disks (ada0 .. ada2) are attached successfully, also a cd0, but then when the first of the set of four (a regular spinning disk) on an LSI controller is to be attached, the boot procedure just gets stuck there: kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) kernel: ada1: Command Queueing enabled kernel: ada1: 305245MB (625142448 512 byte sectors) kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0 kernel: ada2: ATA8-ACS SATA 3.x device kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8 kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) kernel: ada2: Command Queueing enabled kernel: ada2: 114473MB (234441648 512 byte sectors) kernel: ada2: quirks=0x1<4K> kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0 (stuck here, keyboard not responding, fans rising their pitch, presumably CPU is spinning) (instead of the normal continuation like: kernel: da0: Fixed Direct Access SPC-4 SCSI device kernel: da0: Serial Number .... kernel: da0: 600.000MB/s transfers kernel: da0: Command Queueing enabled kernel: da0: 1907729MB (3907029168 512 byte sectors) ) The controller for da0 .. da3 is an LSI: kernel: mps0: port 0x4000-0x40ff mem 0xd1740000-0xd1743fff,0xd1300000-0xd133ffff irq 16 at device 0.0 on pci1 kernel: mps0: Firmware: 14.00.01.00, Driver: 21.02.00.00-fbsd kernel: mps0: IOCCapabilities: 185c [...] kernel: mps0: SAS Address for SATA device = a4a4843003d0cf79 kernel: mps0: SAS Address from SATA device = a4a4843003d0cf79 kernel: mps0: SAS Address for SATA device = d3d48904eddff0d5 kernel: mps0: SAS Address from SATA device = d3d48904eddff0d5 [...] kernel: mps0: SAS Address for SATA device = 2a021c07585c665b kernel: mps0: SAS Address from SATA device = 2a021c07585c665b kernel: mps0: SAS Address for SATA device = 2a021c0758637b7c kernel: mps0: SAS Address from SATA device = 2a021c0758637b7c This host in this configuration worked perfectly well with 11.0 and many older versions of the OS. After some frustration I found out that the system can boot fine if a boot loader option "Safe mode" is set. This way I successfully finished the upgrade procedure (installing world). Playing with loader options that the "Safe mode" turns on ( /boot/menu-commands.4th ) it seems that kern.smp.disabled=1 is the crucial option, although my attempts at ruling out remaining options of the "Safe mode" turned out inconclusive - perhaps there is some random/race involved. Anyway, in "Safe mode" the machine always boots normally and attaches all disks. This experience is much like described in: https://forums.freebsd.org/threads/56524/ where the poster ended up disabling SMP to be able to have a working host. It is also somewhat similar to: https://lists.freebsd.org/pipermail/freebsd-hackers/2017-July/051258.html where a FreeBSD 11.1 prerelease only boots on a single-CPU AWS host, but fails to boot on a 2-core CPU, with various symptoms, including: ( https://lists.freebsd.org/pipermail/freebsd-hackers/2017-July/051260.html ) Feeding entropy: . spin lock 0xffffffff80db45c0 (smp rendezvous) held by 0xfffff80004378560 (tid 100074) too long timeout stopping cpus panic: spin lock held too long Please advise, thanks Mark From owner-freebsd-stable@freebsd.org Mon Jul 17 23:17:45 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9BD16DA3CC4 for ; Mon, 17 Jul 2017 23:17:45 +0000 (UTC) (envelope-from vlad-fbsd@acheronmedia.com) Received: from mx.irealone.hr (xoth.irealone.hr [136.243.79.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6292E76B2D for ; Mon, 17 Jul 2017 23:17:44 +0000 (UTC) (envelope-from vlad-fbsd@acheronmedia.com) Received: by mx.irealone.hr (Postfix, from userid 58) id DB2C779B6; Tue, 18 Jul 2017 01:17:39 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on postfix.xoth.irealone.hr X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 Received: from mail.irealone.com (unknown [10.0.0.10]) by mx.irealone.hr (Postfix) with ESMTP id A2D7C79AC for ; Tue, 18 Jul 2017 01:17:38 +0200 (CEST) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 18 Jul 2017 01:17:38 +0200 From: "Vlad K." To: freebsd-stable@freebsd.org Subject: Re: stack_guard hardening bsdinstall option in STABLE and 11.1 Organization: Acheron Media In-Reply-To: <047E43D8-9F99-4855-8AAC-882AFBC891C9@dsl-only.net> References: <047E43D8-9F99-4855-8AAC-882AFBC891C9@dsl-only.net> Message-ID: X-Sender: vlad-fbsd@acheronmedia.com User-Agent: Roundcube Webmail/1.2.5 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 23:17:45 -0000 On 2017-07-18 00:09, Mark Millard wrote: > (Although I expect Konstantin Belousov's note here is > the first public description of the problem's details.) Thanks for explaining the problem. I guess this was the reason why I failed to parse kib's reply, this was the first bit of info I encountered on that patch being effectively "broken" that way. > I agree that you did not get an answer for the other > part: > >> I simply asked if it's safe to assume the sysctl to be an integer in > >> 11.1 > > > I've not gone through any draft 11.1-release code to > check. It appears to be, the code is MFC'd with (if I'm correct) r320666. I've ran some tests in -RC3 and indeed it works, though probably for the reason you explained above (guard page eating into the stack), raising the stack_guard_pages sufficiently high (eg. 512 pages like the bsdinstaller in CURRENT defaults to) crashes threaded programs. If that is so, though, I wonder why it's not reverted, or at least the sysctl temporarily patched to remain boolean (or turned off completely). And the bsdinstaller option in CURRENT now essentially enables buggy and unstable behavior. If this is a known issue, why default to it in CURRENT. Anyway thanks for taking time to explain, this answers my questions. -- Vlad K. From owner-freebsd-stable@freebsd.org Mon Jul 17 23:23:48 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 65A14DA3F83 for ; Mon, 17 Jul 2017 23:23:48 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk0-x233.google.com (mail-qk0-x233.google.com [IPv6:2607:f8b0:400d:c09::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2650E76EC7 for ; Mon, 17 Jul 2017 23:23:48 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk0-x233.google.com with SMTP id d136so3650679qkg.3 for ; Mon, 17 Jul 2017 16:23:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Sx486zNzyUz9HiN67hzLTy64G5Qx1g1t5LEr/dBfWNE=; b=IthJmGg8lSGOKPwEoQD5q84EFkQaoxM6wQlfo9HJqWYvQABS2xbC+FM/ZWv/zSUKd5 VrDM5hfzMtkwFeTHayMvKxB9RDlhoWvGdEG/XqWpoxfTJ1FllAksDPOwW2DUllrrpwLZ e7rJpZbEp07bUoKbJOG+v1I8gd4/c7TdllEQvwL0OpzlRiRYO0lgmSykIYAVeiZm//nM flWigMUVtwxnGgRSj7AZqeqE0PTZ1V/V3fEYqiaDmr02aElsPiWiQCk5dCbtCBDihBLk q9D/ROjSxnHB7aJ0JvBWnhB0o8I9s9xL9dIml2WWECRbAE9YiiAA5rWgVigSIbyc2yI8 zVYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=Sx486zNzyUz9HiN67hzLTy64G5Qx1g1t5LEr/dBfWNE=; b=YQd9qbmJ3v/M/JCfu3AQiVRJloAaO46sKD63ioHLwbb/yq3a2eIJybYlK9a0ZWKN6d xT4ur3gzDpVGX4VFjpYRu4/7Eu/z+ivfrgrayf1nUBSxef1OhL6hOELh0GlGcaUAWctz RkYVpHfP2SDkan4Pkf4JlBOf+wppp7xMDLkJZKn85ACDa+kQk0LyGSsGgTQkdR9I80Zz 0QKruXlJzEuAiCQOsMrwLBynfDRi0YBQw+RPobp/qOogFny8kLwb67ZCykMuf/r9dvsm sFhApS83CBDFUFZGcHPTrFwrT4WQ7+KhQXF6nC92bJpCvjfhdhO9Qxf9wo/b3PU0JlsQ 2cJw== X-Gm-Message-State: AIVw113eD9eqmxIxorKy86L7inwY2QSyL+WzgKYgkHDI/E+ooBxNgFVA occBU2Bszs8xUg== X-Received: by 10.55.191.7 with SMTP id p7mr23976934qkf.223.1500333827168; Mon, 17 Jul 2017 16:23:47 -0700 (PDT) Received: from wkstn-mjohnston.west.isilon.com (c-76-104-201-218.hsd1.wa.comcast.net. [76.104.201.218]) by smtp.gmail.com with ESMTPSA id c5sm405281qkd.27.2017.07.17.16.23.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Jul 2017 16:23:46 -0700 (PDT) Sender: Mark Johnston Date: Mon, 17 Jul 2017 16:24:34 -0700 From: Mark Johnston To: Mark Martinec Cc: freebsd-stable@freebsd.org Subject: Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Message-ID: <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 23:23:48 -0000 On Tue, Jul 18, 2017 at 01:01:16AM +0200, Mark Martinec wrote: > Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update > upgrade > method I ended up with a system which gets stuck while trying to attach > the second set of disks. This happened already after the first phase of > the upgrade procedure (installing and re-booting with a new kernel). > > The first set of disks (ada0 .. ada2) are attached successfully, also a > cd0, but then when the first of the set of four (a regular spinning > disk) > on an LSI controller is to be attached, the boot procedure just gets > stuck there: > > kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) > kernel: ada1: Command Queueing enabled > kernel: ada1: 305245MB (625142448 512 byte sectors) > kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0 > kernel: ada2: ATA8-ACS SATA 3.x device > kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8 > kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) > kernel: ada2: Command Queueing enabled > kernel: ada2: 114473MB (234441648 512 byte sectors) > kernel: ada2: quirks=0x1<4K> > kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0 > > (stuck here, keyboard not responding, fans rising their pitch, > presumably CPU is spinning) Are you able to break into the debugger at this point? Try setting debug.kdb.break_to_debugger=1 and debug.kdb.alt_break_to_debugger=1 at the loader prompt, and hit the break key, or the key sequence ~ ctrl-b once the hang occurs. At the debugger prompt, try "bt" and "show allpcpu" to start. From owner-freebsd-stable@freebsd.org Tue Jul 18 16:43:23 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CDBCAD99779 for ; Tue, 18 Jul 2017 16:43:23 +0000 (UTC) (envelope-from stephane@dupille.org) Received: from mail.nospam.fr.eu.org (saloon.dalton-brothers.org [212.129.29.51]) by mx1.freebsd.org (Postfix) with ESMTP id 62D73746EF for ; Tue, 18 Jul 2017 16:43:20 +0000 (UTC) (envelope-from stephane@dupille.org) Received: from [192.168.1.25] (LStLambert-658-1-7-84.w193-248.abo.wanadoo.fr [193.248.42.84]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mail.nospam.fr.eu.org (Postfix) with ESMTPSA id 183BD103A for ; Tue, 18 Jul 2017 16:33:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dupille.org; s=default; t=1500395581; bh=U2Yxi5CUUYChLkR/YjscjdbUgzM5vRGQ5Vhq3Cas7kQ=; h=From:Subject:Date:To; b=gCDUXMoUbGU4/jDoLVTW3D9bW41XskJ6qBFu+CbfcFRqExTem2AGVLzYj3OGvC5Ug yVbxL8asTvML7B7Zz+g3LWa1+4iPqWz7RmWV0UnGuUGs6BCYbCtkcL72vRpudqZhMv T06hSm/hrl6PDUjonHD22Sj6Qqr3RWcHx+BgRolc= From: =?utf-8?Q?St=C3=A9phane_Dupille?= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Kernel Panic of 10.2-RELEASE Message-Id: Date: Tue, 18 Jul 2017 18:33:02 +0200 To: freebsd-stable@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) X-Mailer: Apple Mail (2.3124) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,SHORTCIRCUIT shortcircuit=ham autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamd X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jul 2017 16:43:23 -0000 Hello, My server is running 10.2-RELEASE (yes, I need to upgrade it, but it = works like a charm). Today, I launched this command, as root : # zfs destroy -r zroot@attic and the machine crashed : Jul 18 18:09:40 penitencier syslogd: kernel boot file is = /boot/kernel/kernel Jul 18 18:09:40 penitencier kernel: vputx: negative ref count Jul 18 18:09:40 penitencier kernel: 0xfffff8023037f000: tag zfs, type = VDIR Jul 18 18:09:40 penitencier kernel: usecount 0, writecount 0, refcount 0 = mountedhere 0 Jul 18 18:09:40 penitencier kernel: flags (VI_FREE) Jul 18 18:09:40 penitencier kernel: VI_LOCKed lock type zfs: EXCL by = thread 0xfffff8014f242940 (pid 60698, zfs, tid 100747) Jul 18 18:09:40 penitencier kernel: panic: vputx: negative ref cnt Jul 18 18:09:40 penitencier kernel: cpuid =3D 1 Jul 18 18:09:40 penitencier kernel: KDB: stack backtrace: Jul 18 18:09:40 penitencier kernel: #0 0xffffffff80984ef0 at = kdb_backtrace+0x60 Jul 18 18:09:40 penitencier kernel: #1 0xffffffff80948aa6 at = vpanic+0x126 Jul 18 18:09:40 penitencier kernel: #2 0xffffffff80948973 at panic+0x43 Jul 18 18:09:40 penitencier kernel: #3 0xffffffff809eb7d5 at vputx+0x2d5 Jul 18 18:09:40 penitencier kernel: #4 0xffffffff809e4f59 at = dounmount+0x689 Jul 18 18:09:40 penitencier kernel: #5 0xffffffff81a5fdd4 at = zfs_unmount_snap+0x114 Jul 18 18:09:40 penitencier kernel: #6 0xffffffff81a62fc1 at = zfs_ioc_destroy_snaps+0xc1 Jul 18 18:09:40 penitencier kernel: #7 0xffffffff81a61ae0 at = zfsdev_ioctl+0x5f0 Jul 18 18:09:40 penitencier kernel: #8 0xffffffff80830019 at = devfs_ioctl_f+0x139 Jul 18 18:09:40 penitencier kernel: #9 0xffffffff8099cde5 at = kern_ioctl+0x255 Jul 18 18:09:40 penitencier kernel: #10 0xffffffff8099cae0 at = sys_ioctl+0x140 Jul 18 18:09:40 penitencier kernel: #11 0xffffffff80d4b3e7 at = amd64_syscall+0x357 Jul 18 18:09:40 penitencier kernel: #12 0xffffffff80d30acb at = Xfast_syscall+0xfb Jul 18 18:09:40 penitencier kernel: Uptime: 5d6h0m11s This is all I found in logs. I have only a remote access to this machine = so I have no clue of what was printed on console. I use zfs on top of geom_eli. Here is a uname -v : FreeBSD penitencier.dalton-brothers.org 10.2-RELEASE-p9 FreeBSD = 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 = root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 After rebooting, the machine works well, as far as I can see : root@penitencier:/var/log # zpool status pool: zboot state: ONLINE scan: scrub repaired 0 in 0h0m with 0 errors on Wed Nov 12 11:20:33 = 2014 config: NAME STATE READ WRITE CKSUM zboot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/boot0 ONLINE 0 0 0 gpt/boot1 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: resilvered 6,56M in 0h0m with 0 errors on Tue Jul 18 18:13:23 = 2017 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da0p4.eli ONLINE 0 0 0 da1p4.eli ONLINE 0 0 0 errors: No known data errors (the pool has been resilvered because I boot once, but put a wrong = passphrase in geli for one of the two drives, so it booted with only one = disk) What should I do now ? launch a zfs scrub ? I=E2=80=99m a bit afraid of = making it panic again. Should I consider that I got unlucky once ? (please don=E2=80=99t tell me to upgrade it : I=E2=80=99m currently = trying to install a new server, and I will migrate to it very soon). Thanks. From owner-freebsd-stable@freebsd.org Tue Jul 18 16:54:41 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 021F1D99E19 for ; Tue, 18 Jul 2017 16:54:41 +0000 (UTC) (envelope-from daniel@byte.nl) Received: from mail-out.s1.byte.nl (mail-out.s1.byte.nl [46.21.235.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A634F74F17 for ; Tue, 18 Jul 2017 16:54:40 +0000 (UTC) (envelope-from daniel@byte.nl) Received: from localhost (localhost [127.0.0.1]) by mail-out.s1.byte.nl (Postfix) with ESMTP id 20B9A80AFE; Tue, 18 Jul 2017 18:47:50 +0200 (CEST) Received: from mail-out.s1.byte.nl ([127.0.0.1]) by localhost (mail-out6.c1.internal [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gcmIzcYHcOyB; Tue, 18 Jul 2017 18:47:47 +0200 (CEST) Received: from [100.77.135.32] (unknown [89.200.5.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: byte0030) by mail-out.s1.byte.nl (Postfix) with ESMTPSA id 1758080BDE; Tue, 18 Jul 2017 18:47:47 +0200 (CEST) Date: Tue, 18 Jul 2017 18:47:44 +0200 User-Agent: K-9 Mail for Android In-Reply-To: References: MIME-Version: 1.0 Subject: Re: Kernel Panic of 10.2-RELEASE To: =?ISO-8859-1?Q?St=E9phane_Dupille?= , =?ISO-8859-1?Q?St=E9phane_Dupille_via_freebsd-stable?= , freebsd-stable@freebsd.org From: Daniel Genis Message-ID: <651B3F5F-0856-44B0-A31D-3B5BA4E81F41@byte.nl> X-Byte-Mail-Received-Via: smtp-auth X-Byte-SASL-User: byte0030 X-Byte-Domain-ID: 1 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jul 2017 16:54:41 -0000 Hello,=20 Take a look at this commit: https://github=2Ecom/freebsd/freebsd/commit/d9= 9ba5c It might be the issue you're encountering=2E=20 With kind regards,=20 Daniel On 18 July 2017 18:33:02 CEST, "St=C3=A9phane Dupille via freebsd-stable" = wrote: >Hello, > >My server is running 10=2E2-RELEASE (yes, I need to upgrade it, but it >works like a charm)=2E Today, I launched this command, as root : ># zfs destroy -r zroot@attic >and the machine crashed : > >Jul 18 18:09:40 penitencier syslogd: kernel boot file is >/boot/kernel/kernel >Jul 18 18:09:40 penitencier kernel: vputx: negative ref count >Jul 18 18:09:40 penitencier kernel: 0xfffff8023037f000: tag zfs, type >VDIR >Jul 18 18:09:40 penitencier kernel: usecount 0, writecount 0, refcount >0 mountedhere 0 >Jul 18 18:09:40 penitencier kernel: flags (VI_FREE) >Jul 18 18:09:40 penitencier kernel: VI_LOCKed lock type zfs: EXCL by >thread 0xfffff8014f242940 (pid 60698, zfs, tid 100747) >Jul 18 18:09:40 penitencier kernel: panic: vputx: negative ref cnt >Jul 18 18:09:40 penitencier kernel: cpuid =3D 1 >Jul 18 18:09:40 penitencier kernel: KDB: stack backtrace: >Jul 18 18:09:40 penitencier kernel: #0 0xffffffff80984ef0 at >kdb_backtrace+0x60 >Jul 18 18:09:40 penitencier kernel: #1 0xffffffff80948aa6 at >vpanic+0x126 >Jul 18 18:09:40 penitencier kernel: #2 0xffffffff80948973 at panic+0x43 >Jul 18 18:09:40 penitencier kernel: #3 0xffffffff809eb7d5 at >vputx+0x2d5 >Jul 18 18:09:40 penitencier kernel: #4 0xffffffff809e4f59 at >dounmount+0x689 >Jul 18 18:09:40 penitencier kernel: #5 0xffffffff81a5fdd4 at >zfs_unmount_snap+0x114 >Jul 18 18:09:40 penitencier kernel: #6 0xffffffff81a62fc1 at >zfs_ioc_destroy_snaps+0xc1 >Jul 18 18:09:40 penitencier kernel: #7 0xffffffff81a61ae0 at >zfsdev_ioctl+0x5f0 >Jul 18 18:09:40 penitencier kernel: #8 0xffffffff80830019 at >devfs_ioctl_f+0x139 >Jul 18 18:09:40 penitencier kernel: #9 0xffffffff8099cde5 at >kern_ioctl+0x255 >Jul 18 18:09:40 penitencier kernel: #10 0xffffffff8099cae0 at >sys_ioctl+0x140 >Jul 18 18:09:40 penitencier kernel: #11 0xffffffff80d4b3e7 at >amd64_syscall+0x357 >Jul 18 18:09:40 penitencier kernel: #12 0xffffffff80d30acb at >Xfast_syscall+0xfb >Jul 18 18:09:40 penitencier kernel: Uptime: 5d6h0m11s > >This is all I found in logs=2E I have only a remote access to this >machine so I have no clue of what was printed on console=2E > >I use zfs on top of geom_eli=2E > >Here is a uname -v : >FreeBSD penitencier=2Edalton-brothers=2Eorg 10=2E2-RELEASE-p9 FreeBSD >10=2E2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016 =20 >root@amd64-builder=2Edaemonology=2Enet:/usr/obj/usr/src/sys/GENERIC amd6= 4 > >After rebooting, the machine works well, as far as I can see : >root@penitencier:/var/log # zpool status > pool: zboot > state: ONLINE >scan: scrub repaired 0 in 0h0m with 0 errors on Wed Nov 12 11:20:33 >2014 >config: > > NAME STATE READ WRITE CKSUM > zboot ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > gpt/boot0 ONLINE 0 0 0 > gpt/boot1 ONLINE 0 0 0 > >errors: No known data errors > > pool: zroot > state: ONLINE >scan: resilvered 6,56M in 0h0m with 0 errors on Tue Jul 18 18:13:23 >2017 >config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > da0p4=2Eeli ONLINE 0 0 0 > da1p4=2Eeli ONLINE 0 0 0 > >errors: No known data errors > > >(the pool has been resilvered because I boot once, but put a wrong >passphrase in geli for one of the two drives, so it booted with only >one disk) > >What should I do now ? launch a zfs scrub ? I=E2=80=99m a bit afraid of m= aking >it panic again=2E Should I consider that I got unlucky once ? >(please don=E2=80=99t tell me to upgrade it : I=E2=80=99m currently tryin= g to install a >new server, and I will migrate to it very soon)=2E > >Thanks=2E > >_______________________________________________ >freebsd-stable@freebsd=2Eorg mailing list >https://lists=2Efreebsd=2Eorg/mailman/listinfo/freebsd-stable >To unsubscribe, send any mail to >"freebsd-stable-unsubscribe@freebsd=2Eorg" --=20 Sent from my Android device with K-9 Mail=2E Please excuse my brevity=2E From owner-freebsd-stable@freebsd.org Tue Jul 18 22:57:01 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1B6C3DA1C74 for ; Tue, 18 Jul 2017 22:57:01 +0000 (UTC) (envelope-from stephane@dupille.org) Received: from mail.nospam.fr.eu.org (saloon.dalton-brothers.org [212.129.29.51]) by mx1.freebsd.org (Postfix) with ESMTP id D488B2B8B for ; Tue, 18 Jul 2017 22:57:00 +0000 (UTC) (envelope-from stephane@dupille.org) Received: from [192.168.1.25] (LStLambert-658-1-7-84.w193-248.abo.wanadoo.fr [193.248.42.84]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mail.nospam.fr.eu.org (Postfix) with ESMTPSA id 42CD11065; Tue, 18 Jul 2017 22:56:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dupille.org; s=default; t=1500418587; bh=7Zl3NWVPtHiL5RH+xDaQ8sVTiWk4Nx2KUIl4QbGsgm0=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=aiQO1q6riDsXKJHF9Ne02IPe/Aiwk/QvRjLPEk/T+OihBhj3yWjfTYMHNZJ/ifD9j Zf2jeNNWpCGjt+MZvyMV7tbXx2fiipToBORbWW1zeNV45EMD3DL+8SBRW0w86svWDr PTGnfuSWhPIxYN14ghi6KONN42FaEWSq5sBYoG88= Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: Kernel Panic of 10.2-RELEASE From: =?utf-8?Q?St=C3=A9phane_Dupille?= In-Reply-To: <651B3F5F-0856-44B0-A31D-3B5BA4E81F41@byte.nl> Date: Wed, 19 Jul 2017 00:56:27 +0200 Cc: =?utf-8?Q?St=C3=A9phane_Dupille_via_freebsd-stable?= Content-Transfer-Encoding: quoted-printable Message-Id: References: <651B3F5F-0856-44B0-A31D-3B5BA4E81F41@byte.nl> To: Daniel Genis X-Mailer: Apple Mail (2.3124) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,SHORTCIRCUIT shortcircuit=ham autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamd X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jul 2017 22:57:01 -0000 > Le 18 juil. 2017 =C3=A0 18:47, Daniel Genis a =C3=A9cri= t : >=20 > Hello,=20 Hello, > Take a look at this commit: = https://github.com/freebsd/freebsd/commit/d99ba5c > It might be the issue you're encountering.=20 Yes, it is. Here =E2=80=99s the corresponding PR : = https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207464 If I understand comments correctly, we have the same issue in 10.3 as = well. So the solution is to avoid destroy snapshots, or upgrade to 11.0. = Or patch the kernel myself. Thanks. From owner-freebsd-stable@freebsd.org Tue Jul 18 23:18:58 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 54C83DA224B for ; Tue, 18 Jul 2017 23:18:58 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E05DD35F2; Tue, 18 Jul 2017 23:18:57 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.ijs.si (Postfix) with ESMTPS id 3xBx1V1Sjpzjx; Wed, 19 Jul 2017 01:18:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:references:in-reply-to:organization :subject:subject:from:from:date:date:content-transfer-encoding :content-type:content-type:mime-version:received:received :received:received; s=jakla4; t=1500419930; x=1503011931; bh=zc/ mUQK8FCWRtJEBuJYtsOA4Ad+NDcW9nuNAPMJgfwM=; b=WZ70jUsDwmzpWpDOxqi zUsAPSMhpJh/D6pvW9YlNUij5rAljpbprtfDwm0fOTAmv6c3Gmbzjy1OsQcyj6He ARkwdp4xLuKBrfXbENHjxH/LWQ3z/Fkg4N8HT+4rNcZNN38OVE3fguLE+QUFWK/i POaPX+6qz6trOW7U7Sx+avY8= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id WY3jZoXr4A9Q; Wed, 19 Jul 2017 01:18:50 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 3xBx1Q3Zg3zjk; Wed, 19 Jul 2017 01:18:50 +0200 (CEST) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 3xBx1Q3Jgfz14X; Wed, 19 Jul 2017 01:18:50 +0200 (CEST) Received: from sleepy.ijs.si (2001:1470:ff80:e001::76) by nabiralnik.ijs.si with HTTP (HTTP/2.0 POST); Wed, 19 Jul 2017 01:18:50 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Wed, 19 Jul 2017 01:18:50 +0200 From: Mark Martinec To: freebsd-stable@freebsd.org Cc: Mark Johnston Subject: Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Organization: Jozef Stefan Institute In-Reply-To: <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> References: <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> Message-ID: X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.2.4 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jul 2017 23:18:58 -0000 2017-07-18 01:24, Mark Johnston wrote: > Are you able to break into the debugger at this point? Try setting > debug.kdb.break_to_debugger=1 and debug.kdb.alt_break_to_debugger=1 at > the loader prompt, and hit the break key, or the key sequence > ~ ctrl-b once the hang occurs. At the debugger prompt, try > "bt" and "show allpcpu" to start. Thank you for a prompt and good suggestion! I spent an afternoon fiddling with the machine, with mixed results. Your suggestion to break into debugger did not work, there was no reaction to or to ~ ctrl-b. So I embarked on rebuilding the RC3 kernel with options KDB options DDB options BREAK_TO_DEBUGGER options ALT_BREAK_TO_DEBUGGER options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN but then I realized the key is mapped-to by: alt ctrl , which now does break into debugger - but not so early where the holdup occurs. The WITNESS produced some LOR warnings, but that is probably ok. I came across a trace just before the problem area, but it flows by so fast on a vt console and only the last 40 or so lines remain on the screen (I have a photo), which do not look like revealing much. Unfortunately this machine does not have a serial interface. So in my last attempt I rebuilt a kernel with INVARIANTS but without WITNESS - and now I cannot reproduce the problem, with or without a "safe mode". What is interesting here that now the da0..da3 disks are attached first, and only then the ada disks - and even within the group of disks on the same controller their order has been shuffled - no idea what could have caused it - and it may have avoided the problem by doing so. Will play some more with this tomorrow... Mark > On Tue, Jul 18, 2017 at 01:01:16AM +0200, Mark Martinec wrote: >> Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update >> upgrade >> method I ended up with a system which gets stuck while trying to >> attach >> the second set of disks. This happened already after the first phase >> of >> the upgrade procedure (installing and re-booting with a new kernel). >> >> The first set of disks (ada0 .. ada2) are attached successfully, also >> a >> cd0, but then when the first of the set of four (a regular spinning >> disk) >> on an LSI controller is to be attached, the boot procedure just gets >> stuck there: >> kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) >> kernel: ada1: Command Queueing enabled >> kernel: ada1: 305245MB (625142448 512 byte sectors) >> kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0 >> kernel: ada2: ATA8-ACS SATA 3.x device >> kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8 >> kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) >> kernel: ada2: Command Queueing enabled >> kernel: ada2: 114473MB (234441648 512 byte sectors) >> kernel: ada2: quirks=0x1<4K> >> kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0 >> >> (stuck here, keyboard not responding, fans rising their pitch, >> presumably CPU is spinning) [...] From owner-freebsd-stable@freebsd.org Wed Jul 19 23:46:40 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C45E2D9A0FE for ; Wed, 19 Jul 2017 23:46:40 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 66EA971710 for ; Wed, 19 Jul 2017 23:46:40 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.ijs.si (Postfix) with ESMTPS id 3xCYb137Fcz1Vx for ; Thu, 20 Jul 2017 01:46:37 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:references:in-reply-to:organization :subject:subject:from:from:date:date:content-transfer-encoding :content-type:content-type:mime-version:received:received :received:received; s=jakla4; t=1500507993; x=1503099994; bh=fIh 5qigYWYLRxq1TZDkeJYWt9HaQNCUVDTlPaNWM3TU=; b=IvULuadbQS2ROwwzjNJ F7ZtT4vpODcjqdslaY3gw3V6nVB57AQVIYz9HYVs/xcYj8nKBUZNKD9oCoRmF1m7 XZuRCeolElPfWfiiT2D6YaSQBZLZ4blNU0yfcBG8YeT3ObCh/FZgdDjpUsS2n4V0 O6WkuiCjzC8y5AS0ymtrdpbM= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id A4OLfgqqBoyK for ; Thu, 20 Jul 2017 01:46:33 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 3xCYZx5n2Cz1Vw for ; Thu, 20 Jul 2017 01:46:33 +0200 (CEST) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 3xCYZx5YHjz6s for ; Thu, 20 Jul 2017 01:46:33 +0200 (CEST) Received: from www-proxy.ijs.si (2001:1470:ff80::3128:1) by webmail.ijs.si with HTTP (HTTP/1.1 POST); Thu, 20 Jul 2017 01:46:33 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 20 Jul 2017 01:46:33 +0200 From: Mark Martinec To: freebsd-stable@freebsd.org Subject: Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Organization: Jozef Stefan Institute In-Reply-To: References: <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> Message-ID: <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.2.4 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jul 2017 23:46:40 -0000 More news on the matter. As reported yesterday the locally built kernel with options INVARIANTS and DDB works fine and somehow avoids the trouble at attaching the da (mps) disks on an LSI controller, so today I wanted to get back to a reproducible hang - and sure enough, reverting to the generic kernel as distributed brings back the hang. So I tried rebuilding the kernel while experimenting with options like DDB and INVARIANTS. A locally built GENERIC kernel behaves the same as the original kernel from the distribution (as installed by freebsd-upgrade), so no surprises there. It hangs trying to attach the first of the da disks (after first successfully attaching all the ada disks). The alt ctrl esc is unable to enter debugger when the hang occurs (possibly due to an unresponsive USB keyboard at that time), even though the debug.kdb.break_to_debugger was set to 1 at a loader prompt. It needs loader "Safe mode" to be able to boot. Next, a locally built kernel with DDB and INVARIANTS works well (the remaining options come from an included GENERIC). Now the funny part: a locally built kernel with just the DDB option (and the rest included from GENERIC) *also* works well. Somehow the DDB option makes a difference, even though kernel debugger is never activated. To re-assert: at the time of a hang the CPU fan starts revving up, and the USB keyboard is unresponsive ( does not enter scroll mode, caps lock and num lock do not toggle their LED indicators, alt ctrl esc do not activate kernel debugger. Loader "Safe mode" avoids the problem (presumably by disabling SMP). Meanwhile I have successfully upgraded two other similar hosts from 11.0 to 11.1-RC3, no surprises there (but they do not have the same disk controller). Not sure what to try next. Mark 2017-07-19 01:18, Mark Martinec wrote: > 2017-07-18 01:24, Mark Johnston wrote: >> Are you able to break into the debugger at this point? Try setting >> debug.kdb.break_to_debugger=1 and debug.kdb.alt_break_to_debugger=1 at >> the loader prompt, and hit the break key, or the key sequence >> ~ ctrl-b once the hang occurs. At the debugger prompt, try >> "bt" and "show allpcpu" to start. > > Thank you for a prompt and good suggestion! I spent an afternoon > fiddling with the machine, with mixed results. Your suggestion to > break into debugger did not work, there was no reaction to > or to ~ ctrl-b. > > So I embarked on rebuilding the RC3 kernel with > options KDB > options DDB > options BREAK_TO_DEBUGGER > options ALT_BREAK_TO_DEBUGGER > options INVARIANTS > options INVARIANT_SUPPORT > options WITNESS > options WITNESS_SKIPSPIN > but then I realized the key is mapped-to by: alt ctrl , > which now does break into debugger - but not so early where the > holdup occurs. > > The WITNESS produced some LOR warnings, but that is probably ok. > I came across a trace just before the problem area, but it flows > by so fast on a vt console and only the last 40 or so lines > remain on the screen (I have a photo), which do not look like > revealing much. Unfortunately this machine does not have a serial > interface. > > So in my last attempt I rebuilt a kernel with INVARIANTS but > without WITNESS - and now I cannot reproduce the problem, with > or without a "safe mode". What is interesting here that now > the da0..da3 disks are attached first, and only then the ada > disks - and even within the group of disks on the same > controller their order has been shuffled - no idea what could > have caused it - and it may have avoided the problem by doing so. > > Will play some more with this tomorrow... > > Mark > > >> On Tue, Jul 18, 2017 at 01:01:16AM +0200, Mark Martinec wrote: >>> Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update >>> upgrade >>> method I ended up with a system which gets stuck while trying to >>> attach >>> the second set of disks. This happened already after the first phase >>> of >>> the upgrade procedure (installing and re-booting with a new kernel). >>> >>> The first set of disks (ada0 .. ada2) are attached successfully, also >>> a >>> cd0, but then when the first of the set of four (a regular spinning >>> disk) >>> on an LSI controller is to be attached, the boot procedure just gets >>> stuck there: >>> kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO >>> 8192bytes) >>> kernel: ada1: Command Queueing enabled >>> kernel: ada1: 305245MB (625142448 512 byte sectors) >>> kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0 >>> kernel: ada2: ATA8-ACS SATA 3.x device >>> kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8 >>> kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO >>> 8192bytes) >>> kernel: ada2: Command Queueing enabled >>> kernel: ada2: 114473MB (234441648 512 byte sectors) >>> kernel: ada2: quirks=0x1<4K> >>> kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0 >>> >>> (stuck here, keyboard not responding, fans rising their pitch, >>> presumably CPU is spinning) > [...] > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-stable@freebsd.org Thu Jul 20 00:02:28 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B3D72D9AA2F for ; Thu, 20 Jul 2017 00:02:28 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk0-x22f.google.com (mail-qk0-x22f.google.com [IPv6:2607:f8b0:400d:c09::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6A4E071F69 for ; Thu, 20 Jul 2017 00:02:28 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk0-x22f.google.com with SMTP id t2so6787707qkc.1 for ; Wed, 19 Jul 2017 17:02:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=1Xi/C80sSlSxlchwRm6s1+Xz/vNhgkkHvVbkK7A3Opw=; b=l3J5vGPZNHTfffYeQV+e4Oo97H9uZs+QIWfu14jM5ZMfx4Ur+vUp5OMnD6rAWBSqVZ IB+kWbF9x4bb33eQqrlHXsxvH2c/bkTWpGNvKO91UqAhV8FhCxysMWq1hz1umBmZEGfS GSP0OM+jrkt4bf+FHDehUJAhAddoTNQcrp+UQPc30ar7PobGcc+aV+8JxL8s/GjP9TeZ 5j0Tcbltnfvu92iTXJ+XKumHPCnuBD9TuFX/gIMrfVv2WQKn6cH2zRiDksW+ll2IvBcy L2o18TR/xj5NG+BtJZrUWp3FyJ6MU9K/EKfSZG59kmQC5Gcj668275KOAu2ky42n3C8W x1cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=1Xi/C80sSlSxlchwRm6s1+Xz/vNhgkkHvVbkK7A3Opw=; b=fxNVS7PeVFqAh3ifGs6B7olMxrsOcbfpm450mABM8gqs8hxMyoj0KGKL2RpLw5mN+m FwdnXUKQITvomrwn92gHpZQ/BfmbtmAZQciLRSPypkuPLC8go4qC70GH/KOcIe6BC4rD 4/pD8fUjBSiULVj9TGIz4VhscdWIFC8/p6r8FeYa+6qEVWxnbx54FCZPQ18fz4Tq/xH3 eL7E4yjZvL4d4hVfHHLwcijJljNAOC4fyDrcg7+LU+fy9O7DI6L4Vqc0I0M94NWQfnpU Fp0+ZJ1IfTmdbTDBWGq4cJv7KuSuxAFEDiaW0EXtSYRp1i2sj98LsDrpOJkvCfrXHB2g lhNw== X-Gm-Message-State: AIVw113XVOBuzCMm78RqyHhlnhxaa3Q2K23RFo5y5nZLjgUpX7xsd9R1 byC7kwbTYGzY5+Y+ X-Received: by 10.55.207.199 with SMTP id v68mr2561008qkl.142.1500508947332; Wed, 19 Jul 2017 17:02:27 -0700 (PDT) Received: from wkstn-mjohnston.west.isilon.com (c-76-104-201-218.hsd1.wa.comcast.net. [76.104.201.218]) by smtp.gmail.com with ESMTPSA id c4sm936675qtc.1.2017.07.19.17.02.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Jul 2017 17:02:26 -0700 (PDT) Sender: Mark Johnston Date: Wed, 19 Jul 2017 17:03:25 -0700 From: Mark Johnston To: Mark Martinec Cc: freebsd-stable@freebsd.org Subject: Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Message-ID: <20170720000325.GB9198@wkstn-mjohnston.west.isilon.com> References: <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> User-Agent: Mutt/1.8.3 (2017-05-23) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Jul 2017 00:02:28 -0000 On Thu, Jul 20, 2017 at 01:46:33AM +0200, Mark Martinec wrote: > More news on the matter. As reported yesterday the locally built > kernel with options INVARIANTS and DDB works fine and somehow avoids > the trouble at attaching the da (mps) disks on an LSI controller, so > today I wanted to get back to a reproducible hang - and sure enough, > reverting to the generic kernel as distributed brings back the hang. > > So I tried rebuilding the kernel while experimenting with options > like DDB and INVARIANTS. > > A locally built GENERIC kernel behaves the same as the original > kernel from the distribution (as installed by freebsd-upgrade), > so no surprises there. It hangs trying to attach the first of the > da disks (after first successfully attaching all the ada disks). > The alt ctrl esc is unable to enter debugger when the hang occurs > (possibly due to an unresponsive USB keyboard at that time), > even though the debug.kdb.break_to_debugger was set to 1 at a > loader prompt. It needs loader "Safe mode" to be able to boot. > > Next, a locally built kernel with DDB and INVARIANTS works well > (the remaining options come from an included GENERIC). > > Now the funny part: a locally built kernel with just the DDB > option (and the rest included from GENERIC) *also* works well. > Somehow the DDB option makes a difference, even though kernel > debugger is never activated. One thing to try at this point would be to disable EARLY_AP_STARTUP in the kernel config. That is, take a configuration with which you're able to reproduce the hang during boot, and remove "options EARLY_AP_STARTUP". This feature has a fairly large impact on the bootup process and has had a few problems that manifested as hangs during boot. There was at least one other case where an innocuous change to the kernel configuration "fixed" the problem by introducing some second-order effect (causing kernel threads to be scheduled in a different order, for instance). Regardless of whether the suggestion above makes a difference, it would be helpful to see verbose dmesgs from both a clean boot and a boot that hangs. If disabling EARLY_AP_STARTUP helps, then we can try adding some assertions that will cause the system to panic when the hang occurs, making it easier to see what's going on. > > To re-assert: at the time of a hang the CPU fan starts revving up, > and the USB keyboard is unresponsive ( does not enter scroll > mode, caps lock and num lock do not toggle their LED indicators, > alt ctrl esc do not activate kernel debugger. Loader "Safe mode" > avoids the problem (presumably by disabling SMP). > > Meanwhile I have successfully upgraded two other similar > hosts from 11.0 to 11.1-RC3, no surprises there (but they do not > have the same disk controller). > > Not sure what to try next. > > Mark From owner-freebsd-stable@freebsd.org Thu Jul 20 02:55:33 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 990EEDA468A for ; Thu, 20 Jul 2017 02:55:33 +0000 (UTC) (envelope-from hm_sales05@hmcamera.com) Received: from mail139-28.mail.alibaba.com (mail139-28.mail.alibaba.com [198.11.139.28]) by mx1.freebsd.org (Postfix) with ESMTP id 7E5CF7D301 for ; Thu, 20 Jul 2017 02:55:31 +0000 (UTC) (envelope-from hm_sales05@hmcamera.com) X-Alimail-AntiSpam: AC=CONTINUE; BC=0.466586|-1; FP=18140291349596767481|47|3|45|0|-1|-1|-1; HT=e02c03310; MF=hm_sales05@hmcamera.com; NM=1; PH=DS; RN=1; RT=1; SR=0; TI=SMTPD_---.8Rw3Zj0_1500519313; Received: from LocalHost(mailfrom:hm_sales05@hmcamera.com ip:116.20.241.229) by smtp.aliyun-inc.com(10.147.42.135); Thu, 20 Jul 2017 10:55:13 +0800 Message-ID: <7216080176ED1D93BD016EB2E86AF7E6AECECE6CC@HMCAMERA.COM> From: "daisy" Reply-To: To: Subject: Home Security System 4ch Wireless NVR G6204 Date: Wed, 19 Jul 2017 21:35:25 +0800 MIME-Version: 1.0 X-Priority: 3 X-Mailer: Joinf MailSystem 8.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Jul 2017 02:55:33 -0000 DQoNCg0KRGVhciBtYW5hZ2VyLA0KTmV3IGNoYXJnZWFibGUhDQpUaGUgY2FtZXJhIGtpdCBsYXVu Y2hlZCENCkRpc2NvdW50cyBvZmZlciEgQW55IGRldGFpbHMscGxzIGNvbnRhY3QgbWUgYXNhcC4N Cg0KVGhhbmsgeW91ICYgYmVzdCByZWdhcmRzIA0KIERhaXN5DQotLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCi0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLQ0KSE9OR01FSSBUZWNobm9sb2d5IExpbWl0ZWQgDQpza3lwZTpob25nbWVpLXNhbGUw NQ0KDQp3aGF0c2FwcDorODYxMzQ1MDc3NDIwNQ0KRS1tYWlsOmhvbmdtZWktc2FsZTA1QGhvbWVp ay5jb20NClRlbDooKzg2KSAxMzQ1MDc3NDIwNQ0KQWRkcmVzczpGb3NoYW4sQ2hpbmE= From owner-freebsd-stable@freebsd.org Thu Jul 20 13:45:46 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B204AC78ECF for ; Thu, 20 Jul 2017 13:45:46 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5F41A6AF05; Thu, 20 Jul 2017 13:45:45 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.ijs.si (Postfix) with ESMTPS id 3xCwCB4wjTz70; Thu, 20 Jul 2017 15:45:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:references:in-reply-to:organization :subject:subject:from:from:date:date:content-transfer-encoding :content-type:content-type:mime-version:received:received :received:received; s=jakla4; t=1500558339; x=1503150340; bh=Ou8 wNqc3yX51cKvl0LONbvUSszcCWBC7D7kSSiEWp5w=; b=hgxiaVVZvGNfHtcDLf3 QkO/jwceEXrSTvb3hOXB8/BW4Mx8kC/8jDIJOD3nD/jLk5OdtHqWwAE6pDIu2YgQ MKHW/pZLrlezBmgxQpEH6CYgWm/FOCeGpGJpeZ/ybVFjR4dQC6rossuomt+jhhD1 8ZijHfqyYHX2GwuRVKCmBhAQ= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id HCy0JavCX1Gt; Thu, 20 Jul 2017 15:45:39 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 3xCwC73KGdz6x; Thu, 20 Jul 2017 15:45:39 +0200 (CEST) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 3xCwC72GCYzRm; Thu, 20 Jul 2017 15:45:39 +0200 (CEST) Received: from neli.ijs.si (2001:1470:ff80:88:21c:c0ff:feb1:8c91) by nabiralnik.ijs.si with HTTP (HTTP/1.1 POST); Thu, 20 Jul 2017 15:45:39 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 20 Jul 2017 15:45:39 +0200 From: Mark Martinec To: freebsd-stable@freebsd.org Cc: Mark Johnston Subject: Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Organization: Jozef Stefan Institute In-Reply-To: <20170720000325.GB9198@wkstn-mjohnston.west.isilon.com> References: <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> <20170720000325.GB9198@wkstn-mjohnston.west.isilon.com> Message-ID: <81295bcacd7c44813de8d346c88cbb65@ijs.si> X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.2.4 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Jul 2017 13:45:46 -0000 2017-07-20 02:03, Mark Johnston wrote: > One thing to try at this point would be to disable EARLY_AP_STARTUP in > the kernel config. That is, take a configuration with which you're able > to reproduce the hang during boot, and remove "options > EARLY_AP_STARTUP". Done. And it avoids the problem altogether! Thanks. Tried a reboot several times and it succeeds every time. Here is all that I had in a config file for building a kernel, i.e. I took away the 'options DDB' which also seemingly avoided the problem: include GENERIC ident NELI nooptions EARLY_AP_STARTUP > This feature has a fairly large impact on the bootup process and has > had a few problems that manifested as hangs during boot. There was at > least one other case where an innocuous change to the kernel > configuration "fixed" the problem by introducing some second-order > effect (causing kernel threads to be scheduled in a different > order, for instance). > Regardless of whether the suggestion above makes a difference, it would > be helpful to see verbose dmesgs from both a clean boot and a boot that > hangs. If disabling EARLY_AP_STARTUP helps, then we can try adding some > assertions that will cause the system to panic when the hang occurs, > making it easier to see what's going on. Hmmm. I have now saved a couple of versions of /var/run/dmesg.boot (in boot_verbose mode) when EARLY_AP_STARTUP is disabled and the boot is successful. However, I don't know how to capture such log when booting hangs, as I have no serial interface and the boot never completes. All I have is a screen photo of the last state when a hang occurs (showing ada disks successfully attached, followed immediately by the attempt to attach a da disk, which hangs). Mark From owner-freebsd-stable@freebsd.org Fri Jul 21 18:24:48 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 259F9DAE28C for ; Fri, 21 Jul 2017 18:24:48 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from gilb.zs64.net (gilb.zs64.net [IPv6:2a00:14b0:4200:32e0::1ea]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gilb.zs64.net", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BDD127ECFB for ; Fri, 21 Jul 2017 18:24:47 +0000 (UTC) (envelope-from stb@lassitu.de) Received: by gilb.zs64.net (Postfix, from stb@lassitu.de) id 7AC39183869 for ; Fri, 21 Jul 2017 18:24:37 +0000 (UTC) From: Stefan Bethke Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Trouble with PM961 in SuperMicro X11 Message-Id: Date: Fri, 21 Jul 2017 20:24:19 +0200 To: freebsd-stable X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Jul 2017 18:24:48 -0000 I have a fresh SuperMicro SYS-5019S-M and I=E2=80=99ve installed a = Samsung SM961 128GBB, which I want to use as a ZFS cache. After installing 11.1-RC3, I=E2=80=99m getting the below errors during = boot, and trying to read or write to it produces the same message. Is there a tunable to make this work? A quick test with Windows did not = show any issues. Stefan --=20 Stefan Bethke Fon +49 151 14070811 Copyright (c) 1992-2017 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights = reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.1-RC3 #0 r320976: Fri Jul 14 02:20:44 UTC 2017 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on = LLVM 4.0.0) VT(vga): resolution 640x480 CPU: Intel(R) Xeon(R) CPU E3-1240 v6 @ 3.70GHz (3696.16-MHz K8-class = CPU) Origin=3D"GenuineIntel" Id=3D0x906e9 Family=3D0x6 Model=3D0x9e = Stepping=3D9 = Features=3D0xbfebfbff = Features2=3D0x7ffafbff AMD Features=3D0x2c100800 AMD Features2=3D0x121 Structured Extended = Features=3D0x29c6fbf XSAVE Features=3D0xf VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory =3D 68719476736 (65536 MB) avail memory =3D 66668195840 (63579 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: < > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads random: unblocking device. ioapic0 irqs 0-23 on motherboard SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #7 Launched! Timecounter "TSC-low" frequency 1848080854 Hz quality 1000 random: entropy device external interface kbd1 at kbdmux0 netmap: loaded module module_register_init: MOD_LOAD (vesa, 0xffffffff80f5b220, 0) error 19 random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" nexus0 vtvga0: on motherboard cryptosoft0: on motherboard acpi0: on motherboard acpi0: Power Button (fixed) cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 cpu4: on acpi0 cpu5: on acpi0 cpu6: on acpi0 cpu7: on acpi0 hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 24000000 Hz quality 950 Event timer "HPET" frequency 24000000 Hz quality 550 atrtc0: port 0x70-0x77 irq 8 on acpi0 atrtc0: Warning: Couldn't map I/O. Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 19.0 (no driver attached) xhci0: mem = 0xdf400000-0xdf40ffff irq 16 at device 20.0 on pci0 xhci0: 32 bytes context size, 64-bit DMA usbus0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pci0: at device 22.0 (no driver attached) pci0: at device 22.1 (no driver attached) ahci0: port = 0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem = 0xdf410000-0xdf411fff,0xdf41e000-0xdf41e0ff,0xdf41d000-0xdf41d7ff irq 16 = at device 23.0 on pci0 ahci0: AHCI v1.31 with 8 6Gbps ports, Port Multiplier not supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ahcich4: at channel 4 on ahci0 ahcich5: at channel 5 on ahci0 ahcich6: at channel 6 on ahci0 ahcich7: at channel 7 on ahci0 ahciem0: on ahci0 pcib1: irq 16 at device 28.0 on pci0 pci1: on pcib1 igb0: port = 0xe000-0xe01f mem 0xdf300000-0xdf37ffff,0xdf380000-0xdf383fff irq 16 at = device 0.0 on pci1 igb0: Using MSIX interrupts with 5 vectors igb0: Ethernet address: ac:1f:6b:18:06:6e igb0: Bound queue 0 to cpu 0 igb0: Bound queue 1 to cpu 1 igb0: Bound queue 2 to cpu 2 igb0: Bound queue 3 to cpu 3 igb0: netmap queues/slots: TX 4/1024, RX 4/1024 pcib2: irq 17 at device 28.1 on pci0 pci2: on pcib2 igb1: port = 0xd000-0xd01f mem 0xdf200000-0xdf27ffff,0xdf280000-0xdf283fff irq 17 at = device 0.0 on pci2 igb1: Using MSIX interrupts with 5 vectors igb1: Ethernet address: ac:1f:6b:18:06:6f igb1: Bound queue 0 to cpu 4 igb1: Bound queue 1 to cpu 5 igb1: Bound queue 2 to cpu 6 igb1: Bound queue 3 to cpu 7 igb1: netmap queues/slots: TX 4/1024, RX 4/1024 pcib3: irq 16 at device 28.4 on pci0 pci3: on pcib3 nvme0: mem 0xdf100000-0xdf103fff irq 16 at device = 0.0 on pci3 pcib4: irq 18 at device 28.6 on pci0 pci4: on pcib4 pcib5: at device 0.0 on pci4 pci5: on pcib5 vgapci0: port 0xc000-0xc07f mem = 0xde000000-0xdeffffff,0xdf000000-0xdf01ffff irq 18 at device 0.0 on pci5 vgapci0: Boot video device isab0: at device 31.0 on pci0 isa0: on isab0 pci0: at device 31.2 (no driver attached) acpi_button0: on acpi0 acpi_button1: on acpi0 acpi_tz0: on acpi0 acpi_tz1: on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 orm0: at iomem 0xc0000-0xc7fff on isa0 ppc0: cannot reserve I/O port range est0: on cpu0 est1: on cpu1 est2: on cpu2 est3: on cpu3 est4: on cpu4 est5: on cpu5 est6: on cpu6 est7: on cpu7 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 1.000 msec nvme cam probe device init ugen0.1: <0x8086 XHCI root HUB> at usbus0 uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on = usbus0 nvd0: NVMe namespace nvd0: 122104MB (250069680 512 byte sectors) ses0 at ahciem0 bus 0 scbus8 target 0 lun 0 ses0: SEMB S-E-S 2.00 device ses0: SEMB SES Device ada0 at ahcich4 bus 0 scbus4 target 0 lun 0 ada0: ACS-2 ATA SATA 3.x device ada0: Serial Number W1F50MGQ ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 2861588MB (5860533168 512 byte sectors) ada0: quirks=3D0x1<4K> ada1 at ahcich5 bus 0 scbus5 target 0 lun 0 ada1: ATA8-ACS SATA 3.x device ada1: Serial Number W1F29WMP ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 2861588MB (5860533168 512 byte sectors) ada1: quirks=3D0x1<4K> Trying to mount root from zfs:zroot/ROOT/default []... Root mount waiting for: usbus0 uhub0: 26 ports with 26 removable, self powered Root mount waiting for: usbus0 ugen0.2: at usbus0 uhub1 on uhub0 uhub1: on = usbus0 Root mount waiting for: usbus0 uhub1: 3 ports with 2 removable, bus powered ugen0.3: at usbus0 ukbd0 on uhub1 ukbd0: on = usbus0 kbd2 at ukbd0 Root mount waiting for: usbus0 Root mount waiting for: usbus0 ugen0.4: at usbus0 umass0 on uhub0 umass0: on usbus0 umass0: SCSI over Bulk-Only; quirks =3D 0x8100 umass0:9:0: Attached to scbus9 da0 at umass-sim0 bus 0 scbus9 target 0 lun 0 da0: Removable Direct Access SCSI device da0: Serial Number 05E3MIRCU1FD9Y8U da0: 40.000MB/s transfers da0: 15296MB (31326208 512 byte sectors) da0: quirks=3D0x12 GEOM: da0: the secondary GPT header is not in the last LBA. ugen0.5: at usbus0 uhub2 on uhub0 uhub2: = on usbus0 uhub2: 4 ports with 3 removable, self powered Root mount waiting for: usbus0 ugen0.6: at usbus0 ukbd1 on uhub2 ukbd1: = on usbus0 kbd3 at ukbd1 nvme0: resetting controller nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 nvme0: resetting controller nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 nvme0: resetting controller nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 nvme0: resetting controller nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 nvme0: resetting controller nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 nvme0: aborting outstanding i/o nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 igb0: link state changed to UP uhid0 on uhub1 uhid0: on = usbus0 ums0 on uhub2 ums0: = on usbus0 ums0: 3 buttons and [Z] coordinates ID=3D0 From owner-freebsd-stable@freebsd.org Fri Jul 21 18:26:19 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 28974DAE401 for ; Fri, 21 Jul 2017 18:26:19 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from gilb.zs64.net (gilb.zs64.net [212.12.50.234]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gilb.zs64.net", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CCE337EE55 for ; Fri, 21 Jul 2017 18:26:18 +0000 (UTC) (envelope-from stb@lassitu.de) Received: by gilb.zs64.net (Postfix, from stb@lassitu.de) id B93DA183889 for ; Fri, 21 Jul 2017 18:26:15 +0000 (UTC) From: Stefan Bethke Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Trouble with SM961 in SuperMicro X11 Date: Fri, 21 Jul 2017 20:25:57 +0200 References: To: freebsd-stable In-Reply-To: Message-Id: <7C88AB94-17EB-4C66-832F-30D67C064F47@lassitu.de> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Jul 2017 18:26:19 -0000 It=E2=80=99s an SM961, not PM951. > Am 21.07.2017 um 20:24 schrieb Stefan Bethke : >=20 > I have a fresh SuperMicro SYS-5019S-M and I=E2=80=99ve installed a = Samsung SM961 128GBB, which I want to use as a ZFS cache. >=20 > After installing 11.1-RC3, I=E2=80=99m getting the below errors during = boot, and trying to read or write to it produces the same message. >=20 > Is there a tunable to make this work? A quick test with Windows did = not show any issues. >=20 >=20 > Stefan >=20 > --=20 > Stefan Bethke Fon +49 151 14070811 >=20 >=20 >=20 > Copyright (c) 1992-2017 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, = 1994 > The Regents of the University of California. All rights = reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 11.1-RC3 #0 r320976: Fri Jul 14 02:20:44 UTC 2017 > root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 > FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on = LLVM 4.0.0) > VT(vga): resolution 640x480 > CPU: Intel(R) Xeon(R) CPU E3-1240 v6 @ 3.70GHz (3696.16-MHz K8-class = CPU) > Origin=3D"GenuineIntel" Id=3D0x906e9 Family=3D0x6 Model=3D0x9e = Stepping=3D9 > = Features=3D0xbfebfbff > = Features2=3D0x7ffafbff > AMD Features=3D0x2c100800 > AMD Features2=3D0x121 > Structured Extended = Features=3D0x29c6fbf > XSAVE Features=3D0xf > VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID > TSC: P-state invariant, performance statistics > real memory =3D 68719476736 (65536 MB) > avail memory =3D 66668195840 (63579 MB) > Event timer "LAPIC" quality 600 > ACPI APIC Table: < > > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs > FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads > random: unblocking device. > ioapic0 irqs 0-23 on motherboard > SMP: AP CPU #1 Launched! > SMP: AP CPU #2 Launched! > SMP: AP CPU #3 Launched! > SMP: AP CPU #4 Launched! > SMP: AP CPU #5 Launched! > SMP: AP CPU #6 Launched! > SMP: AP CPU #7 Launched! > Timecounter "TSC-low" frequency 1848080854 Hz quality 1000 > random: entropy device external interface > kbd1 at kbdmux0 > netmap: loaded module > module_register_init: MOD_LOAD (vesa, 0xffffffff80f5b220, 0) error 19 > random: registering fast source Intel Secure Key RNG > random: fast provider: "Intel Secure Key RNG" > nexus0 > vtvga0: on motherboard > cryptosoft0: on motherboard > acpi0: on motherboard > acpi0: Power Button (fixed) > cpu0: on acpi0 > cpu1: on acpi0 > cpu2: on acpi0 > cpu3: on acpi0 > cpu4: on acpi0 > cpu5: on acpi0 > cpu6: on acpi0 > cpu7: on acpi0 > hpet0: iomem 0xfed00000-0xfed003ff on = acpi0 > Timecounter "HPET" frequency 24000000 Hz quality 950 > Event timer "HPET" frequency 24000000 Hz quality 550 > atrtc0: port 0x70-0x77 irq 8 on acpi0 > atrtc0: Warning: Couldn't map I/O. > Event timer "RTC" frequency 32768 Hz quality 0 > attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 > Timecounter "i8254" frequency 1193182 Hz quality 0 > Event timer "i8254" frequency 1193182 Hz quality 100 > Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0 > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > pci0: at device 19.0 (no driver = attached) > xhci0: mem = 0xdf400000-0xdf40ffff irq 16 at device 20.0 on pci0 > xhci0: 32 bytes context size, 64-bit DMA > usbus0 on xhci0 > usbus0: 5.0Gbps Super Speed USB v3.0 > pci0: at device 22.0 (no driver attached) > pci0: at device 22.1 (no driver attached) > ahci0: port = 0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem = 0xdf410000-0xdf411fff,0xdf41e000-0xdf41e0ff,0xdf41d000-0xdf41d7ff irq 16 = at device 23.0 on pci0 > ahci0: AHCI v1.31 with 8 6Gbps ports, Port Multiplier not supported > ahcich0: at channel 0 on ahci0 > ahcich1: at channel 1 on ahci0 > ahcich2: at channel 2 on ahci0 > ahcich3: at channel 3 on ahci0 > ahcich4: at channel 4 on ahci0 > ahcich5: at channel 5 on ahci0 > ahcich6: at channel 6 on ahci0 > ahcich7: at channel 7 on ahci0 > ahciem0: on ahci0 > pcib1: irq 16 at device 28.0 on pci0 > pci1: on pcib1 > igb0: port = 0xe000-0xe01f mem 0xdf300000-0xdf37ffff,0xdf380000-0xdf383fff irq 16 at = device 0.0 on pci1 > igb0: Using MSIX interrupts with 5 vectors > igb0: Ethernet address: ac:1f:6b:18:06:6e > igb0: Bound queue 0 to cpu 0 > igb0: Bound queue 1 to cpu 1 > igb0: Bound queue 2 to cpu 2 > igb0: Bound queue 3 to cpu 3 > igb0: netmap queues/slots: TX 4/1024, RX 4/1024 > pcib2: irq 17 at device 28.1 on pci0 > pci2: on pcib2 > igb1: port = 0xd000-0xd01f mem 0xdf200000-0xdf27ffff,0xdf280000-0xdf283fff irq 17 at = device 0.0 on pci2 > igb1: Using MSIX interrupts with 5 vectors > igb1: Ethernet address: ac:1f:6b:18:06:6f > igb1: Bound queue 0 to cpu 4 > igb1: Bound queue 1 to cpu 5 > igb1: Bound queue 2 to cpu 6 > igb1: Bound queue 3 to cpu 7 > igb1: netmap queues/slots: TX 4/1024, RX 4/1024 > pcib3: irq 16 at device 28.4 on pci0 > pci3: on pcib3 > nvme0: mem 0xdf100000-0xdf103fff irq 16 at = device 0.0 on pci3 > pcib4: irq 18 at device 28.6 on pci0 > pci4: on pcib4 > pcib5: at device 0.0 on pci4 > pci5: on pcib5 > vgapci0: port 0xc000-0xc07f mem = 0xde000000-0xdeffffff,0xdf000000-0xdf01ffff irq 18 at device 0.0 on pci5 > vgapci0: Boot video device > isab0: at device 31.0 on pci0 > isa0: on isab0 > pci0: at device 31.2 (no driver attached) > acpi_button0: on acpi0 > acpi_button1: on acpi0 > acpi_tz0: on acpi0 > acpi_tz1: on acpi0 > uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on = acpi0 > uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 > orm0: at iomem 0xc0000-0xc7fff on isa0 > ppc0: cannot reserve I/O port range > est0: on cpu0 > est1: on cpu1 > est2: on cpu2 > est3: on cpu3 > est4: on cpu4 > est5: on cpu5 > est6: on cpu6 > est7: on cpu7 > ZFS filesystem version: 5 > ZFS storage pool version: features support (5000) > Timecounters tick every 1.000 msec > nvme cam probe device init > ugen0.1: <0x8086 XHCI root HUB> at usbus0 > uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on = usbus0 > nvd0: NVMe namespace > nvd0: 122104MB (250069680 512 byte sectors) > ses0 at ahciem0 bus 0 scbus8 target 0 lun 0 > ses0: SEMB S-E-S 2.00 device > ses0: SEMB SES Device > ada0 at ahcich4 bus 0 scbus4 target 0 lun 0 > ada0: ACS-2 ATA SATA 3.x device > ada0: Serial Number W1F50MGQ > ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 2861588MB (5860533168 512 byte sectors) > ada0: quirks=3D0x1<4K> > ada1 at ahcich5 bus 0 scbus5 target 0 lun 0 > ada1: ATA8-ACS SATA 3.x device > ada1: Serial Number W1F29WMP > ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 2861588MB (5860533168 512 byte sectors) > ada1: quirks=3D0x1<4K> > Trying to mount root from zfs:zroot/ROOT/default []... > Root mount waiting for: usbus0 > uhub0: 26 ports with 26 removable, self powered > Root mount waiting for: usbus0 > ugen0.2: at usbus0 > uhub1 on uhub0 > uhub1: = on usbus0 > Root mount waiting for: usbus0 > uhub1: 3 ports with 2 removable, bus powered > ugen0.3: at usbus0 > ukbd0 on uhub1 > ukbd0: = on usbus0 > kbd2 at ukbd0 > Root mount waiting for: usbus0 > Root mount waiting for: usbus0 > ugen0.4: at usbus0 > umass0 on uhub0 > umass0: on usbus0 > umass0: SCSI over Bulk-Only; quirks =3D 0x8100 > umass0:9:0: Attached to scbus9 > da0 at umass-sim0 bus 0 scbus9 target 0 lun 0 > da0: Removable Direct Access SCSI = device > da0: Serial Number 05E3MIRCU1FD9Y8U > da0: 40.000MB/s transfers > da0: 15296MB (31326208 512 byte sectors) > da0: quirks=3D0x12 > GEOM: da0: the secondary GPT header is not in the last LBA. > ugen0.5: at usbus0 > uhub2 on uhub0 > uhub2: on usbus0 > uhub2: 4 ports with 3 removable, self powered > Root mount waiting for: usbus0 > ugen0.6: at usbus0 > ukbd1 on uhub2 > ukbd1: on usbus0 > kbd3 at ukbd1 > nvme0: resetting controller > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 > nvme0: resetting controller > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 > nvme0: resetting controller > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 > nvme0: resetting controller > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 > nvme0: resetting controller > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:127 nsid:1 lba:2080 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:127 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:126 nsid:1 lba:2592 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:126 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:125 nsid:1 lba:250068000 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:125 cdw0:0 > nvme0: aborting outstanding i/o > nvme0: READ sqid:8 cid:124 nsid:1 lba:250068512 len:224 > nvme0: ABORTED - BY REQUEST (00/07) sqid:8 cid:124 cdw0:0 > igb0: link state changed to UP > uhid0 on uhub1 > uhid0: = on usbus0 > ums0 on uhub2 > ums0: = on usbus0 > ums0: 3 buttons and [Z] coordinates ID=3D0 >=20 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org" --=20 Stefan Bethke Fon +49 151 14070811 From owner-freebsd-stable@freebsd.org Sat Jul 22 05:08:34 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 83D65C7F5C9 for ; Sat, 22 Jul 2017 05:08:34 +0000 (UTC) (envelope-from paul@ziemba.us) Received: from osmtp.ziemba.us (osmtp.ziemba.us [208.106.105.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E74B16C1D1 for ; Sat, 22 Jul 2017 05:08:33 +0000 (UTC) (envelope-from paul@ziemba.us) Received: from hairball.ziemba.us (localhost.ziemba.us [127.0.0.1]) by hairball.ziemba.us (8.15.2/8.15.2) with ESMTP id v6M4peWt007735 for ; Fri, 21 Jul 2017 21:51:40 -0700 (PDT) (envelope-from paul@hairball.ziemba.us) Received: (from paul@localhost) by hairball.ziemba.us (8.15.2/8.15.2/Submit) id v6M4peqh007734 for freebsd-stable@FreeBSD.org; Fri, 21 Jul 2017 21:51:40 -0700 (PDT) (envelope-from paul) Date: Fri, 21 Jul 2017 21:51:40 -0700 From: "G. Paul Ziemba" To: freebsd-stable@FreeBSD.org Subject: stable/11 r321349 crashing immediately Message-ID: <20170722045140.GA5680@hairball.ziemba.us> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.6.1 (2016-04-27) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 05:08:34 -0000 GENERIC kernel r321349 results in the following about a minute after multiuser boot completes. What additional information should I provide to assist in debugging? Many thanks! [Extracted from /var/crash/core.txt.NNN] KDB: stack backtrace: #0 0xffffffff810f6ed7 at kdb_backtrace+0xa7 #1 0xffffffff810872a9 at vpanic+0x249 #2 0xffffffff81087060 at vpanic+0 #3 0xffffffff817d9aca at dblfault_handler+0x10a #4 0xffffffff817ae93c at Xdblfault+0xac #5 0xffffffff810cf76e at cpu_search_lowest+0x35e #6 0xffffffff810cf76e at cpu_search_lowest+0x35e #7 0xffffffff810d5b36 at sched_lowest+0x66 #8 0xffffffff810d1d92 at sched_pickcpu+0x522 #9 0xffffffff810d2b03 at sched_add+0xd3 #10 0xffffffff8101df5c at intr_event_schedule_thread+0x18c #11 0xffffffff8101ddb0 at swi_sched+0xa0 #12 0xffffffff81261643 at netisr_queue_internal+0x1d3 #13 0xffffffff81261212 at netisr_queue_src+0x92 #14 0xffffffff81261677 at netisr_queue+0x27 #15 0xffffffff8123da5a at if_simloop+0x20a #16 0xffffffff8123d83b at looutput+0x22b #17 0xffffffff8131c4c6 at ip_output+0x1aa6 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 298 dumptid = curthread->td_tid; (kgdb) #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 #1 0xffffffff810867e8 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff810872ff in vpanic (fmt=0xffffffff81e5f7e0 "double fault", ap=0xfffffe0839778ec0) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff81087060 in panic (fmt=0xffffffff81e5f7e0 "double fault") at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff817d9aca in dblfault_handler (frame=0xfffffe0839778f40) at /usr/src/sys/amd64/amd64/trap.c:828 #5 #6 0xffffffff810cf422 in cpu_search_lowest ( cg=0xffffffff826ccd98 , low=) at /usr/src/sys/kern/sched_ule.c:782 #7 0xffffffff810cf76e in cpu_search (cg=0xffffffff826cccb8 , low=0xfffffe085cfa53b8, high=0x0, match=1) at /usr/src/sys/kern/sched_ule.c:710 #8 cpu_search_lowest (cg=0xffffffff826cccb8 , low=0xfffffe085cfa53b8) at /usr/src/sys/kern/sched_ule.c:783 #9 0xffffffff810cf76e in cpu_search (cg=0xffffffff826ccc80 , low=0xfffffe085cfa5430, high=0x0, match=1) at /usr/src/sys/kern/sched_ule.c:710 #10 cpu_search_lowest (cg=0xffffffff826ccc80 , low=0xfffffe085cfa5430) at /usr/src/sys/kern/sched_ule.c:783 #11 0xffffffff810d5b36 in sched_lowest (cg=0xffffffff826ccc80 , mask=..., pri=28, maxload=2147483647, prefer=4) at /usr/src/sys/kern/sched_ule.c:815 #12 0xffffffff810d1d92 in sched_pickcpu (td=0xfffff8000a3a9000, flags=4) at /usr/src/sys/kern/sched_ule.c:1292 #13 0xffffffff810d2b03 in sched_add (td=0xfffff8000a3a9000, flags=4) at /usr/src/sys/kern/sched_ule.c:2447 #14 0xffffffff8101df5c in intr_event_schedule_thread (ie=0xfffff80007e7ae00) at /usr/src/sys/kern/kern_intr.c:917 #15 0xffffffff8101ddb0 in swi_sched (cookie=0xfffff8000a386880, flags=0) at /usr/src/sys/kern/kern_intr.c:1163 #16 0xffffffff81261643 in netisr_queue_internal (proto=1, m=0xfffff80026d00500, cpuid=0) at /usr/src/sys/net/netisr.c:1022 #17 0xffffffff81261212 in netisr_queue_src (proto=1, source=0, m=0xfffff80026d00500) at /usr/src/sys/net/netisr.c:1056 #18 0xffffffff81261677 in netisr_queue (proto=1, m=0xfffff80026d00500) at /usr/src/sys/net/netisr.c:1069 #19 0xffffffff8123da5a in if_simloop (ifp=0xfffff800116eb000, m=0xfffff80026d00500, af=2, hlen=0) at /usr/src/sys/net/if_loop.c:358 #20 0xffffffff8123d83b in looutput (ifp=0xfffff800116eb000, m=0xfffff80026d00500, dst=0xfffff80026ed6550, ro=0xfffff80026ed6530) at /usr/src/sys/net/if_loop.c:265 #21 0xffffffff8131c4c6 in ip_output (m=0xfffff80026d00500, opt=0x0, ro=0xfffff80026ed6530, flags=0, imo=0x0, inp=0xfffff80026ed63a0) at /usr/src/sys/netinet/ip_output.c:655 #22 0xffffffff8142e1c7 in tcp_output (tp=0xfffff80026eb2820) at /usr/src/sys/netinet/tcp_output.c:1447 #23 0xffffffff81447700 in tcp_usr_send (so=0xfffff80011ec2360, flags=0, m=0xfffff80026d14d00, nam=0x0, control=0x0, td=0xfffff80063ba1000) at /usr/src/sys/netinet/tcp_usrreq.c:967 #24 0xffffffff811776f1 in sosend_generic (so=0xfffff80011ec2360, addr=0x0, uio=0x0, top=0xfffff80026d14d00, control=0x0, flags=0, td=0xfffff80063ba1000) at /usr/src/sys/kern/uipc_socket.c:1360 #25 0xffffffff811779bd in sosend (so=0xfffff80011ec2360, addr=0x0, uio=0x0, top=0xfffff80026d14d00, control=0x0, flags=0, td=0xfffff80063ba1000) at /usr/src/sys/kern/uipc_socket.c:1405 #26 0xffffffff815276a2 in clnt_vc_call (cl=0xfffff80063ca0980, ext=0xfffffe085cfa6e38, proc=4, args=0xfffff80026c3bc00, resultsp=0xfffffe085cfa7110, utimeout=...) at /usr/src/sys/rpc/clnt_vc.c:413 #27 0xffffffff8152391c in clnt_reconnect_call (cl=0xfffff80063ca0c00, ext=0xfffffe085cfa6e38, proc=4, args=0xfffff80026c3bc00, resultsp=0xfffffe085cfa7110, utimeout=...) at /usr/src/sys/rpc/clnt_rc.c:271 #28 0xffffffff80e75628 in newnfs_request (nd=0xfffffe085cfa7110, nmp=0xfffff80007e79c00, clp=0x0, nrp=0xfffff80007e79d28, vp=0xfffff80011d9b588, td=0xfffff80063ba1000, cred=0xfffff800118c0100, prog=100003, vers=3, retsum=0x0, toplevel=1, xidp=0x0, dssep=0x0) at /usr/src/sys/fs/nfs/nfs_commonkrpc.c:760 #29 0xffffffff80ee87f1 in nfscl_request (nd=0xfffffe085cfa7110, vp=0xfffff80011d9b588, p=0xfffff80063ba1000, cred=0xfffff800118c0100, stuff=0x0) at /usr/src/sys/fs/nfsclient/nfs_clport.c:952 #30 0xffffffff80ea865c in nfsrpc_accessrpc (vp=0xfffff80011d9b588, mode=63, cred=0xfffff800118c0100, p=0xfffff80063ba1000, nap=0xfffffe085cfa72e0, attrflagp=0xfffffe085cfa73c0, rmodep=0xfffffe085cfa73b4, stuff=0x0) at /usr/src/sys/fs/nfsclient/nfs_clrpcops.c:243 #31 0xffffffff80ed9ec9 in nfs34_access_otw (vp=0xfffff80011d9b588, wmode=63, td=0xfffff80063ba1000, cred=0xfffff800118c0100, retmode=0xfffffe085cfa7540) at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:283 #32 0xffffffff80ecfb64 in nfs_access (ap=0xfffffe085cfa75f8) at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:426 #33 0xffffffff81a539d4 in VOP_ACCESS_APV ( vop=0xffffffff822ff8b8 , a=0xfffffe085cfa75f8) at vnode_if.c:601 #34 0xffffffff80eda726 in VOP_ACCESS (vp=0xfffff80011d9b588, accmode=64, cred=0xfffff800118c0100, td=0xfffff80063ba1000) at ./vnode_if.h:254 #35 0xffffffff80ecb925 in nfs_lookup (ap=0xfffffe085cfa7cf8) at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:1064 #36 0xffffffff81a52a44 in VOP_LOOKUP_APV ( vop=0xffffffff822ff8b8 , a=0xfffffe085cfa7cf8) at vnode_if.c:127 #37 0xffffffff811c6aad in VOP_LOOKUP (dvp=0xfffff80011d9b588, vpp=0xfffffe085cfa8708, cnp=0xfffffe085cfa8730) at ./vnode_if.h:54 #38 0xffffffff811c5b64 in lookup (ndp=0xfffffe085cfa86a8) at /usr/src/sys/kern/vfs_lookup.c:886 #39 0xffffffff811c4aa2 in namei (ndp=0xfffffe085cfa86a8) at /usr/src/sys/kern/vfs_lookup.c:448 #40 0xffffffff810050f0 in do_execve (td=0xfffff80063ba1000, args=0xfffffe085cfa8838, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:446 #41 0xffffffff810047fa in kern_execve (td=0xfffff80063ba1000, args=0xfffffe085cfa8838, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:347 #42 0xffffffff810041e2 in sys_execve (td=0xfffff80063ba1000, uap=0xfffff80063ba1538) at /usr/src/sys/kern/kern_exec.c:221 #43 0xffffffff817da5ed in syscallenter (td=0xfffff80063ba1000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:131 #44 0xffffffff817d9d0b in amd64_syscall (td=0xfffff80063ba1000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:903 #45 #46 0x0000000800d5285a in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffe7d8 (kgdb) -- G. Paul Ziemba FreeBSD unix: 9:31PM up 24 mins, 3 users, load averages: 0.10, 0.22, 0.25 From owner-freebsd-stable@freebsd.org Sat Jul 22 05:42:51 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1A40ACFC2EA for ; Sat, 22 Jul 2017 05:42:51 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id CFA786D1BF for ; Sat, 22 Jul 2017 05:42:50 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id v6M5ggtP052112; Fri, 21 Jul 2017 22:42:46 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201707220542.v6M5ggtP052112@gw.catspoiler.org> Date: Fri, 21 Jul 2017 22:42:42 -0700 (PDT) From: Don Lewis Subject: Re: stable/11 r321349 crashing immediately To: pz-freebsd-stable@ziemba.us cc: freebsd-stable@FreeBSD.org In-Reply-To: <20170722045140.GA5680@hairball.ziemba.us> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 05:42:51 -0000 On 21 Jul, G. Paul Ziemba wrote: > GENERIC kernel r321349 results in the following about a minute after > multiuser boot completes. > > What additional information should I provide to assist in debugging? > > Many thanks! > > [Extracted from /var/crash/core.txt.NNN] > > KDB: stack backtrace: > #0 0xffffffff810f6ed7 at kdb_backtrace+0xa7 > #1 0xffffffff810872a9 at vpanic+0x249 > #2 0xffffffff81087060 at vpanic+0 > #3 0xffffffff817d9aca at dblfault_handler+0x10a > #4 0xffffffff817ae93c at Xdblfault+0xac > #5 0xffffffff810cf76e at cpu_search_lowest+0x35e > #6 0xffffffff810cf76e at cpu_search_lowest+0x35e > #7 0xffffffff810d5b36 at sched_lowest+0x66 > #8 0xffffffff810d1d92 at sched_pickcpu+0x522 > #9 0xffffffff810d2b03 at sched_add+0xd3 > #10 0xffffffff8101df5c at intr_event_schedule_thread+0x18c > #11 0xffffffff8101ddb0 at swi_sched+0xa0 > #12 0xffffffff81261643 at netisr_queue_internal+0x1d3 > #13 0xffffffff81261212 at netisr_queue_src+0x92 > #14 0xffffffff81261677 at netisr_queue+0x27 > #15 0xffffffff8123da5a at if_simloop+0x20a > #16 0xffffffff8123d83b at looutput+0x22b > #17 0xffffffff8131c4c6 at ip_output+0x1aa6 > > doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 > 298 dumptid = curthread->td_tid; > (kgdb) #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 > #1 0xffffffff810867e8 in kern_reboot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:366 > #2 0xffffffff810872ff in vpanic (fmt=0xffffffff81e5f7e0 "double fault", > ap=0xfffffe0839778ec0) at /usr/src/sys/kern/kern_shutdown.c:759 > #3 0xffffffff81087060 in panic (fmt=0xffffffff81e5f7e0 "double fault") > at /usr/src/sys/kern/kern_shutdown.c:690 > #4 0xffffffff817d9aca in dblfault_handler (frame=0xfffffe0839778f40) > at /usr/src/sys/amd64/amd64/trap.c:828 > #5 > #6 0xffffffff810cf422 in cpu_search_lowest ( > cg=0xffffffff826ccd98 , > low= ff8>) at /usr/src/sys/kern/sched_ule.c:782 > #7 0xffffffff810cf76e in cpu_search (cg=0xffffffff826cccb8 , > low=0xfffffe085cfa53b8, high=0x0, match=1) > at /usr/src/sys/kern/sched_ule.c:710 > #8 cpu_search_lowest (cg=0xffffffff826cccb8 , > low=0xfffffe085cfa53b8) at /usr/src/sys/kern/sched_ule.c:783 > #9 0xffffffff810cf76e in cpu_search (cg=0xffffffff826ccc80 , > low=0xfffffe085cfa5430, high=0x0, match=1) > at /usr/src/sys/kern/sched_ule.c:710 > #10 cpu_search_lowest (cg=0xffffffff826ccc80 , low=0xfffffe085cfa5430) > at /usr/src/sys/kern/sched_ule.c:783 > #11 0xffffffff810d5b36 in sched_lowest (cg=0xffffffff826ccc80 , > mask=..., pri=28, maxload=2147483647, prefer=4) > at /usr/src/sys/kern/sched_ule.c:815 > #12 0xffffffff810d1d92 in sched_pickcpu (td=0xfffff8000a3a9000, flags=4) > at /usr/src/sys/kern/sched_ule.c:1292 > #13 0xffffffff810d2b03 in sched_add (td=0xfffff8000a3a9000, flags=4) > at /usr/src/sys/kern/sched_ule.c:2447 > #14 0xffffffff8101df5c in intr_event_schedule_thread (ie=0xfffff80007e7ae00) > at /usr/src/sys/kern/kern_intr.c:917 > #15 0xffffffff8101ddb0 in swi_sched (cookie=0xfffff8000a386880, flags=0) > at /usr/src/sys/kern/kern_intr.c:1163 > #16 0xffffffff81261643 in netisr_queue_internal (proto=1, > m=0xfffff80026d00500, cpuid=0) at /usr/src/sys/net/netisr.c:1022 > #17 0xffffffff81261212 in netisr_queue_src (proto=1, source=0, > m=0xfffff80026d00500) at /usr/src/sys/net/netisr.c:1056 > #18 0xffffffff81261677 in netisr_queue (proto=1, m=0xfffff80026d00500) > at /usr/src/sys/net/netisr.c:1069 > #19 0xffffffff8123da5a in if_simloop (ifp=0xfffff800116eb000, > m=0xfffff80026d00500, af=2, hlen=0) at /usr/src/sys/net/if_loop.c:358 > #20 0xffffffff8123d83b in looutput (ifp=0xfffff800116eb000, > m=0xfffff80026d00500, dst=0xfffff80026ed6550, ro=0xfffff80026ed6530) > at /usr/src/sys/net/if_loop.c:265 > #21 0xffffffff8131c4c6 in ip_output (m=0xfffff80026d00500, opt=0x0, > ro=0xfffff80026ed6530, flags=0, imo=0x0, inp=0xfffff80026ed63a0) > at /usr/src/sys/netinet/ip_output.c:655 > #22 0xffffffff8142e1c7 in tcp_output (tp=0xfffff80026eb2820) > at /usr/src/sys/netinet/tcp_output.c:1447 > #23 0xffffffff81447700 in tcp_usr_send (so=0xfffff80011ec2360, flags=0, > m=0xfffff80026d14d00, nam=0x0, control=0x0, td=0xfffff80063ba1000) > at /usr/src/sys/netinet/tcp_usrreq.c:967 > #24 0xffffffff811776f1 in sosend_generic (so=0xfffff80011ec2360, addr=0x0, > uio=0x0, top=0xfffff80026d14d00, control=0x0, flags=0, > td=0xfffff80063ba1000) at /usr/src/sys/kern/uipc_socket.c:1360 > #25 0xffffffff811779bd in sosend (so=0xfffff80011ec2360, addr=0x0, uio=0x0, > top=0xfffff80026d14d00, control=0x0, flags=0, td=0xfffff80063ba1000) > at /usr/src/sys/kern/uipc_socket.c:1405 > #26 0xffffffff815276a2 in clnt_vc_call (cl=0xfffff80063ca0980, > ext=0xfffffe085cfa6e38, proc=4, args=0xfffff80026c3bc00, > resultsp=0xfffffe085cfa7110, utimeout=...) > at /usr/src/sys/rpc/clnt_vc.c:413 > #27 0xffffffff8152391c in clnt_reconnect_call (cl=0xfffff80063ca0c00, > ext=0xfffffe085cfa6e38, proc=4, args=0xfffff80026c3bc00, > resultsp=0xfffffe085cfa7110, utimeout=...) > at /usr/src/sys/rpc/clnt_rc.c:271 > #28 0xffffffff80e75628 in newnfs_request (nd=0xfffffe085cfa7110, > nmp=0xfffff80007e79c00, clp=0x0, nrp=0xfffff80007e79d28, > vp=0xfffff80011d9b588, td=0xfffff80063ba1000, cred=0xfffff800118c0100, > prog=100003, vers=3, retsum=0x0, toplevel=1, xidp=0x0, dssep=0x0) > at /usr/src/sys/fs/nfs/nfs_commonkrpc.c:760 > #29 0xffffffff80ee87f1 in nfscl_request (nd=0xfffffe085cfa7110, > vp=0xfffff80011d9b588, p=0xfffff80063ba1000, cred=0xfffff800118c0100, > stuff=0x0) at /usr/src/sys/fs/nfsclient/nfs_clport.c:952 > #30 0xffffffff80ea865c in nfsrpc_accessrpc (vp=0xfffff80011d9b588, mode=63, > cred=0xfffff800118c0100, p=0xfffff80063ba1000, nap=0xfffffe085cfa72e0, > attrflagp=0xfffffe085cfa73c0, rmodep=0xfffffe085cfa73b4, stuff=0x0) > at /usr/src/sys/fs/nfsclient/nfs_clrpcops.c:243 > #31 0xffffffff80ed9ec9 in nfs34_access_otw (vp=0xfffff80011d9b588, wmode=63, > td=0xfffff80063ba1000, cred=0xfffff800118c0100, > retmode=0xfffffe085cfa7540) at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:283 > #32 0xffffffff80ecfb64 in nfs_access (ap=0xfffffe085cfa75f8) > at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:426 > #33 0xffffffff81a539d4 in VOP_ACCESS_APV ( > vop=0xffffffff822ff8b8 , a=0xfffffe085cfa75f8) > at vnode_if.c:601 > #34 0xffffffff80eda726 in VOP_ACCESS (vp=0xfffff80011d9b588, accmode=64, > cred=0xfffff800118c0100, td=0xfffff80063ba1000) at ./vnode_if.h:254 > #35 0xffffffff80ecb925 in nfs_lookup (ap=0xfffffe085cfa7cf8) > at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:1064 > #36 0xffffffff81a52a44 in VOP_LOOKUP_APV ( > vop=0xffffffff822ff8b8 , a=0xfffffe085cfa7cf8) > at vnode_if.c:127 > #37 0xffffffff811c6aad in VOP_LOOKUP (dvp=0xfffff80011d9b588, > vpp=0xfffffe085cfa8708, cnp=0xfffffe085cfa8730) at ./vnode_if.h:54 > #38 0xffffffff811c5b64 in lookup (ndp=0xfffffe085cfa86a8) > at /usr/src/sys/kern/vfs_lookup.c:886 > #39 0xffffffff811c4aa2 in namei (ndp=0xfffffe085cfa86a8) > at /usr/src/sys/kern/vfs_lookup.c:448 > #40 0xffffffff810050f0 in do_execve (td=0xfffff80063ba1000, > args=0xfffffe085cfa8838, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:446 > #41 0xffffffff810047fa in kern_execve (td=0xfffff80063ba1000, > args=0xfffffe085cfa8838, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:347 > #42 0xffffffff810041e2 in sys_execve (td=0xfffff80063ba1000, > uap=0xfffff80063ba1538) at /usr/src/sys/kern/kern_exec.c:221 > #43 0xffffffff817da5ed in syscallenter (td=0xfffff80063ba1000) > at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:131 > #44 0xffffffff817d9d0b in amd64_syscall (td=0xfffff80063ba1000, traced=0) > at /usr/src/sys/amd64/amd64/trap.c:903 > #45 > #46 0x0000000800d5285a in ?? () > Backtrace stopped: Cannot access memory at address 0x7fffffffe7d8 > (kgdb) The double fault is a pretty good indication that you overflowed the kernel stack. Having ~40 frames on the stack when the fault happened is consistent with that. It looks like you are trying to execute a program from an NFS file system that is exported by the same host. This isn't exactly optimal ... Your best bet for a quick workaround for the stack overflow would be to rebuild the kernel with a larger value of KSTACK_PAGES. You can find teh default in /usr/src/sys//conf/NOTES. It would probably be a good idea to compute the differences in the stack pointer values between adjacent stack frames to see of any of them are consuming an excessive amount of stack space. From owner-freebsd-stable@freebsd.org Sat Jul 22 05:51:12 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 711D5CFC4BB for ; Sat, 22 Jul 2017 05:51:12 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 0CB2E6D4D6; Sat, 22 Jul 2017 05:51:11 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221]) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id v6M5p5cB026390 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 22 Jul 2017 07:51:06 +0200 (CEST) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: truckman@FreeBSD.org Received: from [10.58.0.4] (dadv@[10.58.0.4]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id v6M5p17Q058876 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sat, 22 Jul 2017 12:51:01 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: stable/11 r321349 crashing immediately To: Don Lewis , pz-freebsd-stable@ziemba.us References: <201707220542.v6M5ggtP052112@gw.catspoiler.org> Cc: freebsd-stable@FreeBSD.org From: Eugene Grosbein Message-ID: <5972E7C5.6070102@grosbein.net> Date: Sat, 22 Jul 2017 12:51:01 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <201707220542.v6M5ggtP052112@gw.catspoiler.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_00, DATE_IN_FUTURE_96_Q, LOCAL_FROM autolearn=no autolearn_force=no version=3.4.1 X-Spam-Report: * 3.3 DATE_IN_FUTURE_96_Q Date: is 4 days to 4 months after Received: date * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 2.6 LOCAL_FROM From my domains X-Spam-Level: *** X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on hz.grosbein.net X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 05:51:12 -0000 22.07.2017 12:42, Don Lewis wrote: > The double fault is a pretty good indication that you overflowed the > kernel stack. Having ~40 frames on the stack when the fault happened is > consistent with that. > > It looks like you are trying to execute a program from an NFS file > system that is exported by the same host. This isn't exactly optimal > ... > > Your best bet for a quick workaround for the stack overflow would be to > rebuild the kernel with a larger value of KSTACK_PAGES. You can find > teh default in /usr/src/sys//conf/NOTES. > > It would probably be a good idea to compute the differences in the stack > pointer values between adjacent stack frames to see of any of them are > consuming an excessive amount of stack space. Also, there is https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219476 Eugene Grosbein From owner-freebsd-stable@freebsd.org Sat Jul 22 07:00:53 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 22DC8CFD684 for ; Sat, 22 Jul 2017 07:00:53 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ACCCC6EF26; Sat, 22 Jul 2017 07:00:52 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v6M70hgL036417 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 22 Jul 2017 10:00:43 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v6M70hgL036417 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v6M70gd2036415; Sat, 22 Jul 2017 10:00:42 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Jul 2017 10:00:42 +0300 From: Konstantin Belousov To: Don Lewis Cc: pz-freebsd-stable@ziemba.us, freebsd-stable@FreeBSD.org Subject: Re: stable/11 r321349 crashing immediately Message-ID: <20170722070042.GO1935@kib.kiev.ua> References: <20170722045140.GA5680@hairball.ziemba.us> <201707220542.v6M5ggtP052112@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201707220542.v6M5ggtP052112@gw.catspoiler.org> User-Agent: Mutt/1.8.3 (2017-05-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 07:00:53 -0000 On Fri, Jul 21, 2017 at 10:42:42PM -0700, Don Lewis wrote: > Your best bet for a quick workaround for the stack overflow would be to > rebuild the kernel with a larger value of KSTACK_PAGES. You can find > teh default in /usr/src/sys//conf/NOTES. Or set the tunable kern.kstack_pages to the desired number of pages from the loader prompt. From owner-freebsd-stable@freebsd.org Sat Jul 22 07:05:35 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0A0B8CFD88C for ; Sat, 22 Jul 2017 07:05:35 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9649B6F358; Sat, 22 Jul 2017 07:05:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v6M75TNQ037409 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 22 Jul 2017 10:05:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v6M75TNQ037409 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v6M75T2G037407; Sat, 22 Jul 2017 10:05:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Jul 2017 10:05:29 +0300 From: Konstantin Belousov To: Eugene Grosbein Cc: Don Lewis , pz-freebsd-stable@ziemba.us, freebsd-stable@FreeBSD.org Subject: Re: stable/11 r321349 crashing immediately Message-ID: <20170722070529.GP1935@kib.kiev.ua> References: <201707220542.v6M5ggtP052112@gw.catspoiler.org> <5972E7C5.6070102@grosbein.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5972E7C5.6070102@grosbein.net> User-Agent: Mutt/1.8.3 (2017-05-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 07:05:35 -0000 On Sat, Jul 22, 2017 at 12:51:01PM +0700, Eugene Grosbein wrote: > Also, there is https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219476 I strongly disagree with the idea of increasing the default kernel stack size, it will cause systematic problems for all users instead of current state where some workloads are problematic. Finding contig KVA ranges for larger stacks on KVA-starved architectures is not going to work. The real solution is to move allocations from stack to heap, one by one. You claimed that vm/vm_object.o consumes 1.5K of stack, can you show the ddb backtrace of this situation ? From owner-freebsd-stable@freebsd.org Sat Jul 22 07:41:24 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A3039CFE20A for ; Sat, 22 Jul 2017 07:41:24 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 3A2FC7020C; Sat, 22 Jul 2017 07:41:23 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221]) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id v6M7fEB6027054 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 22 Jul 2017 09:41:15 +0200 (CEST) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: kostikbel@gmail.com Received: from [10.58.0.4] ([10.58.0.4]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id v6M7f4tI091237 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sat, 22 Jul 2017 14:41:04 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: stable/11 r321349 crashing immediately To: Konstantin Belousov References: <201707220542.v6M5ggtP052112@gw.catspoiler.org> <5972E7C5.6070102@grosbein.net> <20170722070529.GP1935@kib.kiev.ua> Cc: Don Lewis , pz-freebsd-stable@ziemba.us, freebsd-stable@FreeBSD.org From: Eugene Grosbein Message-ID: <5973018B.2050505@grosbein.net> Date: Sat, 22 Jul 2017 14:40:59 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <20170722070529.GP1935@kib.kiev.ua> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00,LOCAL_FROM autolearn=no autolearn_force=no version=3.4.1 X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 2.6 LOCAL_FROM From my domains X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on hz.grosbein.net X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 07:41:24 -0000 22.07.2017 14:05, Konstantin Belousov wrote: > On Sat, Jul 22, 2017 at 12:51:01PM +0700, Eugene Grosbein wrote: >> Also, there is https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219476 > > I strongly disagree with the idea of increasing the default kernel > stack size, it will cause systematic problems for all users instead of > current state where some workloads are problematic. Finding contig > KVA ranges for larger stacks on KVA-starved architectures is not going > to work. My practice shows that increase of default kernel stack size for i386 system using IPSEC and ZFS with compression and KVA_PAGES=512/KSTACK_PAGES=4 does work. No stack-relates problems observed with such parametes. Contrary, problems quickly arise if one does not increase default kernel stack size for such i386 system. I use several such systems for years. We have src/UPDATING entries 20121223 and 20150728 stating the same. Those are linked to Errata Notes to every release since 10.2 as open issues. How many releases are we going to keep this "open"? Also, I've always wondered what load pattern one should have to exhibit real kernel stack problems due to KVA fragmentation and KSTACK_PAGES>2 on i386? > The real solution is to move allocations from stack to heap, one by one. That was not done since 10.2-RELEASE and I see that this only getting worse. > You claimed that vm/vm_object.o consumes 1.5K of stack, can you show > the ddb backtrace of this situation ? These data were collected with machine object code inspection and only some of numbers were verified by hand. I admit there may be some false positives. How can I get ddb backtrace you asked for? I'm not very familiar with ddb. I have serial console to such i386 system. From owner-freebsd-stable@freebsd.org Sat Jul 22 08:00:18 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7C3D8CFE8A4 for ; Sat, 22 Jul 2017 08:00:18 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1462170842; Sat, 22 Jul 2017 08:00:17 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v6M80CV2049307 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 22 Jul 2017 11:00:12 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v6M80CV2049307 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v6M80CqY049305; Sat, 22 Jul 2017 11:00:12 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Jul 2017 11:00:12 +0300 From: Konstantin Belousov To: Eugene Grosbein Cc: Don Lewis , pz-freebsd-stable@ziemba.us, freebsd-stable@FreeBSD.org Subject: Re: stable/11 r321349 crashing immediately Message-ID: <20170722080012.GR1935@kib.kiev.ua> References: <201707220542.v6M5ggtP052112@gw.catspoiler.org> <5972E7C5.6070102@grosbein.net> <20170722070529.GP1935@kib.kiev.ua> <5973018B.2050505@grosbein.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5973018B.2050505@grosbein.net> User-Agent: Mutt/1.8.3 (2017-05-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 08:00:18 -0000 On Sat, Jul 22, 2017 at 02:40:59PM +0700, Eugene Grosbein wrote: > Also, I've always wondered what load pattern one should have > to exhibit real kernel stack problems due to KVA fragmentation > and KSTACK_PAGES>2 on i386? In fact each stack consumes 3 contigous pages because there is also the guard, which catches the double faults. You need to use the machine, e.g. to run something that creates and destroys kernel threads, while doing something that consumes kernel_arena KVA. Plain malloc/zmalloc is enough. In other words, any non-static load would cause fragmentation preventing allocations of the kernel stacks for new threads. > How can I get ddb backtrace you asked for? I'm not very familiar with ddb. > I have serial console to such i386 system. bt command for the given thread provides the backtrace. I have no idea how did you obtained the numbers that you show. From owner-freebsd-stable@freebsd.org Sat Jul 22 08:37:40 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64148CFF277 for ; Sat, 22 Jul 2017 08:37:40 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E8D6A7189E; Sat, 22 Jul 2017 08:37:39 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221]) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id v6M8bYA8027416 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 22 Jul 2017 10:37:35 +0200 (CEST) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: kostikbel@gmail.com Received: from eg.sd.rdtc.ru (eugen@localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTP id v6M8bU0H007977; Sat, 22 Jul 2017 15:37:30 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: stable/11 r321349 crashing immediately To: Konstantin Belousov References: <201707220542.v6M5ggtP052112@gw.catspoiler.org> <5972E7C5.6070102@grosbein.net> <20170722070529.GP1935@kib.kiev.ua> <5973018B.2050505@grosbein.net> <20170722080012.GR1935@kib.kiev.ua> Cc: Don Lewis , pz-freebsd-stable@ziemba.us, freebsd-stable@FreeBSD.org From: Eugene Grosbein X-Enigmail-Draft-Status: N1110 Message-ID: <59730ECA.7030309@grosbein.net> Date: Sat, 22 Jul 2017 15:37:30 +0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <20170722080012.GR1935@kib.kiev.ua> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_00, DATE_IN_FUTURE_96_Q, LOCAL_FROM autolearn=no autolearn_force=no version=3.4.1 X-Spam-Report: * 3.3 DATE_IN_FUTURE_96_Q Date: is 4 days to 4 months after Received: date * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 2.6 LOCAL_FROM From my domains X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on hz.grosbein.net X-Spam-Level: *** X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 08:37:40 -0000 On 22.07.2017 15:00, Konstantin Belousov wrote: > On Sat, Jul 22, 2017 at 02:40:59PM +0700, Eugene Grosbein wrote: >> Also, I've always wondered what load pattern one should have >> to exhibit real kernel stack problems due to KVA fragmentation >> and KSTACK_PAGES>2 on i386? > In fact each stack consumes 3 contigous pages because there is also > the guard, which catches the double faults. > > You need to use the machine, e.g. to run something that creates and destroys > kernel threads, while doing something that consumes kernel_arena KVA. > Plain malloc/zmalloc is enough. Does an i386 box running PPPoE connection to an ISP (mpd5) plus several IPSEC tunnels plus PPtP tunnel plus WiFi access point plus "transmission" torrent client with 2TB UFS volume over GEOM_CACHE over GEOM_JOURNAL over USB qualify? There are ospfd, racoon, sendmail, ssh and several periodic cron jobs too. > In other words, any non-static load would cause fragmentation preventing > allocations of the kernel stacks for new threads. > >> How can I get ddb backtrace you asked for? I'm not very familiar with ddb. >> I have serial console to such i386 system. > > bt command for the given thread provides the backtrace. I have no idea > how did you obtained the numbers that you show. Not sure what kernel thread I too to trace... If you just need a name of the function: $ objdump -d vm_object.o | grep -B 8 'sub .*0x...,%esp' |less 00003b30 : 3b30: 55 push %ebp 3b31: 89 e5 mov %esp,%ebp 3b33: 53 push %ebx 3b34: 57 push %edi 3b35: 56 push %esi 3b36: 83 e4 f8 and $0xfffffff8,%esp 3b39: 81 ec 30 05 00 00 sub $0x530,%esp It uses stack for pretty large struct kinfo_vmobject (includes char kvo_path[PATH_MAX]) and several others. From owner-freebsd-stable@freebsd.org Sat Jul 22 09:57:32 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A6052D7C109 for ; Sat, 22 Jul 2017 09:57:32 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 34E797416F; Sat, 22 Jul 2017 09:57:31 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v6M9vQJU075133 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 22 Jul 2017 12:57:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v6M9vQJU075133 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v6M9vPkh075132; Sat, 22 Jul 2017 12:57:25 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Jul 2017 12:57:25 +0300 From: Konstantin Belousov To: Eugene Grosbein Cc: Don Lewis , pz-freebsd-stable@ziemba.us, freebsd-stable@FreeBSD.org Subject: Re: stable/11 r321349 crashing immediately Message-ID: <20170722095725.GS1935@kib.kiev.ua> References: <201707220542.v6M5ggtP052112@gw.catspoiler.org> <5972E7C5.6070102@grosbein.net> <20170722070529.GP1935@kib.kiev.ua> <5973018B.2050505@grosbein.net> <20170722080012.GR1935@kib.kiev.ua> <59730ECA.7030309@grosbein.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59730ECA.7030309@grosbein.net> User-Agent: Mutt/1.8.3 (2017-05-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 09:57:32 -0000 On Sat, Jul 22, 2017 at 03:37:30PM +0700, Eugene Grosbein wrote: > On 22.07.2017 15:00, Konstantin Belousov wrote: > > On Sat, Jul 22, 2017 at 02:40:59PM +0700, Eugene Grosbein wrote: > >> Also, I've always wondered what load pattern one should have > >> to exhibit real kernel stack problems due to KVA fragmentation > >> and KSTACK_PAGES>2 on i386? > > In fact each stack consumes 3 contigous pages because there is also > > the guard, which catches the double faults. > > > > You need to use the machine, e.g. to run something that creates and destroys > > kernel threads, while doing something that consumes kernel_arena KVA. > > Plain malloc/zmalloc is enough. > > Does an i386 box running PPPoE connection to an ISP (mpd5) plus several > IPSEC tunnels plus PPtP tunnel plus WiFi access point plus > "transmission" torrent client with 2TB UFS volume over GEOM_CACHE > over GEOM_JOURNAL over USB qualify? There are ospfd, racoon, > sendmail, ssh and several periodic cron jobs too. I doubt that any tunnels activity causes creation and destruction of threads. Same for hostapd or routing daemons or UFS over really weird geom classes. Sendmail and cron indeed cause process creation, but the overhead of processing of these programs prevents high turnaround of new processes, typically. No idea about your torrent client. >From this description, I would be not even surprised if your machine load fits into the kstacks cache, despite cache' quite conservative settings. In other words, almost definitely your machine is not representative for the problematic load. Something that creates a lot of short- and middle- lived threads would be. > > > In other words, any non-static load would cause fragmentation preventing > > allocations of the kernel stacks for new threads. > > > >> How can I get ddb backtrace you asked for? I'm not very familiar with ddb. > >> I have serial console to such i386 system. > > > > bt command for the given thread provides the backtrace. I have no idea > > how did you obtained the numbers that you show. > > Not sure what kernel thread I too to trace... If you just need a name of the function: > > $ objdump -d vm_object.o | grep -B 8 'sub .*0x...,%esp' |less > > 00003b30 : > 3b30: 55 push %ebp > 3b31: 89 e5 mov %esp,%ebp > 3b33: 53 push %ebx > 3b34: 57 push %edi > 3b35: 56 push %esi > 3b36: 83 e4 f8 and $0xfffffff8,%esp > 3b39: 81 ec 30 05 00 00 sub $0x530,%esp > > It uses stack for pretty large struct kinfo_vmobject (includes char kvo_path[PATH_MAX]) > and several others. I see. It is enough information to fix your observation for vm_object.o. Patch below reduces the frame size for sysctl_vm_object_list from 1.3K to 200 bytes. This function is only executed by explicit user query. diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c index 6c6137d5fb2..b92d31c3e60 100644 --- a/sys/vm/vm_object.c +++ b/sys/vm/vm_object.c @@ -2275,7 +2315,7 @@ vm_object_vnode(vm_object_t object) static int sysctl_vm_object_list(SYSCTL_HANDLER_ARGS) { - struct kinfo_vmobject kvo; + struct kinfo_vmobject *kvo; char *fullpath, *freepath; struct vnode *vp; struct vattr va; @@ -2300,6 +2340,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS) count * 11 / 10)); } + kvo = malloc(sizeof(*kvo), M_TEMP, M_WAITOK); error = 0; /* @@ -2317,13 +2358,13 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS) continue; } mtx_unlock(&vm_object_list_mtx); - kvo.kvo_size = ptoa(obj->size); - kvo.kvo_resident = obj->resident_page_count; - kvo.kvo_ref_count = obj->ref_count; - kvo.kvo_shadow_count = obj->shadow_count; - kvo.kvo_memattr = obj->memattr; - kvo.kvo_active = 0; - kvo.kvo_inactive = 0; + kvo->kvo_size = ptoa(obj->size); + kvo->kvo_resident = obj->resident_page_count; + kvo->kvo_ref_count = obj->ref_count; + kvo->kvo_shadow_count = obj->shadow_count; + kvo->kvo_memattr = obj->memattr; + kvo->kvo_active = 0; + kvo->kvo_inactive = 0; TAILQ_FOREACH(m, &obj->memq, listq) { /* * A page may belong to the object but be @@ -2335,46 +2376,46 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS) * approximation of the system anyway. */ if (vm_page_active(m)) - kvo.kvo_active++; + kvo->kvo_active++; else if (vm_page_inactive(m)) - kvo.kvo_inactive++; + kvo->kvo_inactive++; } - kvo.kvo_vn_fileid = 0; - kvo.kvo_vn_fsid = 0; - kvo.kvo_vn_fsid_freebsd11 = 0; + kvo->kvo_vn_fileid = 0; + kvo->kvo_vn_fsid = 0; + kvo->kvo_vn_fsid_freebsd11 = 0; freepath = NULL; fullpath = ""; vp = NULL; switch (obj->type) { case OBJT_DEFAULT: - kvo.kvo_type = KVME_TYPE_DEFAULT; + kvo->kvo_type = KVME_TYPE_DEFAULT; break; case OBJT_VNODE: - kvo.kvo_type = KVME_TYPE_VNODE; + kvo->kvo_type = KVME_TYPE_VNODE; vp = obj->handle; vref(vp); break; case OBJT_SWAP: - kvo.kvo_type = KVME_TYPE_SWAP; + kvo->kvo_type = KVME_TYPE_SWAP; break; case OBJT_DEVICE: - kvo.kvo_type = KVME_TYPE_DEVICE; + kvo->kvo_type = KVME_TYPE_DEVICE; break; case OBJT_PHYS: - kvo.kvo_type = KVME_TYPE_PHYS; + kvo->kvo_type = KVME_TYPE_PHYS; break; case OBJT_DEAD: - kvo.kvo_type = KVME_TYPE_DEAD; + kvo->kvo_type = KVME_TYPE_DEAD; break; case OBJT_SG: - kvo.kvo_type = KVME_TYPE_SG; + kvo->kvo_type = KVME_TYPE_SG; break; case OBJT_MGTDEVICE: - kvo.kvo_type = KVME_TYPE_MGTDEVICE; + kvo->kvo_type = KVME_TYPE_MGTDEVICE; break; default: - kvo.kvo_type = KVME_TYPE_UNKNOWN; + kvo->kvo_type = KVME_TYPE_UNKNOWN; break; } VM_OBJECT_RUNLOCK(obj); @@ -2382,29 +2423,30 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS) vn_fullpath(curthread, vp, &fullpath, &freepath); vn_lock(vp, LK_SHARED | LK_RETRY); if (VOP_GETATTR(vp, &va, curthread->td_ucred) == 0) { - kvo.kvo_vn_fileid = va.va_fileid; - kvo.kvo_vn_fsid = va.va_fsid; - kvo.kvo_vn_fsid_freebsd11 = va.va_fsid; + kvo->kvo_vn_fileid = va.va_fileid; + kvo->kvo_vn_fsid = va.va_fsid; + kvo->kvo_vn_fsid_freebsd11 = va.va_fsid; /* truncate */ } vput(vp); } - strlcpy(kvo.kvo_path, fullpath, sizeof(kvo.kvo_path)); + strlcpy(kvo->kvo_path, fullpath, sizeof(kvo->kvo_path)); if (freepath != NULL) free(freepath, M_TEMP); /* Pack record size down */ - kvo.kvo_structsize = offsetof(struct kinfo_vmobject, kvo_path) + - strlen(kvo.kvo_path) + 1; - kvo.kvo_structsize = roundup(kvo.kvo_structsize, + kvo->kvo_structsize = offsetof(struct kinfo_vmobject, kvo_path) + + strlen(kvo->kvo_path) + 1; + kvo->kvo_structsize = roundup(kvo->kvo_structsize, sizeof(uint64_t)); - error = SYSCTL_OUT(req, &kvo, kvo.kvo_structsize); + error = SYSCTL_OUT(req, kvo, kvo->kvo_structsize); mtx_lock(&vm_object_list_mtx); if (error) break; } mtx_unlock(&vm_object_list_mtx); + free(kvo, M_TEMP); return (error); } SYSCTL_PROC(_vm, OID_AUTO, objects, CTLTYPE_STRUCT | CTLFLAG_RW | CTLFLAG_SKIP | From owner-freebsd-stable@freebsd.org Sat Jul 22 10:15:04 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9430D7C9F5 for ; Sat, 22 Jul 2017 10:15:04 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7526C749E2; Sat, 22 Jul 2017 10:15:03 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221]) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id v6MAEq10028043 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 22 Jul 2017 12:14:52 +0200 (CEST) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: kostikbel@gmail.com Received: from eg.sd.rdtc.ru (eugen@localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTP id v6MAElx3036128; Sat, 22 Jul 2017 17:14:48 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: stable/11 r321349 crashing immediately To: Konstantin Belousov References: <201707220542.v6M5ggtP052112@gw.catspoiler.org> <5972E7C5.6070102@grosbein.net> <20170722070529.GP1935@kib.kiev.ua> <5973018B.2050505@grosbein.net> <20170722080012.GR1935@kib.kiev.ua> <59730ECA.7030309@grosbein.net> <20170722095725.GS1935@kib.kiev.ua> Cc: Don Lewis , pz-freebsd-stable@ziemba.us, freebsd-stable@FreeBSD.org From: Eugene Grosbein X-Enigmail-Draft-Status: N1110 Message-ID: <59732597.4010603@grosbein.net> Date: Sat, 22 Jul 2017 17:14:47 +0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <20170722095725.GS1935@kib.kiev.ua> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00,LOCAL_FROM autolearn=no autolearn_force=no version=3.4.1 X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 2.6 LOCAL_FROM From my domains X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on hz.grosbein.net X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 10:15:05 -0000 On 22.07.2017 16:57, Konstantin Belousov wrote: >>From this description, I would be not even surprised if your machine > load fits into the kstacks cache, despite cache' quite conservative > settings. In other words, almost definitely your machine is not > representative for the problematic load. Something that creates a lot of > short- and middle- lived threads would be. I'm having trouble to imagine a real-world task for today's i386 hardware or virtual machine involving heavy usage of many short/middle-lived threads. Perhaps, you have at least a synthetic benchmark so I could try to reproduce kstack/KVA fragmentation-related problem using my i386 hardware? It has 1G RAM, local 16G CompactFlash (13 GB unpartitioned) as ada0, over 200GB free within local IDE HDD (ada1) and mentioned 2TB USB 2.0 HDD. Pretty many resources for 2007 year AMD Geode system, eh? :-) It runs 11.1-PRERELEASE r318642 currently. From owner-freebsd-stable@freebsd.org Sat Jul 22 12:08:50 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EE9ECD7EFC8 for ; Sat, 22 Jul 2017 12:08:50 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from elf.hq.norma.perm.ru (mail.norma.perm.ru [IPv6:2a00:7540:1::5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.norma.perm.ru", Issuer "Vivat-Trade UNIX Root CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5AEAF77BB9 for ; Sat, 22 Jul 2017 12:08:49 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from [IPv6:2a02:2698:26:9b0e:71b3:748:aac9:67e1] (dynamic-2a02-2698-26-0-0.perm.ertelecom.ru [IPv6:2a02:2698:26:9b0e:71b3:748:aac9:67e1] (may be forged)) (authenticated bits=0) by elf.hq.norma.perm.ru (8.15.2/8.15.2) with ESMTPSA id v6MC8hit081070 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Sat, 22 Jul 2017 17:08:44 +0500 (YEKT) (envelope-from emz@norma.perm.ru) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=norma.perm.ru; s=key; t=1500725324; bh=aT8LS8Sm2zRXOuv3zYYf48vJ+PooRUMwEgiNZ1wZ+XY=; h=To:From:Subject:Date; b=RSl2cHVRlBMuMeOkat9oAN1Xq+HUKJz4y6hBlYPzbkNIHcl/pJMxOgvhrwB0EANNB cBQbkYjebTij/u/QlPZrmle4Tp/pZTgNdZNgZVxrSnPlJpaLPXeGMXXA3ZH5SUNsB1 t1NNSzAwIVFz5BBOyddmxGjKp2+b24JtCnYqQk7w= To: freebsd-stable From: "Eugene M. Zheganin" Subject: cannot destroy faulty zvol Message-ID: <73ed4a4d-a156-457a-37d3-12d7ef4f89b9@norma.perm.ru> Date: Sat, 22 Jul 2017 17:08:29 +0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 12:08:51 -0000 Hi, I cannot destroy a zvol for a reason that I don't understand: [root@san1:~]# zfs list -t all | grep worker182 zfsroot/userdata/worker182-bad 1,38G 1,52T 708M - [root@san1:~]# zfs destroy -R zfsroot/userdata/worker182-bad cannot destroy 'zfsroot/userdata/worker182-bad': dataset already exists [root@san1:~]# also noitice that this zvol is faulty: pool: zfsroot state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub in progress since Sat Jul 22 15:01:37 2017 18,7G scanned out of 130G at 75,0M/s, 0h25m to go 0 repaired, 14,43% done config: NAME STATE READ WRITE CKSUM zfsroot ONLINE 0 0 4 mirror-0 ONLINE 0 0 8 gpt/zroot0 ONLINE 0 0 8 gpt/zroot1 ONLINE 0 0 8 errors: Permanent errors have been detected in the following files: zfsroot/userdata/worker182-bad:<0x1> <0xc7>:<0x1> is this weird error "cannot destroy: already exists" related to the fact that the zvol is faulty ? Does it indicate that metadata is probably faulty too ? Anyway, is there a way to destroy this dataset ? Thanks. From owner-freebsd-stable@freebsd.org Sat Jul 22 15:56:07 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0BEC2DA4B98 for ; Sat, 22 Jul 2017 15:56:07 +0000 (UTC) (envelope-from pz-freebsd-stable@ziemba.us) Received: from osmtp.ziemba.us (osmtp.ziemba.us [208.106.105.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 622DA822C9 for ; Sat, 22 Jul 2017 15:56:06 +0000 (UTC) (envelope-from pz-freebsd-stable@ziemba.us) Received: from hairball.ziemba.us (localhost.ziemba.us [127.0.0.1]) by hairball.ziemba.us (8.15.2/8.15.2) with ESMTP id v6MFtE7I011169 for ; Sat, 22 Jul 2017 08:55:14 -0700 (PDT) (envelope-from pz-freebsd-stable@ziemba.us) Received: (from mailnull@localhost) by hairball.ziemba.us (8.15.2/8.15.2/Submit) id v6MFtEUQ011168 for freebsd-stable@FreeBSD.org; Sat, 22 Jul 2017 08:55:14 -0700 (PDT) (envelope-from pz-freebsd-stable@ziemba.us) X-Authentication-Warning: hairball.ziemba.us: mailnull set sender to pz-freebsd-stable@ziemba.us using -f Received: (from news@localhost) by usenet.ziemba.us (8.14.5/8.14.5/Submit) id v6L4rIP1007347 for treehouse-mail-freebsd-stable@hairball.ziemba.us; Thu, 20 Jul 2017 21:53:18 -0700 (PDT) (envelope-from news) From: "G. Paul Ziemba" To: freebsd-stable@FreeBSD.org Subject: Re: stable/11 r321349 crashing immediately Date: Fri, 21 Jul 2017 04:53:18 +0000 (UTC) Message-id: References: <201707220542.v6M5ggtP052112@gw.catspoiler.org> Reply-to: unp@ziemba.us Errors-to: "G. Paul Ziemba" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 15:56:07 -0000 truckman@freebsd.org (Don Lewis) writes: >On 21 Jul, G. Paul Ziemba wrote: >> GENERIC kernel r321349 results in the following about a minute after >> multiuser boot completes. >> >> What additional information should I provide to assist in debugging? >> >> Many thanks! >> >> [Extracted from /var/crash/core.txt.NNN] >> >> KDB: stack backtrace: >> #0 0xffffffff810f6ed7 at kdb_backtrace+0xa7 >> #1 0xffffffff810872a9 at vpanic+0x249 >> #2 0xffffffff81087060 at vpanic+0 >> #3 0xffffffff817d9aca at dblfault_handler+0x10a >> #4 0xffffffff817ae93c at Xdblfault+0xac >> #5 0xffffffff810cf76e at cpu_search_lowest+0x35e >> #6 0xffffffff810cf76e at cpu_search_lowest+0x35e >> #7 0xffffffff810d5b36 at sched_lowest+0x66 >> #8 0xffffffff810d1d92 at sched_pickcpu+0x522 >> #9 0xffffffff810d2b03 at sched_add+0xd3 >> #10 0xffffffff8101df5c at intr_event_schedule_thread+0x18c >> #11 0xffffffff8101ddb0 at swi_sched+0xa0 >> #12 0xffffffff81261643 at netisr_queue_internal+0x1d3 >> #13 0xffffffff81261212 at netisr_queue_src+0x92 >> #14 0xffffffff81261677 at netisr_queue+0x27 >> #15 0xffffffff8123da5a at if_simloop+0x20a >> #16 0xffffffff8123d83b at looutput+0x22b >> #17 0xffffffff8131c4c6 at ip_output+0x1aa6 >> >> doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >> 298 dumptid = curthread->td_tid; >> (kgdb) #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >> #1 0xffffffff810867e8 in kern_reboot (howto=260) >> at /usr/src/sys/kern/kern_shutdown.c:366 >> #2 0xffffffff810872ff in vpanic (fmt=0xffffffff81e5f7e0 "double fault", >> ap=0xfffffe0839778ec0) at /usr/src/sys/kern/kern_shutdown.c:759 >> #3 0xffffffff81087060 in panic (fmt=0xffffffff81e5f7e0 "double fault") >> at /usr/src/sys/kern/kern_shutdown.c:690 >> #4 0xffffffff817d9aca in dblfault_handler (frame=0xfffffe0839778f40) >> at /usr/src/sys/amd64/amd64/trap.c:828 >> #5 >> #6 0xffffffff810cf422 in cpu_search_lowest ( >> cg=0xffffffff826ccd98 , >> low=> ff8>) at /usr/src/sys/kern/sched_ule.c:782 >> #7 0xffffffff810cf76e in cpu_search (cg=0xffffffff826cccb8 , >> low=0xfffffe085cfa53b8, high=0x0, match=1) >> at /usr/src/sys/kern/sched_ule.c:710 >> #8 cpu_search_lowest (cg=0xffffffff826cccb8 , >> low=0xfffffe085cfa53b8) at /usr/src/sys/kern/sched_ule.c:783 >> #9 0xffffffff810cf76e in cpu_search (cg=0xffffffff826ccc80 , >> low=0xfffffe085cfa5430, high=0x0, match=1) >> at /usr/src/sys/kern/sched_ule.c:710 >> #10 cpu_search_lowest (cg=0xffffffff826ccc80 , low=0xfffffe085cfa5430) >> at /usr/src/sys/kern/sched_ule.c:783 >> #11 0xffffffff810d5b36 in sched_lowest (cg=0xffffffff826ccc80 , >> mask=..., pri=28, maxload=2147483647, prefer=4) >> at /usr/src/sys/kern/sched_ule.c:815 >> #12 0xffffffff810d1d92 in sched_pickcpu (td=0xfffff8000a3a9000, flags=4) >> at /usr/src/sys/kern/sched_ule.c:1292 >> #13 0xffffffff810d2b03 in sched_add (td=0xfffff8000a3a9000, flags=4) >> at /usr/src/sys/kern/sched_ule.c:2447 >> #14 0xffffffff8101df5c in intr_event_schedule_thread (ie=0xfffff80007e7ae00) >> at /usr/src/sys/kern/kern_intr.c:917 >> #15 0xffffffff8101ddb0 in swi_sched (cookie=0xfffff8000a386880, flags=0) >> at /usr/src/sys/kern/kern_intr.c:1163 >> #16 0xffffffff81261643 in netisr_queue_internal (proto=1, >> m=0xfffff80026d00500, cpuid=0) at /usr/src/sys/net/netisr.c:1022 >> #17 0xffffffff81261212 in netisr_queue_src (proto=1, source=0, >> m=0xfffff80026d00500) at /usr/src/sys/net/netisr.c:1056 >> #18 0xffffffff81261677 in netisr_queue (proto=1, m=0xfffff80026d00500) >> at /usr/src/sys/net/netisr.c:1069 >> #19 0xffffffff8123da5a in if_simloop (ifp=0xfffff800116eb000, >> m=0xfffff80026d00500, af=2, hlen=0) at /usr/src/sys/net/if_loop.c:358 >> #20 0xffffffff8123d83b in looutput (ifp=0xfffff800116eb000, >> m=0xfffff80026d00500, dst=0xfffff80026ed6550, ro=0xfffff80026ed6530) >> at /usr/src/sys/net/if_loop.c:265 >> #21 0xffffffff8131c4c6 in ip_output (m=0xfffff80026d00500, opt=0x0, >> ro=0xfffff80026ed6530, flags=0, imo=0x0, inp=0xfffff80026ed63a0) >> at /usr/src/sys/netinet/ip_output.c:655 >> #22 0xffffffff8142e1c7 in tcp_output (tp=0xfffff80026eb2820) >> at /usr/src/sys/netinet/tcp_output.c:1447 >> #23 0xffffffff81447700 in tcp_usr_send (so=0xfffff80011ec2360, flags=0, >> m=0xfffff80026d14d00, nam=0x0, control=0x0, td=0xfffff80063ba1000) >> at /usr/src/sys/netinet/tcp_usrreq.c:967 >> #24 0xffffffff811776f1 in sosend_generic (so=0xfffff80011ec2360, addr=0x0, >> uio=0x0, top=0xfffff80026d14d00, control=0x0, flags=0, >> td=0xfffff80063ba1000) at /usr/src/sys/kern/uipc_socket.c:1360 >> #25 0xffffffff811779bd in sosend (so=0xfffff80011ec2360, addr=0x0, uio=0x0, >> top=0xfffff80026d14d00, control=0x0, flags=0, td=0xfffff80063ba1000) >> at /usr/src/sys/kern/uipc_socket.c:1405 >> #26 0xffffffff815276a2 in clnt_vc_call (cl=0xfffff80063ca0980, >> ext=0xfffffe085cfa6e38, proc=4, args=0xfffff80026c3bc00, >> resultsp=0xfffffe085cfa7110, utimeout=...) >> at /usr/src/sys/rpc/clnt_vc.c:413 >> #27 0xffffffff8152391c in clnt_reconnect_call (cl=0xfffff80063ca0c00, >> ext=0xfffffe085cfa6e38, proc=4, args=0xfffff80026c3bc00, >> resultsp=0xfffffe085cfa7110, utimeout=...) >> at /usr/src/sys/rpc/clnt_rc.c:271 >> #28 0xffffffff80e75628 in newnfs_request (nd=0xfffffe085cfa7110, >> nmp=0xfffff80007e79c00, clp=0x0, nrp=0xfffff80007e79d28, >> vp=0xfffff80011d9b588, td=0xfffff80063ba1000, cred=0xfffff800118c0100, >> prog=100003, vers=3, retsum=0x0, toplevel=1, xidp=0x0, dssep=0x0) >> at /usr/src/sys/fs/nfs/nfs_commonkrpc.c:760 >> #29 0xffffffff80ee87f1 in nfscl_request (nd=0xfffffe085cfa7110, >> vp=0xfffff80011d9b588, p=0xfffff80063ba1000, cred=0xfffff800118c0100, >> stuff=0x0) at /usr/src/sys/fs/nfsclient/nfs_clport.c:952 >> #30 0xffffffff80ea865c in nfsrpc_accessrpc (vp=0xfffff80011d9b588, mode=63, >> cred=0xfffff800118c0100, p=0xfffff80063ba1000, nap=0xfffffe085cfa72e0, >> attrflagp=0xfffffe085cfa73c0, rmodep=0xfffffe085cfa73b4, stuff=0x0) >> at /usr/src/sys/fs/nfsclient/nfs_clrpcops.c:243 >> #31 0xffffffff80ed9ec9 in nfs34_access_otw (vp=0xfffff80011d9b588, wmode=63, >> td=0xfffff80063ba1000, cred=0xfffff800118c0100, >> retmode=0xfffffe085cfa7540) at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:283 >> #32 0xffffffff80ecfb64 in nfs_access (ap=0xfffffe085cfa75f8) >> at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:426 >> #33 0xffffffff81a539d4 in VOP_ACCESS_APV ( >> vop=0xffffffff822ff8b8 , a=0xfffffe085cfa75f8) >> at vnode_if.c:601 >> #34 0xffffffff80eda726 in VOP_ACCESS (vp=0xfffff80011d9b588, accmode=64, >> cred=0xfffff800118c0100, td=0xfffff80063ba1000) at ./vnode_if.h:254 >> #35 0xffffffff80ecb925 in nfs_lookup (ap=0xfffffe085cfa7cf8) >> at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:1064 >> #36 0xffffffff81a52a44 in VOP_LOOKUP_APV ( >> vop=0xffffffff822ff8b8 , a=0xfffffe085cfa7cf8) >> at vnode_if.c:127 >> #37 0xffffffff811c6aad in VOP_LOOKUP (dvp=0xfffff80011d9b588, >> vpp=0xfffffe085cfa8708, cnp=0xfffffe085cfa8730) at ./vnode_if.h:54 >> #38 0xffffffff811c5b64 in lookup (ndp=0xfffffe085cfa86a8) >> at /usr/src/sys/kern/vfs_lookup.c:886 >> #39 0xffffffff811c4aa2 in namei (ndp=0xfffffe085cfa86a8) >> at /usr/src/sys/kern/vfs_lookup.c:448 >> #40 0xffffffff810050f0 in do_execve (td=0xfffff80063ba1000, >> args=0xfffffe085cfa8838, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:446 >> #41 0xffffffff810047fa in kern_execve (td=0xfffff80063ba1000, >> args=0xfffffe085cfa8838, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:347 >> #42 0xffffffff810041e2 in sys_execve (td=0xfffff80063ba1000, >> uap=0xfffff80063ba1538) at /usr/src/sys/kern/kern_exec.c:221 >> #43 0xffffffff817da5ed in syscallenter (td=0xfffff80063ba1000) >> at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:131 >> #44 0xffffffff817d9d0b in amd64_syscall (td=0xfffff80063ba1000, traced=0) >> at /usr/src/sys/amd64/amd64/trap.c:903 >> #45 >> #46 0x0000000800d5285a in ?? () >> Backtrace stopped: Cannot access memory at address 0x7fffffffe7d8 >> (kgdb) >The double fault is a pretty good indication that you overflowed the >kernel stack. Having ~40 frames on the stack when the fault happened is >consistent with that. First, thank you for this answer. I've been tearing my hair out for several hours (crash, fsck, tweak, repeat). I haven't done intensive work with this kernel so hope you'll entertain my simple questions below. >It looks like you are trying to execute a program from an NFS file >system that is exported by the same host. This isn't exactly optimal >... Perhaps not optimal for the implementation, but I think it's a common NFS scenario: define a set of NFS-provided paths for files and use those path names on all hosts, regardless of whether they happen to be serving the files in question or merely clients. >Your best bet for a quick workaround for the stack overflow would be to >rebuild the kernel with a larger value of KSTACK_PAGES. You can find >teh default in /usr/src/sys//conf/NOTES. By the way, my host is amd64, not i386. I don't know if that matters to the larger discussion, but I'm not running in a resource-constrained situation (i.e., the host has Xeon E3-1231 (4 cores, 8 threads) and 32GB ram) It seems (from /usr/include/machine/param.h) that PAGE_SIZE is 2048, so the default KSTACK_PAGES=4 yields a stack size of 8192. Is that right? It's not clear to me how to determine the correct value of KSTACK_PAGES from the stack trace above. If I merely subtract frame 44 from frame 0, I get 817d9d0b - 810f6ed7 = 6e2e34 = 7,220,788 decimal. That can't be right. I must be misunderstanding something. >It would probably be a good idea to compute the differences in the stack >pointer values between adjacent stack frames to see of any of them are >consuming an excessive amount of stack space. Hmm. I am assuming the first number for each frame is the stack pointer. Yet, sometimes that value increases with call depth (see, e.g., when if_simloop calls netisr_queue above, but also other places). What is going on? By the way, I have net.isr.dispatch=deferred (set earlier while I was chasing this crash). thanks! -- G. Paul Ziemba FreeBSD unix: 8:51AM up 1:05, 4 users, load averages: 0.53, 0.42, 0.35 From owner-freebsd-stable@freebsd.org Sat Jul 22 16:02:41 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05A02DA4F6F for ; Sat, 22 Jul 2017 16:02:41 +0000 (UTC) (envelope-from david@catwhisker.org) Received: from albert.catwhisker.org (mx.catwhisker.org [198.144.209.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ADD1482713 for ; Sat, 22 Jul 2017 16:02:39 +0000 (UTC) (envelope-from david@catwhisker.org) Received: from albert.catwhisker.org (localhost [127.0.0.1]) by albert.catwhisker.org (8.15.2/8.15.2) with ESMTP id v6MG2Xg2091937; Sat, 22 Jul 2017 16:02:33 GMT (envelope-from david@albert.catwhisker.org) Received: (from david@localhost) by albert.catwhisker.org (8.15.2/8.15.2/Submit) id v6MG2XV7091936; Sat, 22 Jul 2017 09:02:33 -0700 (PDT) (envelope-from david) Date: Sat, 22 Jul 2017 09:02:33 -0700 From: David Wolfskill To: unp@ziemba.us Cc: freebsd-stable@FreeBSD.org Subject: Re: stable/11 r321349 crashing immediately Message-ID: <20170722160233.GY20018@albert.catwhisker.org> Mail-Followup-To: David Wolfskill , unp@ziemba.us, freebsd-stable@FreeBSD.org References: <201707220542.v6M5ggtP052112@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="p+oKr87AhZpDrZg0" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 16:02:41 -0000 --p+oKr87AhZpDrZg0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 21, 2017 at 04:53:18AM +0000, G. Paul Ziemba wrote: > ... > >It looks like you are trying to execute a program from an NFS file > >system that is exported by the same host. This isn't exactly optimal > >... >=20 > Perhaps not optimal for the implementation, but I think it's a > common NFS scenario: define a set of NFS-provided paths for files > and use those path names on all hosts, regardless of whether they > happen to be serving the files in question or merely clients. Back when I was doing sysadmin stuff for a group of engineers, my usual approach for that sort of thing was to use amd (this was late 1990s - 2001) to have maps so it would set up NFS mounts if the file system being served was from a different host (from the one running amd), but instantiating a symlink instead if the file system resided on the current host. IIRC, this was a fairly common practice with amd (and the like). > .... Peace, david --=20 David H. Wolfskill david@catwhisker.org What kind of "investigation" would it be if it didn't "follow the money?" See http://www.catwhisker.org/~david/publickey.gpg for my public key. --p+oKr87AhZpDrZg0 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJZc3cZXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRDQ0I3Q0VGOTE3QTgwMUY0MzA2NEQ3N0Ix NTM5Q0M0MEEwNDlFRTE3AAoJEBU5zECgSe4X+CcIAKH2uxkgyK6FuUp9V+p7snZ4 JDOpfawPFm5kjeK8dBrSmpEhT4y4qAh618Oo+I3S+/7poBk0QgXmDoxioF9wcKEO igYJ4+z583PLurYJ2Bzld4R7ZlddABBNRzwtJMXeInTvqHJwzmySpAXLRDMkrmqQ 6NPqIjQTWlgAO8sGlkMKJsK6KzD3BXWS7kgo9BnBBUrNY06xE7xwyPEs4nXWvFE4 tnAJMBkkmOpWWzVjy14TUSi1uqUOfLkSw4omL1yW2dZzX502u1iPYX7xST0vA/ZN 3Ibdd0MTCPr/Ox0oswytJ7tTKL+E5gWoj7HlsXlpiPKdWjwKFXpCMbJAUAHXj0k= =PWMv -----END PGP SIGNATURE----- --p+oKr87AhZpDrZg0-- From owner-freebsd-stable@freebsd.org Sat Jul 22 19:28:23 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A377DDAB6E8 for ; Sat, 22 Jul 2017 19:28:23 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from elf.hq.norma.perm.ru (mail.norma.perm.ru [IPv6:2a00:7540:1::5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.norma.perm.ru", Issuer "Vivat-Trade UNIX Root CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1633D3B63 for ; Sat, 22 Jul 2017 19:28:22 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from [IPv6:2a02:2698:26:9b0e:71b3:748:aac9:67e1] (dynamic-2a02-2698-26-0-0.perm.ertelecom.ru [IPv6:2a02:2698:26:9b0e:71b3:748:aac9:67e1] (may be forged)) (authenticated bits=0) by elf.hq.norma.perm.ru (8.15.2/8.15.2) with ESMTPSA id v6MJSFD0098892 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Sun, 23 Jul 2017 00:28:17 +0500 (YEKT) (envelope-from emz@norma.perm.ru) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=norma.perm.ru; s=key; t=1500751697; bh=FO4XLJ1haa/yTNxW5CR/gyiVnawxUnuhRpilFXpmyiE=; h=Subject:To:References:From:Date:In-Reply-To; b=DrGYhICgwdTjsBiQtDFnTpK8j2ZxLLJY13c0+4RqZ27UmM7gOFOVLpf0I6/4Not/X eBKHkJh6uCzQDj8uWJcs/OzU8N91oc8Yulv95Dpixs50co/c7a3dm6/zaF8hbmIzEa aWKI4htu1tVaNWAFJ45mEW0uW1p9zHTcpyhCSg14= Subject: Re: cannot destroy faulty zvol To: freebsd-stable@freebsd.org References: <73ed4a4d-a156-457a-37d3-12d7ef4f89b9@norma.perm.ru> From: "Eugene M. Zheganin" Message-ID: Date: Sun, 23 Jul 2017 00:28:01 +0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <73ed4a4d-a156-457a-37d3-12d7ef4f89b9@norma.perm.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 19:28:23 -0000 Hi, On 22.07.2017 17:08, Eugene M. Zheganin wrote: > > is this weird error "cannot destroy: already exists" related to the > fact that the zvol is faulty ? Does it indicate that metadata is > probably faulty too ? Anyway, is there a way to destroy this dataset ? Follow-up: I sent a similar zvol of the thexactly same size into the faulty one, zpool errors are gone, still cannot destroy the zvol. Is this a zfs bug ? Eugene. From owner-freebsd-stable@freebsd.org Sat Jul 22 20:12:37 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9855DAC3D1 for ; Sat, 22 Jul 2017 20:12:37 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6FD1E641BE for ; Sat, 22 Jul 2017 20:12:37 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id v6MKCT95070706; Sat, 22 Jul 2017 13:12:33 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201707222012.v6MKCT95070706@gw.catspoiler.org> Date: Sat, 22 Jul 2017 13:12:29 -0700 (PDT) From: Don Lewis Subject: Re: stable/11 r321349 crashing immediately To: unp@ziemba.us cc: freebsd-stable@FreeBSD.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 20:12:37 -0000 On 21 Jul, G. Paul Ziemba wrote: > truckman@freebsd.org (Don Lewis) writes: > >>On 21 Jul, G. Paul Ziemba wrote: >>> GENERIC kernel r321349 results in the following about a minute after >>> multiuser boot completes. >>> >>> What additional information should I provide to assist in debugging? >>> >>> Many thanks! >>> >>> [Extracted from /var/crash/core.txt.NNN] >>> >>> KDB: stack backtrace: >>> #0 0xffffffff810f6ed7 at kdb_backtrace+0xa7 >>> #1 0xffffffff810872a9 at vpanic+0x249 >>> #2 0xffffffff81087060 at vpanic+0 >>> #3 0xffffffff817d9aca at dblfault_handler+0x10a >>> #4 0xffffffff817ae93c at Xdblfault+0xac >>> #5 0xffffffff810cf76e at cpu_search_lowest+0x35e >>> #6 0xffffffff810cf76e at cpu_search_lowest+0x35e >>> #7 0xffffffff810d5b36 at sched_lowest+0x66 >>> #8 0xffffffff810d1d92 at sched_pickcpu+0x522 >>> #9 0xffffffff810d2b03 at sched_add+0xd3 >>> #10 0xffffffff8101df5c at intr_event_schedule_thread+0x18c >>> #11 0xffffffff8101ddb0 at swi_sched+0xa0 >>> #12 0xffffffff81261643 at netisr_queue_internal+0x1d3 >>> #13 0xffffffff81261212 at netisr_queue_src+0x92 >>> #14 0xffffffff81261677 at netisr_queue+0x27 >>> #15 0xffffffff8123da5a at if_simloop+0x20a >>> #16 0xffffffff8123d83b at looutput+0x22b >>> #17 0xffffffff8131c4c6 at ip_output+0x1aa6 >>> >>> doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >>> 298 dumptid = curthread->td_tid; >>> (kgdb) #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >>> #1 0xffffffff810867e8 in kern_reboot (howto=260) >>> at /usr/src/sys/kern/kern_shutdown.c:366 >>> #2 0xffffffff810872ff in vpanic (fmt=0xffffffff81e5f7e0 "double fault", >>> ap=0xfffffe0839778ec0) at /usr/src/sys/kern/kern_shutdown.c:759 >>> #3 0xffffffff81087060 in panic (fmt=0xffffffff81e5f7e0 "double fault") >>> at /usr/src/sys/kern/kern_shutdown.c:690 >>> #4 0xffffffff817d9aca in dblfault_handler (frame=0xfffffe0839778f40) >>> at /usr/src/sys/amd64/amd64/trap.c:828 >>> #5 >>> #6 0xffffffff810cf422 in cpu_search_lowest ( >>> cg=0xffffffff826ccd98 , >>> low=>> ff8>) at /usr/src/sys/kern/sched_ule.c:782 >>> #7 0xffffffff810cf76e in cpu_search (cg=0xffffffff826cccb8 , >>> low=0xfffffe085cfa53b8, high=0x0, match=1) >>> at /usr/src/sys/kern/sched_ule.c:710 >>> #8 cpu_search_lowest (cg=0xffffffff826cccb8 , >>> low=0xfffffe085cfa53b8) at /usr/src/sys/kern/sched_ule.c:783 >>> #9 0xffffffff810cf76e in cpu_search (cg=0xffffffff826ccc80 , >>> low=0xfffffe085cfa5430, high=0x0, match=1) >>> at /usr/src/sys/kern/sched_ule.c:710 >>> #10 cpu_search_lowest (cg=0xffffffff826ccc80 , low=0xfffffe085cfa5430) >>> at /usr/src/sys/kern/sched_ule.c:783 >>> #11 0xffffffff810d5b36 in sched_lowest (cg=0xffffffff826ccc80 , >>> mask=..., pri=28, maxload=2147483647, prefer=4) >>> at /usr/src/sys/kern/sched_ule.c:815 >>> #12 0xffffffff810d1d92 in sched_pickcpu (td=0xfffff8000a3a9000, flags=4) >>> at /usr/src/sys/kern/sched_ule.c:1292 >>> #13 0xffffffff810d2b03 in sched_add (td=0xfffff8000a3a9000, flags=4) >>> at /usr/src/sys/kern/sched_ule.c:2447 >>> #14 0xffffffff8101df5c in intr_event_schedule_thread (ie=0xfffff80007e7ae00) >>> at /usr/src/sys/kern/kern_intr.c:917 >>> #15 0xffffffff8101ddb0 in swi_sched (cookie=0xfffff8000a386880, flags=0) >>> at /usr/src/sys/kern/kern_intr.c:1163 >>> #16 0xffffffff81261643 in netisr_queue_internal (proto=1, >>> m=0xfffff80026d00500, cpuid=0) at /usr/src/sys/net/netisr.c:1022 >>> #17 0xffffffff81261212 in netisr_queue_src (proto=1, source=0, >>> m=0xfffff80026d00500) at /usr/src/sys/net/netisr.c:1056 >>> #18 0xffffffff81261677 in netisr_queue (proto=1, m=0xfffff80026d00500) >>> at /usr/src/sys/net/netisr.c:1069 >>> #19 0xffffffff8123da5a in if_simloop (ifp=0xfffff800116eb000, >>> m=0xfffff80026d00500, af=2, hlen=0) at /usr/src/sys/net/if_loop.c:358 >>> #20 0xffffffff8123d83b in looutput (ifp=0xfffff800116eb000, >>> m=0xfffff80026d00500, dst=0xfffff80026ed6550, ro=0xfffff80026ed6530) >>> at /usr/src/sys/net/if_loop.c:265 >>> #21 0xffffffff8131c4c6 in ip_output (m=0xfffff80026d00500, opt=0x0, >>> ro=0xfffff80026ed6530, flags=0, imo=0x0, inp=0xfffff80026ed63a0) >>> at /usr/src/sys/netinet/ip_output.c:655 >>> #22 0xffffffff8142e1c7 in tcp_output (tp=0xfffff80026eb2820) >>> at /usr/src/sys/netinet/tcp_output.c:1447 >>> #23 0xffffffff81447700 in tcp_usr_send (so=0xfffff80011ec2360, flags=0, >>> m=0xfffff80026d14d00, nam=0x0, control=0x0, td=0xfffff80063ba1000) >>> at /usr/src/sys/netinet/tcp_usrreq.c:967 >>> #24 0xffffffff811776f1 in sosend_generic (so=0xfffff80011ec2360, addr=0x0, >>> uio=0x0, top=0xfffff80026d14d00, control=0x0, flags=0, >>> td=0xfffff80063ba1000) at /usr/src/sys/kern/uipc_socket.c:1360 >>> #25 0xffffffff811779bd in sosend (so=0xfffff80011ec2360, addr=0x0, uio=0x0, >>> top=0xfffff80026d14d00, control=0x0, flags=0, td=0xfffff80063ba1000) >>> at /usr/src/sys/kern/uipc_socket.c:1405 >>> #26 0xffffffff815276a2 in clnt_vc_call (cl=0xfffff80063ca0980, >>> ext=0xfffffe085cfa6e38, proc=4, args=0xfffff80026c3bc00, >>> resultsp=0xfffffe085cfa7110, utimeout=...) >>> at /usr/src/sys/rpc/clnt_vc.c:413 >>> #27 0xffffffff8152391c in clnt_reconnect_call (cl=0xfffff80063ca0c00, >>> ext=0xfffffe085cfa6e38, proc=4, args=0xfffff80026c3bc00, >>> resultsp=0xfffffe085cfa7110, utimeout=...) >>> at /usr/src/sys/rpc/clnt_rc.c:271 >>> #28 0xffffffff80e75628 in newnfs_request (nd=0xfffffe085cfa7110, >>> nmp=0xfffff80007e79c00, clp=0x0, nrp=0xfffff80007e79d28, >>> vp=0xfffff80011d9b588, td=0xfffff80063ba1000, cred=0xfffff800118c0100, >>> prog=100003, vers=3, retsum=0x0, toplevel=1, xidp=0x0, dssep=0x0) >>> at /usr/src/sys/fs/nfs/nfs_commonkrpc.c:760 >>> #29 0xffffffff80ee87f1 in nfscl_request (nd=0xfffffe085cfa7110, >>> vp=0xfffff80011d9b588, p=0xfffff80063ba1000, cred=0xfffff800118c0100, >>> stuff=0x0) at /usr/src/sys/fs/nfsclient/nfs_clport.c:952 >>> #30 0xffffffff80ea865c in nfsrpc_accessrpc (vp=0xfffff80011d9b588, mode=63, >>> cred=0xfffff800118c0100, p=0xfffff80063ba1000, nap=0xfffffe085cfa72e0, >>> attrflagp=0xfffffe085cfa73c0, rmodep=0xfffffe085cfa73b4, stuff=0x0) >>> at /usr/src/sys/fs/nfsclient/nfs_clrpcops.c:243 >>> #31 0xffffffff80ed9ec9 in nfs34_access_otw (vp=0xfffff80011d9b588, wmode=63, >>> td=0xfffff80063ba1000, cred=0xfffff800118c0100, >>> retmode=0xfffffe085cfa7540) at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:283 >>> #32 0xffffffff80ecfb64 in nfs_access (ap=0xfffffe085cfa75f8) >>> at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:426 >>> #33 0xffffffff81a539d4 in VOP_ACCESS_APV ( >>> vop=0xffffffff822ff8b8 , a=0xfffffe085cfa75f8) >>> at vnode_if.c:601 >>> #34 0xffffffff80eda726 in VOP_ACCESS (vp=0xfffff80011d9b588, accmode=64, >>> cred=0xfffff800118c0100, td=0xfffff80063ba1000) at ./vnode_if.h:254 >>> #35 0xffffffff80ecb925 in nfs_lookup (ap=0xfffffe085cfa7cf8) >>> at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:1064 >>> #36 0xffffffff81a52a44 in VOP_LOOKUP_APV ( >>> vop=0xffffffff822ff8b8 , a=0xfffffe085cfa7cf8) >>> at vnode_if.c:127 >>> #37 0xffffffff811c6aad in VOP_LOOKUP (dvp=0xfffff80011d9b588, >>> vpp=0xfffffe085cfa8708, cnp=0xfffffe085cfa8730) at ./vnode_if.h:54 >>> #38 0xffffffff811c5b64 in lookup (ndp=0xfffffe085cfa86a8) >>> at /usr/src/sys/kern/vfs_lookup.c:886 >>> #39 0xffffffff811c4aa2 in namei (ndp=0xfffffe085cfa86a8) >>> at /usr/src/sys/kern/vfs_lookup.c:448 >>> #40 0xffffffff810050f0 in do_execve (td=0xfffff80063ba1000, >>> args=0xfffffe085cfa8838, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:446 >>> #41 0xffffffff810047fa in kern_execve (td=0xfffff80063ba1000, >>> args=0xfffffe085cfa8838, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:347 >>> #42 0xffffffff810041e2 in sys_execve (td=0xfffff80063ba1000, >>> uap=0xfffff80063ba1538) at /usr/src/sys/kern/kern_exec.c:221 >>> #43 0xffffffff817da5ed in syscallenter (td=0xfffff80063ba1000) >>> at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:131 >>> #44 0xffffffff817d9d0b in amd64_syscall (td=0xfffff80063ba1000, traced=0) >>> at /usr/src/sys/amd64/amd64/trap.c:903 >>> #45 >>> #46 0x0000000800d5285a in ?? () >>> Backtrace stopped: Cannot access memory at address 0x7fffffffe7d8 >>> (kgdb) > >>The double fault is a pretty good indication that you overflowed the >>kernel stack. Having ~40 frames on the stack when the fault happened is >>consistent with that. > > First, thank you for this answer. I've been tearing my hair out for > several hours (crash, fsck, tweak, repeat). I haven't done intensive > work with this kernel so hope you'll entertain my simple questions > below. > >>It looks like you are trying to execute a program from an NFS file >>system that is exported by the same host. This isn't exactly optimal >>... > > Perhaps not optimal for the implementation, but I think it's a > common NFS scenario: define a set of NFS-provided paths for files > and use those path names on all hosts, regardless of whether they > happen to be serving the files in question or merely clients. > >>Your best bet for a quick workaround for the stack overflow would be to >>rebuild the kernel with a larger value of KSTACK_PAGES. You can find >>teh default in /usr/src/sys//conf/NOTES. > > By the way, my host is amd64, not i386. I don't know if that matters > to the larger discussion, but I'm not running in a resource-constrained > situation (i.e., the host has Xeon E3-1231 (4 cores, 8 threads) and > 32GB ram) > > It seems (from /usr/include/machine/param.h) that PAGE_SIZE is 2048, > so the default KSTACK_PAGES=4 yields a stack size of 8192. Is that > right? Page size is 4096. There is a read-only sysctl that shows the current value of kstack_pages: # sysctl kern.kstack_pages kern.kstack_pages: 4 It's interesting that you are running into this on amd64. Usually i386 is the problem child. Since the problem crops up in the scheduler, which is preparing to switch to another process (with a stack switch), I suspect that you aren't missing the limit by much. Try using the tunable to bump kstack_pages by one. I didn't see the tunable earlier since it is handled in a separate .c file from where KSTACK_PAGES and the kstack_pages sysctl appear in the source. > It's not clear to me how to determine the correct value of KSTACK_PAGES > from the stack trace above. If I merely subtract frame 44 from frame 0, > I get 817d9d0b - 810f6ed7 = 6e2e34 = 7,220,788 decimal. That can't be > right. I must be misunderstanding something. > >>It would probably be a good idea to compute the differences in the stack >>pointer values between adjacent stack frames to see of any of them are >>consuming an excessive amount of stack space. > > Hmm. I am assuming the first number for each frame is the stack pointer. > Yet, sometimes that value increases with call depth (see, e.g., when > if_simloop calls netisr_queue above, but also other places). What is > going on? Nope, it's the pc value at each call location. Point kgdb at the vmcore file and execute "i r rsp" (short for info registers) in each stack frame. When the stack overflows, I think there is a trap that causes a switch to an alternate stack for the double fault handler. > By the way, I have net.isr.dispatch=deferred (set earlier while I was > chasing this crash). > > thanks! From owner-freebsd-stable@freebsd.org Sat Jul 22 20:16:36 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 85BC6DAC501 for ; Sat, 22 Jul 2017 20:16:36 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 65D3964318 for ; Sat, 22 Jul 2017 20:16:36 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id v6MKGMPa070777; Sat, 22 Jul 2017 13:16:26 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201707222016.v6MKGMPa070777@gw.catspoiler.org> Date: Sat, 22 Jul 2017 13:16:22 -0700 (PDT) From: Don Lewis Subject: Re: stable/11 r321349 crashing immediately To: david@catwhisker.org cc: unp@ziemba.us, freebsd-stable@FreeBSD.org In-Reply-To: <20170722160233.GY20018@albert.catwhisker.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jul 2017 20:16:36 -0000 On 22 Jul, David Wolfskill wrote: > On Fri, Jul 21, 2017 at 04:53:18AM +0000, G. Paul Ziemba wrote: >> ... >> >It looks like you are trying to execute a program from an NFS file >> >system that is exported by the same host. This isn't exactly optimal >> >... >> >> Perhaps not optimal for the implementation, but I think it's a >> common NFS scenario: define a set of NFS-provided paths for files >> and use those path names on all hosts, regardless of whether they >> happen to be serving the files in question or merely clients. > > Back when I was doing sysadmin stuff for a group of engineers, my > usual approach for that sort of thing was to use amd (this was late > 1990s - 2001) to have maps so it would set up NFS mounts if the > file system being served was from a different host (from the one > running amd), but instantiating a symlink instead if the file system > resided on the current host. Same here. It's a bit messy to do this manually, but you could either use a symlink or a nullfs mount for the filesystems that are local.