From owner-freebsd-stable@freebsd.org Thu Oct 20 21:13:38 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 445C1C1A21D for ; Thu, 20 Oct 2016 21:13:38 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [IPv6:2001:1440:5001:1::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "uucp.dinoex.sub.de", Issuer "StartCom Class 1 DV Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C4FA0B0C for ; Thu, 20 Oct 2016 21:13:37 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [194.45.71.2]) by uucp.dinoex.sub.de (8.15.2/8.14.9) with ESMTPS id u9KLD9eK045810 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 20 Oct 2016 23:13:09 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-MDaemon-Deliver-To: Received: from citylink.dinoex.sub.org (uucp@localhost) by uucp.dinoex.sub.de (8.15.2/8.14.9/Submit) with UUCP id u9KLD9ES045809 for freebsd-stable@FreeBSD.ORG; Thu, 20 Oct 2016 23:13:09 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by citylink.dinoex.sub.de (8.14.9/8.14.9) with ESMTP id u9KKPJM2067829 for ; Thu, 20 Oct 2016 22:25:19 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by gate.oper.dinoex.org (8.14.9/8.14.9) with ESMTP id u9KKO8F0067566 for ; Thu, 20 Oct 2016 22:24:08 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: (from news@localhost) by gate.oper.dinoex.org (8.14.9/8.14.9/Submit) id u9KKO80M067564 for freebsd-stable@FreeBSD.ORG; Thu, 20 Oct 2016 22:24:08 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-Authentication-Warning: gate.oper.dinoex.org: news set sender to li-fbsd@citylink.dinoex.sub.org using -f From: Peter Subject: Re: Nightly disk-related panic since upgrade to 10.3 Date: Thu, 20 Oct 2016 22:12:26 +0200 Organization: even some more stinky socks Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Thu, 20 Oct 2016 20:12:27 +0000 (UTC) Injection-Info: oper.dinoex.de; logging-data="66049"; mail-complaints-to="usenet@citylink.dinoex.sub.org" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:42.0) Gecko/20100101 Firefox/42.0 SeaMonkey/2.39 In-Reply-To: Sender: li-fbsd@citylink.dinoex.sub.org To: freebsd-stable@FreeBSD.ORG X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 194.45.71.2; Sender-helo: uucp.dinoex.sub.de; ) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (uucp.dinoex.sub.de [194.45.71.2]); Thu, 20 Oct 2016 23:13:10 +0200 (CEST) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Oct 2016 21:13:38 -0000 Andrea Venturoli wrote: > Hello. > > Last week I upgraded a 9.3/amd64 box to 10.3: since then, it crashed and > rebooted at least once every night. Hi, I have quite similar issue, crash dumps every night, but then my stacktrace is different (crashing mostly in cam/scsi/scsi.c), and my env is also quite different (old i386, individual disks, extensive use of ZFS), so here is very likely a different reason. Also here the upgrade is not the only change, I also replaced a burnt powersupply recently and added an SSD cache. Basically You have two options: A) fire up kgdb, go into the code and try and understand what exactly is happening. This depends if You have clue enough to go that way; I found "man 4 gdb" and especially the "Debugging Kernel Problems" pdf by Greg Lehey quite helpful. B) systematically change parameters. Start by figuring from the logs the exact time of crash and what was happening then, try to reproduce that. Then change things and isolate the cause. Having a RAID controller is a bit ugly in this regard, as it is more or less a blackbox, and difficult to change parameters or swap components. > The only exception was on Friday, when it locked without rebooting: it > still answered ping request and logins through HTTP would half work; I'm > under the impression that the disk subsystem was hung, so ICMP would > work since it does no I/O and HTTP too worked as far as no disk access > was required. Yep. That tends to happen. It doesnt give much clue, except that there is a disk related problem.