From owner-freebsd-current@FreeBSD.ORG Wed Mar 12 12:14:11 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C78D755C for ; Wed, 12 Mar 2014 12:14:11 +0000 (UTC) Received: from eu1sys200aog103.obsmtp.com (eu1sys200aog103.obsmtp.com [207.126.144.115]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 20961294 for ; Wed, 12 Mar 2014 12:14:10 +0000 (UTC) Received: from mail-wi0-f173.google.com ([209.85.212.173]) (using TLSv1) by eu1sys200aob103.postini.com ([207.126.147.11]) with SMTP ID DSNKUyBPi5+lrJId9ylBUw6+QxypupjMHlEj@postini.com; Wed, 12 Mar 2014 12:14:11 UTC Received: by mail-wi0-f173.google.com with SMTP id f8so2279307wiw.0 for ; Wed, 12 Mar 2014 05:14:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:date:from:message-id:to:subject:cc :reply-to:in-reply-to; bh=FqjvzPuxbkflQFb63ge+Etjgf1z5t5iwQhQjIAmhLGk=; b=d4tC5CBDGv8A4c/WYqWJZHzMOwkjbHgLeus6AVd8XkeYsKqgAUlQN3xcUxtnGs90SM k+MnLYE5YbjdkHzQq0k89IfPBZ4moYNBTzT2N3p2p/L+vWvQUW5M+BddJ2W5Vg6D4C5K lY3cTtrlKkPcDwljguyiTAKkPKa1O+ykOlq2kGAJCgWkFhMQ7JizARSjZUpRtDRqTk6V sjGlECgcdvTtXFEwJYkWzUrErUimQMvKY4CYxqh+7Ig6Vw133D0nRrynOp/7U3moTRIr ibWSDvFoNRS/CYGSGljwa2H70ybbFc3CbpuDYs5oVDMIA7UJdLUZh4tRWmJgICmVAPHU GUtg== X-Received: by 10.194.24.35 with SMTP id r3mr1130903wjf.68.1394626050220; Wed, 12 Mar 2014 05:07:30 -0700 (PDT) X-Gm-Message-State: ALoCoQl3ENCKtdpQTfLD30BrWTtuDf90XgxELMXDLYbFoVw0BvAzBn70dZO9y6NC7HbSPuYXE0f5soAMou43huYfSLgRSAegcJWqGmwnD7tZpxycM7fTBhh1BTCtr9BjZvk8yzsX9opd9L/k8ZaiVrBkvVEtob3OX/xLfkMDmX+ePiW8v1kp9wY= X-Received: by 10.194.24.35 with SMTP id r3mr1130889wjf.68.1394626050107; Wed, 12 Mar 2014 05:07:30 -0700 (PDT) Received: from mech-cluster241.men.bris.ac.uk (mech-cluster241.men.bris.ac.uk. [137.222.187.241]) by mx.google.com with ESMTPSA id h9sm69259290wjz.16.2014.03.12.05.07.27 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Mar 2014 05:07:28 -0700 (PDT) Sender: Anton Shterenlikht Received: from mech-cluster241.men.bris.ac.uk (localhost [127.0.0.1]) by mech-cluster241.men.bris.ac.uk (8.14.7/8.14.6) with ESMTP id s2CC7Q8l076838 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 12 Mar 2014 12:07:26 GMT (envelope-from mexas@mech-cluster241.men.bris.ac.uk) Received: (from mexas@localhost) by mech-cluster241.men.bris.ac.uk (8.14.7/8.14.6/Submit) id s2CC7QJw076837; Wed, 12 Mar 2014 12:07:26 GMT (envelope-from mexas) Date: Wed, 12 Mar 2014 05:07:28 -0700 (PDT) From: Anton Shterenlikht Message-Id: <201403121207.s2CC7QJw076837@mech-cluster241.men.bris.ac.uk> To: be@0x20.net, mexas@bris.ac.uk Subject: Re: reproducible panic every day at 03:02, probably triggered by daily periodic scipts - help In-Reply-To: <20140306215209.GA74933@e-new.0x20.net> Cc: freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: mexas@bris.ac.uk List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Mar 2014 12:14:11 -0000 >From be@0x20.net Thu Mar 6 22:02:56 2014 > >On Thu, Mar 06, 2014 at 12:59:14AM -0800, Anton Shterenlikht wrote: >> In my initial PR (sparc64 r261798), >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=187080 >> >> I said that rsync was triggering this panic. >> While true, I now see that there's more to it. >> I disabled the rsync, and the cron jobs. >> Still I get exactly the same panic every >> night at 03:02: >> >> # grep Dumptime /var/crash/* >> /var/crash/info.0: Dumptime: Wed Feb 26 10:10:51 2014 >> /var/crash/info.1: Dumptime: Thu Feb 27 03:02:14 2014 >> /var/crash/info.2: Dumptime: Fri Feb 28 03:02:29 2014 >> /var/crash/info.3: Dumptime: Sat Mar 1 03:02:25 2014 >> /var/crash/info.4: Dumptime: Tue Mar 4 03:02:01 2014 >> /var/crash/info.5: Dumptime: Wed Mar 5 03:02:05 2014 >> /var/crash/info.6: Dumptime: Thu Mar 6 03:02:11 2014 >> /var/crash/info.last: Dumptime: Thu Mar 6 03:02:11 2014 >> # >> >> This is likely triggered by one of >> the daily periodic scipts, >> after about 1 min from start: >> >> # grep daily /etc/crontab >> # Perform daily/weekly/monthly maintenance. >> 1 3 * * * root periodic daily >> # >> >> but which one? > >Some time ago I had a similar problem with 8.x. Setting > >vm.kmem_size="512M" >vm.kmem_size_max="512M" > >in loader.conf helped. It's just a wild guess but might help. This didn't make any difference. However, I noticed that the panics were happening more and more often. I started suspecting a disk failure, so decided to do a full integrity check with dd, got: 17054+0 records in 17054+0 records out 17882415104 bytes transferred in 1035.889679 secs (17262857 bytes/sec) (ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 80 93 9c 42 00 00 00 00 00 00 (ada1:ata2:0:1:0): CAM status: ATA Status Error (ada1:ata2:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) (ada1:ata2:0:1:0): RES: 51 40 21 94 9c 02 02 00 00 00 00 (ada1:ata2:0:1:0): Retrying command (ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 80 93 9c 42 00 00 00 00 00 00 (ada1:ata2:0:1:0): CAM status: ATA Status Error (ada1:ata2:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) (ada1:ata2:0:1:0): RES: 51 40 21 94 9c 02 02 00 00 00 00 (ada1:ata2:0:1:0): Retrying command (ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 80 93 9c 42 00 00 00 00 00 00 (ada1:ata2:0:1:0): CAM status: ATA Status Error (ada1:ata2:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) (ada1:ata2:0:1:0): RES: 51 40 21 94 9c 02 02 00 00 00 00 (ada1:ata2:0:1:0): Retrying command (ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 80 93 9c 42 00 00 00 00 00 00 (ada1:ata2:0:1:0): CAM status: ATA Status Error (ada1:ata2:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) (ada1:ata2:0:1:0): RES: 51 40 21 94 9c 02 02 00 00 00 00 (ada1:ata2:0:1:0): Retrying command (ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 80 93 9c 42 00 00 00 00 00 00 (ada1:ata2:0:1:0): CAM status: ATA Status Error (ada1:ata2:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) (ada1:ata2:0:1:0): RES: 51 40 21 94 9c 02 02 00 00 00 00 (ada1:ata2:0:1:0): Error 5, Retries exhausted dd: /dev/ada1b: Input/output error I guess the disk is fucked, right? Given that it's about 10 years old, this is not surprising. Anton