From owner-freebsd-stable@FreeBSD.ORG Tue Dec 29 11:07:49 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2861106566C; Tue, 29 Dec 2009 11:07:49 +0000 (UTC) (envelope-from petefrench@ticketswitch.com) Received: from constantine.ticketswitch.com (constantine.ticketswitch.com [IPv6:2002:57e0:1d4e:1::3]) by mx1.freebsd.org (Postfix) with ESMTP id 9906E8FC1D; Tue, 29 Dec 2009 11:07:49 +0000 (UTC) Received: from dilbert.rattatosk ([10.64.50.6] helo=dilbert.ticketswitch.com) by constantine.ticketswitch.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1NPZvY-0003oP-Ad; Tue, 29 Dec 2009 11:07:48 +0000 Received: from petefrench by dilbert.ticketswitch.com with local (Exim 4.70 (FreeBSD)) (envelope-from ) id 1NPZvY-00055V-8a; Tue, 29 Dec 2009 11:07:48 +0000 Date: Tue, 29 Dec 2009 11:07:48 +0000 Message-Id: To: freebsd-stable@freebsd.org, ivoras@freebsd.org In-Reply-To: From: Pete French Cc: Subject: Re: Disc lock up on 8.0-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Dec 2009 11:07:49 -0000 > When you say "lock up" and "can't login" (in your original mail) - are > you sure this really is a lockup and not e.g. sshd dieing because of the > attacks? E.g. can you ping the machine, can you leave something like > "date >> /root/run.txt && vmstat 1 3 >> /root/run.txt" in crontab so you > track the moment it dies more closely? Yes, I can ping the machine, and connect to the SSH port and see the banner. On the console I can hit return and get a login prompt, and then get a password prompt. Trying to login doesnt work though - the symptoms are consistent with it not being able to read from the discs, but not panicing or dying either. I can, for example, connect to the mysql daemon, and see it trying to execute queries, but never completing thhem. I am currently running a kernel on that machine with DDB, KDB and WITNESS in it. It has annoyingly refused to hang since I did that though - I did have a hang with jst DDB and KDB, which I regret not investigating more. At tghe time I though "gah, forgot witness", and so recompiled the kernel expecting another lockup wthin a few hours. I do think that the original "3am" thing is a red herring now - I have been getting locks at other times of the daya. Also it is not a runaway fork, as when I wa sin the debugger I did a 'ps' and there wasnt anything unusual going on - i.e. a reasombale number of processes, but not excessive. What are the best traces to do when I get a debugger again ? 'show locks' and 'ps' I know, but I am never sure quite what else is useful. cheers, -pete.