From owner-freebsd-questions@FreeBSD.ORG Thu Jan 15 13:29:55 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7B66416A4CE for ; Thu, 15 Jan 2004 13:29:55 -0800 (PST) Received: from ness.plymouth.edu (ness.plymouth.edu [158.136.1.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5C7A043D73 for ; Thu, 15 Jan 2004 13:29:49 -0800 (PST) (envelope-from ted@ness.plymouth.edu) Received: from ness.plymouth.edu (localhost [127.0.0.1]) by ness.plymouth.edu (8.12.9p2/8.12.4) with ESMTP id i0FLTgCG023431; Thu, 15 Jan 2004 16:29:42 -0500 (EST) Received: (from ted@localhost) by ness.plymouth.edu (8.12.9p2/8.12.9/Submit) id i0FLTfBD023430; Thu, 15 Jan 2004 16:29:41 -0500 (EST) (envelope-from ted) From: Ted Wisniewski Message-Id: <200401152129.i0FLTfBD023430@ness.plymouth.edu> In-Reply-To: <200401112005.i0BK5mbY004869@ness.plymouth.edu> "from Ted Wisniewski at Jan 11, 2004 03:05:48 pm" To: Ted Wisniewski Date: Thu, 15 Jan 2004 16:29:41 -0500 (EST) X-Mailer: ELM [version 2.4ME+ PL88 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII cc: Dan Nelson cc: freebsd-questions@freebsd.org Subject: Re: 5.2-RELEASE - Show stopper problem X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Jan 2004 21:29:55 -0000 I have some followup details that I have discovered that make make a difference in nailing this down. I tried to duplicate the hung process on disk I/O problem on an older (hence slower machine) and was unable to duplicate it. This got me to thinking about the problems I have experienced; It appears that the faster the machine the more likely the problem is to occur. So, with my new dual-3.06 Ghz server I could reproduce the Disk (getblk) state at will. ON the slower 2.0 Ghz I had to work at it a bit but I could reproduce it. ON the 400Mhz (Dell PE 4300) unable to re-create (well at least I have not been able to yet). If someone has things to try, I will give it a whirl. Ted (* (* In the last episode (Jan 11), Ted Wisniewski said: (* (* > Thanks for your response... As you can see in this output from the (* (* > ps command you suggested, the processes are dfinitely waiting on the (* (* > disk. BTW.. The syste in question was a fresh install from yesterday (* (* > with no users other than myself (I did the cvsup to get it to (* (* > 5.2-RELEASE). It did hang when I did that with a similar result. (* (* > One of the "install -s etc.." processes went into the same state. (* (* (* (* Are you seeing any errors in dmesg or /var/log/messages? I haven't (* (* seen any other reports of I/O hanging, so it might still be something (* (* to do with your hardware or kernel config. (* (* No messages at all in /var/log/messages. I am using the generic (* kernel in one instance and a custom one in another. For the machine I sent (* the "ps" info it is a Dell power edge 2650 running a generic kernel. The (* disk is configuration is a big raid 5 memory is 2G. Since I can duplicate (* (seemingly at will) on a number of different systems, I doubt it is specific (* to one machines hardware (3 dell servers of differeing models, 1 dell PC, (* and 3 noname brand PC's). (* (* (* > On my test system the machine will run for days with this happening, (* (* > however, I have another system that is actually doing a lot of (* (* > I/O.... eventually it crashes (well locks up completely)... If (* (* > there is any particular info you might need, I am willing to do what (* (* > I can. (* (* (* (* If you can drop into ddb when it's locked up, I think there are some (* (* commands you can run to print the kernel locks held by all the (* (* processes, but I'm not sure what they are or how to interpret the (* (* results. (* (* When it locks up... It is literally frozen... Only a power off (* will cure. I have occasionally seen a "page not present" panic.. Most (* of the time the processes just start to pile up accessing the same place(s) (* on disk. None being able to be killed, and always when I reboot the system (* after this there is a message about not being able to write buffers... giving up... (* (* (* (* Ted (* (* -- (* | Ted Wisniewski E-Mail: ted@mail.plymouth.edu | (* | Manager, Systems Group WEB: http://oz.plymouth.edu/~ted/ | (* | Information Technology Services | (* | Plymouth State University Phone: (603) 535-2661 | (* | Plymouth NH, 03264 Fax: (603) 535-2263 | (* _______________________________________________ (* freebsd-questions@freebsd.org mailing list (* http://lists.freebsd.org/mailman/listinfo/freebsd-questions (* To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" (* -- | Ted Wisniewski E-Mail: ted@mail.plymouth.edu | | Manager, Systems Group WEB: http://oz.plymouth.edu/~ted/ | | Information Technology Services | | Plymouth State University Phone: (603) 535-2661 | | Plymouth NH, 03264 Fax: (603) 535-2263 |