From owner-freebsd-current Wed Oct 1 13:57:12 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id NAA07449 for current-outgoing; Wed, 1 Oct 1997 13:57:12 -0700 (PDT) Received: from hauki.clinet.fi (root@hauki.clinet.fi [194.100.0.1]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id NAA07437 for ; Wed, 1 Oct 1997 13:56:58 -0700 (PDT) Received: from katiska.clinet.fi (root@katiska.clinet.fi [194.100.0.4]) by hauki.clinet.fi (8.8.6/8.8.6) with ESMTP id WAA07082; Wed, 1 Oct 1997 22:56:33 +0200 (EET) Received: (hsu@localhost) by katiska.clinet.fi (8.8.7/8.6.4) id XAA05552; Wed, 1 Oct 1997 23:56:33 +0300 (EEST) Date: Wed, 1 Oct 1997 23:56:33 +0300 (EEST) Message-Id: <199710012056.XAA05552@katiska.clinet.fi> From: Heikki Suonsivu To: Karl Denninger Cc: freebsd-current@freebsd.org In-reply-to: Karl Denninger's message of 27 Sep 1997 23:37:48 +0300 Subject: Re: WARNING! Builds from the last few days have BROKEN NFS Organization: Clinet Ltd, Espoo, Finland References: <19970927145131.64000@Mars.Mcs.Net> <13496.875390123@critter.freebsd.dk> <19970927150755.39452@Mars.Mcs.Net> Sender: owner-freebsd-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Similar lockups happen under 2.2-STABLE also. I do not know exactly if the reason is the same, it just happens to match your symptoms. Other symptoms are panics which are repeatable by doing specific pattern of command (this evening it was generating a zip file larger than 10M which reproduceably generated a panic. See my last PR for traces). In article <19970927150755.39452@Mars.Mcs.Net> Karl Denninger writes: Ok -- tomorrow evening I will check out another copy of the current sources, and give this another shot with the parameters you're referring to. The problem shows up only during fairly heavy I/O load -- my initial tests didn't show it, but putting the code on a reasonably-busy web server does, and its easily reproduced in about 20-30 minutes. Same with the shell systems here. The symptom is that a single disk I/O request will hang in a "D" state. Further attempts to access that same object then also hang, but others, even to the same disk pack, do NOT. That is, a "df" still works, but a "cat " locks up. If "object" is a directory then a "ls" will freeze. If its a file then you have to reference the specific file to see the behavior. Over a fairly short period of time once this starts you're in *big* trouble; you'll end up with thousands of processes hung in a disk wait for a specific file, and eventually run out of either process slots or page space (most people retry failed accesses, which makes the problem worse). Per-user process limits (which I have turned off on these machines) would stop some of the damage, but not all. Exactly the same symptoms. I was trying to resolve cache inconsistency problems with NFS when I ran headfirst into this. There is a problem with V3 mounts (the default now) where you can "mv" a file on one client, and another client never sees the change. This is particularly distressing when you "mv" the access_log file from a web server (from another amchine), kick the server to re-create the access_log file, and then find that it never shows up on the other syste (or does with zero length, but no data in it -- ever). Maybe far-fetched, but one of the directories which are locking up on us are user's www access log directories. I do not know if this is related. If you look on the other system, a "ls" doesn't show the errant file. But a "cat" does -- the data is still there. Needless to say this is pretty troublesome, and leads to lots of head-scratching. -- Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin http://www.mcs.net/~karl | T1's from $600 monthly to FULL DS-3 Service | NEW! K56Flex modem support is now available Voice: [+1 312 803-MCS1 x219]| 56kbps DIGITAL ISDN DOV on analog lines! Fax: [+1 312 803-4929] | 2 FULL DS-3 Internet links; 400Mbps B/W Internal -- Heikki Suonsivu, T{ysikuu 10 C 83/02210 Espoo/FINLAND, hsu@clinet.fi mobile +358-40-5519679 work +358-9-43542270 fax -4555276