Date: Wed, 1 Oct 1997 23:56:33 +0300 (EEST) From: Heikki Suonsivu <hsu@mail.clinet.fi> To: Karl Denninger <karl@mcs.net> Cc: freebsd-current@freebsd.org Subject: Re: WARNING! Builds from the last few days have BROKEN NFS Message-ID: <199710012056.XAA05552@katiska.clinet.fi> In-Reply-To: Karl Denninger's message of 27 Sep 1997 23:37:48 %2B0300 References: <19970927145131.64000@Mars.Mcs.Net> <13496.875390123@critter.freebsd.dk> <19970927150755.39452@Mars.Mcs.Net>
next in thread | previous in thread | raw e-mail | index | archive | help
Similar lockups happen under 2.2-STABLE also. I do not know exactly if the reason is the same, it just happens to match your symptoms. Other symptoms are panics which are repeatable by doing specific pattern of command (this evening it was generating a zip file larger than 10M which reproduceably generated a panic. See my last PR for traces). In article <19970927150755.39452@Mars.Mcs.Net> Karl Denninger <karl@mcs.net> writes: Ok -- tomorrow evening I will check out another copy of the current sources, and give this another shot with the parameters you're referring to. The problem shows up only during fairly heavy I/O load -- my initial tests didn't show it, but putting the code on a reasonably-busy web server does, and its easily reproduced in about 20-30 minutes. Same with the shell systems here. The symptom is that a single disk I/O request will hang in a "D" state. Further attempts to access that same object then also hang, but others, even to the same disk pack, do NOT. That is, a "df" still works, but a "cat <object>" locks up. If "object" is a directory then a "ls" will freeze. If its a file then you have to reference the specific file to see the behavior. Over a fairly short period of time once this starts you're in *big* trouble; you'll end up with thousands of processes hung in a disk wait for a specific file, and eventually run out of either process slots or page space (most people retry failed accesses, which makes the problem worse). Per-user process limits (which I have turned off on these machines) would stop some of the damage, but not all. Exactly the same symptoms. I was trying to resolve cache inconsistency problems with NFS when I ran headfirst into this. There is a problem with V3 mounts (the default now) where you can "mv" a file on one client, and another client never sees the change. This is particularly distressing when you "mv" the access_log file from a web server (from another amchine), kick the server to re-create the access_log file, and then find that it never shows up on the other syste (or does with zero length, but no data in it -- ever). Maybe far-fetched, but one of the directories which are locking up on us are user's www access log directories. I do not know if this is related. If you look on the other system, a "ls" doesn't show the errant file. But a "cat" does -- the data is still there. Needless to say this is pretty troublesome, and leads to lots of head-scratching. -- Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin http://www.mcs.net/~karl | T1's from $600 monthly to FULL DS-3 Service | NEW! K56Flex modem support is now available Voice: [+1 312 803-MCS1 x219]| 56kbps DIGITAL ISDN DOV on analog lines! Fax: [+1 312 803-4929] | 2 FULL DS-3 Internet links; 400Mbps B/W Internal -- Heikki Suonsivu, T{ysikuu 10 C 83/02210 Espoo/FINLAND, hsu@clinet.fi mobile +358-40-5519679 work +358-9-43542270 fax -4555276
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199710012056.XAA05552>