Date: Mon, 15 Apr 1996 18:55:59 +1000 From: Stephen McKay <syssgm@devetir.qld.gov.au> To: freebsd-current@freebsd.org Cc: syssgm@devetir.qld.gov.au Subject: Re: Just how stable is current Message-ID: <199604150856.SAA14153@orion.devetir.qld.gov.au>
next in thread | raw e-mail | index | archive | help
Ollivier Robert <roberto@keltia.freenix.fr> thinks:
>It seems that J Wunsch said:
>> > Yes, I know that this is a bad question to ask, but....
>>
>> Mine's from the Easter weekend, and i can't complain.
>
>Mine is from tuesday is running fine. -CURRENT has been very stable for me
>for at least 3 weeks (if not more).
Not all of us are happy campers. I have a -current kernel from January 9
which works well for me, and have had various problems with all kernels
built since. My hardware is modest: 16Mhz 386sx with 4Mb ram, NFS for all
source and object files, vnconfig swap + real swap totals 16Mb.
I have 3 problems:
1) NFS problem: My January 9 kernel will work properly as a client with any
server using 8Kb max size UDP connections. More recent kernels won't. I
get severe performance degradation that I assume is from lots of retries and
timeouts, even though I can't find them in nfsstat. Many processes hang
for long periods in sbwait, nfsrcvlk and similar network states.
Ok, overruns are a common problem with PC network cards, especially in slow
machines. However, setting the maximum size to 1Kb does not cure the
problem (or maybe moves the problem elsewhere). Switching to TCP transport
produced a total cure, but is not available on all servers.
2) Processes with negative resident size: Friday, I started a make all of
-current and snapped this: (some boring processes deleted)
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 0 0 0 -18 0 0 0 sched DLs ?? 0:03.33 (swapper)
0 1 0 12 10 0 392 0 wait IWs ?? 0:00.33 /sbin/init --
0 2 0 75 -18 0 0 12 psleep DL ?? 11:54.65 (pagedaemon)
0 3 0 32 28 0 0 12 psleep DL ?? 2:52.65 (vmdaemon)
0 4 0 5 29 0 0 12 update DL ?? 0:14.13 (update)
...
0 2177 2176 9 10 5 340 -4 wait IWN p0 0:02.70 make
0 2179 2177 38 10 5 452 0 wait IWN p0 0:00.36 /bin/sh -ec for entry in include lib bin games gnu libexec sbin
0 2190 2179 75 10 5 308 -4 wait IWN p0 0:02.29 make all DIRPRFX
0 2192 2190 107 10 5 452 -4 wait IWN p0 0:00.33 /bin/sh -ec for entry in csu/i386 libc libcompat libcom_err libc
0 2195 2192 32 10 5 2840 8 wait IWN p0 1:12.30 make all DIRPRFX
0 2233 2195 135 10 5 216 16 wait IWN p0 0:00.99 cc -O2 -DLIBC_RCS -DSYSLIBC_RCS -D__DBINTERFACE_PRIVATE -DPOSIX_
0 2238 2233 109 65 5 848 1004 - RN p0 0:17.92 /usr/libexec/cc1 /tmp/cc002233.i -quiet -dumpbase bt_open.c -O2
0 147 1 48 3 0 156 -4 ttyin IWs+ v0 0:00.49 /usr/libexec/getty Pc ttyv0
RSS < 0 may be a cosmetic flaw, or it may be seriously buggering the VM system.
I don't know yet, but I'm valiantly struggling through the VM code. :-)
3) Madly spinning processes: This morning the scene was:
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 4796 4399 131 10 5 308 -4 wait IWN ?? 0:01.85 make all DIRPRFX
0 4798 4796 87 10 5 452 -4 wait IWN ?? 0:00.72 /bin/sh -ec for entry in as awk bc cc cpio cvs dc dialog diff di
0 4990 4798 135 10 5 312 -4 wait IWN ?? 0:01.98 make all DIRPRFX
0 4992 4990 149 10 5 452 -4 wait IWN ?? 0:00.39 /bin/sh -ec for entry in libgroff libdriver libbib groff troff n
0 5011 4992 210 90 5 344 20 - RN ?? 3509:56.22 make all DIR
All but one process had reasonable amounts of time accrued. Some even had
normal resident memory. :-) vmstat -s revealed: (sorry, I don't know what's
irrelevant here)
3010564 cpu context switches
69486232 device interrupts
2658782 software interrupts
371029200 traps
1002815 system calls
86889 swap pager pageins
195866 swap pager pages paged in
57630 swap pager pageouts
82118 swap pager pages paged out
115789 vnode pager pageins
238148 vnode pager pages paged in
0 vnode pager pageouts
0 vnode pager pages paged out
41415 page daemon wakeups
27543608 pages examined by the page daemon
15642 pages reactivated
158113 copy-on-write faults
262888 zero fill pages zeroed
253 intransit blocking page faults
367919662 total VM faults taken
514357 pages freed
39851 pages freed by daemon
368305 pages freed by exiting processes
286 pages active
68 pages inactive
9 pages in VM cache
313 pages wired down
13 pages free
4096 bytes per page
550001 total name lookups
cache hits (77% pos + 2% neg) system 2% per-directory
deletions 0%, falsehits 4%, toolong 0%
367919662 VM faults over 2.5 days equates to 1700 per second. This is far
in excess of what the machine can fetch from disk, so it can only be "soft"
faults (where pages really are there, but the VM system was hoping you didn't
need them any more and was going to free them soon), or some total failure
to provide the needed page at all, causing make to fault again immediately
on returning to user mode.
That make process has only 5 resident pages (or is it 6 :-)), but lots of
memory was available for my shell, telnetd, etc when I logged in. It isn't
lack of real memory that caused this.
Now, for the final twist before the audience can return to the comfortable
normalcy of their own lives: I stopped the whole process group with
SIGSTOP, and noted that all processes went from RSS -4 to 8, presumably
because the u area had faulted in. I waited all day (just because I had
real work :-)), and found that the problem make process was eventually
reduced to 8Kb, like the others. Then I restarted them with SIGCONT, and
blow me down if they didn't just up and carry on like nothing had happened.
The problem make exited (presumably after finishing successfully), and the
compilation is proceeding normally as I write.
Thanks to all who have bothered to read this far. I shall be consulting the
special texts of the masters (sys/vm/*.[hc]) for enlightenment, but expect to
be beaten to the answer by more knowledgeable persons.
Stephen.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604150856.SAA14153>
