Date: Sat, 14 Oct 2006 16:51:42 +0400 (MSD) From: Sergey Zaharchenko <doublef-ctm@yandex.ru> To: FreeBSD-gnats-submit@FreeBSD.org Subject: kern/104406: Processes get stuck in "ufs" state under persistent CPU load Message-ID: <20061014125142.C71931743B@shark> Resent-Message-ID: <200610141300.k9ED0Vi8067662@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 104406 >Category: kern >Synopsis: Processes get stuck in "ufs" state under persistent CPU load >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Oct 14 13:00:30 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Sergey Zaharchenko >Release: FreeBSD 7.0-CURRENT i386 >Organization: Volgograd State Technical University >Environment: System: FreeBSD shark.localdomain 7.0-CURRENT FreeBSD 7.0-CURRENT #5: Fri Oct 13 22:03:33 MSD 2006 root@shark.localdomain:/var/obj/src/usr.src/sys/GENERIC i386 The problem has also been observed on 7.0-CURRENT of August 2006. FWIW 4.8-RELEASE didn't have the problem. CPU: AMD Sempron(tm) 2500+ (1753.99-MHz 686-class CPU) A UP system, GENERIC kernel, no RAID, etc. >Description: When a single process loads the CPU for a long(*) time, other processes which want to access to the filesystem get stuck in the "ufs" state when trying to do that. Other processes which don't need to access the filesystem (like top, etc.) proceed normally. The owner UID, nice- and idprio- status of the offending (or offended) process do not matter. It seems essential that a single process is working all the time (e.g. two hours of compilation don't show up any errors, because there are many processes). Example top outputs for this situation: last pid: 9798; load averages: 2.00, 1.96, 1.72 up 0+02:29:59 11:02:17 113 processes: 3 running, 109 sleeping, 1 zombie CPU states: 0.0% user, 96.2% nice, 0.4% system, 3.4% interrupt, 0.0% idle Mem: 134M Active, 193M Inact, 88M Wired, 316K Cache, 59M Buf, 70M Free Swap: 4097M Total, 64M Used, 4033M Free, 1% Inuse PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 2661 df 1 139 20 63532K 58468K RUN 107:21 97.46% generic_slave 8035 df 1 96 0 7608K 2752K select 0:00 0.05% mc 702 root 1 96 0 5200K 792K select 0:41 0.00% syslogd 1245 root 1 -4 0 5152K 572K ufs 0:35 0.00% tail 912 squid 1 -4 0 13928K 4288K ufs 0:31 0.00% squid 1163 df 1 -4 0 9716K 2512K ufs 0:27 0.00% fetchmail 2600 root 1 -32 0 5516K 2052K RUN 0:13 0.00% top 1042 mysql 6 20 0 59128K 1956K kserel 0:06 0.00% mysqld 9338 df 1 -4 0 15988K 9812K ufs 0:06 0.00% links 1179 root 1 96 0 5200K 112K select 0:02 0.00% moused last pid: 2739; load averages: 2.00, 1.95, 1.57 up 0+00:31:39 16:46:48 91 processes: 3 running, 87 sleeping, 1 zombie CPU states: 97.8% user, 0.0% nice, 0.4% system, 1.9% interrupt, 0.0% idle Mem: 120M Active, 29M Inact, 32M Wired, 17M Cache, 15M Buf, 287M Free Swap: 4097M Total, 4097M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 1352 df 1 132 0 2248K 896K RUN 25:58 97.41% testcase 1580 df 1 96 0 7608K 2696K select 0:09 0.00% mc 970 squid 1 -4 0 13928K 8108K ufs 0:03 0.00% squid 1316 root 1 -32 0 5516K 1608K RUN 0:03 0.00% top 1238 df 1 -4 0 7668K 1884K ufs 0:02 0.00% fetchmail 1110 mysql 6 20 0 59128K 51600K kserel 0:01 0.00% mysqld 754 root 1 96 0 5200K 1088K select 0:01 0.00% syslogd 1322 root 1 96 0 22656K 16968K select 0:01 0.00% Xorg 868 root 1 8 0 9232K 4748K nanslp 0:00 0.00% httpd 1126 news 1 8 0 5484K 1308K wait 0:00 0.00% sh 1317 root 1 -4 0 5152K 684K ufs 0:00 0.00% tail 1144 news 1 4 4 7816K 3396K sbwait 0:00 0.00% perl5.8.8 1120 news 1 -4 0 150M 12828K ufs 0:00 0.00% innd 1351 df 1 20 0 4032K 1784K pause 0:00 0.00% csh 993 squid 1 -4 0 5640K 824K msgwai 0:00 0.00% diskd 1699 df 1 8 0 5244K 3104K ppwait 0:00 0.00% csh 1182 root 1 8 0 5200K 1100K nanslp 0:00 0.00% cron (*) for values of `long' from 10 minutes to 2 hours for me. >How-To-Repeat: A program to generate the necessary load can be quite simple, like int main(void) { /* Crunch some numbers (really meaningless) */ unsigned u=1; while (1) { u*=0x8088405; } } Compile and run it, run `top', and wait for a long (see above) time. Browse directories with `ls' from time to time on a different terminal. See `ls' hang at some time. View the `top' terminal. >Fix: I don't know the fix, but an offending process can be stopped with 'kill -STOP' and continued with 'kill -CONT', which allows other processes to access the filesystem (until another such failure occurs). Periodic stopping and starting processes might count as a lousy workaround. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061014125142.C71931743B>