From owner-freebsd-current@FreeBSD.ORG Tue Oct 25 15:59:42 2005 Return-Path: X-Original-To: current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B9E0816A41F for ; Tue, 25 Oct 2005 15:59:42 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id AC43743D6A for ; Tue, 25 Oct 2005 15:59:37 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 7635C1A3C29 for ; Tue, 25 Oct 2005 08:59:37 -0700 (PDT) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id D47735369C; Tue, 25 Oct 2005 11:59:34 -0400 (EDT) Date: Tue, 25 Oct 2005 11:59:33 -0400 From: Kris Kennaway To: current@FreeBSD.org Message-ID: <20051025155930.GF93337@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xA/XKXTdy9G3iaIz" Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Cc: Subject: syncer going nuts on 6.0 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Oct 2005 15:59:43 -0000 --xA/XKXTdy9G3iaIz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I have 3 UP 6.0 machines that are currently stuck using 100-epsilon% of CPU in the syncer: last pid: 17621; load averages: 9.02, 7.21, 4.82 up 3+18:15:31 +08:49:50 73 processes: 2 running, 47 sleeping, 24 waiting CPU states: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle Mem: 27M Active, 149M Inact, 137M Wired, 4K Cache, 109M Buf, 684M Free Swap: 2048M Total, 8K Used, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 38 root 1 -4 0 0K 8K getblk 180:21 431.30% syncer 17474 root 1 -4 0 1928K 1428K getblk 1:13 28.71% rm 11 root 1 171 52 0K 8K RUN 20.2H 8.94% idle rm -rf is just removing a single ufs directory tree, but it's taking *much* longer than it should be (order of 5 minutes instead of 30 seconds it took for the other 25 machines), and the machine was extremely unresponsive in the meantime (response times of 30 seconds to run commands like top). WITNESS is on, but it's also on on the other 25 machines that aren't seeing a problem under what should be analogous loads. I broke to ddb a couple of times, and syncer is here: --- interrupt, eip = 0xc068fc97, esp = 0xe502f7ec, ebp = 0xe502f7f0 --- spinlock_exit(c07906a0,1,c06c6d13,18b) at spinlock_exit+0x27 _mtx_unlock_spin_flags(c07906a0,0,c06cb459,34a,c06cf11c) at _mtx_unlock_spin_flags+0xbc witness_checkorder(c292d000,9,c06cf11c,d25,1) at witness_checkorder+0x35e _mtx_lock_flags(c292d000,0,c06cf11c,d25,c06cf11c) at _mtx_lock_flags+0x8a vfs_clean_pages(d63aead0,0,c06cf11c,3a0,d63aead0) at vfs_clean_pages+0x68 bdwrite(d63aead0,0,c06dbd1a,867,0) at bdwrite+0x420 softdep_setup_freeblocks(c44154a4,0,0,800,1) at softdep_setup_freeblocks+0x7bf ffs_truncate(c2cc3770,0,0,c00,0) at ffs_truncate+0x632 ufs_inactive(e502fbd4,c2cc37ec,c2cc3770,c2cc37ec,e502fbec) at ufs_inactive+0xe6 VOP_INACTIVE_APV(c070fb20,e502fbd4,c06d0e48,886,c071de20) at VOP_INACTIVE_APV+0xac vinactive(c2cc3770,c22c4180,c06d0e48,818,d64) at vinactive+0x8b vput(c2cc3770,0,c06dbd1a,d64,3b7) at vput+0x19e handle_workitem_remove(c72960c0,0,2,32e,0) at handle_workitem_remove+0x14a process_worklist_item(0,0,c06dbd1a,2de,435e5452) at process_worklist_item+0x20b softdep_process_worklist(0,0,c06d0e48,689,0) at softdep_process_worklist+0x130 sched_sync(0,e502fd38,c06c4e9f,30d,0) at sched_sync+0x2fe fork_exit(c05524f0,0,e502fd38) at fork_exit+0xc1 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe502fd6c, ebp = 0 --- db> --- interrupt, eip = 0xc068fc97, esp = 0xe502fbd0, ebp = 0xe502fbd4 --- spinlock_exit(c07906a0,1,c06c6d13,18b) at spinlock_exit+0x27 _mtx_unlock_spin_flags(c07906a0,0,c06cb459,34a,c070ef80) at _mtx_unlock_spin_flags+0xbc witness_checkorder(c07dfc20,9,c06dbd1a,362,c33cfdd0) at witness_checkorder+0x35e _mtx_lock_flags(c07dfc20,0,c06dbd1a,362,0) at _mtx_lock_flags+0x8a process_worklist_item(0,0,c06dbd1a,2de,435e545e) at process_worklist_item+0x356 softdep_process_worklist(0,0,c06d0e48,689,0) at softdep_process_worklist+0x130 sched_sync(0,e502fd38,c06c4e9f,30d,0) at sched_sync+0x2fe fork_exit(c05524f0,0,e502fd38) at fork_exit+0xc1 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe502fd6c, ebp = 0 --- They eventually completed, but I'm suspicious of why these 3 took so long. Kris --xA/XKXTdy9G3iaIz Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDXlZhWry0BWjoQKURArPBAKDBKZeaLnl3NCr+MZYfyHQljzxVrACgtZu6 oCpwyzSFq7U1fVR1kDh1EjU= =tMsa -----END PGP SIGNATURE----- --xA/XKXTdy9G3iaIz--