Date: Tue, 18 Oct 2011 15:26:46 +1100 From: Peter Jeremy <peterjeremy@acm.org> To: Marius Strobl <marius@alchemy.franken.de> Cc: freebsd-sparc64@freebsd.org Subject: Re: 'make -j16 universe' gives SIReset Message-ID: <20111018042646.GA18863@server.vk2pj.dyndns.org> In-Reply-To: <20111013184224.GG39118@alchemy.franken.de> References: <20110816214820.GA35017@server.vk2pj.dyndns.org> <20110817094541.GJ48988@alchemy.franken.de> <20110830152725.GA28552@alchemy.franken.de> <20110831212458.GA25926@server.vk2pj.dyndns.org> <20110902153206.GR40781@alchemy.franken.de> <20111006120411.GA903@alchemy.franken.de> <20111011030529.GA4093@server.vk2pj.dyndns.org> <20111011205543.GA81376@alchemy.franken.de> <20111013035648.GA54190@server.vk2pj.dyndns.org> <20111013184224.GG39118@alchemy.franken.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--mYCpIKhGyMATD0i+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2011-Oct-13 20:42:25 +0200, Marius Strobl <marius@alchemy.franken.de> wr= ote: >On Thu, Oct 13, 2011 at 02:56:48PM +1100, Peter Jeremy wrote: >> Unfortunately, I can't get a crashdump because dumpon(8) doesn't like >> my Solaris swap partitions: >> GEOM_PART: Partition 'da0b' not suitable for kernel dumps (wrong type?) >> GEOM_PART: Partition 'da6b' not suitable for kernel dumps (wrong type?) >> No suitable dump device was found. >>=20 >> I did write a patch for that but took it out during some earlier >> testing to get back to stock code. It looks like I didn't PR it >> either so I will do that when I get some time. I've resurrected that patch (and will send-pr it later). >Hrm, this backtrace seems impossible as vmtotal() explicitly locks >the object before calling vm_object_clear_flag(). A crash dump of >this panic really would be interesting. I've reproduced the same panic and got a crashdump (2 hours for the dump and another hour for the savecore): VNASSERT failed panic: mutex vm object not owned at /usr/src/sys/vm/vm_object.c:281 cpuid =3D 7 #10 0x00000000c04ffbf4 in panic (fmt=3D0xc0a906d0 "mutex %s not owned at %s= :%d") at /usr/src/sys/kern/kern_shutdown.c:599 #11 0x00000000c04eb1b8 in _mtx_assert (m=3D0xfffff8b29d750ca8, what=3D0x4, = file=3D0xc0ac6c00 "/usr/src/sys/vm/vm_object.c", line=3D0x119) at /usr/src/= sys/kern/kern_mutex.c:706 #12 0x00000000c07f4b0c in vm_object_clear_flag (object=3D0xfffff8b29d750ca8= , bits=3D0x4) at /usr/src/sys/vm/vm_object.c:281 #13 0x00000000c07f1dac in vmtotal (oidp=3D0xc0ba9be8, arg1=3D0x0, arg2=3D0x= 30, req=3D0xef8a54e0) at /usr/src/sys/vm/vm_meter.c:121 #14 0x00000000c050c13c in sysctl_root (oidp=3DVariable "oidp" is not availa= ble. ) at /usr/src/sys/kern/kern_sysctl.c:1509 #15 0x00000000c050c434 in userland_sysctl (td=3D0x0, name=3D0xef8a5628, nam= elen=3D0x2, old=3D0x0, oldlenp=3DVariable "oldlenp" is not available.) at /= usr/src/sys/kern/kern_sysctl.c:1619 #16 0x00000000c050c858 in sys___sysctl (td=3D0xfffff8a2e3ef48c0, uap=3D0xef= 8a5768) at /usr/src/sys/kern/kern_sysctl.c:1545 #17 0x00000000c086ba00 in syscall (tf=3DVariable "tf" is not available.) at= subr_syscall.c:131 #18 0x00000000c0098e60 in tl0_intr () (kgdb) p *object $1 =3D { mtx =3D { lock_object =3D { lo_name =3D 0xc0a9a308 "vm object",=20 lo_flags =3D 0x1430000,=20 lo_data =3D 0x0,=20 lo_witness =3D 0xfff85180 },=20 mtx_lock =3D 0xfffff8a0112d75e0 },=20 =2E.. } (kgdb) p *object->mtx->lock_object->lo_witness $3 =3D { w_name =3D "standard object", '\0' <repeats 48 times>,=20 w_index =3D 0xa3,=20 w_class =3D 0xc0b82e88,=20 w_list =3D { stqe_next =3D 0xfff85100 },=20 w_typelist =3D { stqe_next =3D 0xfff85100 },=20 w_hash_next =3D 0x0,=20 w_file =3D 0xc0ac6388 "/usr/src/sys/vm/vm_meter.c",=20 w_line =3D 0x71,=20 w_refcount =3D 0x53718,=20 w_num_ancestors =3D 0xe,=20 w_num_descendants =3D 0xe,=20 w_ddb_level =3D 0x0,=20 w_displayed =3D 0x1,=20 w_reversed =3D 0x0 } (kgdb) p vm_object_list_mtx $4 =3D { lock_object =3D { lo_name =3D 0xc0ac6e30 "vm object_list",=20 lo_flags =3D 0x1030000,=20 lo_data =3D 0x0,=20 lo_witness =3D 0xfff81d80 },=20 mtx_lock =3D 0xfffff8a2e3ef48c2 } (kgdb) p *vm_object_list_mtx.lock_object.lo_witness=20 $6 =3D { w_name =3D "vm object_list", '\0' <repeats 49 times>,=20 w_index =3D 0x3b,=20 w_class =3D 0xc0b82e88,=20 w_list =3D { stqe_next =3D 0xfff81d00 },=20 w_typelist =3D { stqe_next =3D 0xfff81d00 },=20 w_hash_next =3D 0x0,=20 w_file =3D 0xc0ac6388 "/usr/src/sys/vm/vm_meter.c",=20 w_line =3D 0x6f,=20 w_refcount =3D 0x1,=20 w_num_ancestors =3D 0xf,=20 w_num_descendants =3D 0x0,=20 w_ddb_level =3D 0x0,=20 w_displayed =3D 0x1,=20 w_reversed =3D 0x0 } The witness information looks correct but I notice that vm_object_list_mtx is owned by a different thread to the vm_object that triggers the panic. The panic says it occurred on CPU 7: (kgdb) p cpuid_to_pcpu[7]->pc_curthread $21 =3D (struct thread *) 0xfffff8a2e3ef48c0 which matches the vm_object_list_mtx. My inital thought was a locking glitch but, looking through cpuid_to_pcpu[], the vm_object's lock doesn't match any running thread: (kgdb) p cpuid_to_pcpu[0]->pc_curthread $14 =3D (struct thread *) 0xfffff8a2e3008000 (kgdb) p cpuid_to_pcpu[1]->pc_curthread $15 =3D (struct thread *) 0xfffff8a2aae7c8c0 (kgdb) p cpuid_to_pcpu[2]->pc_curthread $16 =3D (struct thread *) 0xfffff8a0112acd20 (kgdb) p cpuid_to_pcpu[3]->pc_curthread $17 =3D (struct thread *) 0xfffff8a0112ac8c0 (kgdb) p cpuid_to_pcpu[4]->pc_curthread $18 =3D (struct thread *) 0xfffff8a2aae7da40 (kgdb) p cpuid_to_pcpu[5]->pc_curthread $19 =3D (struct thread *) 0xfffff8a2aa2a6460 (kgdb) p cpuid_to_pcpu[6]->pc_curthread $20 =3D (struct thread *) 0xfffff8a2e3148d20 (kgdb) p cpuid_to_pcpu[7]->pc_curthread $21 =3D (struct thread *) 0xfffff8a2e3ef48c0 (kgdb) p cpuid_to_pcpu[8]->pc_curthread $22 =3D (struct thread *) 0xfffff8d32cfa0460 (kgdb) p cpuid_to_pcpu[9]->pc_curthread $23 =3D (struct thread *) 0xfffff8a0112b3a40 (kgdb) p cpuid_to_pcpu[10]->pc_curthread $24 =3D (struct thread *) 0xfffff8a2a8f77180 (kgdb) p cpuid_to_pcpu[11]->pc_curthread $25 =3D (struct thread *) 0xfffff8a2e3ef1a40 (kgdb) p cpuid_to_pcpu[12]->pc_curthread $26 =3D (struct thread *) 0xfffff8a2e319e8c0 (kgdb) p cpuid_to_pcpu[13]->pc_curthread $27 =3D (struct thread *) 0xfffff8a2e3c30d20 (kgdb) p cpuid_to_pcpu[14]->pc_curthread $28 =3D (struct thread *) 0xfffff8a0112b2460 (kgdb) p cpuid_to_pcpu[15]->pc_curthread $29 =3D (struct thread *) 0xfffff8c1f78cb180 Some rummaging around says that the object is locked by yarrow: (kgdb) p ((struct thread *) 0xfffff8a0112d75e0)->td_proc.p_comm $35 =3D "yarrow", '\0' <repeats 13 times> At this stage, I'm not sure where to go next. --=20 Peter Jeremy --mYCpIKhGyMATD0i+ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk6dAAYACgkQ/opHv/APuIdILQCgvSvXFoWS5pZovoJT/RANMk8Y 95YAn3WeigJ2bT5zaE/7OYwl8zHPSeZP =SYZg -----END PGP SIGNATURE----- --mYCpIKhGyMATD0i+--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111018042646.GA18863>