Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Oct 2011 15:26:46 +1100
From:      Peter Jeremy <peterjeremy@acm.org>
To:        Marius Strobl <marius@alchemy.franken.de>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: 'make -j16 universe' gives SIReset
Message-ID:  <20111018042646.GA18863@server.vk2pj.dyndns.org>
In-Reply-To: <20111013184224.GG39118@alchemy.franken.de>
References:  <20110816214820.GA35017@server.vk2pj.dyndns.org> <20110817094541.GJ48988@alchemy.franken.de> <20110830152725.GA28552@alchemy.franken.de> <20110831212458.GA25926@server.vk2pj.dyndns.org> <20110902153206.GR40781@alchemy.franken.de> <20111006120411.GA903@alchemy.franken.de> <20111011030529.GA4093@server.vk2pj.dyndns.org> <20111011205543.GA81376@alchemy.franken.de> <20111013035648.GA54190@server.vk2pj.dyndns.org> <20111013184224.GG39118@alchemy.franken.de>

next in thread | previous in thread | raw e-mail | index | archive | help

--mYCpIKhGyMATD0i+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2011-Oct-13 20:42:25 +0200, Marius Strobl <marius@alchemy.franken.de> wr=
ote:
>On Thu, Oct 13, 2011 at 02:56:48PM +1100, Peter Jeremy wrote:
>> Unfortunately, I can't get a crashdump because dumpon(8) doesn't like
>> my Solaris swap partitions:
>> GEOM_PART: Partition 'da0b' not suitable for kernel dumps (wrong type?)
>> GEOM_PART: Partition 'da6b' not suitable for kernel dumps (wrong type?)
>> No suitable dump device was found.
>>=20
>> I did write a patch for that but took it out during some earlier
>> testing to get back to stock code.  It looks like I didn't PR it
>> either so I will do that when I get some time.

I've resurrected that patch (and will send-pr it later).

>Hrm, this backtrace seems impossible as vmtotal() explicitly locks
>the object before calling vm_object_clear_flag(). A crash dump of
>this panic really would be interesting.

I've reproduced the same panic and got a crashdump (2 hours for
the dump and another hour for the savecore):
VNASSERT failed
panic: mutex vm object not owned at /usr/src/sys/vm/vm_object.c:281
cpuid =3D 7

#10 0x00000000c04ffbf4 in panic (fmt=3D0xc0a906d0 "mutex %s not owned at %s=
:%d") at /usr/src/sys/kern/kern_shutdown.c:599
#11 0x00000000c04eb1b8 in _mtx_assert (m=3D0xfffff8b29d750ca8, what=3D0x4, =
file=3D0xc0ac6c00 "/usr/src/sys/vm/vm_object.c", line=3D0x119) at /usr/src/=
sys/kern/kern_mutex.c:706
#12 0x00000000c07f4b0c in vm_object_clear_flag (object=3D0xfffff8b29d750ca8=
, bits=3D0x4) at /usr/src/sys/vm/vm_object.c:281
#13 0x00000000c07f1dac in vmtotal (oidp=3D0xc0ba9be8, arg1=3D0x0, arg2=3D0x=
30, req=3D0xef8a54e0) at /usr/src/sys/vm/vm_meter.c:121
#14 0x00000000c050c13c in sysctl_root (oidp=3DVariable "oidp" is not availa=
ble.
) at /usr/src/sys/kern/kern_sysctl.c:1509
#15 0x00000000c050c434 in userland_sysctl (td=3D0x0, name=3D0xef8a5628, nam=
elen=3D0x2, old=3D0x0, oldlenp=3DVariable "oldlenp" is not available.) at /=
usr/src/sys/kern/kern_sysctl.c:1619
#16 0x00000000c050c858 in sys___sysctl (td=3D0xfffff8a2e3ef48c0, uap=3D0xef=
8a5768) at /usr/src/sys/kern/kern_sysctl.c:1545
#17 0x00000000c086ba00 in syscall (tf=3DVariable "tf" is not available.) at=
 subr_syscall.c:131
#18 0x00000000c0098e60 in tl0_intr ()

(kgdb) p *object
$1 =3D {
  mtx =3D {
    lock_object =3D {
      lo_name =3D 0xc0a9a308 "vm object",=20
      lo_flags =3D 0x1430000,=20
      lo_data =3D 0x0,=20
      lo_witness =3D 0xfff85180
    },=20
    mtx_lock =3D 0xfffff8a0112d75e0
  },=20
=2E..
}
(kgdb) p *object->mtx->lock_object->lo_witness
$3 =3D {
  w_name =3D "standard object", '\0' <repeats 48 times>,=20
  w_index =3D 0xa3,=20
  w_class =3D 0xc0b82e88,=20
  w_list =3D {
    stqe_next =3D 0xfff85100
  },=20
  w_typelist =3D {
    stqe_next =3D 0xfff85100
  },=20
  w_hash_next =3D 0x0,=20
  w_file =3D 0xc0ac6388 "/usr/src/sys/vm/vm_meter.c",=20
  w_line =3D 0x71,=20
  w_refcount =3D 0x53718,=20
  w_num_ancestors =3D 0xe,=20
  w_num_descendants =3D 0xe,=20
  w_ddb_level =3D 0x0,=20
  w_displayed =3D 0x1,=20
  w_reversed =3D 0x0
}
(kgdb) p vm_object_list_mtx
$4 =3D {
  lock_object =3D {
    lo_name =3D 0xc0ac6e30 "vm object_list",=20
    lo_flags =3D 0x1030000,=20
    lo_data =3D 0x0,=20
    lo_witness =3D 0xfff81d80
  },=20
  mtx_lock =3D 0xfffff8a2e3ef48c2
}
(kgdb) p *vm_object_list_mtx.lock_object.lo_witness=20
$6 =3D {
  w_name =3D "vm object_list", '\0' <repeats 49 times>,=20
  w_index =3D 0x3b,=20
  w_class =3D 0xc0b82e88,=20
  w_list =3D {
    stqe_next =3D 0xfff81d00
  },=20
  w_typelist =3D {
    stqe_next =3D 0xfff81d00
  },=20
  w_hash_next =3D 0x0,=20
  w_file =3D 0xc0ac6388 "/usr/src/sys/vm/vm_meter.c",=20
  w_line =3D 0x6f,=20
  w_refcount =3D 0x1,=20
  w_num_ancestors =3D 0xf,=20
  w_num_descendants =3D 0x0,=20
  w_ddb_level =3D 0x0,=20
  w_displayed =3D 0x1,=20
  w_reversed =3D 0x0
}

The witness information looks correct but I notice that vm_object_list_mtx
is owned by a different thread to the vm_object that triggers the panic.

The panic says it occurred on CPU 7:
(kgdb) p cpuid_to_pcpu[7]->pc_curthread
$21 =3D (struct thread *) 0xfffff8a2e3ef48c0
which matches the vm_object_list_mtx.

My inital thought was a locking glitch but, looking through
cpuid_to_pcpu[], the vm_object's lock doesn't match any running thread:

(kgdb) p cpuid_to_pcpu[0]->pc_curthread
$14 =3D (struct thread *) 0xfffff8a2e3008000
(kgdb) p cpuid_to_pcpu[1]->pc_curthread
$15 =3D (struct thread *) 0xfffff8a2aae7c8c0
(kgdb) p cpuid_to_pcpu[2]->pc_curthread
$16 =3D (struct thread *) 0xfffff8a0112acd20
(kgdb) p cpuid_to_pcpu[3]->pc_curthread
$17 =3D (struct thread *) 0xfffff8a0112ac8c0
(kgdb) p cpuid_to_pcpu[4]->pc_curthread
$18 =3D (struct thread *) 0xfffff8a2aae7da40
(kgdb) p cpuid_to_pcpu[5]->pc_curthread
$19 =3D (struct thread *) 0xfffff8a2aa2a6460
(kgdb) p cpuid_to_pcpu[6]->pc_curthread
$20 =3D (struct thread *) 0xfffff8a2e3148d20
(kgdb) p cpuid_to_pcpu[7]->pc_curthread
$21 =3D (struct thread *) 0xfffff8a2e3ef48c0
(kgdb) p cpuid_to_pcpu[8]->pc_curthread
$22 =3D (struct thread *) 0xfffff8d32cfa0460
(kgdb) p cpuid_to_pcpu[9]->pc_curthread
$23 =3D (struct thread *) 0xfffff8a0112b3a40
(kgdb) p cpuid_to_pcpu[10]->pc_curthread
$24 =3D (struct thread *) 0xfffff8a2a8f77180
(kgdb) p cpuid_to_pcpu[11]->pc_curthread
$25 =3D (struct thread *) 0xfffff8a2e3ef1a40
(kgdb) p cpuid_to_pcpu[12]->pc_curthread
$26 =3D (struct thread *) 0xfffff8a2e319e8c0
(kgdb) p cpuid_to_pcpu[13]->pc_curthread
$27 =3D (struct thread *) 0xfffff8a2e3c30d20
(kgdb) p cpuid_to_pcpu[14]->pc_curthread
$28 =3D (struct thread *) 0xfffff8a0112b2460
(kgdb) p cpuid_to_pcpu[15]->pc_curthread
$29 =3D (struct thread *) 0xfffff8c1f78cb180

Some rummaging around says that the object is locked by yarrow:
(kgdb) p ((struct thread *) 0xfffff8a0112d75e0)->td_proc.p_comm
$35 =3D "yarrow", '\0' <repeats 13 times>

At this stage, I'm not sure where to go next.

--=20
Peter Jeremy

--mYCpIKhGyMATD0i+
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAk6dAAYACgkQ/opHv/APuIdILQCgvSvXFoWS5pZovoJT/RANMk8Y
95YAn3WeigJ2bT5zaE/7OYwl8zHPSeZP
=SYZg
-----END PGP SIGNATURE-----

--mYCpIKhGyMATD0i+--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111018042646.GA18863>