Date: Wed, 20 Jan 2010 11:01:31 +0100 From: Giovanni Trematerra <giovanni.trematerra@gmail.com> To: freebsd-current@freebsd.org Subject: Re: Bug about sched_4bsd? Message-ID: <4e6cba831001200201m2ff9def8i2b6f72091a91eeee@mail.gmail.com> In-Reply-To: <3bbf2fe11001171858o4568fe38l9b2db54ec9856b50@mail.gmail.com> References: <20100117.142200.321689433999177718.okuno.kohji@jp.panasonic.com> <20100117.152835.119882392487126976.okuno.kohji@jp.panasonic.com> <3bbf2fe11001171858o4568fe38l9b2db54ec9856b50@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 18, 2010 at 3:58 AM, Attilio Rao <attilio@freebsd.org> wrote: > 2010/1/17 Kohji Okuno <okuno.kohji@jp.panasonic.com>: >> Hello, >> >> Could you check sched_4bsd.patch, please? > > I think, instead, that what needs to happen is to have sched_switch() > to do a lock handover from sleepq/turnstile spinlock to schedlock. > That way, if threads are willing to contest on td_lock they will be > still inhibited. > I'm not sure if this patch breaks any invariant, if you may test I > would appreciate: > http://www.freebsd.org/~attilio/sched_4bsd_schedlock.diff > > Reviews and comments are appreciated. > BTW, nice catch. > > Attilio > I stressed an 8-core machine with pho's stress2 kernel stress suite and your patch seems to break the invariant THREAD_LOCKPTR_ASSERT in turnstile_claim:subr_turnstile.c The relevant stack trace are: Tracing command creat pid 79098 tid 100624 td 0xc8c59af0 kdb_enter(c0c9b0d1,c0c9b0d1,c0c9f546,e978a85c,1,...) at kdb_enter+0x3a panic(c0c9f546,c9633af0,c0dfc024,c9d3e280,c0f5bd3c,...) at panic+0x136 turnstile_claim(c9d3e280,2,c0ca7376,1f0,4,...) at turnstile_claim+0x148 _rw_try_upgrade(c0f5bd3c,c0ca7376,1f0,e978a990,e978ab88,...) at _rw_try_upgrade+0xe6 cache_lookup(c99e9bb0,e978ab74,e978ab88,0,0,...) at cache_lookup+0x362 nfs_lookup(e978aa50,e978aa50,e978ab5c,200000,e978ab5c,...) at nfs_lookup+0xf6 VOP_LOOKUP_APV(c0da5fe0,e978aa50,e978ab88,1f1,e978ab74,...) at VOP_LOOKUP_APV+0xa5 lookup(e978ab5c,c0ca7aea,ea,c5,cc093d48,...) at lookup+0x66b namei(e978ab5c,c08d3aab,c0c94c96,c0c92df9,3,...) at namei+0x55f kern_statat_vnhook(c8c59af0,0,ffffff9c,bfbfdf78,0,...) at kern_statat_vnhook+0x72 kern_statat(c8c59af0,0,ffffff9c,bfbfdf78,0,...) at kern_statat+0x3c kern_stat(c8c59af0,bfbfdf78,0,e978ac18,2,...) at kern_stat+0x36 stat(c8c59af0,e978acf8,8,c0ca19f2,c0d88230,...) at stat+0x2f syscall(e978ad38) at syscall+0x2a3 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (188, FreeBSD ELF32, stat), eip = 0x2817d1a3, esp = 0xbfbfdecc, ebp = 0xbfbfe388 --- Tracing command creat pid 79122 tid 100202 td 0xc9633af0 sched_switch(c9633af0,0,103,18c,48fb36ba,...) at sched_switch+0x1c5 mi_switch(103,0,c0c9fc0e,2e2,c9d3e280,...) at mi_switch+0x200 turnstile_wait(c9d3e280,0,0,c0f5bd3c,c0f2cb10,...) at turnstile_wait+0x495 _rw_wlock_hard(c0f5bd3c,c9633af0,c0ca7376,209,0,...) at _rw_wlock_hard+0x20c _rw_wlock(c0f5bd3c,c0ca7376,209,e9147990,e9147b88,...) at _rw_wlock+0xae cache_lookup(c99e9aa0,e9147b74,e9147b88,0,0,...) at cache_lookup+0x46f nfs_lookup(e9147a50,e9147a50,e9147b5c,200000,e9147b5c,...) at nfs_lookup+0xf6 VOP_LOOKUP_APV(c0da5fe0,e9147a50,e9147b88,1f1,e9147b74,...) at VOP_LOOKUP_APV+0xa5 lookup(e9147b5c,c0ca7aea,ea,c5,c962caa0,...) at lookup+0x66b namei(e9147b5c,c08d3aab,c0c94c96,c0c92df9,3,...) at namei+0x55f kern_statat_vnhook(c9633af0,0,ffffff9c,bfbfdf78,0,...) at kern_statat_vnhook+0x72 kern_statat(c9633af0,0,ffffff9c,bfbfdf78,0,...) at kern_statat+0x3c kern_stat(c9633af0,bfbfdf78,0,e9147c18,2,...) at kern_stat+0x36 stat(c9633af0,e9147cf8,8,c0c828f4,c0d88230,...) at stat+0x2f syscall(e9147d38) at syscall+0x2a3 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (188, FreeBSD ELF32, stat), eip = 0x2817d1a3, esp = 0xbfbfdecc, ebp = 0xbfbfe388 --- What seems to happen to me is: The thread 0xc8c59af0 sleeps on the turnstile queue by a previous call to turnstile_wait. The thread 0xc9633af0 call turnstile_wait and does a voluntary switch. The call to thread_lock_set added from your patch to sched_switch, wakes up the thread 0xc8c59af0 and while thread 0xc9633af0 is in the middle of sched_switch (so before cpu_switch), thread 0xc8c59af0 is running turnstile_claim that discover the thread 0xc9633af0 not hold a turnstile lock by THREAD_LOCKPTR_ASSERT invariant assertion. If needed I have the coredump and the entire stack trace. Hope this help. -- Gianni
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4e6cba831001200201m2ff9def8i2b6f72091a91eeee>