Date: Sat, 28 May 2022 09:02:23 +0200 From: Paul Floyd <paulf2718@gmail.com> To: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: Hang ast / pipelk / piperd Message-ID: <d7b270c1-c375-3b52-c9e7-dcf5db2deeb6@gmail.com> In-Reply-To: <84015bf9-8504-1c3c-0ba5-58d0d7824843@gmail.com> References: <84015bf9-8504-1c3c-0ba5-58d0d7824843@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5/27/22 22:13, Paul Floyd wrote: > > Hi > > I'm debugging two issues with Valgrind on FreeBSD 13.1 and 14, one on > amd64 and one on i386. > ... > |Both hangs seem quite sensitive to timing - in both cases adding or > changing nanosleep times seem to make them no longer hang. | > |Adding debug statements to Valgrind can also change the behaviour > (and is also unsafe when not holding the scheduler lock). Does this > look like a kernel bug? | |One important detail I missed out. Why is Valgrind releasing the scheduler lock?| | | |To make a client syscall. This needs to be done in "client-like" circumstances - specifically, with the client signal mask (rather than the Valgrind mask, which is to mask all signals so that Valgrind has full control).| |Two things can happen with a client syscall.| |1/ it succeeds, and Valgrind will re-acquire the lock and continue.| |2/ it gets interrupted, Valgrind re-acquires the lock, does a load of stuff to fixup the guest state and take the appropriate action (restart, return EINTR, save carry etc).| | | |I did think that 2/ might be prone to get into an infinite loop, especially with restart. But I don't see anything like that in the Valgrind logs.| PJF thread 14 making a client nanosleep syscall |SYSCALL[5379,14](240) sys_nanosleep ( 0x200890, 0x0 ) --> [async] ... | |PJF -thread 14 releases the scheduler lock --5379-- SCHED[14]: releasing lock (VG_(client_syscall)[async]) -> VgTs_WaitSys | |PJF thread 2 acquires the scheduler lock --5379-- SCHED[2]: acquired lock (VG_(client_syscall)[async]) || | |PJF thread 2 return from nanosleep SYSCALL[5379,2](240) ... [async] --> Success(0x0) PJF thread 2 making a client write syscall SYSCALL[5379,2]( 4) sys_write ( 1, 0x4ea9000, 48 ) --> [async] ... PJF thread 2 releases the scheduler lock --5379-- SCHED[2]: releasing lock (VG_(client_syscall)[async]) -> VgTs_WaitSys PJF this is the thread 2 printf from syscall write tls_ptr: case "race" has mismatch: *ip=8 here=4 PJF thread 2 acquires the scheduler lock --5379-- SCHED[2]: acquired lock (VG_(client_syscall)[async]) PJF thread 2 return from write (30 bytes written) SYSCALL[5379,2]( 4) ... [async] --> Success(0x30) PJF thread 2 making a client nanosleep syscall SYSCALL[5379,2](240) sys_nanosleep ( 0x200890, 0x0 ) --> [async] ... PJF thread 2 releases the scheduler lock --5379-- SCHED[2]: releasing lock (VG_(client_syscall)[async]) -> VgTs_WaitSys | |And that's it, it hangs making the client nanosleep syscall.| | | |A+| |Paul | ||
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d7b270c1-c375-3b52-c9e7-dcf5db2deeb6>