Date: Thu, 16 Feb 2012 18:19:06 -0800 From: Julian Elischer <julian@freebsd.org> To: davidxu@freebsd.org Cc: Alexander Kabaev <kan@freebsd.org>, threads@freebsd.org, David Xu <listlog2011@gmail.com>, FreeBSD Stable <freebsd-stable@freebsd.org>, Andriy Gapon <avg@freebsd.org> Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) Message-ID: <4F3DB91A.2090806@freebsd.org> In-Reply-To: <4F3DB3DB.2060603@gmail.com> References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3DB.2060603@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2/16/12 5:56 PM, David Xu wrote: > On 2012/2/17 8:42, Julian Elischer wrote: >> Adding David Xu for his thoughts since he reqrote the code in >> quesiton in revision 213098 >> >> On 2/16/12 2:57 PM, Julian Elischer wrote: >>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>> The program fio (an IO test in ports) uses pthreads >>>>>> >>>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>>> has suddenly started misbehaving. >>>>>> >>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>> t.tv_sec += seconds + 10; >>>>>> >>>>>> pthread_mutex_lock(&mutex->lock); >>>>>> >>>>>> while (!mutex->value&& !ret) { >>>>>> mutex->waiters++; >>>>>> ret = >>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>> mutex->waiters--; >>>>>> } >>>>>> >>>>>> if (!ret) { >>>>>> mutex->value--; >>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>> } >>>>>> >>>>>> >>>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>>> machine) with a >>>>>> value of 60 (ETIMEDOUT) >>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>> future. >>>>>> >>>>>> Has anyone else seen anything like this? >>>>>> (and yes the condition variable attribute have been set to use >>>>>> the REALTIME clock). >>>>> But why? >>>>> >>>>> Just a hypothesis that maybe there is some issue with time >>>>> keeping on that system. >>>>> How would that code work out for you with MONOTONIC? >>>> >>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, >>>> and they both had the same problem.. >>>> i.e. random early returns with ETIMEDOUT. >>>> >>>> I think we will try move out machine forward to a newer -stable >>>> to see if it resolves. >>> Kan upgraded the machine today to today's 9.x branch tip and the >>> problem still occurs. >>> 8.x does not have this problem. >>> >>> I have not got a 9-RELEASE machine to test on.. so I can not tell >>> if this came in with the burst of stuff >>> that came in after the 9.x branch was unfrozen after the release >>> of 9.0. >>> >>> >> > I am trying to reproduce the problem, do you have complete sample > code to test ? I'm still looking the exact set but on my machine (4 cpus) the program from ports sysutils/fio exhibits the problem when used with kern.timecounter.hardware=TSC-low and with the following config file: pu05 # cat config.fio [global] #clocksource=cpu direct=1 rw=randread bs=4096 fill_device=1 numjobs=16 iodepth=16 #ioengine=posixaio #ioengine=psync ioengine=psync group_reporting norandommap time_based runtime=60000 randrepeat=0 [file1] filename=/dev/ada0 pu05 # pu05 # fio config.fio fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 ... file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 fio 2.0.3 Starting 15 threads and 1 process fio: job startup hung? exiting. fio: 5 jobs failed to start Segmentation fault (core dumped) pu05# The reason 5 jobs failed to start is because the parent timed out on them immediately. It didn't time out on 10 of them apparently. if I set the timer to ACPI-fast it works as expected.. > > Regards, > David Xu > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F3DB91A.2090806>