From owner-freebsd-threads@FreeBSD.ORG Mon Feb 13 11:08:14 2012 Return-Path: Delivered-To: freebsd-threads@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50EB410656D8 for ; Mon, 13 Feb 2012 11:08:14 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 23F648FC1E for ; Mon, 13 Feb 2012 11:08:14 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1DB8EM3091055 for ; Mon, 13 Feb 2012 11:08:14 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1DB8Drr091052 for freebsd-threads@FreeBSD.org; Mon, 13 Feb 2012 11:08:13 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 13 Feb 2012 11:08:13 GMT Message-Id: <201202131108.q1DB8Drr091052@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-threads@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Feb 2012 11:08:14 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o threa/163512 threads libc defaults to single threaded o threa/160708 threads possible security problem with RLIMIT_VMEM o threa/150959 threads [libc] Stub pthread_once in libc should call _libc_onc o threa/148515 threads Memory / syslog strangeness in FreeBSD 8.x ( possible o threa/141721 threads rtprio(1): (id|rt)prio priority resets when new thread o threa/135673 threads databases/mysql50-server - MySQL query lock-ups on 7.2 o threa/128922 threads threads hang with xorg running o threa/122923 threads 'nice' does not prevent background process from steali o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/101323 threads [patch] fork(2) in threaded programs broken. s threa/84483 threads problems with devel/nspr and -lc_r on 4.x o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/79683 threads svctcp_create() fails if multiple threads call at the s threa/76694 threads fork cause hang in dup()/close() function in child (-l s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/34536 threads accept() blocks other threads s threa/30464 threads [patch] pthread mutex attributes -- pshared 22 problems total. From owner-freebsd-threads@FreeBSD.ORG Wed Feb 15 11:40:07 2012 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A03CA106564A for ; Wed, 15 Feb 2012 11:40:07 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5A1E58FC1A for ; Wed, 15 Feb 2012 11:40:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1FBe7Tk092767 for ; Wed, 15 Feb 2012 11:40:07 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1FBe7R1092766; Wed, 15 Feb 2012 11:40:07 GMT (envelope-from gnats) Resent-Date: Wed, 15 Feb 2012 11:40:07 GMT Resent-Message-Id: <201202151140.q1FBe7R1092766@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-threads@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Shane Ambler Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 239C8106564A for ; Wed, 15 Feb 2012 11:36:15 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 0F3E88FC14 for ; Wed, 15 Feb 2012 11:36:15 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q1FBaEHO072987 for ; Wed, 15 Feb 2012 11:36:14 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id q1FBaEfj072986; Wed, 15 Feb 2012 11:36:14 GMT (envelope-from nobody) Message-Id: <201202151136.q1FBaEfj072986@red.freebsd.org> Date: Wed, 15 Feb 2012 11:36:14 GMT From: Shane Ambler To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: threads/165173: clang buildworld breaks libthr X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2012 11:40:07 -0000 >Number: 165173 >Category: threads >Synopsis: clang buildworld breaks libthr >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-threads >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Feb 15 11:40:07 UTC 2012 >Closed-Date: >Last-Modified: >Originator: Shane Ambler >Release: 9.0-RELEASE >Organization: >Environment: FreeBSD leader 9.0-RELEASE FreeBSD 9.0-RELEASE #1: Wed Feb 15 16:03:18 CST 2012 root@:/usr/obj/usr/src/sys/GENERIC amd64 >Description: With a world built with clang a bus error is produced within libthr on a call to sigprocmask(), the common backtrace appears to be - #0 0x00000008019001d5 in sigprocmask () from /lib/libthr.so.3 #1 0x0000000801b5b2ac in longjmp () from /lib/libc.so.7 Testing has shown that a world built with clang has this issue and replacing /lib/libthr.so.3 with one from a gcc built world fixes the issue. More details can be found at http://forums.freebsd.org/showthread.php?t=29782 >How-To-Repeat: The easiest way I have found is to build lang/perl5.12 within a clang built world. >Fix: >Release-Note: >Audit-Trail: >Unformatted: From owner-freebsd-threads@FreeBSD.ORG Wed Feb 15 16:40:18 2012 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 07BF0106564A for ; Wed, 15 Feb 2012 16:40:18 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D0FAE8FC0A for ; Wed, 15 Feb 2012 16:40:17 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1FGeHAJ069513 for ; Wed, 15 Feb 2012 16:40:17 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q1FGeHRW069512; Wed, 15 Feb 2012 16:40:17 GMT (envelope-from gnats) Date: Wed, 15 Feb 2012 16:40:17 GMT Message-Id: <201202151640.q1FGeHRW069512@freefall.freebsd.org> To: freebsd-threads@FreeBSD.org From: Konstantin Belousov Cc: Subject: Re: threads/165173: clang buildworld breaks libthr X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Konstantin Belousov List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2012 16:40:18 -0000 The following reply was made to PR threads/165173; it has been noted by GNATS. From: Konstantin Belousov To: Shane Ambler Cc: freebsd-gnats-submit@freebsd.org Subject: Re: threads/165173: clang buildworld breaks libthr Date: Wed, 15 Feb 2012 18:02:12 +0200 This should have been fixed by r227023, merged to stable as r229008. The fix is not in release. From owner-freebsd-threads@FreeBSD.ORG Wed Feb 15 21:58:45 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9703E106566C for ; Wed, 15 Feb 2012 21:58:45 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 661FF8FC13 for ; Wed, 15 Feb 2012 21:58:45 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1FLdZcI009369 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 15 Feb 2012 13:39:40 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3C2671.3090808@freebsd.org> Date: Wed, 15 Feb 2012 13:41:05 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: threads@freebsd.org, FreeBSD Stable Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jens Axboe Subject: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2012 21:58:45 -0000 The program fio (an IO test in ports) uses pthreads the following code (from fio-2.0.3, but its in earlier code too) has suddenly started misbehaving. clock_gettime(CLOCK_REALTIME, &t); t.tv_sec += seconds + 10; pthread_mutex_lock(&mutex->lock); while (!mutex->value && !ret) { mutex->waiters++; ret = pthread_cond_timedwait(&mutex->cond, &mutex->lock, &t); mutex->waiters--; } if (!ret) { mutex->value--; pthread_mutex_unlock(&mutex->lock); } It turns out that 'ret' sometimes comes back instantly (on my machine) with a value of 60 (ETIMEDOUT) despite the fact that we set the timeout 10 seconds into the future. Has anyone else seen anything like this? (and yes the condition variable attribute have been set to use the REALTIME clock). From owner-freebsd-threads@FreeBSD.ORG Thu Feb 16 14:25:59 2012 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A78701065670 for ; Thu, 16 Feb 2012 14:25:58 +0000 (UTC) (envelope-from cs@sdata.de) Received: from mailgw.sdata.de (mailgw.sdata.de [193.30.133.65]) by mx1.freebsd.org (Postfix) with ESMTP id 58BBE8FC0C for ; Thu, 16 Feb 2012 14:25:56 +0000 (UTC) Received: from mailgw.sdata.de (mailgw.sdata.de [127.0.0.1]) by mailgw.sdata.de (iRedMail) with ESMTP id 6F93633C33 for ; Thu, 16 Feb 2012 15:10:53 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=sdata.de; h= mime-version:content-transfer-encoding:to:date:date:message-id :x-mailer:content-type:content-type:from:from:subject:subject; s=dkim; t=1329401447; x=1330265448; bh=OLoZvYO51cFo/4gisS8CoE5J WWSnvwktPWxppSgyDq0=; b=A1ZlFNONzxTN5R95aw/yVzn/rdMvJw6LuyKzA1fF ul5V5vxv7w5Ppi3ohC6NBjK8OJrSQqlGIr/LLM3l8REF/CGbYPzn16YQOSUkEGIE HrCf/0lKhLWfxbDuJVRk6IZcEStXA1vzWri3qmtpggpGRTykxmSkcoIL7SxAfaAS oow= X-Virus-Scanned: amavisd-new at mailgw.sdata.de Received: from mailgw.sdata.de ([127.0.0.1]) by mailgw.sdata.de (mailgw.sdata.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id swEd3rGnXzDx for ; Thu, 16 Feb 2012 15:10:47 +0100 (CET) Received: from [192.168.133.100] (p4FF17669.dip.t-dialin.net [79.241.118.105]) by mailgw.sdata.de (iRedMail) with ESMTPSA id DD74F33C2C for ; Thu, 16 Feb 2012 15:10:46 +0100 (CET) From: Christoph Splittgerber Content-Type: text/plain; charset=us-ascii X-Mailer: iPad Mail (9A405) Message-Id: <813D616D-66E4-41BB-9D4C-BB736268B88A@sdata.de> Date: Thu, 16 Feb 2012 15:10:47 +0100 To: "freebsd-threads@freebsd.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) Subject: How to map a thread-id to a thread-address X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2012 14:25:59 -0000 Hallo, I hope his is the correct maling list for his: I need a per thread overview of CPU time used. I found out that a "ps -H -otdaddr,time" gives me exactly this. The question= now is, how to relate the thread-address printed by the ps command to my th= reads. I did let the program print the thread-ids, and the threads stack-add= ress but non of them correlate to the address printed by ps. I would be grateful for any pointers. Thanks in advance, Christoph From owner-freebsd-threads@FreeBSD.ORG Thu Feb 16 17:45:38 2012 Return-Path: Delivered-To: threads@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A45DA106566C for ; Thu, 16 Feb 2012 17:45:38 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EC0878FC0C for ; Thu, 16 Feb 2012 17:45:37 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (beta-e.starpoint.kiev.ua [212.40.38.102]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA23892; Thu, 16 Feb 2012 19:34:38 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <4F3D3E2D.9090100@FreeBSD.org> Date: Thu, 16 Feb 2012 19:34:37 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0) Gecko/20120206 Thunderbird/10.0 MIME-Version: 1.0 To: Julian Elischer References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> In-Reply-To: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> X-Enigmail-Version: 1.3.5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: threads@FreeBSD.org, FreeBSD Stable , Jens Axboe Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2012 17:45:38 -0000 on 15/02/2012 23:41 Julian Elischer said the following: > The program fio (an IO test in ports) uses pthreads > > the following code (from fio-2.0.3, but its in earlier code too) > has suddenly started misbehaving. > > clock_gettime(CLOCK_REALTIME, &t); > t.tv_sec += seconds + 10; > > pthread_mutex_lock(&mutex->lock); > > while (!mutex->value && !ret) { > mutex->waiters++; > ret = pthread_cond_timedwait(&mutex->cond, &mutex->lock, &t); > mutex->waiters--; > } > > if (!ret) { > mutex->value--; > pthread_mutex_unlock(&mutex->lock); > } > > > It turns out that 'ret' sometimes comes back instantly (on my machine) with a > value of 60 (ETIMEDOUT) > despite the fact that we set the timeout 10 seconds into the future. > > Has anyone else seen anything like this? > (and yes the condition variable attribute have been set to use the REALTIME clock). But why? Just a hypothesis that maybe there is some issue with time keeping on that system. How would that code work out for you with MONOTONIC? -- Andriy Gapon From owner-freebsd-threads@FreeBSD.ORG Thu Feb 16 21:05:09 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5EC7E106564A; Thu, 16 Feb 2012 21:05:09 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 2ECC08FC0A; Thu, 16 Feb 2012 21:05:09 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1GL554j015543 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 16 Feb 2012 13:05:08 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3D6FDD.9050808@freebsd.org> Date: Thu, 16 Feb 2012 13:06:37 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: Andriy Gapon References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> In-Reply-To: <4F3D3E2D.9090100@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@freebsd.org, FreeBSD Stable , Jens Axboe Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2012 21:05:09 -0000 On 2/16/12 9:34 AM, Andriy Gapon wrote: > on 15/02/2012 23:41 Julian Elischer said the following: >> The program fio (an IO test in ports) uses pthreads >> >> the following code (from fio-2.0.3, but its in earlier code too) >> has suddenly started misbehaving. >> >> clock_gettime(CLOCK_REALTIME,&t); >> t.tv_sec += seconds + 10; >> >> pthread_mutex_lock(&mutex->lock); >> >> while (!mutex->value&& !ret) { >> mutex->waiters++; >> ret = pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >> mutex->waiters--; >> } >> >> if (!ret) { >> mutex->value--; >> pthread_mutex_unlock(&mutex->lock); >> } >> >> >> It turns out that 'ret' sometimes comes back instantly (on my machine) with a >> value of 60 (ETIMEDOUT) >> despite the fact that we set the timeout 10 seconds into the future. >> >> Has anyone else seen anything like this? >> (and yes the condition variable attribute have been set to use the REALTIME clock). > But why? > > Just a hypothesis that maybe there is some issue with time keeping on that system. > How would that code work out for you with MONOTONIC? Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, and they both had the same problem.. i.e. random early returns with ETIMEDOUT. I think we will try move out machine forward to a newer -stable to see if it resolves. From owner-freebsd-threads@FreeBSD.ORG Thu Feb 16 21:33:06 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B79A1065670 for ; Thu, 16 Feb 2012 21:33:06 +0000 (UTC) (envelope-from JAxboe@fusionio.com) Received: from mx1.fusionio.com (mx1.fusionio.com [66.114.96.30]) by mx1.freebsd.org (Postfix) with ESMTP id 4F7748FC15 for ; Thu, 16 Feb 2012 21:33:06 +0000 (UTC) X-ASG-Debug-ID: 1329426816-03d6a50ee0138200001-TYnE9X Received: from mail1.int.fusionio.com (mail1.int.fusionio.com [10.101.1.21]) by mx1.fusionio.com with ESMTP id 6Wqqh7uHIwmH680t; Thu, 16 Feb 2012 14:13:36 -0700 (MST) X-Barracuda-Envelope-From: JAxboe@fusionio.com Received: from [192.168.0.212] (188.20.58.220) by mail.fusionio.com (10.101.1.19) with Microsoft SMTP Server (TLS) id 8.3.83.0; Thu, 16 Feb 2012 14:13:35 -0700 Message-ID: <4F3D717C.9040309@fusionio.com> Date: Thu, 16 Feb 2012 22:13:32 +0100 From: Jens Axboe MIME-Version: 1.0 To: Julian Elischer References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> X-ASG-Orig-Subj: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) In-Reply-To: <4F3D6FDD.9050808@freebsd.org> X-Enigmail-Version: 1.3.5 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mail1.int.fusionio.com[10.101.1.21] X-Barracuda-Start-Time: 1329426816 X-Barracuda-URL: http://10.101.1.180:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at fusionio.com X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.88735 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- Cc: Kabaev , "threads@freebsd.org" , FreeBSD Stable , Andriy Gapon , Alexander Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2012 21:33:06 -0000 On 2012-02-16 22:06, Julian Elischer wrote: > On 2/16/12 9:34 AM, Andriy Gapon wrote: >> on 15/02/2012 23:41 Julian Elischer said the following: >>> The program fio (an IO test in ports) uses pthreads >>> >>> the following code (from fio-2.0.3, but its in earlier code too) >>> has suddenly started misbehaving. >>> >>> clock_gettime(CLOCK_REALTIME,&t); >>> t.tv_sec += seconds + 10; >>> >>> pthread_mutex_lock(&mutex->lock); >>> >>> while (!mutex->value&& !ret) { >>> mutex->waiters++; >>> ret = pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>> mutex->waiters--; >>> } >>> >>> if (!ret) { >>> mutex->value--; >>> pthread_mutex_unlock(&mutex->lock); >>> } >>> >>> >>> It turns out that 'ret' sometimes comes back instantly (on my machine) with a >>> value of 60 (ETIMEDOUT) >>> despite the fact that we set the timeout 10 seconds into the future. >>> >>> Has anyone else seen anything like this? >>> (and yes the condition variable attribute have been set to use the REALTIME clock). >> But why? >> >> Just a hypothesis that maybe there is some issue with time keeping on that system. >> How would that code work out for you with MONOTONIC? > > Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, and > they both had the same problem.. > i.e. random early returns with ETIMEDOUT. Yep indeed, using either MONOTONIC or REALTIME (and having set both with pthread_condattr_setclock()), no change in behaviour. -- Jens Axboe From owner-freebsd-threads@FreeBSD.ORG Thu Feb 16 21:58:09 2012 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 112F010657C3 for ; Thu, 16 Feb 2012 21:58:09 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id 7B0688FC13 for ; Thu, 16 Feb 2012 21:58:08 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 39C3A35B3F0; Thu, 16 Feb 2012 22:58:07 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id 1FF7A28468; Thu, 16 Feb 2012 22:58:07 +0100 (CET) Date: Thu, 16 Feb 2012 22:58:07 +0100 From: Jilles Tjoelker To: Christoph Splittgerber Message-ID: <20120216215806.GA65161@stack.nl> References: <813D616D-66E4-41BB-9D4C-BB736268B88A@sdata.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <813D616D-66E4-41BB-9D4C-BB736268B88A@sdata.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "freebsd-threads@freebsd.org" Subject: Re: How to map a thread-id to a thread-address X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2012 21:58:09 -0000 On Thu, Feb 16, 2012 at 03:10:47PM +0100, Christoph Splittgerber wrote: > Hallo, I hope his is the correct maling list for his: > I need a per thread overview of CPU time used. > I found out that a "ps -H -otdaddr,time" gives me exactly this. The > question now is, how to relate the thread-address printed by the ps > command to my threads. I did let the program print the thread-ids, and > the threads stack-address but non of them correlate to the address > printed by ps. > I would be grateful for any pointers. The keyword for the thread ID is lwp, apparently for compatibility with other OSes. Alternatively, you can modify your code to set thread names using pthread_set_name_np() and use the tdnam keyword. -- Jilles Tjoelker From owner-freebsd-threads@FreeBSD.ORG Thu Feb 16 22:55:47 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 892DE1065677; Thu, 16 Feb 2012 22:55:47 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 3E9498FC14; Thu, 16 Feb 2012 22:55:47 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1GMtj5P015873 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 16 Feb 2012 14:55:46 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3D89CD.9050309@freebsd.org> Date: Thu, 16 Feb 2012 14:57:17 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: Andriy Gapon References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> In-Reply-To: <4F3D6FDD.9050808@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@freebsd.org, FreeBSD Stable , Jens Axboe Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2012 22:55:47 -0000 On 2/16/12 1:06 PM, Julian Elischer wrote: > On 2/16/12 9:34 AM, Andriy Gapon wrote: >> on 15/02/2012 23:41 Julian Elischer said the following: >>> The program fio (an IO test in ports) uses pthreads >>> >>> the following code (from fio-2.0.3, but its in earlier code too) >>> has suddenly started misbehaving. >>> >>> clock_gettime(CLOCK_REALTIME,&t); >>> t.tv_sec += seconds + 10; >>> >>> pthread_mutex_lock(&mutex->lock); >>> >>> while (!mutex->value&& !ret) { >>> mutex->waiters++; >>> ret = >>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>> mutex->waiters--; >>> } >>> >>> if (!ret) { >>> mutex->value--; >>> pthread_mutex_unlock(&mutex->lock); >>> } >>> >>> >>> It turns out that 'ret' sometimes comes back instantly (on my >>> machine) with a >>> value of 60 (ETIMEDOUT) >>> despite the fact that we set the timeout 10 seconds into the future. >>> >>> Has anyone else seen anything like this? >>> (and yes the condition variable attribute have been set to use the >>> REALTIME clock). >> But why? >> >> Just a hypothesis that maybe there is some issue with time keeping >> on that system. >> How would that code work out for you with MONOTONIC? > > Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, > and they both had the same problem.. > i.e. random early returns with ETIMEDOUT. > > I think we will try move out machine forward to a newer -stable to > see if it resolves. Kan upgraded the machine today to today's 9.x branch tip and the problem still occurs. 8.x does not have this problem. I have not got a 9-RELEASE machine to test on.. so I can not tell if this came in with the burst of stuff that came in after the 9.x branch was unfrozen after the release of 9.0. > > > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to > "freebsd-threads-unsubscribe@freebsd.org" > From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 00:41:06 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 655FA106564A; Fri, 17 Feb 2012 00:41:06 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 1A9468FC14; Fri, 17 Feb 2012 00:41:05 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1H0f2aF016187 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 16 Feb 2012 16:41:04 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3DA27A.3090903@freebsd.org> Date: Thu, 16 Feb 2012 16:42:34 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: Andriy Gapon References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> In-Reply-To: <4F3D89CD.9050309@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@freebsd.org, FreeBSD Stable , David Xu Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 00:41:06 -0000 Adding David Xu for his thoughts since he reqrote the code in quesiton in revision 213098 On 2/16/12 2:57 PM, Julian Elischer wrote: > On 2/16/12 1:06 PM, Julian Elischer wrote: >> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>> on 15/02/2012 23:41 Julian Elischer said the following: >>>> The program fio (an IO test in ports) uses pthreads >>>> >>>> the following code (from fio-2.0.3, but its in earlier code too) >>>> has suddenly started misbehaving. >>>> >>>> clock_gettime(CLOCK_REALTIME,&t); >>>> t.tv_sec += seconds + 10; >>>> >>>> pthread_mutex_lock(&mutex->lock); >>>> >>>> while (!mutex->value&& !ret) { >>>> mutex->waiters++; >>>> ret = >>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>> mutex->waiters--; >>>> } >>>> >>>> if (!ret) { >>>> mutex->value--; >>>> pthread_mutex_unlock(&mutex->lock); >>>> } >>>> >>>> >>>> It turns out that 'ret' sometimes comes back instantly (on my >>>> machine) with a >>>> value of 60 (ETIMEDOUT) >>>> despite the fact that we set the timeout 10 seconds into the future. >>>> >>>> Has anyone else seen anything like this? >>>> (and yes the condition variable attribute have been set to use >>>> the REALTIME clock). >>> But why? >>> >>> Just a hypothesis that maybe there is some issue with time keeping >>> on that system. >>> How would that code work out for you with MONOTONIC? >> >> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, >> and they both had the same problem.. >> i.e. random early returns with ETIMEDOUT. >> >> I think we will try move out machine forward to a newer -stable to >> see if it resolves. > Kan upgraded the machine today to today's 9.x branch tip and the > problem still occurs. > 8.x does not have this problem. > > I have not got a 9-RELEASE machine to test on.. so I can not tell if > this came in with the burst of stuff > that came in after the 9.x branch was unfrozen after the release of > 9.0. > > From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 01:54:28 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CDE201065672; Fri, 17 Feb 2012 01:54:28 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 9B26F8FC08; Fri, 17 Feb 2012 01:54:28 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1H1sQtv016431 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 16 Feb 2012 17:54:27 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3DB3AE.5000109@freebsd.org> Date: Thu, 16 Feb 2012 17:55:58 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: Andriy Gapon References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> In-Reply-To: <4F3DA27A.3090903@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@freebsd.org, FreeBSD Stable , David Xu Subject: Re: pthread_cond_timedwait() broken in 9-stable? [possible answer] X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 01:54:28 -0000 kern.timecounter.tick: 1 kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) ACPI-fast(900) dummy(-1000000) kern.timecounter.hardware: ACPI-fast kern.timecounter.stepwarnings: 0 switching the machine from TSC_low to ACPI-fast fixes the problem. in 8.x it used to default to ACPI but I used to switch it to "TSC" to get better performance. I wonder why TSC-low is now bad to use.. maybe the TSCs are not as well sychronised as they were in 8.x? maybe the pthreads code didn't get the memo about changing timers? From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 01:56:48 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2835D106567C; Fri, 17 Feb 2012 01:56:48 +0000 (UTC) (envelope-from listlog2011@gmail.com) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id EDC238FC12; Fri, 17 Feb 2012 01:56:47 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1H1ujN2056748; Fri, 17 Feb 2012 01:56:46 GMT (envelope-from listlog2011@gmail.com) Message-ID: <4F3DB3DB.2060603@gmail.com> Date: Fri, 17 Feb 2012 09:56:43 +0800 From: David Xu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:10.0.1) Gecko/20120208 Thunderbird/10.0.1 MIME-Version: 1.0 To: Julian Elischer References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> In-Reply-To: <4F3DA27A.3090903@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@freebsd.org, FreeBSD Stable , David Xu , Andriy Gapon Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: davidxu@freebsd.org List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 01:56:48 -0000 On 2012/2/17 8:42, Julian Elischer wrote: > Adding David Xu for his thoughts since he reqrote the code in quesiton > in revision 213098 > > On 2/16/12 2:57 PM, Julian Elischer wrote: >> On 2/16/12 1:06 PM, Julian Elischer wrote: >>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>> The program fio (an IO test in ports) uses pthreads >>>>> >>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>> has suddenly started misbehaving. >>>>> >>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>> t.tv_sec += seconds + 10; >>>>> >>>>> pthread_mutex_lock(&mutex->lock); >>>>> >>>>> while (!mutex->value&& !ret) { >>>>> mutex->waiters++; >>>>> ret = >>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>> mutex->waiters--; >>>>> } >>>>> >>>>> if (!ret) { >>>>> mutex->value--; >>>>> pthread_mutex_unlock(&mutex->lock); >>>>> } >>>>> >>>>> >>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>> machine) with a >>>>> value of 60 (ETIMEDOUT) >>>>> despite the fact that we set the timeout 10 seconds into the future. >>>>> >>>>> Has anyone else seen anything like this? >>>>> (and yes the condition variable attribute have been set to use the >>>>> REALTIME clock). >>>> But why? >>>> >>>> Just a hypothesis that maybe there is some issue with time keeping >>>> on that system. >>>> How would that code work out for you with MONOTONIC? >>> >>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, >>> and they both had the same problem.. >>> i.e. random early returns with ETIMEDOUT. >>> >>> I think we will try move out machine forward to a newer -stable to >>> see if it resolves. >> Kan upgraded the machine today to today's 9.x branch tip and the >> problem still occurs. >> 8.x does not have this problem. >> >> I have not got a 9-RELEASE machine to test on.. so I can not tell if >> this came in with the burst of stuff >> that came in after the 9.x branch was unfrozen after the release of 9.0. >> >> > I am trying to reproduce the problem, do you have complete sample code to test ? Regards, David Xu From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 02:00:05 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2792B106566B; Fri, 17 Feb 2012 02:00:05 +0000 (UTC) (envelope-from listlog2011@gmail.com) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 04C7E8FC15; Fri, 17 Feb 2012 02:00:05 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1H2020l056811; Fri, 17 Feb 2012 02:00:03 GMT (envelope-from listlog2011@gmail.com) Message-ID: <4F3DB4A0.2080904@gmail.com> Date: Fri, 17 Feb 2012 10:00:00 +0800 From: David Xu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:10.0.1) Gecko/20120208 Thunderbird/10.0.1 MIME-Version: 1.0 To: Julian Elischer References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3AE.5000109@freebsd.org> In-Reply-To: <4F3DB3AE.5000109@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@freebsd.org, FreeBSD Stable , David Xu , Andriy Gapon Subject: Re: pthread_cond_timedwait() broken in 9-stable? [possible answer] X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: davidxu@freebsd.org List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 02:00:05 -0000 On 2012/2/17 9:55, Julian Elischer wrote: > > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) > ACPI-fast(900) dummy(-1000000) > kern.timecounter.hardware: ACPI-fast > kern.timecounter.stepwarnings: 0 > > switching the machine from TSC_low to ACPI-fast fixes the problem. > > in 8.x it used to default to ACPI > but I used to switch it to "TSC" to get better performance. > > I wonder why TSC-low is now bad to use.. > maybe the TSCs are not as well sychronised as they were in 8.x? > maybe the pthreads code didn't get the memo about changing timers? > pthread code does not know timer setting, same as other code in kernel. ;-) From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 02:17:37 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38850106564A; Fri, 17 Feb 2012 02:17:37 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 00FF88FC0C; Fri, 17 Feb 2012 02:17:36 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1H2HZGJ016535 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 16 Feb 2012 18:17:36 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3DB91A.2090806@freebsd.org> Date: Thu, 16 Feb 2012 18:19:06 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: davidxu@freebsd.org References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3DB.2060603@gmail.com> In-Reply-To: <4F3DB3DB.2060603@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@freebsd.org, David Xu , FreeBSD Stable , Andriy Gapon Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 02:17:37 -0000 On 2/16/12 5:56 PM, David Xu wrote: > On 2012/2/17 8:42, Julian Elischer wrote: >> Adding David Xu for his thoughts since he reqrote the code in >> quesiton in revision 213098 >> >> On 2/16/12 2:57 PM, Julian Elischer wrote: >>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>> The program fio (an IO test in ports) uses pthreads >>>>>> >>>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>>> has suddenly started misbehaving. >>>>>> >>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>> t.tv_sec += seconds + 10; >>>>>> >>>>>> pthread_mutex_lock(&mutex->lock); >>>>>> >>>>>> while (!mutex->value&& !ret) { >>>>>> mutex->waiters++; >>>>>> ret = >>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>> mutex->waiters--; >>>>>> } >>>>>> >>>>>> if (!ret) { >>>>>> mutex->value--; >>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>> } >>>>>> >>>>>> >>>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>>> machine) with a >>>>>> value of 60 (ETIMEDOUT) >>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>> future. >>>>>> >>>>>> Has anyone else seen anything like this? >>>>>> (and yes the condition variable attribute have been set to use >>>>>> the REALTIME clock). >>>>> But why? >>>>> >>>>> Just a hypothesis that maybe there is some issue with time >>>>> keeping on that system. >>>>> How would that code work out for you with MONOTONIC? >>>> >>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, >>>> and they both had the same problem.. >>>> i.e. random early returns with ETIMEDOUT. >>>> >>>> I think we will try move out machine forward to a newer -stable >>>> to see if it resolves. >>> Kan upgraded the machine today to today's 9.x branch tip and the >>> problem still occurs. >>> 8.x does not have this problem. >>> >>> I have not got a 9-RELEASE machine to test on.. so I can not tell >>> if this came in with the burst of stuff >>> that came in after the 9.x branch was unfrozen after the release >>> of 9.0. >>> >>> >> > I am trying to reproduce the problem, do you have complete sample > code to test ? I'm still looking the exact set but on my machine (4 cpus) the program from ports sysutils/fio exhibits the problem when used with kern.timecounter.hardware=TSC-low and with the following config file: pu05 # cat config.fio [global] #clocksource=cpu direct=1 rw=randread bs=4096 fill_device=1 numjobs=16 iodepth=16 #ioengine=posixaio #ioengine=psync ioengine=psync group_reporting norandommap time_based runtime=60000 randrepeat=0 [file1] filename=/dev/ada0 pu05 # pu05 # fio config.fio fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 ... file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 fio 2.0.3 Starting 15 threads and 1 process fio: job startup hung? exiting. fio: 5 jobs failed to start Segmentation fault (core dumped) pu05# The reason 5 jobs failed to start is because the parent timed out on them immediately. It didn't time out on 10 of them apparently. if I set the timer to ACPI-fast it works as expected.. > > Regards, > David Xu > > From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 02:42:29 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C09241065673; Fri, 17 Feb 2012 02:42:29 +0000 (UTC) (envelope-from listlog2011@gmail.com) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A97D28FC12; Fri, 17 Feb 2012 02:42:29 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1H2gQDm002878; Fri, 17 Feb 2012 02:42:27 GMT (envelope-from listlog2011@gmail.com) Message-ID: <4F3DBE90.5030305@gmail.com> Date: Fri, 17 Feb 2012 10:42:24 +0800 From: David Xu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:10.0.1) Gecko/20120208 Thunderbird/10.0.1 MIME-Version: 1.0 To: Julian Elischer References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3DB.2060603@gmail.com> <4F3DB91A.2090806@freebsd.org> In-Reply-To: <4F3DB91A.2090806@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@freebsd.org, FreeBSD Stable , davidxu@freebsd.org, Andriy Gapon Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: davidxu@freebsd.org List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 02:42:29 -0000 On 2012/2/17 10:19, Julian Elischer wrote: > On 2/16/12 5:56 PM, David Xu wrote: >> On 2012/2/17 8:42, Julian Elischer wrote: >>> Adding David Xu for his thoughts since he reqrote the code in >>> quesiton in revision 213098 >>> >>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>> >>>>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>>>> has suddenly started misbehaving. >>>>>>> >>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>> t.tv_sec += seconds + 10; >>>>>>> >>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>> >>>>>>> while (!mutex->value&& !ret) { >>>>>>> mutex->waiters++; >>>>>>> ret = >>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>> mutex->waiters--; >>>>>>> } >>>>>>> >>>>>>> if (!ret) { >>>>>>> mutex->value--; >>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>>>> machine) with a >>>>>>> value of 60 (ETIMEDOUT) >>>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>>> future. >>>>>>> >>>>>>> Has anyone else seen anything like this? >>>>>>> (and yes the condition variable attribute have been set to use >>>>>>> the REALTIME clock). >>>>>> But why? >>>>>> >>>>>> Just a hypothesis that maybe there is some issue with time >>>>>> keeping on that system. >>>>>> How would that code work out for you with MONOTONIC? >>>>> >>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, >>>>> and they both had the same problem.. >>>>> i.e. random early returns with ETIMEDOUT. >>>>> >>>>> I think we will try move out machine forward to a newer -stable to >>>>> see if it resolves. >>>> Kan upgraded the machine today to today's 9.x branch tip and the >>>> problem still occurs. >>>> 8.x does not have this problem. >>>> >>>> I have not got a 9-RELEASE machine to test on.. so I can not tell >>>> if this came in with the burst of stuff >>>> that came in after the 9.x branch was unfrozen after the release of >>>> 9.0. >>>> >>>> >>> >> I am trying to reproduce the problem, do you have complete sample >> code to test ? > > I'm still looking the exact set > but on my machine (4 cpus) the program from ports sysutils/fio > exhibits the problem when used with > kern.timecounter.hardware=TSC-low and with the following config file: > > pu05 # cat config.fio > > [global] > #clocksource=cpu > direct=1 > rw=randread > bs=4096 > fill_device=1 > numjobs=16 > iodepth=16 > #ioengine=posixaio > #ioengine=psync > ioengine=psync > group_reporting > norandommap > time_based > runtime=60000 > randrepeat=0 > > [file1] > filename=/dev/ada0 > > pu05 # > pu05 # fio config.fio > fio: this platform does not support process shared mutexes, forcing > use of threads. Use the 'thread' option to get rid of this warning. > file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 > ... > file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 > fio 2.0.3 > Starting 15 threads and 1 process > fio: job startup hung? exiting. > fio: 5 jobs failed to start > Segmentation fault (core dumped) > pu05# > > > The reason 5 jobs failed to start is because the parent timed out on > them immediately. > It didn't time out on 10 of them apparently. > > > if I set the timer to ACPI-fast it works as expected.. maybe following code can check to see if TSC-LOW works by let the thread run on each cpu. gettimeofday(&prev, NULL); int cpu = 0; for (;;) { cpuset_t set; cpu = ++cpu % 4; CPU_ZERO(&set); CPU_SET(cpu, &set); pthread_setaffinity_np(pthread_self(), sizeof(set), &set); gettimeofday(&cur, NULL); if ( timercmp(&prev, &cur, >=)) { abort(); } } From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 02:44:59 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 713B11065673; Fri, 17 Feb 2012 02:44:59 +0000 (UTC) (envelope-from listlog2011@gmail.com) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5A4488FC15; Fri, 17 Feb 2012 02:44:59 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1H2iu4L002945; Fri, 17 Feb 2012 02:44:56 GMT (envelope-from listlog2011@gmail.com) Message-ID: <4F3DBF26.2000306@gmail.com> Date: Fri, 17 Feb 2012 10:44:54 +0800 From: David Xu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:10.0.1) Gecko/20120208 Thunderbird/10.0.1 MIME-Version: 1.0 To: davidxu@freebsd.org References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3DB.2060603@gmail.com> <4F3DB91A.2090806@freebsd.org> <4F3DBE90.5030305@gmail.com> In-Reply-To: <4F3DBE90.5030305@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@freebsd.org, FreeBSD Stable , Andriy Gapon Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: davidxu@freebsd.org List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 02:44:59 -0000 On 2012/2/17 10:42, David Xu wrote: > aybe following code can check to see if TSC-LOW works by let the > thread run > on each cpu. > > refresh: gettimeofday(&prev, NULL); int cpu = 0; for (;;) { cpuset_t set; cpu = ++cpu % 4; CPU_ZERO(&set); CPU_SET(cpu, &set); pthread_setaffinity_np(pthread_self(), sizeof(set), &set); gettimeofday(&cur, NULL); if ( timercmp(&prev, &cur, >)) { abort(); } prev = cur; } From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 07:40:10 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 273DC106564A; Fri, 17 Feb 2012 07:40:02 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 3A3E48FC08; Fri, 17 Feb 2012 07:40:01 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1H7e0uw017537 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 16 Feb 2012 23:40:00 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3E04AB.2000508@freebsd.org> Date: Thu, 16 Feb 2012 23:41:31 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: davidxu@freebsd.org References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3DB.2060603@gmail.com> <4F3DB91A.2090806@freebsd.org> <4F3DBE90.5030305@gmail.com> In-Reply-To: <4F3DBE90.5030305@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: David Xu , Alexander Kabaev , Andriy Gapon , threads@freebsd.org, FreeBSD Stable , Jung-uk Kim Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 07:40:10 -0000 adding jkim as he seems to be the last person working with TSC. On 2/16/12 6:42 PM, David Xu wrote: > On 2012/2/17 10:19, Julian Elischer wrote: >> On 2/16/12 5:56 PM, David Xu wrote: >>> On 2012/2/17 8:42, Julian Elischer wrote: >>>> Adding David Xu for his thoughts since he reqrote the code in >>>> quesiton in revision 213098 >>>> >>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>> >>>>>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>>>>> has suddenly started misbehaving. >>>>>>>> >>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>> t.tv_sec += seconds + 10; >>>>>>>> >>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>> >>>>>>>> while (!mutex->value&& !ret) { >>>>>>>> mutex->waiters++; >>>>>>>> ret = >>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>> mutex->waiters--; >>>>>>>> } >>>>>>>> >>>>>>>> if (!ret) { >>>>>>>> mutex->value--; >>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>>>>> machine) with a >>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>>>> future. >>>>>>>> >>>>>>>> Has anyone else seen anything like this? >>>>>>>> (and yes the condition variable attribute have been set to >>>>>>>> use the REALTIME clock). >>>>>>> But why? >>>>>>> >>>>>>> Just a hypothesis that maybe there is some issue with time >>>>>>> keeping on that system. >>>>>>> How would that code work out for you with MONOTONIC? >>>>>> >>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>> i.e. random early returns with ETIMEDOUT. >>>>>> >>>>>> I think we will try move out machine forward to a newer -stable >>>>>> to see if it resolves. >>>>> Kan upgraded the machine today to today's 9.x branch tip and the >>>>> problem still occurs. >>>>> 8.x does not have this problem. >>>>> >>>>> I have not got a 9-RELEASE machine to test on.. so I can not >>>>> tell if this came in with the burst of stuff >>>>> that came in after the 9.x branch was unfrozen after the release >>>>> of 9.0. >>>>> >>>>> >>>> >>> I am trying to reproduce the problem, do you have complete sample >>> code to test ? >> >> I'm still looking the exact set >> but on my machine (4 cpus) the program from ports sysutils/fio >> exhibits the problem when used with >> kern.timecounter.hardware=TSC-low and with the following config file: >> >> pu05 # cat config.fio >> >> [global] >> #clocksource=cpu >> direct=1 >> rw=randread >> bs=4096 >> fill_device=1 >> numjobs=16 >> iodepth=16 >> #ioengine=posixaio >> #ioengine=psync >> ioengine=psync >> group_reporting >> norandommap >> time_based >> runtime=60000 >> randrepeat=0 >> >> [file1] >> filename=/dev/ada0 >> >> pu05 # >> pu05 # fio config.fio >> fio: this platform does not support process shared mutexes, forcing >> use of threads. Use the 'thread' option to get rid of this warning. >> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >> ... >> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >> fio 2.0.3 >> Starting 15 threads and 1 process >> fio: job startup hung? exiting. >> fio: 5 jobs failed to start >> Segmentation fault (core dumped) >> pu05# >> >> >> The reason 5 jobs failed to start is because the parent timed out >> on them immediately. >> It didn't time out on 10 of them apparently. >> >> >> if I set the timer to ACPI-fast it works as expected.. > maybe following code can check to see if TSC-LOW works by let the > thread run > on each cpu. > > gettimeofday(&prev, NULL); > int cpu = 0; > for (;;) { > cpuset_t set; > cpu = ++cpu % 4; > CPU_ZERO(&set); > CPU_SET(cpu, &set); > pthread_setaffinity_np(pthread_self(), sizeof(set), &set); > gettimeofday(&cur, NULL); > if ( timercmp(&prev, &cur, >=)) { > abort(); > } > } > > From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 08:05:11 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C413106566B; Fri, 17 Feb 2012 08:05:11 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 329F68FC0C; Fri, 17 Feb 2012 08:05:10 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1H858gq017634 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 17 Feb 2012 00:05:09 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3E0A90.6080400@freebsd.org> Date: Fri, 17 Feb 2012 00:06:40 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: davidxu@freebsd.org References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3DB.2060603@gmail.com> <4F3DB91A.2090806@freebsd.org> <4F3DBE90.5030305@gmail.com> <4F3E04AB.2000508@freebsd.org> In-Reply-To: <4F3E04AB.2000508@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: David Xu , Alexander Kabaev , Andriy Gapon , threads@freebsd.org, FreeBSD Stable , Jung-uk Kim Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 08:05:11 -0000 On 2/16/12 11:41 PM, Julian Elischer wrote: > adding jkim as he seems to be the last person working with TSC. > > > On 2/16/12 6:42 PM, David Xu wrote: >> On 2012/2/17 10:19, Julian Elischer wrote: >>> On 2/16/12 5:56 PM, David Xu wrote: >>>> On 2012/2/17 8:42, Julian Elischer wrote: >>>>> Adding David Xu for his thoughts since he reqrote the code in >>>>> quesiton in revision 213098 >>>>> >>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>>> >>>>>>>>> the following code (from fio-2.0.3, but its in earlier code >>>>>>>>> too) >>>>>>>>> has suddenly started misbehaving. >>>>>>>>> >>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>>> t.tv_sec += seconds + 10; >>>>>>>>> >>>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>>> >>>>>>>>> while (!mutex->value&& !ret) { >>>>>>>>> mutex->waiters++; >>>>>>>>> ret = >>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>>> mutex->waiters--; >>>>>>>>> } >>>>>>>>> >>>>>>>>> if (!ret) { >>>>>>>>> mutex->value--; >>>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on >>>>>>>>> my machine) with a >>>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>>>>> future. >>>>>>>>> >>>>>>>>> Has anyone else seen anything like this? >>>>>>>>> (and yes the condition variable attribute have been set to >>>>>>>>> use the REALTIME clock). >>>>>>>> But why? >>>>>>>> >>>>>>>> Just a hypothesis that maybe there is some issue with time >>>>>>>> keeping on that system. >>>>>>>> How would that code work out for you with MONOTONIC? >>>>>>> >>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>>> i.e. random early returns with ETIMEDOUT. >>>>>>> >>>>>>> I think we will try move out machine forward to a newer >>>>>>> -stable to see if it resolves. >>>>>> Kan upgraded the machine today to today's 9.x branch tip and >>>>>> the problem still occurs. >>>>>> 8.x does not have this problem. >>>>>> >>>>>> I have not got a 9-RELEASE machine to test on.. so I can not >>>>>> tell if this came in with the burst of stuff >>>>>> that came in after the 9.x branch was unfrozen after the >>>>>> release of 9.0. >>>>>> >>>>>> >>>>> >>>> I am trying to reproduce the problem, do you have complete >>>> sample code to test ? >>> >>> I'm still looking the exact set >>> but on my machine (4 cpus) the program from ports sysutils/fio >>> exhibits the problem when used with >>> kern.timecounter.hardware=TSC-low and with the following config file: >>> >>> pu05 # cat config.fio >>> >>> [global] >>> #clocksource=cpu >>> direct=1 >>> rw=randread >>> bs=4096 >>> fill_device=1 >>> numjobs=16 >>> iodepth=16 >>> #ioengine=posixaio >>> #ioengine=psync >>> ioengine=psync >>> group_reporting >>> norandommap >>> time_based >>> runtime=60000 >>> randrepeat=0 >>> >>> [file1] >>> filename=/dev/ada0 >>> >>> pu05 # >>> pu05 # fio config.fio >>> fio: this platform does not support process shared mutexes, >>> forcing use of threads. Use the 'thread' option to get rid of this >>> warning. >>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >>> ... >>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >>> fio 2.0.3 >>> Starting 15 threads and 1 process >>> fio: job startup hung? exiting. >>> fio: 5 jobs failed to start >>> Segmentation fault (core dumped) >>> pu05# >>> >>> >>> The reason 5 jobs failed to start is because the parent timed out >>> on them immediately. >>> It didn't time out on 10 of them apparently. >>> >>> >>> if I set the timer to ACPI-fast it works as expected.. >> maybe following code can check to see if TSC-LOW works by let the >> thread run >> on each cpu. >> >> gettimeofday(&prev, NULL); >> int cpu = 0; >> for (;;) { >> cpuset_t set; >> cpu = ++cpu % 4; >> CPU_ZERO(&set); >> CPU_SET(cpu, &set); >> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); >> gettimeofday(&cur, NULL); >> if ( timercmp(&prev, &cur, >=)) { >> abort(); >> } >> } >> >> pu05# sysctl kern.timecounter.hardware=TSC-low kern.timecounter.hardware: ACPI-fast -> TSC-low pu05# ./test ^C pu05# cat test.c #include #include #include #include #include main() { int cpu = 0; struct timeval prev, cur; gettimeofday(&prev, NULL); for (;;) { cpuset_t set; cpu = ++cpu % 4; CPU_ZERO(&set); CPU_SET(cpu, &set); pthread_setaffinity_np(pthread_self(), sizeof(set), &set); gettimeofday(&cur, NULL); if ( timercmp(&prev, &cur, >)) { abort(); } prev = cur; } } pu05# ./test minutes pass....... ^C pu05# so it looks as if the TSC is working ok.. I'm just going to check that the program is actually moving CPU... yes it is moving around but I can't tell at what speed. (according to top). so we are still left with a question of "where is the problem?" kernel TSC driver? generic gettimeofday() code? pthreads cond code? the application? From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 09:37:43 2012 Return-Path: Delivered-To: threads@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4AF7106564A; Fri, 17 Feb 2012 09:37:43 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id F2D428FC14; Fri, 17 Feb 2012 09:37:41 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA03355; Fri, 17 Feb 2012 11:37:39 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1RyKG3-0007Xd-IM; Fri, 17 Feb 2012 11:37:39 +0200 Message-ID: <4F3E1FC5.2020103@FreeBSD.org> Date: Fri, 17 Feb 2012 11:37:09 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0) Gecko/20120202 Thunderbird/10.0 MIME-Version: 1.0 To: Julian Elischer References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3AE.5000109@freebsd.org> In-Reply-To: <4F3DB3AE.5000109@freebsd.org> X-Enigmail-Version: 1.3.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , threads@FreeBSD.org, FreeBSD Stable , David Xu , Jung-uk Kim Subject: Re: pthread_cond_timedwait() broken in 9-stable? [possible answer] X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 09:37:43 -0000 on 17/02/2012 03:55 Julian Elischer said the following: > > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) ACPI-fast(900) > dummy(-1000000) > kern.timecounter.hardware: ACPI-fast > kern.timecounter.stepwarnings: 0 > > switching the machine from TSC_low to ACPI-fast fixes the problem. > > in 8.x it used to default to ACPI > but I used to switch it to "TSC" to get better performance. > > I wonder why TSC-low is now bad to use.. > maybe the TSCs are not as well sychronised as they were in 8.x? > maybe the pthreads code didn't get the memo about changing timers? More useful information that you can provide: - C-states configuration - CPU identification I see that you've already contacted jkim, that's useful too. -- Andriy Gapon From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 11:28:52 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C98B1065670; Fri, 17 Feb 2012 11:28:52 +0000 (UTC) (envelope-from listlog2011@gmail.com) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0E56D8FC08; Fri, 17 Feb 2012 11:28:52 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1HBSn4v027239; Fri, 17 Feb 2012 11:28:49 GMT (envelope-from listlog2011@gmail.com) Message-ID: <4F3E39EF.3030209@gmail.com> Date: Fri, 17 Feb 2012 19:28:47 +0800 From: David Xu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Julian Elischer References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3DB.2060603@gmail.com> <4F3DB91A.2090806@freebsd.org> <4F3DBE90.5030305@gmail.com> <4F3E04AB.2000508@freebsd.org> <4F3E0A90.6080400@freebsd.org> In-Reply-To: <4F3E0A90.6080400@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Stable , Alexander Kabaev , Andriy Gapon , davidxu@freebsd.org, threads@freebsd.org, Jung-uk Kim Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: davidxu@freebsd.org List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 11:28:52 -0000 On 2012/2/17 16:06, Julian Elischer wrote: > On 2/16/12 11:41 PM, Julian Elischer wrote: >> adding jkim as he seems to be the last person working with TSC. >> >> >> On 2/16/12 6:42 PM, David Xu wrote: >>> On 2012/2/17 10:19, Julian Elischer wrote: >>>> On 2/16/12 5:56 PM, David Xu wrote: >>>>> On 2012/2/17 8:42, Julian Elischer wrote: >>>>>> Adding David Xu for his thoughts since he reqrote the code in >>>>>> quesiton in revision 213098 >>>>>> >>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>>>> >>>>>>>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>>>>>>> has suddenly started misbehaving. >>>>>>>>>> >>>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>>>> t.tv_sec += seconds + 10; >>>>>>>>>> >>>>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>>>> >>>>>>>>>> while (!mutex->value&& !ret) { >>>>>>>>>> mutex->waiters++; >>>>>>>>>> ret = >>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>>>> mutex->waiters--; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if (!ret) { >>>>>>>>>> mutex->value--; >>>>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>>>>>>> machine) with a >>>>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>>>>>> future. >>>>>>>>>> >>>>>>>>>> Has anyone else seen anything like this? >>>>>>>>>> (and yes the condition variable attribute have been set to >>>>>>>>>> use the REALTIME clock). >>>>>>>>> But why? >>>>>>>>> >>>>>>>>> Just a hypothesis that maybe there is some issue with time >>>>>>>>> keeping on that system. >>>>>>>>> How would that code work out for you with MONOTONIC? >>>>>>>> >>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>>>> i.e. random early returns with ETIMEDOUT. >>>>>>>> >>>>>>>> I think we will try move out machine forward to a newer -stable >>>>>>>> to see if it resolves. >>>>>>> Kan upgraded the machine today to today's 9.x branch tip and the >>>>>>> problem still occurs. >>>>>>> 8.x does not have this problem. >>>>>>> >>>>>>> I have not got a 9-RELEASE machine to test on.. so I can not >>>>>>> tell if this came in with the burst of stuff >>>>>>> that came in after the 9.x branch was unfrozen after the release >>>>>>> of 9.0. >>>>>>> >>>>>>> >>>>>> >>>>> I am trying to reproduce the problem, do you have complete sample >>>>> code to test ? >>>> >>>> I'm still looking the exact set >>>> but on my machine (4 cpus) the program from ports sysutils/fio >>>> exhibits the problem when used with >>>> kern.timecounter.hardware=TSC-low and with the following config file: >>>> >>>> pu05 # cat config.fio >>>> >>>> [global] >>>> #clocksource=cpu >>>> direct=1 >>>> rw=randread >>>> bs=4096 >>>> fill_device=1 >>>> numjobs=16 >>>> iodepth=16 >>>> #ioengine=posixaio >>>> #ioengine=psync >>>> ioengine=psync >>>> group_reporting >>>> norandommap >>>> time_based >>>> runtime=60000 >>>> randrepeat=0 >>>> >>>> [file1] >>>> filename=/dev/ada0 >>>> >>>> pu05 # >>>> pu05 # fio config.fio >>>> fio: this platform does not support process shared mutexes, forcing >>>> use of threads. Use the 'thread' option to get rid of this warning. >>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >>>> ... >>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >>>> fio 2.0.3 >>>> Starting 15 threads and 1 process >>>> fio: job startup hung? exiting. >>>> fio: 5 jobs failed to start >>>> Segmentation fault (core dumped) >>>> pu05# >>>> >>>> >>>> The reason 5 jobs failed to start is because the parent timed out >>>> on them immediately. >>>> It didn't time out on 10 of them apparently. >>>> >>>> >>>> if I set the timer to ACPI-fast it works as expected.. >>> maybe following code can check to see if TSC-LOW works by let the >>> thread run >>> on each cpu. >>> >>> gettimeofday(&prev, NULL); >>> int cpu = 0; >>> for (;;) { >>> cpuset_t set; >>> cpu = ++cpu % 4; >>> CPU_ZERO(&set); >>> CPU_SET(cpu, &set); >>> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); >>> gettimeofday(&cur, NULL); >>> if ( timercmp(&prev, &cur, >=)) { >>> abort(); >>> } >>> } >>> >>> > > pu05# sysctl kern.timecounter.hardware=TSC-low > kern.timecounter.hardware: ACPI-fast -> TSC-low > pu05# ./test > ^C > pu05# cat test.c > > #include > #include > #include > #include > > #include > > main() > { > int cpu = 0; > struct timeval prev, cur; > > gettimeofday(&prev, NULL); > for (;;) { > cpuset_t set; > cpu = ++cpu % 4; > CPU_ZERO(&set); > CPU_SET(cpu, &set); > pthread_setaffinity_np(pthread_self(), sizeof(set), &set); > gettimeofday(&cur, NULL); > if ( timercmp(&prev, &cur, >)) { > abort(); > } > prev = cur; > } > } > > pu05# ./test > > minutes pass....... > > ^C > pu05# > > so it looks as if the TSC is working ok.. > I'm just going to check that the program is actually moving CPU... > yes it is moving around but I can't tell at what speed. (according to > top). > > so we are still left with a question of "where is the problem?" > > kernel TSC driver? > generic gettimeofday() code? > pthreads cond code? > the application? > > I am running the fio test on my notebook which is using TSC-low, it is on 9.0-RC3, I can not reproduce the problem for minutes, then I interrupt it with ctrl-c: http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 16:17:55 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from [127.0.0.1] (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by hub.freebsd.org (Postfix) with ESMTP id ADFB41065674; Fri, 17 Feb 2012 16:17:54 +0000 (UTC) (envelope-from jkim@FreeBSD.org) From: Jung-uk Kim To: "davidxu@freebsd.org" Date: Fri, 17 Feb 2012 11:17:41 -0500 User-Agent: KMail/1.6.2 References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3E0A90.6080400@freebsd.org> <4F3E39EF.3030209@gmail.com> In-Reply-To: <4F3E39EF.3030209@gmail.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201202171117.46626.jkim@FreeBSD.org> Cc: Alexander Kabaev , "threads@freebsd.org" , FreeBSD Stable , Andriy Gapon Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 16:17:55 -0000 On Friday 17 February 2012 06:28 am, David Xu wrote: > On 2012/2/17 16:06, Julian Elischer wrote: > > On 2/16/12 11:41 PM, Julian Elischer wrote: > >> adding jkim as he seems to be the last person working with TSC. > >> > >> On 2/16/12 6:42 PM, David Xu wrote: > >>> On 2012/2/17 10:19, Julian Elischer wrote: > >>>> On 2/16/12 5:56 PM, David Xu wrote: > >>>>> On 2012/2/17 8:42, Julian Elischer wrote: > >>>>>> Adding David Xu for his thoughts since he reqrote the code > >>>>>> in quesiton in revision 213098 > >>>>>> > >>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: > >>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: > >>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: > >>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: > >>>>>>>>>> The program fio (an IO test in ports) uses pthreads > >>>>>>>>>> > >>>>>>>>>> the following code (from fio-2.0.3, but its in earlier > >>>>>>>>>> code too) has suddenly started misbehaving. > >>>>>>>>>> > >>>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); > >>>>>>>>>> t.tv_sec += seconds + 10; > >>>>>>>>>> > >>>>>>>>>> pthread_mutex_lock(&mutex->lock); > >>>>>>>>>> > >>>>>>>>>> while (!mutex->value&& !ret) { > >>>>>>>>>> mutex->waiters++; > >>>>>>>>>> ret = > >>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); > >>>>>>>>>> mutex->waiters--; > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> if (!ret) { > >>>>>>>>>> mutex->value--; > >>>>>>>>>> pthread_mutex_unlock(&mutex->lock); > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> It turns out that 'ret' sometimes comes back instantly > >>>>>>>>>> (on my machine) with a > >>>>>>>>>> value of 60 (ETIMEDOUT) > >>>>>>>>>> despite the fact that we set the timeout 10 seconds into > >>>>>>>>>> the future. > >>>>>>>>>> > >>>>>>>>>> Has anyone else seen anything like this? > >>>>>>>>>> (and yes the condition variable attribute have been set > >>>>>>>>>> to use the REALTIME clock). > >>>>>>>>> > >>>>>>>>> But why? > >>>>>>>>> > >>>>>>>>> Just a hypothesis that maybe there is some issue with > >>>>>>>>> time keeping on that system. > >>>>>>>>> How would that code work out for you with MONOTONIC? > >>>>>>>> > >>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and > >>>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. > >>>>>>>> i.e. random early returns with ETIMEDOUT. > >>>>>>>> > >>>>>>>> I think we will try move out machine forward to a newer > >>>>>>>> -stable to see if it resolves. > >>>>>>> > >>>>>>> Kan upgraded the machine today to today's 9.x branch tip > >>>>>>> and the problem still occurs. > >>>>>>> 8.x does not have this problem. > >>>>>>> > >>>>>>> I have not got a 9-RELEASE machine to test on.. so I can > >>>>>>> not tell if this came in with the burst of stuff > >>>>>>> that came in after the 9.x branch was unfrozen after the > >>>>>>> release of 9.0. > >>>>> > >>>>> I am trying to reproduce the problem, do you have complete > >>>>> sample code to test ? > >>>> > >>>> I'm still looking the exact set > >>>> but on my machine (4 cpus) the program from ports sysutils/fio > >>>> exhibits the problem when used with > >>>> kern.timecounter.hardware=TSC-low and with the following > >>>> config file: > >>>> > >>>> pu05 # cat config.fio > >>>> > >>>> [global] > >>>> #clocksource=cpu > >>>> direct=1 > >>>> rw=randread > >>>> bs=4096 > >>>> fill_device=1 > >>>> numjobs=16 > >>>> iodepth=16 > >>>> #ioengine=posixaio > >>>> #ioengine=psync > >>>> ioengine=psync > >>>> group_reporting > >>>> norandommap > >>>> time_based > >>>> runtime=60000 > >>>> randrepeat=0 > >>>> > >>>> [file1] > >>>> filename=/dev/ada0 > >>>> > >>>> pu05 # > >>>> pu05 # fio config.fio > >>>> fio: this platform does not support process shared mutexes, > >>>> forcing use of threads. Use the 'thread' option to get rid of > >>>> this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K, > >>>> ioengine=psync, iodepth=16 ... > >>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, > >>>> iodepth=16 fio 2.0.3 > >>>> Starting 15 threads and 1 process > >>>> fio: job startup hung? exiting. > >>>> fio: 5 jobs failed to start > >>>> Segmentation fault (core dumped) > >>>> pu05# > >>>> > >>>> > >>>> The reason 5 jobs failed to start is because the parent timed > >>>> out on them immediately. > >>>> It didn't time out on 10 of them apparently. > >>>> > >>>> > >>>> if I set the timer to ACPI-fast it works as expected.. > >>> > >>> maybe following code can check to see if TSC-LOW works by let > >>> the thread run > >>> on each cpu. > >>> > >>> gettimeofday(&prev, NULL); > >>> int cpu = 0; > >>> for (;;) { > >>> cpuset_t set; > >>> cpu = ++cpu % 4; > >>> CPU_ZERO(&set); > >>> CPU_SET(cpu, &set); > >>> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); > >>> gettimeofday(&cur, NULL); > >>> if ( timercmp(&prev, &cur, >=)) { > >>> abort(); > >>> } > >>> } > > > > pu05# sysctl kern.timecounter.hardware=TSC-low > > kern.timecounter.hardware: ACPI-fast -> TSC-low > > pu05# ./test > > ^C > > pu05# cat test.c > > > > #include > > #include > > #include > > #include > > > > #include > > > > main() > > { > > int cpu = 0; > > struct timeval prev, cur; > > > > gettimeofday(&prev, NULL); > > for (;;) { > > cpuset_t set; > > cpu = ++cpu % 4; > > CPU_ZERO(&set); > > CPU_SET(cpu, &set); > > pthread_setaffinity_np(pthread_self(), sizeof(set), > > &set); gettimeofday(&cur, NULL); > > if ( timercmp(&prev, &cur, >)) { > > abort(); > > } > > prev = cur; > > } > > } > > > > pu05# ./test > > > > minutes pass....... > > > > ^C > > pu05# > > > > so it looks as if the TSC is working ok.. > > I'm just going to check that the program is actually moving > > CPU... yes it is moving around but I can't tell at what speed. > > (according to top). > > > > so we are still left with a question of "where is the problem?" > > > > kernel TSC driver? > > generic gettimeofday() code? > > pthreads cond code? > > the application? > > I am running the fio test on my notebook which is using TSC-low, > it is on 9.0-RC3, I can not reproduce the problem for > minutes, then I interrupt it with ctrl-c: > > http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt > http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt > http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt Your CPU is single-package, dual-core, and SMT-enabled. All cores should be in perfect sync. Jung-uk Kim From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 16:39:31 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from [127.0.0.1] (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by hub.freebsd.org (Postfix) with ESMTP id 5457F106566B; Fri, 17 Feb 2012 16:39:30 +0000 (UTC) (envelope-from jkim@FreeBSD.org) From: Jung-uk Kim To: freebsd-stable@FreeBSD.org Date: Fri, 17 Feb 2012 11:39:21 -0500 User-Agent: KMail/1.6.2 References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3AE.5000109@freebsd.org> In-Reply-To: <4F3DB3AE.5000109@freebsd.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201202171139.23610.jkim@FreeBSD.org> Cc: Alexander Kabaev , "threads@freebsd.org" , David Xu , Andriy Gapon Subject: Re: pthread_cond_timedwait() broken in 9-stable? [possible answer] X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 16:39:31 -0000 On Thursday 16 February 2012 08:55 pm, Julian Elischer wrote: > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) > ACPI-fast(900) dummy(-1000000) > kern.timecounter.hardware: ACPI-fast > kern.timecounter.stepwarnings: 0 > > switching the machine from TSC_low to ACPI-fast fixes the problem. > > in 8.x it used to default to ACPI > but I used to switch it to "TSC" to get better performance. > > I wonder why TSC-low is now bad to use.. > maybe the TSCs are not as well sychronised as they were in 8.x? Can you please show us verbose dmesg output? FYI, TSC and TSC-low are not very different. TSC-low is just lower resolution version of TSC for SMP. Only difference is, we have automated your timecounter choice, i.e., if TSCs seem reasonably well-synchronized, select it by default but give lower resolution. In other words, if your TSC timecounter was never going backwards previously, TSC-low timecounter won't, guaranteed. So, the root cause should be somewhere else. Jung-uk Kim From owner-freebsd-threads@FreeBSD.ORG Fri Feb 17 18:04:51 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BAAC1106567A; Fri, 17 Feb 2012 18:04:51 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 8A4AC8FC21; Fri, 17 Feb 2012 18:04:51 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1HI4na8027215 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 17 Feb 2012 10:04:50 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3E971E.6070204@freebsd.org> Date: Fri, 17 Feb 2012 10:06:22 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: davidxu@freebsd.org References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3D3E2D.9090100@FreeBSD.org> <4F3D6FDD.9050808@freebsd.org> <4F3D89CD.9050309@freebsd.org> <4F3DA27A.3090903@freebsd.org> <4F3DB3DB.2060603@gmail.com> <4F3DB91A.2090806@freebsd.org> <4F3DBE90.5030305@gmail.com> <4F3E04AB.2000508@freebsd.org> <4F3E0A90.6080400@freebsd.org> <4F3E39EF.3030209@gmail.com> In-Reply-To: <4F3E39EF.3030209@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: David Xu , Alexander Kabaev , Andriy Gapon , threads@freebsd.org, FreeBSD Stable , Jung-uk Kim Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Feb 2012 18:04:51 -0000 On 2/17/12 3:28 AM, David Xu wrote: > On 2012/2/17 16:06, Julian Elischer wrote: >> On 2/16/12 11:41 PM, Julian Elischer wrote: >>> adding jkim as he seems to be the last person working with TSC. >>> >>> >>> On 2/16/12 6:42 PM, David Xu wrote: >>>> On 2012/2/17 10:19, Julian Elischer wrote: >>>>> On 2/16/12 5:56 PM, David Xu wrote: >>>>>> On 2012/2/17 8:42, Julian Elischer wrote: >>>>>>> Adding David Xu for his thoughts since he reqrote the code in >>>>>>> quesiton in revision 213098 >>>>>>> >>>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>>>>> >>>>>>>>>>> the following code (from fio-2.0.3, but its in earlier >>>>>>>>>>> code too) >>>>>>>>>>> has suddenly started misbehaving. >>>>>>>>>>> >>>>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>>>>> t.tv_sec += seconds + 10; >>>>>>>>>>> >>>>>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>>>>> >>>>>>>>>>> while (!mutex->value&& !ret) { >>>>>>>>>>> mutex->waiters++; >>>>>>>>>>> ret = >>>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>>>>> mutex->waiters--; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> if (!ret) { >>>>>>>>>>> mutex->value--; >>>>>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on >>>>>>>>>>> my machine) with a >>>>>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>>>>> despite the fact that we set the timeout 10 seconds into >>>>>>>>>>> the future. >>>>>>>>>>> >>>>>>>>>>> Has anyone else seen anything like this? >>>>>>>>>>> (and yes the condition variable attribute have been set to >>>>>>>>>>> use the REALTIME clock). >>>>>>>>>> But why? >>>>>>>>>> >>>>>>>>>> Just a hypothesis that maybe there is some issue with time >>>>>>>>>> keeping on that system. >>>>>>>>>> How would that code work out for you with MONOTONIC? >>>>>>>>> >>>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>>>>> i.e. random early returns with ETIMEDOUT. >>>>>>>>> >>>>>>>>> I think we will try move out machine forward to a newer >>>>>>>>> -stable to see if it resolves. >>>>>>>> Kan upgraded the machine today to today's 9.x branch tip and >>>>>>>> the problem still occurs. >>>>>>>> 8.x does not have this problem. >>>>>>>> >>>>>>>> I have not got a 9-RELEASE machine to test on.. so I can not >>>>>>>> tell if this came in with the burst of stuff >>>>>>>> that came in after the 9.x branch was unfrozen after the >>>>>>>> release of 9.0. >>>>>>>> >>>>>>>> >>>>>>> >>>>>> I am trying to reproduce the problem, do you have complete >>>>>> sample code to test ? >>>>> >>>>> I'm still looking the exact set >>>>> but on my machine (4 cpus) the program from ports sysutils/fio >>>>> exhibits the problem when used with >>>>> kern.timecounter.hardware=TSC-low and with the following config >>>>> file: >>>>> >>>>> pu05 # cat config.fio >>>>> >>>>> [global] >>>>> #clocksource=cpu >>>>> direct=1 >>>>> rw=randread >>>>> bs=4096 >>>>> fill_device=1 >>>>> numjobs=16 >>>>> iodepth=16 >>>>> #ioengine=posixaio >>>>> #ioengine=psync >>>>> ioengine=psync >>>>> group_reporting >>>>> norandommap >>>>> time_based >>>>> runtime=60000 >>>>> randrepeat=0 >>>>> >>>>> [file1] >>>>> filename=/dev/ada0 >>>>> >>>>> pu05 # >>>>> pu05 # fio config.fio >>>>> fio: this platform does not support process shared mutexes, >>>>> forcing use of threads. Use the 'thread' option to get rid of >>>>> this warning. >>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, >>>>> iodepth=16 >>>>> ... >>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, >>>>> iodepth=16 >>>>> fio 2.0.3 >>>>> Starting 15 threads and 1 process >>>>> fio: job startup hung? exiting. >>>>> fio: 5 jobs failed to start >>>>> Segmentation fault (core dumped) >>>>> pu05# >>>>> >>>>> >>>>> The reason 5 jobs failed to start is because the parent timed >>>>> out on them immediately. >>>>> It didn't time out on 10 of them apparently. >>>>> >>>>> >>>>> if I set the timer to ACPI-fast it works as expected.. >>>> maybe following code can check to see if TSC-LOW works by let the >>>> thread run >>>> on each cpu. >>>> >>>> gettimeofday(&prev, NULL); >>>> int cpu = 0; >>>> for (;;) { >>>> cpuset_t set; >>>> cpu = ++cpu % 4; >>>> CPU_ZERO(&set); >>>> CPU_SET(cpu, &set); >>>> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); >>>> gettimeofday(&cur, NULL); >>>> if ( timercmp(&prev, &cur, >=)) { >>>> abort(); >>>> } >>>> } >>>> >>>> >> >> pu05# sysctl kern.timecounter.hardware=TSC-low >> kern.timecounter.hardware: ACPI-fast -> TSC-low >> pu05# ./test >> ^C >> pu05# cat test.c >> >> #include >> #include >> #include >> #include >> >> #include >> >> main() >> { >> int cpu = 0; >> struct timeval prev, cur; >> >> gettimeofday(&prev, NULL); >> for (;;) { >> cpuset_t set; >> cpu = ++cpu % 4; >> CPU_ZERO(&set); >> CPU_SET(cpu, &set); >> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); >> gettimeofday(&cur, NULL); >> if ( timercmp(&prev, &cur, >)) { >> abort(); >> } >> prev = cur; >> } >> } >> >> pu05# ./test >> >> minutes pass....... >> >> ^C >> pu05# >> >> so it looks as if the TSC is working ok.. >> I'm just going to check that the program is actually moving CPU... >> yes it is moving around but I can't tell at what speed. (according >> to top). >> >> so we are still left with a question of "where is the problem?" >> >> kernel TSC driver? >> generic gettimeofday() code? >> pthreads cond code? >> the application? >> >> > I am running the fio test on my notebook which is using TSC-low, > it is on 9.0-RC3, I can not reproduce the problem for > minutes, then I interrupt it with ctrl-c: looks mot > > http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt > http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt > http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt > > looks normal to me.. I have to been able to test this on a 9-RELEASE machine.. just 9-stable.. From owner-freebsd-threads@FreeBSD.ORG Sat Feb 18 01:28:59 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 999931065677; Sat, 18 Feb 2012 01:28:59 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 53E258FC23; Sat, 18 Feb 2012 01:28:58 +0000 (UTC) Received: from julian-mac.elischer.org (c-67-180-24-15.hsd1.ca.comcast.net [67.180.24.15]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id q1I1Sv5F028778 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 17 Feb 2012 17:28:58 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <4F3EFF35.8040901@freebsd.org> Date: Fri, 17 Feb 2012 17:30:29 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.26) Gecko/20120129 Thunderbird/3.1.18 MIME-Version: 1.0 To: Jung-uk Kim References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3E0A90.6080400@freebsd.org> <4F3E39EF.3030209@gmail.com> <201202171117.46626.jkim@FreeBSD.org> In-Reply-To: <201202171117.46626.jkim@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Kabaev , "threads@freebsd.org" , FreeBSD Stable , "davidxu@freebsd.org" , Andriy Gapon Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Feb 2012 01:28:59 -0000 > On Friday 17 February 2012 06:28 am, David Xu wrote: >> On 2012/2/17 16:06, Julian Elischer wrote: >>> On 2/16/12 11:41 PM, Julian Elischer wrote: >>>> adding jkim as he seems to be the last person working with TSC. >>>> >>>> On 2/16/12 6:42 PM, David Xu wrote: >>>>> On 2012/2/17 10:19, Julian Elischer wrote: >>>>>> On 2/16/12 5:56 PM, David Xu wrote: >>>>>>> On 2012/2/17 8:42, Julian Elischer wrote: >>>>>>>> Adding David Xu for his thoughts since he reqrote the code >>>>>>>> in quesiton in revision 213098 >>>>>>>> >>>>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>>>>>> >>>>>>>>>>>> the following code (from fio-2.0.3, but its in earlier >>>>>>>>>>>> code too) has suddenly started misbehaving. >>>>>>>>>>>> >>>>>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>>>>>> t.tv_sec += seconds + 10; >>>>>>>>>>>> >>>>>>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>>>>>> >>>>>>>>>>>> while (!mutex->value&& !ret) { >>>>>>>>>>>> mutex->waiters++; >>>>>>>>>>>> ret = >>>>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>>>>>> mutex->waiters--; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> if (!ret) { >>>>>>>>>>>> mutex->value--; >>>>>>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> It turns out that 'ret' sometimes comes back instantly >>>>>>>>>>>> (on my machine) with a >>>>>>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>>>>>> despite the fact that we set the timeout 10 seconds into >>>>>>>>>>>> the future. >>>>>>>>>>>> >>>>>>>>>>>> Has anyone else seen anything like this? >>>>>>>>>>>> (and yes the condition variable attribute have been set >>>>>>>>>>>> to use the REALTIME clock). >>>>>>>>>>> But why? >>>>>>>>>>> >>>>>>>>>>> Just a hypothesis that maybe there is some issue with >>>>>>>>>>> time keeping on that system. >>>>>>>>>>> How would that code work out for you with MONOTONIC? >>>>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>>>>>> i.e. random early returns with ETIMEDOUT. >>>>>>>>>> >>>>>>>>>> I think we will try move out machine forward to a newer >>>>>>>>>> -stable to see if it resolves. >>>>>>>>> Kan upgraded the machine today to today's 9.x branch tip >>>>>>>>> and the problem still occurs. >>>>>>>>> 8.x does not have this problem. >>>>>>>>> >>>>>>>>> I have not got a 9-RELEASE machine to test on.. so I can >>>>>>>>> not tell if this came in with the burst of stuff >>>>>>>>> that came in after the 9.x branch was unfrozen after the >>>>>>>>> release of 9.0. >>>>>>> I am trying to reproduce the problem, do you have complete >>>>>>> sample code to test ? >>>>>> I'm still looking the exact set >>>>>> but on my machine (4 cpus) the program from ports sysutils/fio >>>>>> exhibits the problem when used with >>>>>> kern.timecounter.hardware=TSC-low and with the following >>>>>> config file: >>>>>> >>>>>> pu05 # cat config.fio >>>>>> >>>>>> [global] >>>>>> #clocksource=cpu >>>>>> direct=1 >>>>>> rw=randread >>>>>> bs=4096 >>>>>> fill_device=1 >>>>>> numjobs=16 >>>>>> iodepth=16 >>>>>> #ioengine=posixaio >>>>>> #ioengine=psync >>>>>> ioengine=psync >>>>>> group_reporting >>>>>> norandommap >>>>>> time_based >>>>>> runtime=60000 >>>>>> randrepeat=0 >>>>>> >>>>>> [file1] >>>>>> filename=/dev/ada0 >>>>>> >>>>>> pu05 # >>>>>> pu05 # fio config.fio >>>>>> fio: this platform does not support process shared mutexes, >>>>>> forcing use of threads. Use the 'thread' option to get rid of >>>>>> this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K, >>>>>> ioengine=psync, iodepth=16 ... >>>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, >>>>>> iodepth=16 fio 2.0.3 >>>>>> Starting 15 threads and 1 process >>>>>> fio: job startup hung? exiting. >>>>>> fio: 5 jobs failed to start >>>>>> Segmentation fault (core dumped) >>>>>> pu05# >>>>>> >>>>>> >>>>>> The reason 5 jobs failed to start is because the parent timed >>>>>> out on them immediately. >>>>>> It didn't time out on 10 of them apparently. >>>>>> >>>>>> >>>>>> if I set the timer to ACPI-fast it works as expected.. >>>>> maybe following code can check to see if TSC-LOW works by let >>>>> the thread run >>>>> on each cpu. >>>>> >>>>> gettimeofday(&prev, NULL); >>>>> int cpu = 0; >>>>> for (;;) { >>>>> cpuset_t set; >>>>> cpu = ++cpu % 4; >>>>> CPU_ZERO(&set); >>>>> CPU_SET(cpu,&set); >>>>> pthread_setaffinity_np(pthread_self(), sizeof(set),&set); >>>>> gettimeofday(&cur, NULL); >>>>> if ( timercmp(&prev,&cur,>=)) { >>>>> abort(); >>>>> } >>>>> } >>> pu05# sysctl kern.timecounter.hardware=TSC-low >>> kern.timecounter.hardware: ACPI-fast -> TSC-low >>> pu05# ./test >>> ^C >>> pu05# cat test.c >>> >>> #include >>> #include >>> #include >>> #include >>> >>> #include >>> >>> main() >>> { >>> int cpu = 0; >>> struct timeval prev, cur; >>> >>> gettimeofday(&prev, NULL); >>> for (;;) { >>> cpuset_t set; >>> cpu = ++cpu % 4; >>> CPU_ZERO(&set); >>> CPU_SET(cpu,&set); >>> pthread_setaffinity_np(pthread_self(), sizeof(set), >>> &set); gettimeofday(&cur, NULL); >>> if ( timercmp(&prev,&cur,>)) { >>> abort(); >>> } >>> prev = cur; >>> } >>> } >>> >>> pu05# ./test >>> >>> minutes pass....... >>> >>> ^C >>> pu05# >>> >>> so it looks as if the TSC is working ok.. >>> I'm just going to check that the program is actually moving >>> CPU... yes it is moving around but I can't tell at what speed. >>> (according to top). >>> >>> so we are still left with a question of "where is the problem?" >>> >>> kernel TSC driver? >>> generic gettimeofday() code? >>> pthreads cond code? >>> the application? >> I am running the fio test on my notebook which is using TSC-low, >> it is on 9.0-RC3, I can not reproduce the problem for >> minutes, then I interrupt it with ctrl-c: >> >> http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt >> http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt >> http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt > Your CPU is single-package, dual-core, and SMT-enabled. All cores > should be in perfect sync. > > Jung-uk Kim > mine is too, yet it still has problems.. CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2500.14-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping = 6 Features=0xbfebfbff Features2=0xce3bd AMD Features=0x20100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 8589934592 (8192 MB) avail memory = 8214368256 (7833 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard From owner-freebsd-threads@FreeBSD.ORG Sat Feb 18 05:47:10 2012 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 01691106564A; Sat, 18 Feb 2012 05:47:10 +0000 (UTC) (envelope-from listlog2011@gmail.com) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C0A7A8FC14; Sat, 18 Feb 2012 05:47:09 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1I5l5uh054728; Sat, 18 Feb 2012 05:47:07 GMT (envelope-from listlog2011@gmail.com) Message-ID: <4F3F3B57.5070700@gmail.com> Date: Sat, 18 Feb 2012 13:47:03 +0800 From: David Xu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Julian Elischer References: <4F3C2671.3090808__7697.00510795719$1329343207$gmane$org@freebsd.org> <4F3E0A90.6080400@freebsd.org> <4F3E39EF.3030209@gmail.com> <201202171117.46626.jkim@FreeBSD.org> <4F3EFF35.8040901@freebsd.org> In-Reply-To: <4F3EFF35.8040901@freebsd.org> Content-Type: multipart/mixed; boundary="------------040401090208010403080104" Cc: FreeBSD Stable , Alexander Kabaev , Andriy Gapon , "davidxu@freebsd.org" , "threads@freebsd.org" , Jung-uk Kim Subject: Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: davidxu@freebsd.org List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Feb 2012 05:47:10 -0000 This is a multi-part message in MIME format. --------------040401090208010403080104 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 2012/2/18 9:30, Julian Elischer wrote: >> > > mine is too, yet it still has problems.. > CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2500.14-MHz > K8-class CPU) > Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 > Stepping = 6 > > Features=0xbfebfbff > > Features2=0xce3bd > AMD Features=0x20100800 > AMD Features2=0x1 > TSC: P-state invariant, performance statistics > real memory = 8589934592 (8192 MB) > avail memory = 8214368256 (7833 MB) > Event timer "LAPIC" quality 400 > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > FreeBSD/SMP: 1 package(s) x 4 core(s) > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > cpu2 (AP): APIC ID: 2 > cpu3 (AP): APIC ID: 3 > ioapic0 irqs 0-23 on motherboard > ioapic1 irqs 24-47 on motherboard > > Attached file is a small patch, don't know if it works for you, I can only find this at the moment. --------------040401090208010403080104 Content-Type: text/plain; name="thr_umtx.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="thr_umtx.c.diff" Index: src/lib/libthr/thread/thr_umtx.c =================================================================== --- src/lib/libthr/thread/thr_umtx.c (revision 231637) +++ src/lib/libthr/thread/thr_umtx.c (working copy) @@ -205,7 +205,7 @@ if (abstime != NULL) { clock_gettime(clockid, &ts); TIMESPEC_SUB(&ts2, abstime, &ts); - if (ts2.tv_sec < 0 || ts2.tv_nsec <= 0) + if (ts2.tv_sec < 0 || (ts2.tv_sec == 0 && ts2.tv_nsec <= 0)) return (ETIMEDOUT); tsp = &ts2; } else { --------------040401090208010403080104--