From owner-freebsd-bugs@FreeBSD.ORG Fri Sep 1 11:10:24 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4965F16A503 for ; Fri, 1 Sep 2006 11:10:24 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id DA2B743D49 for ; Fri, 1 Sep 2006 11:10:22 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k81BAM9E003074 for ; Fri, 1 Sep 2006 11:10:22 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k81BAMjf003070; Fri, 1 Sep 2006 11:10:22 GMT (envelope-from gnats) Resent-Date: Fri, 1 Sep 2006 11:10:22 GMT Resent-Message-Id: <200609011110.k81BAMjf003070@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Sergey Zaharchenko Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3345D16A4DE for ; Fri, 1 Sep 2006 11:03:19 +0000 (UTC) (envelope-from doublef-ctm@yandex.ru) Received: from mx18.yandex.ru (smtp2.yandex.ru [213.180.200.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id 50C4B43D55 for ; Fri, 1 Sep 2006 11:03:17 +0000 (GMT) (envelope-from doublef-ctm@yandex.ru) Received: from [83.239.189.161] ([83.239.189.161]:8646 "EHLO shark" smtp-auth: "doublef-ctm" TLS-CIPHER: TLS-PEER-CN1: ) by mail.yandex.ru with ESMTP id S3375808AbWIALDP (ORCPT ); Fri, 1 Sep 2006 15:03:15 +0400 Received: by shark (Postfix, from userid 1000) id 416A89DB4C; Fri, 1 Sep 2006 15:02:40 +0400 (MSD) Message-Id: <20060901110240.416A89DB4C@shark> Date: Fri, 1 Sep 2006 15:02:40 +0400 (MSD) From: Sergey Zaharchenko To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Brian Somers Subject: bin/102747: ppp timer subsystem stops under certain circumstances X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Sep 2006 11:10:24 -0000 >Number: 102747 >Category: bin >Synopsis: ppp timer subsystem stops under certain circumstances >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Sep 01 11:10:21 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Sergey Zaharchenko >Release: FreeBSD 7.0-CURRENT i386 >Organization: Volgograd State Technical University >Environment: System: FreeBSD shark.localdomain 7.0-CURRENT FreeBSD 7.0-CURRENT #1: Mon Aug 28 15:41:24 UTC 2006 root@shark.localdomain:/var/obj/src/usr.src/sys/GENERIC i386 >Description: There are certain situations which cause the ppp timer subsystem to stop (that is, there are timers in the list, but none of them ever get triggered). This is caused by calling setitimer() with a zero it_value in timer_InitService(). Killing ppp with SIGALRM triggers the first timer and resumes normal operation. Here's a description of one of the situations when the offending setitimer() call could happen. If two timers (call them A and B) are near expiration (less than ~0.05 seconds left before triggering; A is first in the list; B->rest==0, because they trigger at roughly the same time), and, at the same time, ppp decides to stop A using StopTimerNoBlock(), the call adjusts TimerList->rest to the itimer's value ``so that it reflects what's really happening''. Now A->rest==RESTVAL(small time value)==0, B->rest==0. Then B->rest+=A->rest; 0+0==0; B->rest is still zero. A is dequeued, B is now first in the list, and TimerList==B. timer_InitService() is called, because the first timer was deleted. timer_InitService() calls setitimer() with it_value==(a conversion of)B->rest==0, which stops the itimer according to getitimer(2). If a timer is later added, B will still sit there with rest==0, preventing the itimer from launching. Neither B nor any other timer will ever be serviced, unless some magic intervenes. An example of such magic is the user killing ppp with SIGALRM. Then B will get serviced and dequeued, and itimer relaunched. This is not the only situation when TimerList->rest==0. >How-To-Repeat: The described situation appears only rarely in ordinary practice. They happen when a lot of timers are being started, stopped and expired (for example, just after the authentification succeeds). A way of detecting the situation would be to insert the following line into timer.c: if (setitimer(ITIMER_REAL, &itimer, NULL) == -1) log_Printf(LogERROR, "Unable to set itimer (%s)\n", strerror(errno)); + if (TimerList->rest==0) raise(SIGSTOP); } recompile and reinstall ppp, add "set log timer" to ppp.conf and run ppp in the following script: #!/bin/sh ppp -dd whatever # We assume 100 seconds is enough to connect and authenticate to the peer until ps -o state,comm | grep '^T *ppp$'; do killall -INT ppp; sleep 100;done The script should exit when the offending call has just been issued. You can then ``killall -CONT ppp'' and observe the absence of any timer activity in the log. You can ``killall -ALRM ppp'' to revive it. >Fix: Never calling timer_InitService() when TimerList->rest==0 seems to be the right thing to do. The following is a patch which avoids situations when TimerList->rest==0 and timer_InitService() is called. ---------------------------------------------------------------------cut here --- ppp-7.0-CURRENT.orig/timer.c Wed Jan 5 19:25:29 2005 +++ ppp-7.0-CURRENT/timer.c Fri Sep 1 13:37:08 2006 @@ -94,9 +94,16 @@ return; } - /* Adjust our first delta so that it reflects what's really happening */ + /* + * Adjust the tick count so that it roughly corresponds to reality. + * Changing TimerList->rest might cause it to become zero (if + * getitimer() returns a value close to zero), and the + * timer_InitService() call will call setitimer() with zero it_value, + * stopping the itimer. + */ + if (TimerList && getitimer(ITIMER_REAL, &itimer) == 0) - TimerList->rest = RESTVAL(itimer); + ticks=(RESTVAL(itimer)-(long)TimerList->rest); pt = NULL; for (t = TimerList; t; t = t->next) { @@ -156,14 +163,24 @@ timer_TermService(); /* Terminate Timer Service */ } if (t->next) { + unsigned t_rest=t->rest; if (!pt) { /* t (tp) was the first in the list */ struct itimerval itimer; if (getitimer(ITIMER_REAL, &itimer) == 0) - t->rest = RESTVAL(itimer); + t_rest = RESTVAL(itimer); } - t->next->rest += t->rest; - if (!pt) /* t->next is now the first in the list */ + t->next->rest += t_rest; + + /* + * It's still possible for t->next->rest to be zero (just + * because the next timer should trigger right after the + * one being stopped), in which case calling timer_InitService() + * would be a bad idea. Don't do that, then. In this case, the + * SIGALRM will trigger the next timer, so that's OK. + */ + + if (!pt&&(t->next->rest>0)) /* t->next is now the first in the list */ timer_InitService(1); } } else { @@ -235,11 +252,18 @@ { struct itimerval itimer; struct pppTimer *pt; - u_long rest = 0; + long rest = 0; /* should allow negative values */ + + /* + * Adjust the base time so that the deltas reflect what's really + * happening. Changing TimerList->rest might cause it to become zero + * (if getitimer() returns a value close to zero), and the + * timer_InitService() call will call setitimer() with zero it_value, + * stopping the itimer. + */ - /* Adjust our first delta so that it reflects what's really happening */ if (TimerList && getitimer(ITIMER_REAL, &itimer) == 0) - TimerList->rest = RESTVAL(itimer); + rest=(RESTVAL(itimer)-(long)TimerList->rest); #define SECS(val) ((val) / SECTICKS) #define HSECS(val) (((val) % SECTICKS) * 100 / SECTICKS) ---------------------------------------------------------------------cut here Another way to fix the problem is to simply check for TimerList->rest==0 after setitimer() in timer_InitService() and do a raise(SIGALRM) then. It's simpler, but some of the accuracy is lost. >Release-Note: >Audit-Trail: >Unformatted: