From owner-cvs-usrsbin Sun Mar 9 04:11:44 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id EAA16668 for cvs-usrsbin-outgoing; Sun, 9 Mar 1997 04:11:44 -0800 (PST) Received: from sovcom.kiae.su (sovcom.kiae.su [193.125.152.1]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id EAA16663; Sun, 9 Mar 1997 04:11:39 -0800 (PST) Received: by sovcom.kiae.su id AA08107 (5.65.kiae-1 ); Sun, 9 Mar 1997 15:05:55 +0300 Received: by sovcom.KIAE.su (UUMAIL/2.0); Sun, 9 Mar 97 15:05:55 +0300 Received: (from ache@localhost) by nagual.ru (8.8.5/8.8.5) id OAA00815; Sun, 9 Mar 1997 14:56:08 +0300 (MSK) Date: Sun, 9 Mar 1997 14:56:04 +0300 (MSK) From: =?KOI8-R?B?4c7E0sXKIP7F0s7P1w==?= To: Brian Somers Cc: CVS-committers@freebsd.org, cvs-all@freebsd.org, cvs-usrsbin@freebsd.org Subject: Re: cvs commit: src/usr.sbin/ppp timer.c In-Reply-To: <199703082058.UAA24419@awfulhak.demon.co.uk> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-cvs-usrsbin@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Sat, 8 Mar 1997, Brian Somers wrote: > I don't understand. The idea is that if an interrupt occurs (calling > the pending function), the select() is interrupted and the pending > interrupt routine is immediately called. There should be very little > latency.... unless there's some other tight loops in the code ? I don't > know of any. It seems that some other tight loops present... > You're forgetting about SIGHUP and SIGTERM. They call LogClose(), which > ends up in a call to mballoc() in LogFlush(). Needless to say, mballoc() > calls our friend malloc(). Also, TimerService calls logprintf() which > calls vlogprintf() which calls LogFlush()...... This signals can't be bug report reason, since only normal mode bug assumed in report, not termination bug. And in norman mode only SIGALRM can happens, other signals are impossible. > > I consider having malloc() problem after two days of running is lesser bug > > than having dead hang after 5minutes of running and carrier drop. > > Ah, but you're the first to complain of the "problem". Similar code was > released in 2.2-GAMMA and nobody complained (AFAIK). We have bad phone lines in Russia, so carrier drop is common situation here. As I hear it almost never occurse in USA. > Let's not jump to conclusions. I agree with the SIGSEGV stuff (if it > has to be trapped), and fork signals aren't broken. SIG_DFL & SIG_IGN > pass right through the pending code. As I say, SIGALRM is only one signal which can happen in normal running mode. I didn't see a reason to pend other signals (excepting maybe SIGTSTP. etc which I left pending). > > Proper fixing assumed not pending SIGALRM calls (true time is valuable > > thing) but making all timer code recursion-safe. > > The original problem wasn't *just* with recursive malloc()s in the Timer > code. 2.2-ALPHA (or was it GAMMA) went out with a pending SIGALRM, and > still exhibited the problem. You mean that signal pending not fix the problem? Why bug report stays closed in this case? > IMHO, "proper fixing" entails not allowing any malloc() calls to recurse. > AFAIK, POSIX doesn't say anything about malloc() needing to be re-entrant, > therefore it's up to the program not to re-enter. As a signal may occur > during malloc(), we must make sure that no handler that calls malloc() > may be caused until it's safe (ie via handle_signals()). Yes, proper fixing is not enabling malloc in signal handlers. But pending alarm ticks is not allowed in any case. They are alarm tics just because they don't want to be delayed. I.e. alarm signal handlred must be executed immediately but don't call malloc (you can pend malloc call, not signal handler itself). > You are not sure that all of the changes from pending_signal() to signal() > are changes to calls that use handlers that don't call malloc() (as I > pointed out above), so I will not agree with the changes. If you insist > on leaving the code there, you can deal with the re-opened recursive > malloc() pr. Yes, I re-open it. But from your words it happens even with pending signals or not? > Either way, I'd like to know where the code is when ppp loops. I've heard > that this does happen from time to time, but nobody's ever identified where > the code was at the time. I'd really appreciate if you could tell me > where so that I can scatter a few more calls to handle_signals(). If you > can reproduce the problem, could you remove the signal(SIGSEGV,...) call, > -11 it when it's hung, and ask the ensuing core where it was at ? TIA. I'll try to debug and tell exact place (it is a bit hard to debug daemon with unstable effect). Right now I can only say that problem disappearse when I remove signals pending. -- Andrey A. Chernov http://www.nagual.ru/~ache/