From owner-freebsd-hackers Fri Oct 5 12:41:43 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from tomts11-srv.bellnexxia.net (tomts11.bellnexxia.net [209.226.175.55]) by hub.freebsd.org (Postfix) with ESMTP id B386037B406; Fri, 5 Oct 2001 12:41:15 -0700 (PDT) Received: from khan.anarcat.dyndns.org ([65.92.161.107]) by tomts11-srv.bellnexxia.net (InterMail vM.4.01.03.16 201-229-121-116-20010115) with ESMTP id <20011005194114.YJZB19737.tomts11-srv.bellnexxia.net@khan.anarcat.dyndns.org>; Fri, 5 Oct 2001 15:41:14 -0400 Received: from shall.anarcat.dyndns.org (shall.anarcat.dyndns.org [192.168.0.1]) by khan.anarcat.dyndns.org (Postfix) with ESMTP id 789EF1B30; Fri, 5 Oct 2001 15:41:09 -0400 (EDT) Received: by shall.anarcat.dyndns.org (Postfix, from userid 1000) id 2705020BE1; Fri, 5 Oct 2001 15:41:27 -0400 (EDT) Date: Fri, 5 Oct 2001 15:41:27 -0400 From: The Anarcat To: gnats-admin@FreeBSD.org, freebsd-bugs@FreeBSD.org Cc: "Crist J. Clark" , freebsd-hackers@freebsd.org Subject: Re: bin/31029: syslogd remote logging back down Message-ID: <20011005154126.B7418@shall.anarcat.dyndns.org> Mime-Version: 1.0 Content-Type: application/pgp; x-action=sign; format=text Content-Disposition: inline; filename="msg.pgp" In-Reply-To: <200110040830.f948U1O13043@freefall.freebsd.org> User-Agent: Mutt/1.3.22.1i Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [cc'ed to -hackers since it is what I think is the relevant list, please correct me if I'm wrong] [[note that I welcome style and function comments]] Note that this fix "works" in the sense that syslogd will eventually be able to reach to remote server again, but the delays of the checks are dependent on the frequency of the outgoing log lines. For example, say the initial delay is 30 seconds. If the algorithm was "clean", there would be some kind of check made on the remote server at 30, 30+60=90, 90+120=210, etc, seconds after the failure. But the check is only made when a log has to be sent out. So if no log is sent out for (say) 5 minutes, the delay (f_delay) won't be changed to take into account the delay taken to have a new log to send. This might not be an undesirable behavior, BTW. :) That way, if a remote host doesn't usually receive much logs, it has more chances of being checked each time a new log is sent. On the other hand, if a host receives much logs, most will be discarded, and the host will be checked at regular, exponentially expanding intervals, until it becomes available again. Also, note that the patch inserts 2 more time_t fields in the filed struct. I don't know exactly how the struct filed* pointers are initialized in the fprintlog function. Actually, one must assume that f_unreach is initialized to 0. If the field is filled with arbitrary value, the algorithm leads to undefined behavior. The worst that can happen is that syslogd will not *try* to send the log to this host, even if it is reachable, as the f_unreach field is also a "flag". Also, all this won't work very well if something like this isn't applied after: (warning: cut'n'paste) - --- /usr/src/usr.sbin/syslogd/syslogd.c Thu Oct 4 00:06:49 2001 +++ syslogd.c Fri Oct 5 15:31:28 2001 @@ -1064,6 +1064,8 @@ f->f_type = F_UNUSED; break; } + } else { /* no error from sendto */ + f->f_unreach = 0; /* clear unreach error flag */ } } break; I could swear I saw my router's patched syslogd recover from "host down" even without that extra patch (using only the orinigal pr's patch), but anyways.. For convenience, I also put here a full patch that has the dprintf's removed: - --- /usr/src/usr.sbin/syslogd/syslogd.c.orig Wed Oct 3 15:56:32 2001 +++ syslogd.c Fri Oct 5 15:37:57 2001 @@ -142,6 +142,9 @@ #define MARK 0x008 /* this message is a mark */ #define ISKERNEL 0x010 /* kernel generated message */ +#define DELAY_MUL 2 /* delay multiplier */ +#define DELAY_INIT 30 /* initial delay in seconds */ + /* * This structure represents the files that will have log * copies printed. @@ -159,6 +162,9 @@ #define PRI_EQ 0x2 #define PRI_GT 0x4 char *f_program; /* program this applies to */ + /* should this be part of the union? */ + time_t f_unreach; /* time since last unreach */ + time_t f_delay; /* backoff time */ union { char f_uname[MAXUNAMES][UT_NAMESIZE+1]; struct { @@ -999,6 +1005,11 @@ l = MAXLINE; if (finet) { + /* XXX: must make sure this is initialized to 0 */ + if ((f->f_unreach) && + ((now - f->f_unreach) < f->f_delay)) { + break; /* do not send */ + } for (r = f->f_un.f_forw.f_addr; r; r = r->ai_next) { for (i = 0; i < *finet; i++) { #if 0 @@ -1019,10 +1030,34 @@ } if (lsent != l) { int e = errno; - - (void)close(f->f_file); - - errno = e; - - f->f_type = F_UNUSED; logerror("sendto"); + errno = e; + switch (errno) { + case EHOSTUNREACH: + case EHOSTDOWN: + if (f->f_unreach) + f->f_delay *= DELAY_MUL; + else { + f->f_unreach = now; + f->f_delay = DELAY_INIT; + } + break; + /* case EBADF: */ + /* case EACCES: */ + /* case ENOTSOCK: */ + /* case EFAULT: */ + /* case EMSGSIZE: */ + /* case EAGAIN: */ + /* case ENOBUFS: */ + /* case ECONNREFUSED: */ + default: + (void)close(f->f_file); + errno = e; + f->f_type = F_UNUSED; + break; + } + } else { /* no error from sendto */ + f->f_unreach = 0; /* clear unreach error flag */ } } break; A. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (FreeBSD) Comment: Pour information voir http://www.gnupg.org iEYEARECAAYFAju+DOUACgkQttcWHAnWiGcsVgCeI7L2H3xD5GRN65mDdW4ZLvwe sfkAnAjESC9zhuC7wlobnXpm14MZ00Ik =h9I5 -----END PGP SIGNATURE----- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message