Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Oct 2001 15:41:27 -0400
From:      The Anarcat <anarcat@anarcat.dyndns.org>
To:        gnats-admin@FreeBSD.org, freebsd-bugs@FreeBSD.org
Cc:        "Crist J. Clark" <cristjc@earthlink.net>, freebsd-hackers@freebsd.org
Subject:   Re: bin/31029: syslogd remote logging back down
Message-ID:  <20011005154126.B7418@shall.anarcat.dyndns.org>
In-Reply-To: <200110040830.f948U1O13043@freefall.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[cc'ed to -hackers since it is what I think is the relevant list, please
correct me if I'm wrong]

[[note that I welcome style and function comments]]

Note that this fix "works" in the sense that syslogd will eventually be
able to reach to remote server again, but the delays of the checks are
dependent on the frequency of the outgoing log lines. 

For example, say the initial delay is 30 seconds. If the algorithm was
"clean", there would be some kind of check made on the remote server at
30, 30+60=90, 90+120=210, etc, seconds after the failure. But the check
is only made when a log has to be sent out. So if no log is sent out for
(say) 5 minutes, the delay (f_delay) won't be changed to take into
account the delay taken to have a new log to send. 

This might not be an undesirable behavior, BTW. :) That way, if a remote
host doesn't usually receive much logs, it has more chances of being
checked each time a new log is sent. On the other hand, if a host
receives much logs, most will be discarded, and the host will be checked
at regular, exponentially expanding intervals, until it becomes
available again.

Also, note that the patch inserts 2 more time_t fields in the filed
struct.

I don't know exactly how the struct filed* pointers are initialized in
the fprintlog function. Actually, one must assume that f_unreach is
initialized to 0. If the field is filled with arbitrary value, the
algorithm leads to undefined behavior. The worst that can happen is that
syslogd will not *try* to send the log to this host, even if it is
reachable, as the f_unreach field is also a "flag".

Also, all this won't work very well if something like this isn't applied
after:

(warning: cut'n'paste)

- --- /usr/src/usr.sbin/syslogd/syslogd.c Thu Oct  4 00:06:49 2001
+++ syslogd.c   Fri Oct  5 15:31:28 2001
@@ -1064,6 +1064,8 @@
                                        f->f_type = F_UNUSED;
                                        break;
                                }
+                       } else { /* no error from sendto */
+                               f->f_unreach = 0; /* clear unreach error
flag */
                        }
                }
                break;

I could swear I saw my router's patched syslogd recover from "host down"
even without that extra patch (using only the orinigal pr's patch), but
anyways..

For convenience, I also put here a full patch that has the dprintf's
removed:

- --- /usr/src/usr.sbin/syslogd/syslogd.c.orig	Wed Oct  3 15:56:32 2001
+++ syslogd.c	Fri Oct  5 15:37:57 2001
@@ -142,6 +142,9 @@
 #define MARK		0x008	/* this message is a mark */
 #define ISKERNEL	0x010	/* kernel generated message */
 
+#define DELAY_MUL	2       /* delay multiplier */
+#define DELAY_INIT	30	/* initial delay in seconds */
+
 /*
  * This structure represents the files that will have log
  * copies printed.
@@ -159,6 +162,9 @@
 #define PRI_EQ	0x2
 #define PRI_GT	0x4
 	char	*f_program;		/* program this applies to */
+	/* should this be part of the union? */
+	time_t  f_unreach;	      /* time since last unreach */
+	time_t  f_delay;		/* backoff time */
 	union {
 		char	f_uname[MAXUNAMES][UT_NAMESIZE+1];
 		struct {
@@ -999,6 +1005,11 @@
 			l = MAXLINE;
 
 		if (finet) {
+			/* XXX: must make sure this is initialized to 0 */
+			if ((f->f_unreach) && 
+			   ((now - f->f_unreach) < f->f_delay)) {
+				break; /* do not send */
+			}
 			for (r = f->f_un.f_forw.f_addr; r; r = r->ai_next) {
 				for (i = 0; i < *finet; i++) {
 #if 0 
@@ -1019,10 +1030,34 @@
 			}
 			if (lsent != l) {
 				int e = errno;
- -				(void)close(f->f_file);
- -				errno = e;
- -				f->f_type = F_UNUSED;
 				logerror("sendto");
+				errno = e;
+				switch (errno) {
+				case EHOSTUNREACH:
+				case EHOSTDOWN:
+					if (f->f_unreach)
+						f->f_delay *= DELAY_MUL;
+					else {
+						f->f_unreach = now;
+						f->f_delay = DELAY_INIT;
+					}
+					break;
+				/* case EBADF: */
+				/* case EACCES: */
+				/* case ENOTSOCK: */
+				/* case EFAULT: */
+				/* case EMSGSIZE: */
+				/* case EAGAIN: */
+				/* case ENOBUFS: */
+				/* case ECONNREFUSED: */
+				default:
+					(void)close(f->f_file);
+					errno = e;
+					f->f_type = F_UNUSED;
+					break;
+				}
+			} else { /* no error from sendto */
+				f->f_unreach = 0; /* clear unreach error flag */
 			}
 		}
 		break;

A.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (FreeBSD)
Comment: Pour information voir http://www.gnupg.org

iEYEARECAAYFAju+DOUACgkQttcWHAnWiGcsVgCeI7L2H3xD5GRN65mDdW4ZLvwe
sfkAnAjESC9zhuC7wlobnXpm14MZ00Ik
=h9I5
-----END PGP SIGNATURE-----

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011005154126.B7418>