From owner-freebsd-current@freebsd.org Mon Nov 19 09:27:07 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E0216113A011 for ; Mon, 19 Nov 2018 09:27:06 +0000 (UTC) (envelope-from sg@efficientip.com) Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8E48E6D004 for ; Mon, 19 Nov 2018 09:27:05 +0000 (UTC) (envelope-from sg@efficientip.com) Received: by mail-ed1-x52e.google.com with SMTP id h15so19936451edb.4 for ; Mon, 19 Nov 2018 01:27:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficientip.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7Xuoe2KTxGwU3L2oibGR44OPHQfP9SR6abjL6gKpjss=; b=JlLPVWGyNyIsju4UZoUnGkqYpQx7ee0YkCWsJexazucLW7O6xl3Lp3xD4Ryx3q5j5n 0YKLWE7L1UMGYEMxC91cfcjPVfPfpS74Eeu51wUPhzNoTsKN7kAlBOsCUmc/dmNAgTc2 9AvNpk3fWAfllk1lJ25D9oLXbgZdC5RyYQx1Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7Xuoe2KTxGwU3L2oibGR44OPHQfP9SR6abjL6gKpjss=; b=kf7CjR4RaA5t5ZYh9AGtJeW6oV62bj3Pu5wC0LSCsBqQ24XDYo5F0Jw6gyofwlmnDj RZmDX41kWS8vapnQvWTb8IDnVKxah09lihxKaoQPZD/J2ww50OLNfyY7w4Vi3uBGr1s4 f0lMMEA8jnrN9rAE8/P5W4xwnVaRXOaNFesnsFume/4iwd6iAmezTTe7Kg7mPFVvSZ2a LlflUvpWoJ07g2RljDH2v2Nrw1v3IJWxOOPL5J4qm+nbeoObXFqGlBu4rEiHStIVV6o6 ieTGWNb7qapbcTcNrkaPiY/VFtk06LaL6fYqvIdJZ3pWA0CtCmGfTnZsDgMDWIQZOSY9 N2gg== X-Gm-Message-State: AGRZ1gKRU8HRY/u8pxllP5JJFOcQ+SeeWwKP9wTEe+FytDs9KKl8WSMI N/FHF3gTkQlsceQ64a7D+bZjolfj/ogTIB+qc+WZuAbdAjM= X-Google-Smtp-Source: AJdET5dusxvY+8CJoIOEoi+ZHOmyCxRGTmttEs6rwqa2naQjo7nCNGyp0sP4Gl5zA3lwzOam6s6XTngxo6EU6bQjNkI= X-Received: by 2002:a50:b68a:: with SMTP id d10mr19552231ede.16.1542619624111; Mon, 19 Nov 2018 01:27:04 -0800 (PST) MIME-Version: 1.0 References: <20181107043503.GB30861@raichu> <20181115221019.GA2514@raichu> <20181116154210.GB17379@raichu> <20181118003554.GC2799@raichu> In-Reply-To: <20181118003554.GC2799@raichu> From: Sylvain GALLIANO Date: Mon, 19 Nov 2018 10:26:51 +0100 Message-ID: Subject: Re: Panic on kern_event.c To: markj@freebsd.org Cc: freebsd-current@freebsd.org X-Rspamd-Queue-Id: 8E48E6D004 X-Spamd-Result: default: False [-4.54 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.992,0]; R_DKIM_ALLOW(-0.20)[efficientip.com]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_SHORT(-0.97)[-0.974,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; DMARC_NA(0.00)[efficientip.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[efficientip.com:+]; RCPT_COUNT_TWO(0.00)[2]; MIME_BASE64_TEXT(0.10)[]; RCVD_IN_DNSWL_NONE(0.00)[e.2.5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; R_SPF_NA(0.00)[]; MX_GOOD(-0.01)[alt1.aspmx.l.google.com,aspmx.l.google.com,aspmx5.googlemail.com,aspmx4.googlemail.com,aspmx3.googlemail.com,alt2.aspmx.l.google.com,aspmx2.googlemail.com]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; IP_SCORE(-2.37)[ip: (-8.31), ipnet: 2a00:1450::/32(-1.73), asn: 15169(-1.70), country: US(-0.09)]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Server: mx1.freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Nov 2018 09:27:07 -0000 With this latest patch, after stressing syslog-ng few minutes, it do not log anymore and a simple kill do not work (I have to do kill -9) Le dim. 18 nov. 2018 =C3=A0 01:35, Mark Johnston a =C3= =A9crit : > On Fri, Nov 16, 2018 at 05:56:54PM +0100, Sylvain GALLIANO wrote: > > Le ven. 16 nov. 2018 =C3=A0 16:42, Mark Johnston a = =C3=A9crit > : > > > > > On Fri, Nov 16, 2018 at 03:47:39PM +0100, Sylvain GALLIANO wrote: > > > > Le jeu. 15 nov. 2018 =C3=A0 23:10, Mark Johnston a > =C3=A9crit > > > : > > > > > > > > > On Thu, Nov 08, 2018 at 05:05:03PM +0100, Sylvain GALLIANO wrote: > > > > > > Hi, > > > > > > > > > > > > I replaced > > > > > > << printf("XXX knote %p already in tailq status:%x kq_count:%d > [%p > > > %p] > > > > > > > > > > > > > > > %u\n",kn,kn->kn_status,kq->kq_count,kn->kn_tqe.tqe_next,kn->kn_tqe.tqe_pr= ev,__LINE__); > > > > > > by > > > > > > >> panic("XXX knote %p already in tailq status:%x kq_count:%d > [%p > > > %p] > > > > > > > > > > > > > > > %u\n",kn,kn->kn_status,kq->kq_count,kn->kn_tqe.tqe_next,kn->kn_tqe.tqe_pr= ev,__LINE__); > > > > > > > > > > > > Here is the stack during panic: > > > > > > panic: XXX knote 0xfffff801e1c6ddc0 already in tailq status:1 > > > kq_count:2 > > > > > > [0 0xfffff8000957a978] 2671 > > > > > > > > > > > Could you please give the following patch a try? > > > > > > > > > > If possible, could you also ktrace one of the active syslog-ng > > > processes > > > > > for some time, perhaps 15 seconds, and share the kdump? I have > been > > > > > trying to reproduce the problem without any luck. > > > > > > > > > Unfortunately patched kernel is not stable: > > > > - some processes run at 100% CPU (STOP state) and cannot be killed > > > > - sometime the system completely freeze (need a hard reboot) > > Could you please give this updated a patch a try? > > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c > index d9c670e29d60..f989d612bac4 100644 > --- a/sys/kern/kern_event.c > +++ b/sys/kern/kern_event.c > @@ -224,6 +224,7 @@ SYSCTL_UINT(_kern, OID_AUTO, kq_calloutmax, CTLFLAG_R= W, > #define KQ_LOCK(kq) do { \ > mtx_lock(&(kq)->kq_lock); \ > } while (0) > +#define KQ_LOCKPTR(kq) (&(kq)->kq_lock) > #define KQ_FLUX_WAKEUP(kq) do { > \ > if (((kq)->kq_state & KQ_FLUXWAIT) =3D=3D KQ_FLUXWAIT) { = \ > (kq)->kq_state &=3D ~KQ_FLUXWAIT; = \ > @@ -687,8 +688,11 @@ filt_timerexpire(void *knx) > struct kq_timer_cb_data *kc; > > kn =3D knx; > + KQ_OWNED(kn->kn_kq); > kn->kn_data++; > - KNOTE_ACTIVATE(kn, 0); /* XXX - handle locking */ > + > + if (!kn_in_flux(kn) || (kn->kn_status & KN_SCAN) !=3D 0) > + KNOTE_ACTIVATE(kn, 1); > > if ((kn->kn_flags & EV_ONESHOT) !=3D 0) > return; > @@ -753,7 +757,7 @@ filt_timerattach(struct knote *kn) > kn->kn_flags |=3D EV_CLEAR; /* automatically set */ > kn->kn_status &=3D ~KN_DETACHED; /* knlist_add clears it= */ > kn->kn_ptr.p_v =3D kc =3D malloc(sizeof(*kc), M_KQUEUE, M_WAITOK)= ; > - callout_init(&kc->c, 1); > + callout_init_mtx(&kc->c, KQ_LOCKPTR(kn->kn_kq), 0); > filt_timerstart(kn, to); > > return (0); > @@ -772,8 +776,10 @@ filt_timerstart(struct knote *kn, sbintime_t to) > kc->next =3D to + sbinuptime(); > kc->to =3D to; > } > + KQ_LOCK(kn->kn_kq); > callout_reset_sbt_on(&kc->c, kc->next, 0, filt_timerexpire, kn, > PCPU_GET(cpuid), C_ABSOLUTE); > + KQ_UNLOCK(kn->kn_kq); > } > > static void > @@ -826,7 +832,6 @@ filt_timertouch(struct knote *kn, struct kevent *kev, > u_long type) > KQ_LOCK(kq); > if (kn->kn_status & KN_QUEUED) > knote_dequeue(kn); > - > kn->kn_status &=3D ~KN_ACTIVE; > kn->kn_data =3D 0; > KQ_UNLOCK(kq); > @@ -1843,12 +1848,13 @@ kqueue_scan(struct kqueue *kq, int maxevents, > struct kevent_copyops *k_ops, > } > > TAILQ_INSERT_TAIL(&kq->kq_head, marker, kn_tqe); > + marker->kn_status |=3D KN_QUEUED; > influx =3D 0; > while (count) { > KQ_OWNED(kq); > kn =3D TAILQ_FIRST(&kq->kq_head); > > - if ((kn->kn_status =3D=3D KN_MARKER && kn !=3D marker) || > + if (((kn->kn_status & KN_MARKER) !=3D 0 && kn !=3D marker= ) || > kn_in_flux(kn)) { > if (influx) { > influx =3D 0; > @@ -1861,24 +1867,21 @@ kqueue_scan(struct kqueue *kq, int maxevents, > struct kevent_copyops *k_ops, > } > > TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); > - if ((kn->kn_status & KN_DISABLED) =3D=3D KN_DISABLED) { > - kn->kn_status &=3D ~KN_QUEUED; > - kq->kq_count--; > + kn->kn_status &=3D ~KN_QUEUED; > + if ((kn->kn_status & KN_DISABLED) =3D=3D KN_DISABLED) > continue; > - } > if (kn =3D=3D marker) { > KQ_FLUX_WAKEUP(kq); > if (count =3D=3D maxevents) > goto retry; > goto done; > } > + kq->kq_count--; > KASSERT(!kn_in_flux(kn), > ("knote %p is unexpectedly in flux", kn)); > > if ((kn->kn_flags & EV_DROP) =3D=3D EV_DROP) { > - kn->kn_status &=3D ~KN_QUEUED; > kn_enter_flux(kn); > - kq->kq_count--; > KQ_UNLOCK(kq); > /* > * We don't need to lock the list since we've > @@ -1888,9 +1891,7 @@ kqueue_scan(struct kqueue *kq, int maxevents, struc= t > kevent_copyops *k_ops, > KQ_LOCK(kq); > continue; > } else if ((kn->kn_flags & EV_ONESHOT) =3D=3D EV_ONESHOT)= { > - kn->kn_status &=3D ~KN_QUEUED; > kn_enter_flux(kn); > - kq->kq_count--; > KQ_UNLOCK(kq); > /* > * We don't need to lock the list since we've > @@ -1910,10 +1911,9 @@ kqueue_scan(struct kqueue *kq, int maxevents, > struct kevent_copyops *k_ops, > if (kn->kn_fop->f_event(kn, 0) =3D=3D 0) { > KQ_LOCK(kq); > KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal)= ; > - kn->kn_status &=3D ~(KN_QUEUED | KN_ACTIV= E | > - KN_SCAN); > + /* XXXMJ races with EVFILT_TIMER > activation. */ > + kn->kn_status &=3D ~(KN_ACTIVE | KN_SCAN)= ; > kn_leave_flux(kn); > - kq->kq_count--; > kn_list_unlock(knl); > influx =3D 1; > continue; > @@ -1937,11 +1937,10 @@ kqueue_scan(struct kqueue *kq, int maxevents, > struct kevent_copyops *k_ops, > } > if (kn->kn_flags & EV_DISPATCH) > kn->kn_status |=3D KN_DISABLED; > - kn->kn_status &=3D ~(KN_QUEUED | KN_ACTIV= E); > - kq->kq_count--; > - } else > - TAILQ_INSERT_TAIL(&kq->kq_head, kn, > kn_tqe); > - > + kn->kn_status &=3D ~KN_ACTIVE; > + } else if ((kn->kn_status & KN_QUEUED) =3D=3D 0) > + knote_enqueue(kn); > + > kn->kn_status &=3D ~KN_SCAN; > kn_leave_flux(kn); > kn_list_unlock(knl); >