From nobody Tue Nov 9 09:58:30 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7FD53185060D for ; Tue, 9 Nov 2021 09:58:33 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HpNj53FwQz3HTM; Tue, 9 Nov 2021 09:58:33 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from [192.168.0.88] (unknown [195.64.148.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: avg/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id DFF4B23AF9; Tue, 9 Nov 2021 09:58:32 +0000 (UTC) (envelope-from avg@FreeBSD.org) From: Andriy Gapon To: Konstantin Belousov , FreeBSD Current References: <20191022104434.GM73312@kib.kiev.ua> <3a67f9a9-31cf-5814-4a68-8bdd6063b21e@FreeBSD.org> <20191022131633.GN73312@kib.kiev.ua> <9c131a2a-cc94-4d93-1ba8-595c0151e366@FreeBSD.org> Subject: Re: thread on sleepqueue does not wake up after timeout Message-ID: <32fe4f76-155b-8b99-5782-4daafd4219d5@FreeBSD.org> Date: Tue, 9 Nov 2021 11:58:30 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:78.0) Gecko/20100101 Firefox/78.0 Thunderbird/78.14.0 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 In-Reply-To: <9c131a2a-cc94-4d93-1ba8-595c0151e366@FreeBSD.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-ThisMailContainsUnwantedMimeParts: N Two years have passed since my original report, some things have changed, but we are still seeing the problem. To recap, the problem is that sometimes a callout for a sleepq timeout would fire to early. The code in sleepq_timeout would ignore such a wake-up. And, as there would not be another callout, there will never be a wake-up. This has been observed on VMWare only. We have seen it on different versions of ESXi (different environments, upgrades, etc) starting with 5.0, through 5.5 and 6.0, and now on 6.5. We use 12.2 now and are still seeing the problem occasionally. On advice from Kostik I added some diagnostic code. The code checks for the too early condition and panics when it happens. It also provides some additional information. Here is an example of such a panic. Unread portion of the kernel message buffer: panic: too early sleepq timeout, 190543869738008 > 190543869603412 (190543869798505) cpuid = 1 time = 1636195325 (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=153080096) at /usr/src/sys/kern/kern_shutdown.c:447 #2 0xffffffff8038f1cc in db_fncall_generic (addr=, nargs=0, args=, rv=) at /usr/src/sys/ddb/db_command.c:622 #3 db_fncall (dummy1=, dummy2=, dummy3=, dummy4=) at /usr/src/sys/ddb/db_command.c:670 #4 0xffffffff8038eaf6 in db_command (last_cmdp=, cmd_table=, dopager=0) at /usr/src/sys/ddb/db_command.c:494 #5 0xffffffff80393a68 in db_script_exec (scriptname=, warnifnotfound=) at /usr/src/sys/ddb/db_script.c:304 #6 0xffffffff80393892 in db_script_kdbenter (eventname=) at /usr/src/sys/ddb/db_script.c:326 #7 0xffffffff80391ac3 in db_trap (type=, code=) at /usr/src/sys/ddb/db_main.c:251 #8 0xffffffff807821e2 in kdb_trap (type=3, code=0, tf=0xfffffe01091fd410) at /usr/src/sys/kern/subr_kdb.c:700 #9 0xffffffff809d870e in trap (frame=0xfffffe01091fd410) at /usr/src/sys/amd64/amd64/trap.c:583 #10 #11 kdb_enter (why=0xffffffff80b17af9 "panic", msg=0xffffffff80b17af9 "panic") at /usr/src/sys/kern/subr_kdb.c:486 #12 0xffffffff8073ddce in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:975 #13 0xffffffff8073dc23 in panic (fmt=0xffffffff81178120 "") at /usr/src/sys/kern/kern_shutdown.c:909 #14 0xffffffff8078f09c in sleepq_timeout (arg=0xfffff8005b6a1000) at /usr/src/sys/kern/subr_sleepqueue.c:1042 #15 0xffffffff8075594a in softclock_call_cc (c=0xfffff8005b6a1470, cc=0xffffffff811a0580 , now=190543869798505, direct=1) at /usr/src/sys/kern/kern_timeout.c:768 #16 0xffffffff8075567c in callout_process (now=190543869798505) at /usr/src/sys/kern/kern_timeout.c:511 #17 0xffffffff80a3d455 in handleevents (now=190543869798505, fake=0) at /usr/src/sys/kern/kern_clocksource.c:213 #18 0xffffffff80a3da46 in timercb (et=0xffffffff81251238 , arg=) at /usr/src/sys/kern/kern_clocksource.c:357 #19 0xffffffff80a680cb in lapic_handle_timer (frame=0xfffffe01091fd7d0) at /usr/src/sys/x86/x86/local_apic.c:1339 #20 #21 0xffffffff81d90654 in buf_hash_find ... (kgdb) fr 18 #18 0xffffffff80a3da46 in timercb (et=0xffffffff81251238 , arg=) at /usr/src/sys/kern/kern_clocksource.c:357 357 /usr/src/sys/kern/kern_clocksource.c: No such file or directory. (kgdb) i loc state = next = 0xfffffe00015b7b80 now = 190543869798505 bcast = cpu = (kgdb) fr 15 #15 0xffffffff8075594a in softclock_call_cc (c=0xfffff8005b6a1470, cc=0xffffffff811a0580 , now=190543869798505, direct=1) at /usr/src/sys/kern/kern_timeout.c:768 768 /usr/src/sys/kern/kern_timeout.c: No such file or directory. (kgdb) p *c $1 = {c_links = {le = {le_next = 0x0, le_prev = 0xfffffe000029f398}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xfffffe000029f398}}, c_time = 190543869738008, c_precision = 2684354, c_arg = 0xfffff8005b6a1000, c_func = 0xffffffff8078f000 , c_lock = 0x0, c_flags = 2, c_iflags = 272, c_cpu = 1, c_exec_time = 190543869798505, c_lines = {u128 = 5514281696343066435261632100963648521, u16 = {1033, 755, 1225, 628, 409, 651, 788, 1062}}} Here is an explanation for the numbers reproted in the panic message (sorted from earliest to latest): 190543869603412 - 'now' as seen in sleepq_timeout(), returned by sbinuptime(); 190543869738008 - td_sleeptimo, also c_time in the callout; 190543869798505 - 'now' as captured when the LAPIC timer fired, seen in the stack trace and also recorded as c_exec_time in the callout. Just in case, here is the code that printed the panic message: panic("too early sleepq timeout, %jd > %jd (%jd)", (intmax_t)td->td_sleeptimo, (intmax_t)sbinuptime(), (intmax_t)td->td_slpcallout.c_exec_time); As can be seen, sbinuptime() in sleepq_timeout() returned a value smaller than what sbinuptime() returned in timercb(). It seems that the code in callout_process() was right to fire the callout because now > td_sleeptimo there. But sleepq_timeout() thought that it was premature as sbinuptime() < td_sleeptimo there. In the above case the eventtimer is obviously LAPIC, the timecounter is HPET. But we also saw the same issue when we changed the timecounter to ACPI-fast. I think that we haven't tried any other choices because: kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.HPET.quality: 950 kern.timecounter.tc.TSC-low.quality: -100 I see three possibilities of why sbinuptime() could go backwards: - a bug in FreeBSD time keeping code that gets triggered only under very specific conditions that don't normally happen - a bug in VMWare, e.g. in HPET and ACPI timer emulation - something exotic, like correct C code miscompiled to incorrect machine code or weird memory model violation or etc... I'd appreciate any suggestions on additional diagnostics to narrow down / rule out the possibilities. P.S. As a workaround I could modify sleepq_timeout() to get "current time" from c_exec_time (added by us) instead of sbinuptime(). c_exec_time is the value of 'now' in callout_process() when it decides that the callout should fire. But I'd like to get to the bottom of the issue. -- Andriy Gapon