From owner-freebsd-current@freebsd.org Wed Dec 11 11:05:36 2019 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1DF541D3C90 for ; Wed, 11 Dec 2019 11:05:36 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47XvH363Gzz3ycC; Wed, 11 Dec 2019 11:05:35 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id xBBB5LER067827 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 11 Dec 2019 13:05:24 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua xBBB5LER067827 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id xBBB5L3X067826; Wed, 11 Dec 2019 13:05:21 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 11 Dec 2019 13:05:21 +0200 From: Konstantin Belousov To: Andriy Gapon Cc: FreeBSD Current Subject: Re: any scheduler/ipi/wakeup bug fixed in the last year? Message-ID: <20191211110521.GT2744@kib.kiev.ua> References: <7196727c-77bd-8c96-5194-3dcc7277d4bd@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7196727c-77bd-8c96-5194-3dcc7277d4bd@FreeBSD.org> User-Agent: Mutt/1.12.2 (2019-09-21) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-Rspamd-Queue-Id: 47XvH363Gzz3ycC X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-6.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Dec 2019 11:05:36 -0000 On Wed, Dec 11, 2019 at 12:48:36PM +0200, Andriy Gapon wrote: > > I am investigating a problem that originally looked like a ZFS I/O hang. > But it quickly became obvious that the GEOM "up" queue was not being processed. > (kgdb) p g_bio_run_up > $54 = {bio_queue = {tqh_first = 0xfffff801d8627178, tqh_last = > 0xfffff80134751658}, bio_queue_lock = {lock_object = {lo_name = > 0xffffffff80ad11ab "bio queue", lo_flags = 16973824, lo_data = 0, lo_witness = > 0x0}, mtx_lock = 0}, bio_queue_length = 19} > > The queue is unlocked and there are 19 bio-s on it. > At the same time: > (kgdb) tid 100125 > (kgdb) bt > #0 sched_switch (td=0xfffff80111b23000, newtd=0xfffff801119d2000, > flags=) at /usr/src/sys/kern/sched_ule.c:1997 > #1 0xffffffff80705405 in mi_switch (flags=, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:436 > #2 0xffffffff8074844a in sleepq_wait (wchan=, pri=) > at /usr/src/sys/kern/subr_sleepqueue.c:694 > #3 0xffffffff80704ed6 in _sleep (ident=0xffffffff81233d68 , > lock=0xffffffff810d72e0 , priority=, > wmesg=0xffffffff80b417e4 "-", sbt=0, pr=0, flags=256) at > /usr/src/sys/kern/kern_synch.c:216 > #4 0xffffffff8067713c in g_io_schedule_up (tp=) at > /usr/src/sys/geom/geom_io.c:908 > #5 0xffffffff8067772d in g_up_procbody (arg=) at > /usr/src/sys/geom/geom_kern.c:99 > #6 0xffffffff806c64c1 in fork_exit (callout=0xffffffff806776c0 , > arg=0x0, frame=0xfffffe014cc87ac0) at /usr/src/sys/kern/kern_fork.c:1042 > > The "g_up" thread is sleeping as if the queue was empty. > The code in g_io_schedule_up() and g_io_deliver() is obviously correct with > respect to synchronizing the queue access and wait/wakeup. > So, there must be something deeper. > > I examined the struct thread and the related scheduling objects: > (kgdb) p *td > $57 = {td_lock = 0xffffffff810f3a00 , td_proc = > 0xfffff801119cd590, td_plist = {tqe_next = 0xfffff80111b1f5e0, tqe_prev = > 0xfffff80111b235f0}, td_runq = {tqe_next = 0x0, > tqe_prev = 0xffffffff810f3bd8 }, td_slpq = {tqe_next = 0x0, > tqe_prev = 0xfffff80100050280}, td_lockq = {tqe_next = 0x0, tqe_prev = > 0xfffffe018e443998}, td_hash = {le_next = 0x0, le_prev = 0xfffffe014bab68e8}, > td_cpuset = 0xfffff80111b3a618, td_domain = {dr_policy = 0xffffffff810d78d8 > , dr_iterator = 0}, td_sel = 0x0, td_sleepqueue = > 0xfffff80100050280, td_turnstile = 0xfffff801a7ed8a80, td_rlqe = 0x0, > td_umtxq = 0xfffff80111b13e80, td_tid = 100125, td_sigqueue = {sq_signals = > {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_ptrace = {__bits > = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, > tqh_last = 0xfffff80111b230d8}, sq_proc = 0xfffff801119cd590, sq_flags = > 1}, td_lend_user_pri = 255 '\377', td_flags = 4, td_inhibitors = 0, td_pflags = > 2097152, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, > td_wmesg = 0x0, td_owepreempt = 0 '\000', td_tsqueue = 0 '\000', td_locks = 0, > td_rw_rlocks = 0, td_sx_slocks = 0, td_lk_slocks = 0, td_stopsched = 0, > td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, > td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_ucred = > 0xfffff80100082b00, td_limit = 0xfffff80100082a00, td_slptick = 0, td_blktick = > 0, td_swvoltick = -2139537593, td_swinvoltick = -2139537706, td_cow = 0, > td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec > = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 0, > ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, > ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 14113408, > ru_nivcsw = 240828}, td_rux = {rux_runtime = 202213463115, rux_uticks = 0, > rux_sticks = 10554, rux_iticks = 0, rux_uu = 0, rux_su = 36818497, > rux_tu = 36818497}, td_incruntime = 46828278, td_runtime = 202260266673, > td_pticks = 10557, td_sticks = 3, td_iticks = 0, td_uticks = 0, td_intrval = 0, > td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_generation = 14354236, > td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 0}, td_xsig = 0, > td_profil_addr = 0, td_profil_ticks = 0, td_name = "g_up", '\000' times>, td_fpop = 0x0, td_dbgflags = 0, td_si = {si_signo = 0, si_errno = 0, > si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value > = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = > {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, > _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0, > __spare2__ = {0, 0, 0, 0, 0, 0, 0}}}}, td_ng_outbound = 0, td_osd = {osd_nslots > = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, > td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv = 0, td_no_sleeping = > 0, td_su = 0x0, td_sleeptimo = 0, td_rtcgen = 0, td_sigmask = {__bits = {0, 0, > 0, 0}}, td_rqindex = 23 '\027', td_base_pri = 92 '\\', > td_priority = 92 '\\', td_pri_class = 3 '\003', td_user_pri = 120 'x', > td_base_user_pri = 120 'x', td_rb_list = 0, td_rbp_list = 0, td_rb_inact = 0, > td_sa = {code = 0, callp = 0x0, args = {0 }, narg = 0}, > td_pcb = 0xfffffe014cc87b80, td_state = TDS_RUNQ, td_uretoff = {tdu_retval = > {0, 0}, tdu_off = 0}, td_cowgen = 0, td_slpcallout = {c_links = {le = {le_next = > 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, > tqe_prev = 0x0}}, c_time = 0, c_precision = 0, c_arg = 0x0, c_func = > 0x0, c_lock = 0x0, c_flags = 0, c_iflags = 16, c_cpu = 0, c_exec_time = 0, > c_lines = {u128 = 1481, u16 = {1481, 0, 0, 0, 0, 0, 0, 0}}}, > td_frame = 0xfffffe014cc87ac0, td_kstack_obj = 0xfffff80111b31e00, td_kstack = > 18446741880269406208, td_kstack_pages = 16, td_critnest = 1, td_md = > {md_spinlock_count = 1, md_saved_flags = 582, md_spurflt_addr = 0, > md_invl_gen = {gen = 0, link = {le_next = 0x0, le_prev = 0x0}}}, td_ar = > 0x0, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = > 0xfffff80111aef700, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, > td_intr_frame = 0x0, td_rfppwait_p = 0x0, td_ma = 0x0, td_ma_cnt = 0, > td_emuldata = 0x0, td_lastcpu = 3, td_oncpu = -1, td_lkpi_task = 0x0} > > (kgdb) p td->td_oncpu > $58 = -1 > > (kgdb) p *td->td_sleepqueue > $61 = {sq_blocked = {{tqh_first = 0x0, tqh_last = 0xfffff80100050280}, > {tqh_first = 0x0, tqh_last = 0xfffff80100050290}}, sq_blockedcnt = {0, 0}, > sq_hash = {le_next = 0xfffff807d069c380, > le_prev = 0xffffffff811c7c40 }, sq_free = {lh_first = > 0x0}, sq_wchan = 0xffffffff81233d68 , sq_type = 0} > > (kgdb) p *(struct td_sched *)&td[1] > $68 = {ts_runq = 0xffffffff810f3a60 , ts_flags = 2, ts_cpu = 3, > ts_rltick = -2139537593, ts_slice = 0, ts_slptime = 4601740, ts_runtime = 30146, > ts_ltick = -2139537587, ts_ftick = -2139547918, ts_ticks = 20239, > ts_name = "g_up tid 100125", '\000' } > > (kgdb) p tdq_cpu[3] > $70 = {tdq_lock = {lock_object = {lo_name = 0xffffffff810f4678 > "sched lock 3", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = > 0}, tdq_cg = 0xffffffff811ce8b0 , tdq_load = 1, > tdq_cpu_idle = 1, tdq_sysload = 1, tdq_transferable = 1, tdq_switchcnt = 1, > tdq_oldswitchcnt = 26, tdq_lowpri = 92 '\\', tdq_ipipending = 0 '\000', tdq_idx > = 9 '\t', tdq_ridx = 9 '\t', tdq_realtime = { ... > > What I see here: > - td_state = TDS_RUNQ -- so the thread is "kind of" woken up > - td_oncpu = -1 -- but not quite > - ts_cpu = 3 -- looks like it was placed on CPU3's run queue > - tdq_cpu_idle = 1 -- the CPU still "runs" its idle thread > > So, if I am not confused, it appears like possibly a notification from a waking > CPU to the woken CPU (CPU3) was never delivered. > Potentially, a problem with cpu_idle_wakeup() ? > > I wonder if there have been any bug fixes in that area over the past year or so. > Any help and pointers are welcome. What is the value of tdq_ipipending ? See https://reviews.freebsd.org/D22758