From owner-freebsd-stable@FreeBSD.ORG Wed Aug 17 13:56:12 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1842A1065676; Wed, 17 Aug 2011 13:56:12 +0000 (UTC) (envelope-from prvs=1210f20b9f=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 51A868FC20; Wed, 17 Aug 2011 13:56:10 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 14:55:32 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 14:55:32 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014634579.msg; Wed, 17 Aug 2011 14:55:31 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1210f20b9f=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> <4E48D967.9060804@FreeBSD.org> <9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk> <4E490DAF.1080009@FreeBSD.org> <796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk> <4E491D01.1090902@FreeBSD.org> <570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk> <4E4AD35C.7020504@FreeBSD.org> <6A7238AED44542A880B082A40304D940@multiplay.co.uk> <4E4BA21F.6010805@FreeBSD.org> <581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk> <4E4BBA7F.30907@FreeBSD.org> Date: Wed, 17 Aug 2011 14:56:10 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 13:56:12 -0000 ----- Original Message ----- From: "Andriy Gapon" To: "Steven Hartland" Cc: Sent: Wednesday, August 17, 2011 1:56 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE > on 17/08/2011 15:15 Steven Hartland said the following: >>> define allpcpu >>> set $i = 0 >>> while ($i <= mp_maxid) >>> p *cpuid_to_pcpu[$i] >>> set $i = $i + 1 >>> end >>> end >>> allpcpu >> >> Here's the output. > [snip] >> $3 = {pc_curthread = 0xffffff06b7f9c000, pc_idlethread = 0xffffff0012d85460, >> pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xffffff8d8f35ad00, >> pc_switchtime = 564139963042291, pc_switchticks = 247796550, pc_cpuid = 2, >> pc_cpumask = 4, pc_other_cpus = 16777211, pc_allcpu = {sle_next = >> 0xffffffff808af680}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1005391948, v_trap = >> 95927887, v_syscall = 2033274537, v_intr = 137253, v_soft = 151981308, >> v_vm_faults = 14199910, v_cow_faults = 1468132, v_cow_optim = 533, v_zfod = >> 11032593, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout >> = 0, v_vnodein = 17238, v_vnodeout = 48, v_vnodepgsin = 17238, >> v_vnodepgsout = 378, v_intrans = 6753, v_reactivated = 0, v_pdwakeups = 0, >> v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 15435380, >> v_page_size = 0, v_page_count = 0, v_free_reserved = 0, >> v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, >> v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count = >> 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0, >> v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 24041, v_vforks = 16857, >> v_rforks = 0, v_kthreads = 0, v_forkpages = 6281292, v_vforkpages = 3606842, >> v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8629094, >> 693, 594838, 24425, 23707811}, pc_device = 0xffffff0012da2500, pc_netisr = 0x0, >> pc_rm_queue = {rmq_next = 0xffffffff808afa50, rmq_prev = 0xffffffff808afa50}, >> pc_dynamic = 18446743526093326592, >> pc_monitorbuf = '\0' , pc_prvspace = 0xffffffff808af900, >> pc_curpmap = 0xffffffff8083ea50, pc_tssp = 0xffffffff808ae7d0, pc_commontssp = >> 0xffffffff808ae7d0, pc_rsp0 = -491518579456, >> pc_scratch_rsp = 140737488347240, pc_apic_id = 2, pc_acpi_id = 2, pc_fs32p = >> 0xffffffff808ad600, pc_gs32p = 0xffffffff808ad608, pc_ldt = 0xffffffff808ad648, >> pc_tss = 0xffffffff808ad638, pc_cmci_mask = 8} > [snip] > > Thank you. > A few more questions: > 1. more kgdb info for the core: > p *(cpuid_to_pcpu[2]->pc_curthread) > p *(cpuid_to_pcpu[2]->pc_curthread->td_proc) > p *(cpuid_to_pcpu[2]->pc_curthread->td_proc->p_limit) > (kgdb) p *(cpuid_to_pcpu[2]->pc_curthread) $1 = {td_lock = 0xffffffff8084a440, td_proc = 0xffffff070b5a48c0, td_plist = {tqe_next = 0x0, tqe_prev = 0xffffff070b5a48d0}, td_runq = {tqe_next = 0x0, tqe_prev = 0xffffffff8084a688}, td_slpq = {tqe_next = 0x0, tqe_prev = 0xffffff0296460900}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xffffff8d8fb5c8b0}, td_cpuset = 0xffffff0012d65dc8, td_sel = 0xffffff0a1b76c700, td_sleepqueue = 0xffffff0296460900, td_turnstile = 0xffffff05f31d8000, td_umtxq = 0xffffff05513d9780, td_tid = 102057, td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xffffff06b7f9c0a0}, sq_proc = 0xffffff070b5a48c0, sq_flags = 1}, td_flags = 6, td_inhibitors = 0, td_pflags = 0, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 2 '\002', td_oncpu = 2 '\002', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 998, td_rw_rlocks = 0, td_lk_slocks = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 1, td_ucred = 0xffffff0551cf9900, td_estcpu = 0, td_slptick = 0, td_blktick = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 2068, ru_ixrss = 5280, ru_idrss = 19296, ru_isrss = 6144, ru_minflt = 5015, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 241, ru_msgrcv = 2076, ru_nsignals = 1, ru_nvcsw = 2264, ru_nivcsw = 159}, td_incruntime = 4257692, td_runtime = 487523210, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 4, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {16384, 0, 0, 0}}, td_generation = 2423, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_name = "httpd", '\0' , td_fpop = 0x0, td_dbgflags = 0, td_dbgksi = { ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0}}}}, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_rqindex = 32 ' ', td_base_pri = 128 '\200', td_priority = 128 '\200', td_pri_class = 3 '\003', td_user_pri = 128 '\200', td_base_user_pri = 128 '\200', td_pcb = 0xffffff8d8f35ad00, td_state = TDS_RUNNING, td_retval = {0, 8}, td_slpcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xffffff800088ce00}}, c_time = 247622368, c_arg = 0xffffff06b7f9c000, c_func = 0xffffffff803c4bd0 , c_lock = 0x0, c_flags = 16, c_cpu = 13}, td_frame = 0xffffff8d8f35ac40, td_kstack_obj = 0xffffff0a51ee5e58, td_kstack = 18446743582190956544, td_kstack_pages = 4, td_unused1 = 0x0, td_unused2 = 0, td_unused3 = 0, td_critnest = 0, td_md = {md_spinlock_count = 0, md_saved_flags = 70}, td_sched = 0xffffff06b7f9c428, td_ar = 0x0, td_syscalls = 129862, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0x0, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_rux = {rux_runtime = 483265518, rux_uticks = 7, rux_sticks = 17, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, td_map_def_user = 0x0} (kgdb) p *(cpuid_to_pcpu[2]->pc_curthread->td_proc) $2 = {p_list = {le_next = 0xffffff0653ff78c0, le_prev = 0xffffffff80841b48}, p_threads = {tqh_first = 0xffffff06b7f9c000, tqh_last = 0xffffff06b7f9c010}, p_slock = {lock_object = {lo_name = 0xffffffff806323c0 "process slock", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xffffff0551cf9900, p_fd = 0x0, p_fdtol = 0x0, p_stats = 0xffffff04ea565600, p_limit = 0x0, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = { tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0xffffff070b5a49b8, c_flags = 0, c_cpu = 0}, p_sigacts = 0xffffff0a663a1000, p_flag = 268443904, p_state = PRS_NORMAL, p_pid = 78097, p_hash = {le_next = 0x0, le_prev = 0xffffff800021c888}, p_pglist = {le_next = 0xffffff00285c5460, le_prev = 0xffffff0afa9b8988}, p_pptr = 0xffffff0afa9b88c0, p_sibling = {le_next = 0xffffff00285c5460, le_prev = 0xffffff0afa9b89b0}, p_children = {lh_first = 0x0}, p_mtx = {lock_object = {lo_name = 0xffffffff806323b3 "process lock", lo_flags = 21168128, lo_data = 10, lo_witness = 0x0}, mtx_lock = 18446743003054325761}, p_ksi = 0xffffff0016738bd0, p_sigqueue = {sq_signals = {__bits = {16384, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0xffffff033829d070, tqh_last = 0xffffff033829d070}, sq_proc = 0xffffff070b5a48c0, sq_flags = 1}, p_oppid = 0, p_vmspace = 0xffffffff8083e920, p_swtick = 89392056, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 0, ru_nivcsw = 0}, p_rux = {rux_runtime = 483265518, rux_uticks = 7, rux_sticks = 17, rux_iticks = 0, rux_uu = 61934, rux_su = 150412, rux_tu = 212347}, p_crux = {rux_runtime = 80058539464, rux_uticks = 2914, rux_sticks = 1778, rux_iticks = 0, rux_uu = 21847439, rux_su = 13330387, rux_tu = 35177827}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0x0, p_lock = 11, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0xffffff06b7f9c000, p_boundary_count = 0, p_pendingcnt = 1, p_itimers = 0x0, p_magic = 3203398350, p_osrel = 802000, p_comm = "httpd", '\0' , p_pgrp = 0xffffff05f3928080, p_sysent = 0xffffffff807fe180, p_args = 0xffffff0a8ad5e600, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_xstat = 0, p_klist = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff803586e0 , kl_unlock = 0xffffffff803586b0 , kl_assert_locked = 0xffffffff80355380 , kl_assert_unlocked = 0xffffffff80355390 , kl_lockarg = 0xffffff070b5a49b8}, p_numthreads = 1, p_md = {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 1, p_peers = 0x0, p_leader = 0xffffff070b5a48c0, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xffffff070b5a4d20, p_ktr = {stqh_first = 0x0, stqh_last = 0xffffff070b5a4cf0}, p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0x0, p_pwait = {cv_description = 0xffffffff80632b87 "ppwait", cv_waiters = 0}} (kgdb) p *(cpuid_to_pcpu[2]->pc_curthread->td_proc->p_limit) Cannot access memory at address 0x0 > 2. do you have any additional patches in your source tree besides those debugging > patches that I provided to you? Yes, in this build we have:- 1. tcp_reass.c-logdebug+missingsegment-20110811-lstewart.patch (fixes tcp stalling) http://people.freebsd.org/~lstewart/patches/misctcp/tcp_reass.c-logdebug%2bmissingsegment-20110811-lstewart.diff 2. libz.patch (disables assembly optimisations in libz as it causes application crashes) 3. udp6_usrreq.c.patch (fixes ipv4 on ipv6 sockets) http://svnweb.freebsd.org/base/head/sys/netinet6/udp6_usrreq.c?r1=220463&r2=220462&pathrev=220463 4. cam-timeout-fix.patch (fixes overflow in cam timeouts) http://codelabs.ru/fbsd/patches/cam/CAM-properly-convert-timeout-to-ticks.diff 5. ixgbe.c.patch & ixgbe.h.patch (fixes ipconfig disconnecting link) 6. stop_scheduler_on_panic.8.x.patch (your first patch) 7. panic-info.patch (your second patch) The only patches of these present when we initially noticed the problem where #2, #3 & #5 (but these machines are not using this driver) > 3. do you have any thirdparty/out-of-tree kernel modules? Nope, our kernel is compiled with a load of drivers disabled and then the following:- device ahci makeoptions MODULES_OVERRIDE="linux linprocfs acpi nullfs unionfs accf_http if_lagg opensolaris zfs ipmi i2c" options COMPAT_LINUX32 options DEVICE_POLLING N.B. although device polling is compiled in its not used on any of these machines. > 4. could you please send me your kernel config? See direct email, as not sure it will go to the list. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.