From owner-freebsd-current@freebsd.org Sun Aug 16 15:28:46 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id F09953BF881 for ; Sun, 16 Aug 2020 15:28:46 +0000 (UTC) (envelope-from a13xlevy@gmail.com) Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4BV1Kn6lwSz4V4j for ; Sun, 16 Aug 2020 15:28:45 +0000 (UTC) (envelope-from a13xlevy@gmail.com) Received: by mail-wr1-x444.google.com with SMTP id c15so12457359wrs.11 for ; Sun, 16 Aug 2020 08:28:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BdjwoCBCtTITHKlx4B7CwgvKpbtuotyZo2Xy5q8LwHs=; b=i4cOVjN2l1PIX5nKl/E++2BqnNjqrJdkx6iZrW5TSFfIDMsln01DxSS+pb2mtevwuF KRNsY/h0YsfkIlo90/Gb3qAW4q4t5VGALtKQaYIpisZKdfg6t/XpiNR65N42PYBm6voX ww1BMOc6rNzp43ryHFbP6dW+dP+0SWKnKnM/JTjIQ7xEqvcyUMRTJLDL33shHPUL6B2d j4cUVtgeQtmmFm9lXPSvj2OBq4PI5pudqQGFlr37MzcKPTM6UdPFW0VpQ0U6iKj7Zm1D 7hLU4nbaUPPP47QHup6z3W0mUk4Gof2/sf3ckal+wvm60J4oRzvU5K6g7R9UUjrPZFi0 cCEg== X-Gm-Message-State: AOAM532E1pvrCglP7JgnycL/sXzmaZS8F5pDsz0/oreKGhKtgO1kMRlA RueIOfngvWaco3zzAqtiFkQOxXblHQ9/Tri2ascxUsNKbBdsgg== X-Google-Smtp-Source: ABdhPJxh+uLKl/d+w9gTQIP7F3oHgHF4aYkslKSOboLGazIg25/8EarY/XAISIbjlhJoT4VzJ9LBDVQSso4ar8vtx8I= X-Received: by 2002:adf:f3cc:: with SMTP id g12mr11089555wrp.412.1597591724044; Sun, 16 Aug 2020 08:28:44 -0700 (PDT) MIME-Version: 1.0 References: <13793020-1bde-b13f-65e3-909e27d876ad@selasky.org> <4e9d9a89-4883-1f1c-c796-e5925fd171cc@selasky.org> <51a2fe4f-5a3e-8d24-19e2-3cdaa8378015@selasky.org> In-Reply-To: From: Alexandre Levy Date: Sun, 16 Aug 2020 16:28:33 +0100 Message-ID: Subject: Re: Kernel crash during video transcoding To: Hans Petter Selasky Cc: freebsd-current@freebsd.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1597591726; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BdjwoCBCtTITHKlx4B7CwgvKpbtuotyZo2Xy5q8LwHs=; b=C4GKVyIte0QE01UCRgdco4QKsSqrJt0Tv6Sq7BoGZRc9gENrDI2qDnmPpzdkj7Fm7rEfdj Q6VG87Yiesg4P4ymqBQAdrXwXtv5gzwEjifxhkl3HVcfcPC/knqxPF382EFwOqTEQk3jcf PGPu7r2WwoTkE3bJ2KtWV4Xmp/KWwxwtfFaro1N/Pcjg7+qbquCecxm5HhonT6WzKSK9uI BpAq7VVgDw2pT1sHFEgTAek5b16MYUm/HXXpM2hoXWyEH8Ef7hNksfzA0TFU5e2vf82Jvz euc0xuyOTy4PmBC/Tbe1pVWzRf40LweurEjsn/1PH8R997nQ8evDlNMI03m8CA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1597591726; a=rsa-sha256; cv=none; b=wVn+IdAMrl8W2QSt7RiXlQRmPa3KWA7QTgegylNCW42DcvMUozhI3mjxgJARodR+LVkJk/ BoG4YjM6bb/3ezBlNY8OyfEX+7o0vkqR9ac6rQ3Qz88/gXUt9++s2GKigE+0HU1SRtXgDi 1iLlSfXU1FiY6CUcBgzXeRUBzAsI90gkCPlZYRuZFCEVEAgridWhx0dCO1AwpxQsuQJcue zk9/OGqPBQNUGp+0vHkc2+tlWg12SkECe9RGaTrt97oPzzxETS85dOvVcW9uGZEEhV35st RQAglMOkJT7l3m2IIqGlMaX/N+HHzITGma2rm7L4Rhfmd9jR9w+2Fwhe9RlyOA== ARC-Authentication-Results: i=1; mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=T/YcAG33; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of a13xlevy@gmail.com designates 2a00:1450:4864:20::444 as permitted sender) smtp.mailfrom=a13xlevy@gmail.com X-Rspamd-Queue-Id: 4BV1Kn6lwSz4V4j X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.12 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.05)[-1.051]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; ARC_SIGNED(0.00)[i=1]; NEURAL_HAM_LONG(-1.02)[-1.020]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::444:from]; NEURAL_HAM_SHORT(-1.05)[-1.046]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Aug 2020 15:28:47 -0000 Hi, I looked at the crash dump and the code more closely: #18 0xffffffff82be1c5f in i915_gem_fault (dummy=3D, vmf=3D) at /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm= /i915/gem/i915_gem_mman.c:367 (kgdb) p area->vm_obj->lock $43 =3D {lock_object =3D {lo_name =3D 0xffffffff8112c767 "vm object", lo_fl= ags =3D 627245056, lo_data =3D 0, lo_witness =3D 0xfffff8045f575800}, rw_lock =3D 18446741878623409920} So vm_obj is not NULL and has a rw_lock member Now at intel_freebsd.c:193 (frame #17) the driver calls vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL). 'm' is the page grabbed from vm_obj of the calling frame. The panic occurs in kern_rwlock.c:270 in frame #15 when calling rw_wowner(rwlock2rw(c)) so something goes wrong either in rw_wowner or in rwlock2rw. Looking at rwlock2rw() : /* * Return the rwlock address when the lock cookie address is provided. * This functionality assumes that struct rwlock* have a member named rw_lock. */ #define rwlock2rw(c) (__containerof(c, struct rwlock, rw_lock)) I think this one is just extracting out the rw_lock member of the passed in struct. However I don't understand what the cookie address is about due to my lack of knowledge on kernel locking concepts. So maybe something is wrong with the cookie or the rw_lock value itself. Looking at rw_wowner() : /* * Return a pointer to the owning thread if the lock is write-locked or * NULL if the lock is unlocked or read-locked. */ #define lv_rw_wowner(v) \ ((v) & RW_LOCK_READ ? NULL : \ (struct thread *)RW_OWNER((v))) #define rw_wowner(rw) lv_rw_wowner(RW_READ_VALUE(rw)) I don't think that one could cause a panic but again I'm not experienced enough to be sure, it seems this either returns the thread that owns the lock or NULL if no thread owns it. The is also the fact that the driver calls vm_page_busy_acquire with the VM_ALLOC_WAITFAIL flag which is defined in vm_page.h as : #define VM_ALLOC_WAITFAIL 0x0010 /* (acf) Sleep and return error */ Could this be the reason of the panic as in we try to lock, then cannot and eventually just return an error without retrying ? There is the flag VM_ALLOC_WAITOK that says /* (acf) Sleep and retry */. Should I try to patch intel_freebsd.c to use this flag instead ? Thanks. Le sam. 15 ao=C3=BBt 2020 =C3=A0 20:35, Alexandre Levy = a =C3=A9crit : > Hi, > > I could finally generate a crash dump even with a black screen, I had to > guess I was in the crash handler and I type "dump" and enter which worked= . > The driver logs "[drm] Cannot find any crtc or sizes" which I guess is th= e > reason why I couldn't see anything on my screen. > > Back to the initial problem, I could start a kgdb session, loaded the > i915kms.ko symbols and here are the results : > > (kgdb) bt > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > #1 doadump (textdump=3D0) at /usr/src/sys/kern/kern_shutdown.c:394 > #2 0xffffffff8049c26a in db_dump (dummy=3D, > dummy2=3D, dummy3=3D, dummy4=3D) a= t > /usr/src/sys/ddb/db_command.c:575 > #3 0xffffffff8049c02c in db_command (last_cmdp=3D, > cmd_table=3D, dopager=3D1) at /usr/src/sys/ddb/db_command.= c:482 > #4 0xffffffff8049bd9d in db_command_loop () at > /usr/src/sys/ddb/db_command.c:535 > #5 0xffffffff8049f048 in db_trap (type=3D, code=3D out>) at /usr/src/sys/ddb/db_main.c:270 > #6 0xffffffff80c1b374 in kdb_trap (type=3D3, code=3D0, tf=3D) at > /usr/src/sys/kern/subr_kdb.c:699 > #7 0xffffffff8100ca98 in trap (frame=3D0xfffffe00d7567300) at > /usr/src/sys/amd64/amd64/trap.c:576 > #8 > #9 kdb_enter (why=3D0xffffffff811d5de0 "panic", msg=3D) a= t > /usr/src/sys/kern/subr_kdb.c:486 > #10 0xffffffff80bd00be in vpanic (fmt=3D, ap=3D) > at /usr/src/sys/kern/kern_shutdown.c:902 > #11 0xffffffff80bcfe53 in panic (fmt=3D0xffffffff81c8c7c8 > "\b\214\031\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:83= 9 > #12 0xffffffff8100cee7 in trap_fatal (frame=3D0xfffffe00d7567600, eva=3D0= ) at > /usr/src/sys/amd64/amd64/trap.c:915 > #13 0xffffffff8100c360 in trap (frame=3D0xfffffe00d7567600) at > /usr/src/sys/amd64/amd64/trap.c:212 > #14 > #15 _rw_wowned (c=3D0x2659c92217d5aa52) at > /usr/src/sys/kern/kern_rwlock.c:270 > #16 0xffffffff80ec23ed in vm_page_busy_acquire (m=3D0xfffffe00040ff9e8, > allocflags=3D16) at /usr/src/sys/vm/vm_page.c:884 > #17 0xffffffff82b4e980 in remap_io_mapping (vma=3D0xfffff80315148300, > addr=3D, pfn=3D, size=3D, > iomap=3D) > at > /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/d= rm/i915/intel_freebsd.c:193 > #18 0xffffffff82be1c5f in i915_gem_fault (dummy=3D, > vmf=3D) > at > /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/d= rm/i915/gem/i915_gem_mman.c:367 > #19 0xffffffff82cb5ddf in linux_cdev_pager_populate > (vm_obj=3D0xfffff80368501420, pidx=3D, fault_type=3D out>, max_prot=3D, > first=3D0xfffffe00d7567868, last=3D0xfffffe00d7567888) at > /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:554 > #20 0xffffffff80ea9e8f in vm_pager_populate (object=3D0x2659c92217d5aa52, > pidx=3D18446741874754451944, fault_type=3D0, max_prot=3D0 '\000', > first=3D, last=3D) > at /usr/src/sys/vm/vm_pager.h:172 > #21 vm_fault_populate (fs=3D) at > /usr/src/sys/vm/vm_fault.c:444 > #22 vm_fault_allocate (fs=3D) at > /usr/src/sys/vm/vm_fault.c:1028 > #23 vm_fault (map=3D, vaddr=3D, > fault_type=3D, fault_flags=3D, m_hold=3D out>) at /usr/src/sys/vm/vm_fault.c:1338 > #24 0xffffffff80ea98ee in vm_fault_trap (map=3D0xfffffe00c0f539e8, > vaddr=3D, fault_type=3D, fault_flags=3D0, > signo=3D0xfffffe00d7567ac4, > ucode=3D0xfffffe00d7567ac0) at /usr/src/sys/vm/vm_fault.c:585 > #25 0xffffffff8100d0de in trap_pfault (frame=3D0xfffffe00d7567b00, > usermode=3D, signo=3D, ucode=3D0xffffffff81= d1de80 > ) > at /usr/src/sys/amd64/amd64/trap.c:817 > #26 0xffffffff8100c72c in trap (frame=3D0xfffffe00d7567b00) at > /usr/src/sys/amd64/amd64/trap.c:340 > #27 > #28 0x000000080296659a in ?? () > Backtrace stopped: Cannot access memory at address 0x7fffffffbf38 > (kgdb) list *0xffffffff82be1c5f > 0xffffffff82be1c5f is in i915_gem_fault > (/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/= drm/i915/gem/i915_gem_mman.c:367). > 362 ret =3D i915_vma_pin_fence(vma); > 363 if (ret) > 364 goto err_unpin; > 365 > 366 /* Finally, remap it using the new GTT offset */ > 367 ret =3D remap_io_mapping(area, > 368 area->vm_start + > (vma->ggtt_view.partial.offset << PAGE_SHIFT), > 369 (ggtt->gmadr.start + > vma->node.start) >> PAGE_SHIFT, > 370 min_t(u64, vma->size, area->vm_end > - area->vm_start), > 371 &ggtt->iomap); > (kgdb) list *0xffffffff82b4e980 > 0xffffffff82b4e980 is in remap_io_mapping > (/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/= drm/i915/intel_freebsd.c:193). > 188 pidx++, pa +=3D PAGE_SIZE) { > 189 retry: > 190 m =3D vm_page_grab(vm_obj, pidx, VM_ALLOC_NOCREAT= ); > 191 if (m =3D=3D NULL) { > 192 m =3D PHYS_TO_VM_PAGE(pa); > 193 if (!vm_page_busy_acquire(m, > VM_ALLOC_WAITFAIL)) > 194 goto retry; > 195 if (vm_page_insert(m, vm_obj, pidx)) { > 196 vm_page_xunbusy(m); > 197 VM_OBJECT_WUNLOCK(vm_obj); > (kgdb) list *0xffffffff80ec23ed > 0xffffffff80ec23ed is in vm_page_busy_acquire > (/usr/src/sys/vm/vm_page.c:884). > 879 if (vm_page_tryacquire(m, allocflags)) > 880 return (true); > 881 if ((allocflags & VM_ALLOC_NOWAIT) !=3D 0) > 882 return (false); > 883 if (obj !=3D NULL) > 884 locked =3D VM_OBJECT_WOWNED(obj); > 885 else > 886 locked =3D false; > 887 MPASS(locked || vm_page_wired(m)); > 888 if (_vm_page_busy_sleep(obj, m, m->pindex, > "vmpba", allocflags, > > It seems like the problem occured when calling vm_page_busy_acquire(m, > VM_ALLOC_WAITFAIL) where m might be a NULL pointer ? I am very new to > kernel debugging so not sure where to go from there. > > Thanks. > > Le lun. 10 ao=C3=BBt 2020 =C3=A0 12:04, Hans Petter Selasky a > =C3=A9crit : > >> Hi, >> >> On 2020-08-10 12:59, Alexandre Levy wrote: >> > Looking at the code, the error happens during the call to >> VM_OBJECT_WLOCK >> > (memory page locking for write ?) in the intel_freebsd.c (see [1] >> below). >> > I'm out for a few days but I'll try to dig more into it when I'm back >> next >> > weekend although I have no experience in the drm-devel-kmod codebase. = In >> > the meantime if you have any suggestions on debugging this further I'm >> > happy to follow them. >> >> The problem is likely that the vm_obj is NULL. >> >> I think I recall that this function is special and can only be called >> from a certain context, unlike in Linux. Will need the full backtrace >> with line numbers in order to debug this. >> >> --HPS >> >