From owner-freebsd-virtualization@FreeBSD.ORG Mon May 9 22:11:01 2011 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E36C106566B; Mon, 9 May 2011 22:11:01 +0000 (UTC) (envelope-from zec@fer.hr) Received: from munja.zvne.fer.hr (munja.zvne.fer.hr [161.53.66.248]) by mx1.freebsd.org (Postfix) with ESMTP id 09E648FC1A; Mon, 9 May 2011 22:11:00 +0000 (UTC) Received: from sluga.fer.hr ([161.53.66.244]) by munja.zvne.fer.hr with Microsoft SMTPSVC(6.0.3790.4675); Tue, 10 May 2011 00:10:58 +0200 Received: from localhost ([161.53.19.8]) by sluga.fer.hr with Microsoft SMTPSVC(6.0.3790.4675); Tue, 10 May 2011 00:10:58 +0200 From: Marko Zec To: freebsd-virtualization@freebsd.org Date: Tue, 10 May 2011 00:10:36 +0200 User-Agent: KMail/1.9.10 References: <86aaewdopy.fsf@kopusha.home.net> <201105091621.16414.zec@fer.hr> <864o53yfc7.fsf@kopusha.home.net> In-Reply-To: <864o53yfc7.fsf@kopusha.home.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201105100010.37726.zec@fer.hr> X-OriginalArrivalTime: 09 May 2011 22:10:58.0314 (UTC) FILETIME=[F9586EA0:01CC0E95] Cc: Mikolaj Golub , Robert Watson , Bjoern Zeeb , Kostik Belousov Subject: Re: vnet: acessing module's virtualized global variables from another module X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 22:11:01 -0000 On Monday 09 May 2011 19:05:28 Mikolaj Golub wrote: > On Mon, 9 May 2011 16:21:15 +0200 Marko Zec wrote: > > MZ> On Monday 09 May 2011 14:48:25 Mikolaj Golub wrote: > >> Hi, > >> > >> Trying ipfw_nat under VIMAGE kernel I got this panic on the module > >> load: > > MZ> Hi, > > MZ> I think the problem here is that curvnet context is not set properly > on entry MZ> to ipfw_nat_modevent(). The canonical way to initialize > VNET-enabled MZ> subsystems is to trigger them using VNET_SYSINIT() macros > (instead of using MZ> modevent mechanisms), which in turn ensure that: > > MZ> a) that the initializer function gets invoked for each existing vnet > MZ> b) curvnet context is set properly on entry to initializer functions > and > > hm, sorry, but I don't see how curvnet context might help here. You're getting a panic in a function, i.e. in ipfw_nat_modevent() which has ipfw_nat_init() inlined into it, where you attempt to access per-vnet data without having curvnet context set. By definition that is not supposed to work on a VIMAGE kernel, so what you observe is not unexpected at all. Please set the curvnet context using VNET_SYSINIT() macros, or by hand using CURVNET_SET() / CURVNET_RESTORE(), before accesing any V_ data. Marko > For me this > does not look like curvnet context problem or my understanding how it works > completely wrong. > > Below is kgdb session on live VIMAGE system with ipfw.ko loaded. > > Let's look at some kernel virtualized variable: > > (kgdb) p vnet_entry_ifnet > $1 = {tqh_first = 0x0, tqh_last = 0x0} > (kgdb) p &vnet_entry_ifnet > $2 = (struct ifnethead *) 0x8102d488 > > As expected the address is in kernel 'set_vnet': > > kopusha:/usr/src/sys% kldstat |grep kernel > 1 69 0x80400000 1092700 kernel > kopusha:/usr/src/sys% nm /boot/kernel/kernel |grep __start_set_vnet > 8102d480 A __start_set_vnet > > default vnet: > > (kgdb) p vnet0 > $3 = (struct vnet *) 0x86d9b000 > > Calculate ifnet location on vnet0 (a la VNET_VNET(vnet0, ifnet)): > > (kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) & > vnet_entry_ifnet 0x86d9c008 > > Access it: > > (kgdb) p *((struct ifnethead *)0x86d9c008) > $4 = {tqh_first = 0x86da5c00, tqh_last = 0x89489c0c} > (kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_dname > $7 = 0x80e8b480 "usbus" > (kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_vnet > $8 = (struct vnet *) 0x86d9b000 > > Everything looks good. Now try the same with virtualized variable > layer3_chain from ipfw module: > > (kgdb) p vnet_entry_layer3_chain > $9 = {rules = 0x0, reap = 0x0, default_rule = 0x0, n_rules = 0, static_len > = 0, map = 0x0, nat = {lh_first = 0x0}, tables = {0x0 }, > rwmtx = {lock_object = { lo_name = 0x0, lo_flags = 0, lo_data = 0, > lo_witness = 0x0}, rw_lock = 0}, uh_lock = { lock_object = {lo_name = 0x0, > lo_flags = 0, lo_data = 0, lo_witness = 0x0}, rw_lock = 0}, id = 0, gencnt > = 0} > > "master" variable looks good (initialized to zeros), what about its > address? > > (kgdb) p &vnet_entry_layer3_chain > $10 = (struct ip_fw_chain *) 0x894a5c00 > > It points to 'set_vnet' of the ipfw.ko: > > kopusha# kldstat |grep ipfw.ko > 13 2 0x89495000 11000 ipfw.ko > kopusha:/usr/src/sys% nm /boot/kernel/ipfw.ko |grep __start_set_vnet > 00010be0 A __start_set_vnet > kopusha:/usr/src/sys% printf "0x%x\n" $((0x89495000 + 0x00010be0)) > 0x894a5be0 > > Calculate layer3_chain location on vnet0 (a la VNET_VNET(vnet0, > layer3_chain)): > > (kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) & > vnet_entry_layer3_chain 0x8f214780 > > Try to read it: > > (kgdb) p ((struct ip_fw_chain *)0x8f214780)->rwmtx > $13 = {lock_object = {lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness > = 0x0}, rw_lock = 0} (kgdb) p ((struct ip_fw_chain *)0x8f214780)->rules > $14 = (struct ip_fw *) 0x6 > > Data looks wrong. But this is the way how this variable is acessed by > ipfw_nat. I see the same in the crash image: > > (kgdb) where > ... > #11 0xc09a4882 in _rw_wlock (rw=0xc6d5e91c, > file=0xca0ac2e3 > "/usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c", line=547) > at /usr/src/sys/kern/kern_rwlock.c:238 > #12 0xca0ab841 in ipfw_nat_modevent (mod=0xc98a48c0, type=0, unused=0x0) > at /usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c:547 > > note, rw=0xc6d5e91c (it crashed on it). And I get the same address doing > like I did above: > > (kgdb) VNET_VNET vnet0 vnet_entry_layer3_chain > at 0xc6d5e700 of type = struct ip_fw_chain > (kgdb) p &((struct ip_fw_chain *)0xc6d5e700)->rwmtx > $8 = (struct rwlock *) 0xc6d5e91c > > Thus ipfw_nat was in vnet0 context then. I saw crashes (in other modules) > when the context was not initialised and they looked differently. > > Right location was 0x86d9c160 (found adding print to ipfw module, I don't > know easier way): > > (kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rwmtx > $1 = {lock_object = {lo_name = 0x932ba4b3 "IPFW static rules", lo_flags = > 69402624, lo_data = 0, lo_witness = 0x86d6ab30}, rw_lock = 1} > (kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rules > $2 = (struct ip_fw *) 0x8f2d1c80 > > So I don't see a way how to reach module's virtualized variable from > outside the module even if you are in the right vnet context. The linker, > when loading the module and allocating the variable on vnet stacks in > 'modspace' possesses this information and it reallocates addresses in the > module and they are accessible from inside the module, but not from > outside. > > MZ> Cheers, > > MZ> Marko > > >> Fatal trap 12: page fault while in kernel mode > >> cpuid = 1; apic id = 01 > >> fault virtual address = 0x4 > >> fault code = supervisor read, page not present > >> instruction pointer = 0x20:0xc09f098e > >> stack pointer = 0x28:0xf563b944 > >> frame pointer = 0x28:0xf563b998 > >> code segment = base 0x0, limit 0xfffff, type 0x1b > >> = DPL 0, pres 1, def32 1, gran 1 > >> processor eflags = interrupt enabled, resume, IOPL = 0 > >> current process = 4264 (kldload) > >> > >> witness_checkorder(c6d5e91c,9,ca0ac2e3,223,0,...) at > >> witness_checkorder+0x6e _rw_wlock(c6d5e91c,ca0ac2e3,223,0,c0e8f795,...) > >> at _rw_wlock+0x82 > >> ipfw_nat_modevent(c98a48c0,0,0,75,0,...) at ipfw_nat_modevent+0x41 > >> module_register_init(ca0ad508,0,c0e8d834,e6,0,...) at > >> module_register_init+0xa7 > >> linker_load_module(0,f563bc18,c0e8d834,3fc,f563bc28,...) at > >> linker_load_module+0xa05 > >> kern_kldload(c86835c0,c72d3400,f563bc40,0,c8d0d000,...) at > >> kern_kldload+0x133 kldload(c86835c0,f563bcec,c09e8940,c86835c0,0,...) > >> at kldload+0x74 syscallenter(c86835c0,f563bce4,c0ce05dd,c1022150,0,...) > >> at syscallenter+0x263 syscall(f563bd28) at syscall+0x34 > >> Xint0x80_syscall() at Xint0x80_syscall+0x21 > >> --- syscall (304, FreeBSD ELF32, kldload), eip = 0x280da00b, esp = > >> 0xbfbfe79c, ebp = 0xbfbfec88 - > >> > >> It crashed on acessing data from virtualized global variable > >> V_layer3_chain in ipfw_nat_modevent(). V_layer3_chain is defined in > >> ipfw module and it turns out that &V_layer3_chain returns wrong > >> location from anywhere but ipfw.ko. > >> > >> May be this is a known issue, but I have not found info about this, so > >> below are details of investigation why this happens. > >> > >> Virtualized global variables are defined using the VNET_DEFINE() macro, > >> which places them in the 'set_vnet' linker set (in the base kernel or > >> in module). This is used to > >> > >> 1) copy these "default" values to each virtual network stack instance > >> when created; > >> > >> 2) act as unique global names by which the variable can be referred to. > >> The location of a per-virtual instance variable is calculated at > >> run-time like in the example below for layer3_chain variable in the > >> default vnet (vnet0): > >> > >> vnet0->vnet_data_base + (uintptr_t) & vnet_entry_layer3_chain > >> (1) > >> > >> For modules the thing is more complicated. When a module is loaded its > >> global variables from 'set_vnet' linker set are copied to the kernel > >> 'set_vnet', and for module to be able to access them the linker > >> reallocates all references accordingly > >> (kern/link_elf.c:elf_relocaddr()): > >> > >> if (x >= ef->vnet_start && x < ef->vnet_stop) > >> return ((x - ef->vnet_start) + ef->vnet_base); > >> > >> So from inside the module the access to its virtualized variables > >> works, but from the outside we get wrong location using calculation > >> like above (1), because &vnet_entry_layer3_chain returns address of the > >> variable in the module's 'set_vnet'. > >> > >> The workaround is to compile such modules into the kernel or use a hack > >> I have done for ipfw_nat -- add the function to ipfw module which > >> returns the location of virtualized layer3_chain variable and use this > >> location instead of V_layer3_chain macro (see the attached patch). > >> > >> But I suppose the problem is not a new and there might be better > >> approach already invented to deal with this?