Date: Wed, 2 Nov 2005 10:43:12 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: nocool <nocool@263.net> Cc: freebsd-current <freebsd-current@freebsd.org>, freebsd-hacker <freebsd-hacker@freebsd.org> Subject: Re: Why INVARIANTS option and sanity checking? Message-ID: <20051102103112.J45155@fledge.watson.org> In-Reply-To: <20051102021226.B1542E5E@smtp.263.net> References: <20051102021226.B1542E5E@smtp.263.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2 Nov 2005, nocool wrote: > Hi, I need some explanation about INVARIANTS compile option. This option > has the description that enable calls of extra sanity checking. What > does sanity mean here? Where and why we need to use this option? There are a number of debugging kernel options available in the kernel, including INVARIANTS, WITNESS, SOCKBUF_DEBUG, etc. They all exchange performance for checking of programmer belief about the invariants of the source code. The goal is to have the programmer document their beliefs about the state of the system as a series of tests, which are then validated at run time. These tests can be cheap (extra NULL checks), or they can be very expensive (clearing and checking memory on free and allocate, run-time lock order verification). As such, they are used extensively during development, but get turned off in production for performance reasons. For example, for some workloads, we've had reports of 70%+ loss in performance due to running with WITNESS turned on. Part of the goal of an invariants test is to fail-stop the system before it goes from violating a low-level assumption to wide-spread data corruption and hard-to-track bugs. This makes it much easier to analyze the bug and fix it (or for that matter, fix the assertion). As such, most invariant violations will result in a panic and optionally a core dump for diagnostic purposes. However, some invariants testing, such as lock order analysis, is configurable to either generate a warning with debugging trace, or to panic, depending on desired usage. > I find some codes in kern/kern_malloc.c in 5.4 kernel: > > 511 kmemzones[indx].kz_zone = uma_zcreate(name, size, > 512 #ifdef INVARIANTS > 513 mtrash_ctor, mtrash_dtor, mtrash_init, mtrash_fini, > 514 #else > 515 NULL, NULL, NULL, NULL, > 516 #endif > 517 UMA_ALIGN_PTR, UMA_ZONE_MALLOC); > > In the case INVARIANTS is defined, kz_zone will be set up with the > constructor function mtrash_ctor and destructor function mtrash_dtor. > When kz_zone free some items, kernel will call mtrash_dtor(), every item > will be filled with the value of uma_junk. When some items will be > reallocated, kernel calls mtrash_ctor() and makes sure the constructing > item has'nt been overwritten since it was freed through comparing every > int of the item with uma_junk. Why kmemzones need this check, while > other zones and memory areas need't? Where comes the danger that the > memory item will be overwritted after its free? The UMA slab allocator implement an object life cycle, in which memory moves between three states: --zone_init--> --zone_ctor--> [uninitialized] [initialized] [allocated] <--zone_fini-- <--zone_dtor-- This allows the reuse of memory for the same type of object repeatedly, allowing some state to be reused across allocations. For example, threads are always associated with thread stacks. Rather than reallocating the stack separately from the thread, the zone caches the stack with the thread in its initialized state, allowing less work to occur each time a thread is allocated and free'd. As such, there will be data in the memory object that can't be trashed on its destructor and tested on its constructor -- if this were done, the persistent state would be lost. So zones are individually configured to perform memory trashing and testing based on whether or not they take advantage of persistent state between allocations. With regard to why this is helpful -- since the C language is not type safe, nothing in the language prevents touching memory after it has been free'd. Therefore, it is a ripe opportunity for nasty bugs -- things like the following: crfree(cred); cred->cr_uid = 0; These bugs are notoriously hard to catch, as in a multi-threaded, multi-processing kernel, it's possibly that the memory may actually be allocated to another thread as soon as it is free'd, resulting in the above assignment occuring on valid, allocated memory. The ctor and dtor tests are designed to help identify when an access has happened after free, and if so, to memory owned by what zone. MEMGUARD is another similar notion, only it uses the VM system to help detect references to memory using page protections. The above bug example is one of the more simple of its class -- often the bug occurs due to a stray uncleared pointer in another data structure, which may persist for a long time before it is used. Basically, it all comes down to this: invariants and sanity checking allow programmers to test that their assumptions about the source code they (or someone else) has implemented. This helps find bugs faster, and in a way that makes them much easier to debug. Robert N M Watson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051102103112.J45155>