Date: Wed, 18 Mar 2015 10:24:09 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: Ryan Stone <rysto32@gmail.com> Subject: Re: What parts of UMA are part of the stable ABI? Message-ID: <2085699.T9kJlc0rkf@ralph.baldwin.cx> In-Reply-To: <CAFMmRNypU_Y9UKDwpiRtedOCeCPQFOsVuswN0-rn3EmVykTAYw@mail.gmail.com> References: <CAFMmRNypU_Y9UKDwpiRtedOCeCPQFOsVuswN0-rn3EmVykTAYw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, March 13, 2015 07:48:38 PM Ryan Stone wrote: > In this freebsd-hackers thread[1], a user reported that 10.1-RELEASE > crashes during boot on a system with 3TB of RAM. As it turns out, when you > have that much RAM ZFS autotunes itself to allocate a 6GB hash table. This > triggers a nasty 32-bit integer truncation bug in malloc(9). malloc() > calls uma_large_malloc(), but uma_large_malloc() accepts an int instead of > a size_t and all kinds of hilarity can ensure from there. The user has > confirmed that the page in [2] fixed the kernel from instantly panicking > once zfs.ko was loaded. I'm a bit concerned about whether the patch as > written is an MFC candidate though. > > uma_large_malloc() calls page_alloc() to actuallly allocate the memory, and > page_alloc() also accepts an int size parameter. This is where things get > tricky. The signature for page_alloc() is governed by the uma_alloc() > typedef, as uma also uses it internally for allocating memory for > uma_zones. There is even a uma_zone_set_allocf() API for overriding the > default allocation function. So there's definitely an argument to be made > the the signature of page_alloc() being a part of the stable ABI. > > I have no hesitation in saying that uma_large_malloc() is not a stable API > and changing it is fair game. If uma_alloc() is a part of the stable API, > then it's simple enough to commit a 64-bit safe allocation function for > uma_large_malloc() to call and changing page_alloc() to call it instead. > That commit can be MFC'ed, and a follow-up commit could convert the UMA > APIs to use size_t everywhere. I think uma_zone_set_allocf() is most likely obscure enough to not be used outside of the places in the kernel where it is used. From that, I think you are fine to change uma_alloc()'s signature. > While I am at this, I'd like to also change the uma init/fini/ctor/dtor to > also use size_t. I'm a little torn on this because this will definitely > cause a lot of churn, both in the tree and for downstream consumers, and > there's not necessarily going to be a big benefit to it. However, I > suppose that the existence of machines where 4GB is less than 1% of system > memory may mean that allocating 4GB at a time may not that outlandish. I > can definitely be talked out of this though. I do think the normal zone callbacks passed to uma_zcreate() are too public to change. Or at least, you would need to do some crazy ABI shim where you have a uma_zcreate_new() that you map to uma_zcreate() via a #define for the API, but include a legacy uma_zcreate() symbol that older modules can call (and then somehow tag the old function pointers via an internal flag in the zone and patch UMA to cast to the old function signatures for zones with that flag). Note that if you are really paranoid you could do the same thing with uma_zone_set_allocf(). It would break the API, but would preserve the ABI (i.e. leave an existing uma_zone_set_allocf() function that accepts the old prototype but have a uma_zone_set_allocf_new() that gets mapped to uma_zone_set_allocf via a #define). -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2085699.T9kJlc0rkf>