Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Nov 2016 17:58:30 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Sebastian Huber <sebastian.huber@embedded-brains.de>
Cc:        FreeBSD <freebsd-hackers@freebsd.org>
Subject:   Re: Should page allocator zero the pages for UMA?
Message-ID:  <20161112155830.GC54029@kib.kiev.ua>
In-Reply-To: <5822C615.3060100@embedded-brains.de>
References:  <5821DC2E.9020302@embedded-brains.de> <20161108145528.GN54029@kib.kiev.ua> <5822C615.3060100@embedded-brains.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Nov 09, 2016 at 07:45:41AM +0100, Sebastian Huber wrote:
> On 08/11/16 15:55, Konstantin Belousov wrote:
> > On Tue, Nov 08, 2016 at 03:07:42PM +0100, Sebastian Huber wrote:
> >> Hello,
> >>
> >> we use the FreeBSD network, USB and SD/MMC card stacks for the real-time
> >> operating system RTEMS:
> >>
> >> https://git.rtems.org/rtems-libbsd
> >>
> >> I update currently from FreeBSD 9.3 to head. We use the UMA from FreeBSD
> >> with a custom page allocator:
> >>
> >> https://git.rtems.org/rtems-libbsd/tree/rtemsbsd/rtems/rtems-kernel-page.c
> >>
> >> The FreeBSD 9.3 based port worked well with uninitialized pages, e.g.
> >> random or previous content. However, after the update to head I had to
> >> zero initialize the pages. One issue was an incomplete
> >>
> >> struct inpcb {
> >> [...]
> >>       struct    inpcbport *inp_phd;    /* (i/h) head of this list */
> >> #define inp_zero_size offsetof(struct inpcb, inp_gencnt)
> >>       inp_gen_t    inp_gencnt;    /* (c) generation count */
> >>       struct llentry    *inp_lle;    /* cached L2 information */
> >>       struct rwlock    inp_lock;
> >>       rt_gen_t    inp_rt_cookie;    /* generation for route entry */
> >>       union {                /* cached L3 information */
> >>           struct route inpu_route;
> >>           struct route_in6 inpu_route6;
> >>       } inp_rtu;
> >> #define inp_route inp_rtu.inpu_route
> >> #define inp_route6 inp_rtu.inpu_route6
> >> };
> >>
> >> initialization. The initialization consists of two parts:
> >>
> >> static int
> >> udp_inpcb_init(void *mem, int size, int flags)
> >> {
> >>       struct inpcb *inp;
> >>
> >>       inp = mem;
> >>       INP_LOCK_INIT(inp, "inp", "udpinp");
> >>       return (0);
> >> }
> >>
> >> /*
> >>    * Allocate a PCB and associate it with the socket.
> >>    * On success return with the PCB locked.
> >>    */
> >> int
> >> in_pcballoc(struct socket *so, struct inpcbinfo *pcbinfo)
> >> {
> >>       struct inpcb *inp;
> >>       int error;
> >>
> >> #ifdef INVARIANTS
> >>       if (pcbinfo == &V_tcbinfo) {
> >>           INP_INFO_RLOCK_ASSERT(pcbinfo);
> >>       } else {
> >>           INP_INFO_WLOCK_ASSERT(pcbinfo);
> >>       }
> >> #endif
> >>
> >>       error = 0;
> >>       inp = uma_zalloc(pcbinfo->ipi_zone, M_NOWAIT);
> >>       if (inp == NULL)
> >>           return (ENOBUFS);
> >>       bzero(inp, inp_zero_size);
> >>       inp->inp_pcbinfo = pcbinfo;
> >>       inp->inp_socket = so;
> >>       inp->inp_cred = crhold(so->so_cred);
> >>       inp->inp_inc.inc_fibnum = so->so_fibnum;
> >> [...]
> >>
> >> This lets at least inp_route uninitialized leading to a crash during
> >> destruction, e.g.
> >>
> >>       if (inp->inp_route.ro_rt) {
> >>           RTFREE(inp->inp_route.ro_rt);
> >>           inp->inp_route.ro_rt = (struct rtentry *)NULL;
> >>       }
> >>
> >> uses uninitialized data.
> >>
> >> Did something in the page allocator change between FreeBSD 9.3 and
> >> trunk, so that page are now zero initialized or is this a bug in
> >> udp_inpcb_init()?
> > Nothing guarantees that returned items are zeroed out automatically.
> > Pages are not returned zeroed by VM page allocator, it might indicate
> > that the allocated page happens to be zeroed by PG_ZERO flag.  Unless
> > M_ZERO is passed to uma_zalloc(), item is not zeroed.
> >
> > I am not sure, you did not demonstrated zone initialization, but it seems
> > that your zone is having zone init function.  If this is true, than I do not
> > see why do you consider it possible to have items automatically zeroed at
> > all.  Uma calls the init function at the bulk allocation time, and constructor
> > for each item returned by uma_zalloc().  No implicit zeroing occurs there.
> 
> Now, I am a bit confused. We have
> 
> /*
>   * Allocate a new slab for a keg.  This does not insert the slab onto a 
> list.
>   *
>   * Arguments:
>   *    wait  Shall we wait?
>   *
>   * Returns:
>   *    The slab that was allocated or NULL if there is no memory and the
>   *    caller specified M_NOWAIT.
>   */
> static uma_slab_t
> keg_alloc_slab(uma_keg_t keg, uma_zone_t zone, int wait)
> {
> [...]
>      /*
>       * This reproduces the old vm_zone behavior of zero filling pages the
>       * first time they are added to a zone.
>       *
>       * Malloced items are zeroed in uma_zalloc.
>       */
> 
>      if ((keg->uk_flags & UMA_ZONE_MALLOC) == 0)
>          wait |= M_ZERO;
>      else
>          wait &= ~M_ZERO;
> [...]
>      /* zone is passed for legacy reasons. */
>      mem = allocf(zone, keg->uk_ppera * PAGE_SIZE, &flags, wait);
> 
> Firstly, the name "wait" and the comment are confusing, since "wait" is 
> really "flags" which may have M_ZERO set. So, for non UMA_ZONE_MALLOC 
> zones (which should be most zones) the page is supposed to be zeroed?
> 
> We ignored the flags in our FreeBSD 9.3 based port and this worked quite 
> well. However, on FreeBSD head I changed the page allocator now to do:
> 
> #ifdef INVARIANTS
>      wait |= M_ZERO;
> #endif
> 
>      if (addr != NULL && (wait & M_ZERO) != 0) {
>          memset(addr, 0, size_in_bytes);
>      }
> 
> With this change, everything works like before.
Is this in your page allocator ?  If yes, then this is how things are
suppossed to work.  If uma requested M_ZERO page, the page must be zeroed.

This is very different from 'always zeroing', which you asked about before.

> 
> >
> > If your zone does not have init/fini functions, nor ctors/dtors, then
> > HEAD and stable/11 actually actively trash items to catch reliance on
> > the uninitialized memory being zero.  Look for trash_ctor in uma_core.c.
> 
> The zone has an init/fini function, see udp_init()
> 
>      /*
>       * For now default to 2-tuple UDP hashing - until the fragment
>       * reassembly code can also update the flowid.
>       *
>       * Once we can calculate the flowid that way and re-establish
>       * a 4-tuple, flip this to 4-tuple.
>       */
>      in_pcbinfo_init(&V_udbinfo, "udp", &V_udb, UDBHASHSIZE, UDBHASHSIZE,
>          "udp_inpcb", udp_inpcb_init, NULL, 0,
>          IPI_HASHFIELDS_2TUPLE);
> 
If you use init/ctr which do not set the allocated object to the
pre-defined values, and do not specify M_ZERO flag, the content of the
bytes which were not touched by init/ctr is not defined. It is not
guaranteed to be zeroed.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161112155830.GC54029>