Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Oct 2014 20:53:50 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        "Kenneth D. Merry" <ken@FreeBSD.ORG>
Cc:        Harald Schmalzbauer <h.schmalzbauer@omnilan.de>, royger@freebsd.org, FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: sa(4) 9.2->10.1, nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split request
Message-ID:  <20141025175350.GK1877@kib.kiev.ua>
In-Reply-To: <20141024230725.GA50845@mithlond.kdm.org>
References:  <54494E92.5010007@omnilan.de> <20141024230725.GA50845@mithlond.kdm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 24, 2014 at 05:07:26PM -0600, Kenneth D. Merry wrote:
> On Thu, Oct 23, 2014 at 20:53:06 +0200, Harald Schmalzbauer wrote:
> >  Hello,
> > 
> > I read about the changes in sa(4) regarding large-block-split changes
> > and the transitional 'kern.cam.sa.allow_io_split' workarround.
> > 
> > I'm using bacula (7.0.5) and my previous neccessarry multi-blocking
> > adjustmets like "Minimum block size = 2097152" obviously didn't work
> > with FreebSD 10.1 anymore.
> > Good news is, they are not needed any more!
> > With the default of 126 blocks (64512) I get 60-140MB/s with btape(8)'s
> > speed test on my LTO4 (HH) drive and another quick test showed that
> > using mbuffer(1) for zfs(8) 'send' isn't needed anymore (| dd
> > of=/dev/nsa0 bs=64512 seems to max out LTO4 speed). [with FreeBSD 9 the
> > transfer rates were some magnitudes lower with these block size settings!]
> > 
> > Not so good news is, that bacula can't read the tape's label.
> > 'Labeling a tape (with 'label' at bconsole(8) or btape(8)) is
> > successful, and btape(8)'s 'readlabel' partially displays the correct
> > label, but not the very beginning of the label:
> > Volume Label:
> > Id : **error**VerNo
> > ?rest OK
> > 
> > While it should read:
> > Volume Label:
> > Id : Bacula 1.0 immortal
> > VerNo : 11
> > ?
> > 
> > When btape(8) starts to read the label, the _subject's error is reported_:
> > *nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split
> > request*
> 
> What blocksize are you using with btape(8)?
> 
> What kind of controller are you using?
> 
> The reason you get that error message is that the sa(4) driver goes through
> physio(9) to get buffers from userland into the kernel.  physio(9) relies
> on the vmapbuf()/vunmapbuf() routines to map buffers in and out of the
> kernel.
> 
> vmapbuf() operates with a page granularity.  The address to be mapped has
> to start on a page boundary.  It also uses kernel virtual address segments
> that are MAXPHYS in size.  On x86 boxes at least, MAXPHYS is 128KB.
> 
> So if you use a blocksize of 128KB, but pass in a pointer that doesn't
> start on a page boundary, vmapbuf() will have to map 33 pages instead of
> 32.  In your case, it will have to start at page address 0x803135000, and
> will need 33 4KB pages, which is greater than 128KB.
I want to disable unaligned physio at all.

See https://reviews.freebsd.org/D888 for yet another case where this beats.

Obvious thing which stops us from doing this is binary compatibility.
I need some form of wide support to make this change.

> 
> This behavior obviously isn't very user friendly. 
> 
> If you want to avoid the problem, try setting your blocksize in Bacula to
> 4K less than what is reported in kern.cam.sa.0.maxio.  If it's 131072, then
> set the blocksize to 126976.
> 
> Another way to avoid the problem is to increase MAXPHYS.  Increasing it
> beyond kern.cam.sa.0.cpi_maxio won't help anything.  If you increase
> it too much, you can run into other problems.
> 
> That said, though, you can probably bump it to 512K without much worry.
> Put this in your kernel config file and recompile/reinstall your kernel:
> 
> options         MAXPHYS="(512*1024)"
> options         DFLTPHYS="(512*1024)"
> 
> The same thing applies, though -- you'll want to set your blocksize to 1
> page less than kern.cam.sa.0.maxio, since Bacula isn't using page-aligned
> buffers.
> 
> > The same error show up if I configure bacula to use a fixed block size
> > of kern.cam.sa.0.maxio (131072).
> 
> At that (i.e. the physio(9)) level, variable vs. fixed block mode won't
> matter.
> 
> > Like expected, allowing split (with kern.cam.sa.allow_io_split in
> > loader.conf) works arround that problem.
> > But I'd like to understand why I cannot set kern.cam.sa.0.maxio resp.
> > why btape(8) doesn't work 100% correct although blocksize < sa.0.maxio
> 
> See above.  The unfortunate thing is that with the above setup, I think
> you'll wind up with a bigger block and then a smaller block going onto the
> tape in variable block mode at least.
> 
> This is an example of why I/O splitting is bad -- you don't have good
> visibility from userland into exactly how things are getting put on tape.
> The application writes out what it wants, but it doesn't know what size
> blocks are hitting the tape.
> 
> > I don't have enough understanding to check the code myself, if it's a
> > cam/sa(4) issue in FreeBSD or a problem in btape(8) (and also bacula
> > itself, most likely the tool shares the code with bacula's storage deamon).
> > 
> > Any hints highly appreciated!
> 
> I have considered implementing a custom read/write routine in the sa(4)
> driver to get around some of these issues, but it will require more than
> just sa(4) driver modifications for everything to work optimally.
> 
> With a custom read/write routine, if we copied data into the kernel, we
> could essentially allow any I/O size that the controller and tape drive
> support without altering MAXPHYS.  And alignment issues wouldn't matter,
> either.
> 
> The drawback is that we wouldn't be able to do unmapped I/O for drivers
> that support it.  (Unless the user happened to give us a single buffer that
> we could send down as an unmapped I/O.)  The unmapped I/O code doesn't
> currently handle scatter/gather lists of unmapped buffers.
> 
> Another drawback to copying is the increased overhead of versus unmapped
> I/O.  Although on modern hardware, copying is usually more efficient than
> mapping user memory into the kernel's virtual address space, because of the
> TLB shootdowns that happen with the mapping operation.
> 
> For tape users with just one tape drive, the overhead wouldn't be a big
> deal.  If you have lots of tape drives attached to one machine, though, it
> could have a noticable effect.
> 
> Ken
> -- 
> Kenneth Merry
> ken@FreeBSD.ORG
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141025175350.GK1877>