Date: Sat, 25 Oct 2014 20:53:50 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: "Kenneth D. Merry" <ken@FreeBSD.ORG> Cc: Harald Schmalzbauer <h.schmalzbauer@omnilan.de>, royger@freebsd.org, FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: sa(4) 9.2->10.1, nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split request Message-ID: <20141025175350.GK1877@kib.kiev.ua> In-Reply-To: <20141024230725.GA50845@mithlond.kdm.org> References: <54494E92.5010007@omnilan.de> <20141024230725.GA50845@mithlond.kdm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 24, 2014 at 05:07:26PM -0600, Kenneth D. Merry wrote: > On Thu, Oct 23, 2014 at 20:53:06 +0200, Harald Schmalzbauer wrote: > > Hello, > > > > I read about the changes in sa(4) regarding large-block-split changes > > and the transitional 'kern.cam.sa.allow_io_split' workarround. > > > > I'm using bacula (7.0.5) and my previous neccessarry multi-blocking > > adjustmets like "Minimum block size = 2097152" obviously didn't work > > with FreebSD 10.1 anymore. > > Good news is, they are not needed any more! > > With the default of 126 blocks (64512) I get 60-140MB/s with btape(8)'s > > speed test on my LTO4 (HH) drive and another quick test showed that > > using mbuffer(1) for zfs(8) 'send' isn't needed anymore (| dd > > of=/dev/nsa0 bs=64512 seems to max out LTO4 speed). [with FreeBSD 9 the > > transfer rates were some magnitudes lower with these block size settings!] > > > > Not so good news is, that bacula can't read the tape's label. > > 'Labeling a tape (with 'label' at bconsole(8) or btape(8)) is > > successful, and btape(8)'s 'readlabel' partially displays the correct > > label, but not the very beginning of the label: > > Volume Label: > > Id : **error**VerNo > > ?rest OK > > > > While it should read: > > Volume Label: > > Id : Bacula 1.0 immortal > > VerNo : 11 > > ? > > > > When btape(8) starts to read the label, the _subject's error is reported_: > > *nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split > > request* > > What blocksize are you using with btape(8)? > > What kind of controller are you using? > > The reason you get that error message is that the sa(4) driver goes through > physio(9) to get buffers from userland into the kernel. physio(9) relies > on the vmapbuf()/vunmapbuf() routines to map buffers in and out of the > kernel. > > vmapbuf() operates with a page granularity. The address to be mapped has > to start on a page boundary. It also uses kernel virtual address segments > that are MAXPHYS in size. On x86 boxes at least, MAXPHYS is 128KB. > > So if you use a blocksize of 128KB, but pass in a pointer that doesn't > start on a page boundary, vmapbuf() will have to map 33 pages instead of > 32. In your case, it will have to start at page address 0x803135000, and > will need 33 4KB pages, which is greater than 128KB. I want to disable unaligned physio at all. See https://reviews.freebsd.org/D888 for yet another case where this beats. Obvious thing which stops us from doing this is binary compatibility. I need some form of wide support to make this change. > > This behavior obviously isn't very user friendly. > > If you want to avoid the problem, try setting your blocksize in Bacula to > 4K less than what is reported in kern.cam.sa.0.maxio. If it's 131072, then > set the blocksize to 126976. > > Another way to avoid the problem is to increase MAXPHYS. Increasing it > beyond kern.cam.sa.0.cpi_maxio won't help anything. If you increase > it too much, you can run into other problems. > > That said, though, you can probably bump it to 512K without much worry. > Put this in your kernel config file and recompile/reinstall your kernel: > > options MAXPHYS="(512*1024)" > options DFLTPHYS="(512*1024)" > > The same thing applies, though -- you'll want to set your blocksize to 1 > page less than kern.cam.sa.0.maxio, since Bacula isn't using page-aligned > buffers. > > > The same error show up if I configure bacula to use a fixed block size > > of kern.cam.sa.0.maxio (131072). > > At that (i.e. the physio(9)) level, variable vs. fixed block mode won't > matter. > > > Like expected, allowing split (with kern.cam.sa.allow_io_split in > > loader.conf) works arround that problem. > > But I'd like to understand why I cannot set kern.cam.sa.0.maxio resp. > > why btape(8) doesn't work 100% correct although blocksize < sa.0.maxio > > See above. The unfortunate thing is that with the above setup, I think > you'll wind up with a bigger block and then a smaller block going onto the > tape in variable block mode at least. > > This is an example of why I/O splitting is bad -- you don't have good > visibility from userland into exactly how things are getting put on tape. > The application writes out what it wants, but it doesn't know what size > blocks are hitting the tape. > > > I don't have enough understanding to check the code myself, if it's a > > cam/sa(4) issue in FreeBSD or a problem in btape(8) (and also bacula > > itself, most likely the tool shares the code with bacula's storage deamon). > > > > Any hints highly appreciated! > > I have considered implementing a custom read/write routine in the sa(4) > driver to get around some of these issues, but it will require more than > just sa(4) driver modifications for everything to work optimally. > > With a custom read/write routine, if we copied data into the kernel, we > could essentially allow any I/O size that the controller and tape drive > support without altering MAXPHYS. And alignment issues wouldn't matter, > either. > > The drawback is that we wouldn't be able to do unmapped I/O for drivers > that support it. (Unless the user happened to give us a single buffer that > we could send down as an unmapped I/O.) The unmapped I/O code doesn't > currently handle scatter/gather lists of unmapped buffers. > > Another drawback to copying is the increased overhead of versus unmapped > I/O. Although on modern hardware, copying is usually more efficient than > mapping user memory into the kernel's virtual address space, because of the > TLB shootdowns that happen with the mapping operation. > > For tape users with just one tape drive, the overhead wouldn't be a big > deal. If you have lots of tape drives attached to one machine, though, it > could have a noticable effect. > > Ken > -- > Kenneth Merry > ken@FreeBSD.ORG > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141025175350.GK1877>