Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Aug 2011 20:41:47 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        alc@freebsd.org
Cc:        freebsd-stable@freebsd.org, perryh@pluto.rain.com, "Alexander V. Chernikov" <melifaro@ipfw.ru>, daniel@digsys.bg
Subject:   Re: 32GB limit per swap device?
Message-ID:  <20110820174147.GW17489@deviant.kiev.zoral.com.ua>
In-Reply-To: <CAJUyCcMc7m65c_XjHNFi0A4cHHySC1brLS7HdivstxeOi6uFQw@mail.gmail.com>
References:  <4E4143A6.6030307@digsys.bg> <935F8EC2-88E0-45A3-BE8B-7210BE223BC5@mac.com> <4e42a0c0.e2t/9MF98O3HFjb1%perryh@pluto.rain.com> <4E4CCA6C.8020408@ipfw.ru> <CAJUyCcMc7m65c_XjHNFi0A4cHHySC1brLS7HdivstxeOi6uFQw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--gfR41eDGUhhc/UyZ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov <melifaro@ipfw.ru=
>wrote:
>=20
> > On 10.08.2011 19:16, perryh@pluto.rain.com wrote:
> >
> >> Chuck Swiger<cswiger@mac.com>  wrote:
> >>
> >>  On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
> >>>
> >>>> I am trying to set up 64GB partitions for swap for a system that
> >>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on
> >>>> 8-stable as of today I get:
> >>>>
> >>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit
> >>>>
> >>>> Is there workaround for this limitation?
> >>>>
> >>>
> > Another interesting question:
> >
> > swap pager operates in page blocks (PAGE_SIZE=3D4k on common arch).
> >
> > Block device size in passed to swaponsomething() in number of _disk_ bl=
ocks
> >  (e.g. in DEV_BSIZE=3D512). After that, kernel b-lists (on top of which=
 swap
> > pager is build) maximum objects check is enforced.
> >
> > The (possible) problem is that real object count we will operate on is =
not
> > the value passed to swaponsomething() since it is calculated in wrong u=
nits.
> >
> > we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value wh=
ich
> > is rough (X / 8) so we should be able to address 32*8=3D256G.
> >
> > The code should look like this:
> >
> > Index: vm/swap_pager.c
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D**=3D=3D=3D=3D=3D=3D=3D
> > --- vm/swap_pager.c     (revision 223877)
> > +++ vm/swap_pager.c     (working copy)
> > @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_lo=
ng
> >        u_long mblocks;
> >
> >        /*
> > +        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunk=
s.
> > +        * First chop nblks off to page-align it, then convert.
> > +        *
> > +        * sw->sw_nblks is in page-sized chunks now too.
> > +        */
> > +       nblks &=3D ~(ctodb(1) - 1);
> > +       nblks =3D dbtoc(nblks);
> > +
> > +       /*
> >
> >         * If we go beyond this, we get overflows in the radix
> >         * tree bitmap code.
> >         */
> > @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_lo=
ng
> >                        mblocks);
> >                nblks =3D mblocks;
> >        }
> > -       /*
> > -        * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunk=
s.
> > -        * First chop nblks off to page-align it, then convert.
> > -        *
> > -        * sw->sw_nblks is in page-sized chunks now too.
> > -        */
> > -       nblks &=3D ~(ctodb(1) - 1);
> > -       nblks =3D dbtoc(nblks);
> >
> >        sp =3D malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
> >        sp->sw_vp =3D vp;
> >
> >
> > (move pages recalculation before b-list check)
> >
> >
> > Can someone comment on this?
> >
> >
> I believe that you are correct.  Have you tried testing this change on a
> large swap device?
I probably agree too, but I am in the process of re-reading the swap code,
and I do not quite believe in the limit.

When the initial code was committed, our daddr_t was 32bit, I checked
the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression
right now is that we only utilize the low 32bits of daddr_t.

Esp. interesting looks the following typedef:
typedef	uint32_t	u_daddr_t;	/* unsigned disk address */
which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff.

I wonder whether we could just use full 64bit and de-facto remove the
limitation on the swap partition size.

--gfR41eDGUhhc/UyZ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk5P8dsACgkQC3+MBN1Mb4gKdwCeK7fVc2QYLxELDvVNP+xeDEdQ
bk8An2aneYCGFD/rDi0TA2tSjFHD5Srd
=Eikm
-----END PGP SIGNATURE-----

--gfR41eDGUhhc/UyZ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110820174147.GW17489>