Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 Mar 2018 11:08:54 -0500
From:      Mark Johnston <markj@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r330663 - head/sys/kern
Message-ID:  <20180309160854.GC6174@raichu>
In-Reply-To: <20180309150402.X950@besplex.bde.org>
References:  <201803081704.w28H4aQx052056@repo.freebsd.org> <20180309150402.X950@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 09, 2018 at 03:42:05PM +1100, Bruce Evans wrote:
> On Thu, 8 Mar 2018, Mark Johnston wrote:
> 
> > Log:
> >  Return E2BIG if we run out of space writing a compressed kernel dump.
> 
> E2BIG a very wrong errno.  It means "Argment list too long".  It is broken
> as designed, with "too" encrypted as "2" and no indication of what is too
> big.  EFBIG is not so wrong.  It means "File too large".

There is explicit handling for E2BIG most of the (mini)dumpsys()
implementations, which is why I chose it. In particular, amd64's
minidumpsys() retries the dump upon receiving ENOSPC from the MI code,
but E2BIG simply causes the dump to fail:

443         else if (error == E2BIG)
444                 printf("Dump failed. Partition too small.\n");

> >  ENOSPC causes the MD kernel dump code to retry the dump, but this is
> >  undesirable in the case where we legitimately ran out of space.
> 
> ENOSPC is the correct errno.  It means "[really] No space left on device".
> The bug was either retrying or possibly abusing ENOSPC instead of EAGAIN
> to mean "transiently out of space for something".

When writing an uncompressed dump, the starting offset is chosen so
that the end of the dump lines up with the end of the dump device. If we
attempt to write past the end of the dump, the presumption is that
something caused pages to be added to the dump map during the dump, and
we should retry with a different starting offset. EAGAIN seems like a
reasonable error number for this case, but it's somewhat unsatisfying
since these checks were originally meant to stop programming errors from
scribbling over a filesystem.

I wonder if the retry logic in amd64's minidumpsys() is really useful at
all. It was added in r215133 and copied to arm64, but isn't present in
any other MD dump code. I've never seen a kernel dump succeed after a
retry; for instance, the problem addressed in r329521 simply caused us
to retry a number of times before giving up. The description for
Differential D8254 also suggests that the retry logic is of questionable
value.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180309160854.GC6174>