Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Aug 2022 10:59:08 +0200
From:      Philipp <satanist+freebsd@bureaucracy.de>
To:        questions@freebsd.org
Subject:   Re: zfs and git upload-pack
Message-ID:  <97376e6a2eefd7e62e2118d505cd7584.philipp@bureaucracy.de>
In-Reply-To: <34cea530-841a-527c-b080-ab3e1379d86f@holgerdanske.com>
References:  <20220807102839.7c69f387@bureaucracy.de> <a94ec43d-8ae8-fd20-a5a9-0e1e67750107@holgerdanske.com> <20220807195750.0233e2f3@bureaucracy.de> <348470bf-0f11-f7b3-e782-881c3f864ffb@holgerdanske.com> <20220807211348.401ee1c3@bureaucracy.de> <62ced745-9db4-021f-ae0a-fdb4aba03a13@holgerdanske.com> <92ca8cb253ec6c84e44c76d82e98c1e1.philipp@bureaucracy.de> <34cea530-841a-527c-b080-ab3e1379d86f@holgerdanske.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[2022-08-09 17:40] David Christensen <dpchrist@holgerdanske.com>
> On 8/9/22 06:16, Philipp wrote:
> > [2022-08-07 20:52] David Christensen
> >> On 8/7/22 12:13, Philipp Takacs wrote:
> >>> On Sun, 7 Aug 2022 11:12:20 -0700 David Christensen
> >>>> On 8/7/22 10:57, Philipp Takacs wrote:
> >>>>> On Sun, 7 Aug 2022 09:54:41 -0700 David Christensen
> >>>>>> On 8/7/22 01:28, Philipp wrote:
> >>>>>>> Hi all
> >>>>>>>
> >>>>>>> I host a quite uncommon git repository mostly out of binary
> >>>>>>> files. I have the problem every time this repo is cloned the host
> >>>>>>> allocate memory and going to swap. This leads to the host being
> >>>>>>> unusable and need to force rebooted.
> >>>>>>>
> >>>>>>> The repo is stored on a zfs and nullmounted in a jail to run the
> >>>>>>> git service over ssh. The host is a FreeBSD 13.1 with 4GB RAM and
> >>>>>>> 4GB swap.
> >>>>>>>
> >>>>>>> What I have noticed is that the biggest memory consumtion is from
> >>>>>>> mmap() a pack file. For the given repo this has the size of 6,7G.
> >>>>>>> I suspect this file is mapped in memory but not correctly
> >>>>>>> handled/unmaped (by the kernel) when not enough memory is
> >>>>>>> available.
> >>>>>>>
> >>>>>>> I have tested some options to solve/workaround this issue:
> >>>>>>>
> >>>>>>> * limit the zfs ARC size in loader.conf
> >>>>>>> * zfs set primarycache none for the dataset
> >>>>>>> * limit datasize, memoryuse and vmemoryuse via login.conf
> >>>>>>> * limit git packedGitLimit
>
> > I have [restored the options to previous values]. Now the behavior has =
changed. Now one clone was
> > succsessfull and at later clones stop with an error (Cannot allocate
> > memory). This is better but still not good.
>
>
> If you reboot the server, does the clone work the first them and then =

> always fail?

I have currently not that mutch time, I might test this next week.

> >> What happens if the clone is attempted by a different user on the same
> >> workstation?
> >>
> >>
> >> What happens if the clone is attempted from another workstation?
> > =

> > The same as described.
>
>
> That supports a hypothesis that the problem is the server.
>
>
> >> Please post client console sessions that demonstrate correct operation
> >> and failed operation.
>
> > successfull:
> > =

> > satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/b=
igrepo
> > Cloning into 'bigrepo'...
> > remote: Objekte aufz=C3=A4hlen: 9661, fertig.
> > remote: Gesamt 9661 (Delta 0), Wiederverwendet 0 (Delta 0), Pack wieder=
verwendet 9661
> > Receiving objects: 100% (9661/9661), 6.73 GiB | 5.96 MiB/s, done.
> > Resolving deltas: 100% (3/3), done.
> > Updating files: 100% (6591/6591), done.
>
>
> Are those status messages in German?  I do not know German...

A sorry havent paied attention. I can translate:

aufz=C3=A4hlen =3D> counting
Gesamt =3D> total
Wiederverwendet =3D> reused

> > unsuccessfull:
> > =

> > satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/b=
igrepo
> > Cloning into 'bigrepo'...
> > remote: Enumerating objects: 9661, done.                               =
                       Rerror: git upload-pack: git-pack-ob
> jects died with error.iB/s
> > fatal: git upload-pack: aborting due to possible repository corruption =
on the remote side.
> > remote: fatal: packfile ./objects/pack/pack-6fee671a31a59454b539c88d674=
373d88ad67780.pack cannot be mapped: Cannot allocate memory
> > remote: aborting due to possible repository corruption on the remote si=
de.
> > fatal: early EOF
> > fatal: index-pack failed
> > =

> > As mentioned earlier the "Cannot allocate memory" is new. The old
> > behavior was that the server was unusable till I restarted the server.
> > I currently don't know how this exactly looks on the client, but there
> > is not mutch info in the output.
>
>
> Now the status messages are in English.  Do you know why the language =

> changed?

Because I lied a bit. The first clone was directly from the host. The
other one was over the jail responsible for git. There are a bit
diffrent lang settings and the first one don't use fugit. Also the repo
is read from a nullfs which points to the same data. But the "Cannot
allocate memory" happends indipendet of this.

> Have you run 'git fsck'?
>
> https://git-scm.com/docs/git-fsck

Not before, but this currently behaves like the git clone before
restoring the defaults. So free memory goes down to 85M and server
start swapping. I have killed it now, before the server was unusable.

I have tested this on a local copy created with scp. No error.

> Does the server have ECC memory?

No

> >>> This is a server, a client connect with a
> >>> git client over ssh and use git-upload-pack
> >>
> >>
> >> https://git-scm.com/docs/git-upload-pack
> > =

> > Yes this programm, but I post hear because I susspect this is an freebs=
d
> > issue not an issue with git. This programm basicly mmap() some files,
> > parse them and write parts (based on stdin) of the content to stdout.
>
>
> I desire stable OS's, so I typically use an older production release. =

> My servers have:
>
> 2022-08-09 16:17:40 dpchrist@f3 ~
> $ freebsd-version ; uname -a
> 12.3-RELEASE-p5
> FreeBSD f3.tracy.holgerdanske.com 12.3-RELEASE-p5 FreeBSD =

> 12.3-RELEASE-p5 GENERIC  amd64
>
>
> Perhaps you should grab a blank SSD, do an install of =

> FreeBSD-12.3-RELEASE, install and set up the software you need without =

> using jails, restore the Git repository onto it, and test if that works.

I have not the time to reinstall the complete server (ldap, mailserver,
http, shell, ...). Also I don't have the money to buy extra hardware.

> >>> to receive the content of
> >>> the repo. The communication of the git client and git-upload-pack wor=
ks
> >>> with stdin/stdout. I can give the logs of my git authorization handle=
r
> >>> (inside jail):
> >>
> >> <snip>
> >>
> >> What file?
> > =

> > This file is called fugit.log and is created by a authorization handler
> > for git over ssh called fugit. I use fugit to manage authorization for
> > multible git repositories. See https://github.com/cbdevnet/fugit/blob/m=
aster/fugit
>
>
> Is fugit is validated and supported on your OS?

Not exactly I have tested it and patched the issues I have found.

> Is there a support community?

Yes, but it's small. I would guess 3 people including me.

> Has anyone else seen this problem?

No, but the problem also ocourse if I clone the repo without fugit.

> Has anyone else seen this problem on Git?

I haven't found any repots.

> > So it looks like the git-upload-pack and the corresponding
> > IO causes a lot of cache allocation. This leads to no free memory left
> > for the rest of the operation of the server.
>
>
> As the clone works the first time, and then fails every time thereafter =

> with memory-related errors, perhaps there is a memory leak?

This is also my guess, the question is where and how to fix it. I
suspect some fs cache. Because as I mentioned in my first mail, after
a clone free memory stay low till I unmount the zfs dataset.


> > I don't know where to increase the verbosity. ssh just starts a session
> > and goes on. The logs of fugit are already posted completly. git-upload=
-pack
> > does not log.
>
>
> Perhaps we can explore that later.
>
>
> >> # zpool iostat -v 60
> > =

> > This looks quite normal here parts of the output:
> > =

> > Normal operation without a git clone running:
> > =

> > pool        alloc   free   read  write   read  write
> > ----------  -----  -----  -----  -----  -----  -----
> > zroot       51.8G   408G      3     14  25.4K   150K
> >    mirror-0  51.8G   408G      3     14  25.4K   150K
> >      ada1p3      -      -      1      7  12.2K  74.8K
> >      ada0p3      -      -      1      7  13.2K  74.8K
> > ----------  -----  -----  -----  -----  -----  -----
> > =

> > During a clone:
> > =

> >                capacity     operations     bandwidth
> > pool        alloc   free   read  write   read  write
> > ----------  -----  -----  -----  -----  -----  -----
> > zroot       51.8G   408G     22     24  2.87M   226K
> >    mirror-0  51.8G   408G     22     24  2.87M   226K
> >      ada1p3      -      -     11     12  1.47M   113K
> >      ada0p3      -      -     11     12  1.40M   113K
> > ----------  -----  -----  -----  -----  -----  -----
> > =

> > As expected the read goes up during the the clone. But not
> > to a level I have conserne about the load.
>
>
> I assume ada0 and ada1 are HDD's (?).  (If so, an SSD cache device =

> really helps; but that should be unrelated to this issue.)

This are SSD's.


> Does the "During a clone" output correspond to the first clone after a =

> reboot or to a second or subsequent clone?  You want to look at the =

> former; the latter may not show much because of caching.

I belive it was during a failing one.

I can check for the output of the first one after boot next week.

> >> Start the following command in another terminal on the server to monit=
or
> >> CPU and/or IO activity (press 'm' to switch between the two) (press 'q=
'
> >> to exit):
> >>
> >> # top -S -s 60
> > =

> > This also looks as expected. The git process (chiled of git-upload-pack=
)
> > uses cpu and memory also creates IO. I have some output some secounds
> > befor the git was killed (sorted by RES):
> > =

> > Mem: 598M Active, 426M Inact, 166M Laundry, 1223M Wired, 1020M Free
> > ARC: 543M Total, 185M MFU, 82M MRU, 16K Anon, 8618K Header, 267M Other
> >       98M Compressed, 240M Uncompressed, 2,45:1 Ratio
> > Swap: 4096M Total, 17M Used, 4079M Free
> >   Displaying CPU statistics.
> >    PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU=
 COMMAND
> > 45719 satanist      1  30    0  1289M   986M pipdwt   2   0:26  17,77% =
git
> > 53388  10001       44  52    0  2755M   237M uwait    2   2:23   0,15% =
java
> > =

> > After the java process there are only processes with less then 100MB
> > reserved.
> > =

> > I don't know excactly, but it looks like RES and SIZE adds memory
> > allocation and memory mapped files. In this case I would argue there is
> > sufficient memory availible to drop, because it can be read from disk.
>
>
> Check 'man top' for SIZE and RES field definitions just to be sure.  My =

> top(1) looks somewhat different.

top(1):
> SIZE is the total size of the process (text, data, and stack),
> RES is the current amount of resident memory

So I would interprete this as the mmap is included.

> The difference between SIZE and RES is 303M for git and 2518M for java; =

> 2821M total.  This total is larger than any Mem, ARC (ZFS), or Swap =

> statistic.  My guess is that the overall statistics do not include =

> memory-mapped files (?).

No this only means the memory was allocated but not in physical
memory. For the java process this is propably some libs which are =

currently not used. For git RES goes constanly up till RES and SIZE is
about the same and then it tries to mmap the next junk of the file.


> See if your top(1) can display the "RSfd" field -- Resident File-Backed =

> Memory Size, which should include file mappings.

I haven't found an option to do so.

> "1020M Free" is shown.  Was the server hung when you ran top(1)?

No. Currently no hung during clone.


> Does the server have a console?  Do you have access?  If you log in to =

> the console before testing and renice that process to a negative =

> priority, the console might keep working when everything else hangs.  If =

> you try the same trick over SSH, it could interact with SSH and Git.

The server has a console over IPMI. I can request a shell for IPMI
access from my provider. I can try to monitor it over this console
next week.

Philipp



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?97376e6a2eefd7e62e2118d505cd7584.philipp>