Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Jun 2005 23:00:56 +0200
From:      =?ISO-8859-1?Q?Eirik_=D8verby?= <eirik@unicore.no>
To:        Brian Fundakowski Feldman <green@freebsd.org>
Cc:        stable@freebsd.org
Subject:   Re: Jails that won't die...
Message-ID:  <67DA5F6F-62D2-4371-8707-CFB06B16E269@unicore.no>
In-Reply-To: <20050630205629.GG1074@green.homeunix.org>
References:  <92135CB3-5540-4D06-A991-708C8AAD6AC7@unicore.no> <20050628145859.GC1074@green.homeunix.org> <CA38D1F9-3976-4DE9-BED1-DB8935EDD1D4@unicore.no> <20050629185803.GE1074@green.homeunix.org> <23ED6035-A1AE-4F38-853F-D0D42D42E934@unicore.no> <20050630205629.GG1074@green.homeunix.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 30. jun. 2005, at 22.56, Brian Fundakowski Feldman wrote:

> On Thu, Jun 30, 2005 at 03:53:56PM +0200, Eirik =D8verby wrote:
>
>>
>> On 29. jun. 2005, at 20.58, Brian Fundakowski Feldman wrote:
>>
>>
>>> On Wed, Jun 29, 2005 at 03:28:09PM +0200, Eirik =D8verby wrote:
>>>
>>>
>>>>
>>>> On 28. jun. 2005, at 16.58, Brian Fundakowski Feldman wrote:
>>>>
>>>>
>>>>
>>>>> On Tue, Jun 28, 2005 at 10:37:29AM +0200, Eirik =D8verby wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have, since upgrading to 5.x and updating my management tools,
>>>>>> seen
>>>>>> a number of problems relating to stopping jails.
>>>>>>
>>>>>> I'm maintaining several hosts with a number of full-featured =20
>>>>>> jails
>>>>>> (i.e. full virtual FreeBSD installations in each jail), and in
>>>>>> general this works fine. However, whenever I stop a jail using
>>>>>> 'jexec
>>>>>> <id> kill -SIGNAL -1' or 'jexec <id> /bin/sh /etc/=20
>>>>>> rc.shutdown' (in
>>>>>> various combinations), jails have a tendency to stick around for
>>>>>> minutes or hours - according to 'jls'. Often I see an entry in
>>>>>> 'netstat -a' indicating that there is one or more sockets in
>>>>>> FIN_WAIT
>>>>>> state, preventing the jail from coming down. Taking the virtual
>>>>>> network interface (alias) down does not help. All I can do at =20
>>>>>> this
>>>>>> point is wait.
>>>>>>
>>>>>> I normally use 'jls' to determine whether or not a jail can be
>>>>>> restarted (i.e. it's not running), but this is pretty useless in
>>>>>> such
>>>>>> cases. And right now I have a case where 'netstat -a' shows me
>>>>>> nothing pertaining to the jail, though it has no processes
>>>>>> running. I
>>>>>> have therefore force-started the jail again, which seems to work
>>>>>> nicely, but now 'jls' gives me two entries for this jail, with
>>>>>> different JIDs.
>>>>>>
>>>>>> What am I doing wrong here?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> You could just use ps to check for jailed processes and check =20
>>>>> their
>>>>> respective jails using the procfs status entry (at least according
>>>>> to the ps manpage...)
>>>>>
>>>>>
>>>>
>>>> My jailctl script can do both - list by jls and list by =20
>>>> processes in
>>>> the jail. There are NO processes running in the jail.
>>>>
>>>>
>>>
>>> So it's obviously not running, and you can mark its state as such.
>>>
>>
>> ...which is what I do on FreeBSD 4.x, but on 5.x the 'jls' command
>> still claims the jail is running. I think this is unbelieveably
>> dirty. Also, using /proc to determine if a jail is still running is a
>> bad idea, as mounting /proc is depreceated.
>>
>
> The deprecation is due to security concerns, not bit-rot.  You can
> just mount it with root-readable-only permissions.  The jls for
> current isn't incorrect, you're just expecting a different criteria to
> mean "alive" than it is using.  It would take increased kernel
> complexity to do what you want if you're not going to do it in
> userland.

I am aware of that. However, I have seen instabilities with /proc as =20
well, but that's another story.

> Anyway, why aren't you just using a /var/run file in the "real" system
> to tell whether the jail is running or not?  It's the corollary to
> pid files versus doing "killall"...  Just seems like something really
> trivial to implement as you like it in the userland.

Sure, this is what I fall back on when running my jailctl script (/=20
usr/ports/sysutils/jailctl) on 4.x. However, I NEED 'jls' to be =20
correct, because I use it to inject other processes (like executing =20
shutdown scripts inside the jails when taking them down, etc.). I =20
suppose I could sort the output of jls on jail id and always use =20
whichever instance of a jail has the highest ID, but I don't know how =20=

these IDs work - if they are recycled, if they "wrap around" at some =20
point, etc.

In any case it would be nice to know which criteria exactly jls uses =20
- and perhaps a way to remove whichever criteria that keeps it =20
thinking the jail is still running.

Thing is - sometimes jails stop just fine. Other times they don't. It =20=

all depends. Perhaps I should get lsof or something, see if there are =20=

any open files (though I think I tried once without finding any)...

/Eirik

>
> --=20
> Brian Fundakowski Feldman                           =20
> \'[ FreeBSD ]''''''''''\
>   <> green@FreeBSD.org                               \  The Power =20
> to Serve! \
>  Opinions expressed are my own.                       =20
> \,,,,,,,,,,,,,,,,,,,,,,\
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?67DA5F6F-62D2-4371-8707-CFB06B16E269>