Date: Sat, 12 May 2001 16:58:14 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Rik van Riel <riel@conectiva.com.br> Cc: arch@freebsd.org, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping Message-ID: <200105122358.f4CNwEr20137@earth.backplane.com> References: <Pine.LNX.4.21.0105121109210.5468-100000@imladris.rielhome.conectiva>
next in thread | previous in thread | raw e-mail | index | archive | help
: :But if the larger processes never get a chance to make decent :progress without thrashing, won't your system be slowed down :forever by these (thrashing) large processes? : :It's nice to protect your small processes from the large ones, :but if the large processes don't get to run to completion the :system will never get out of thrashing... Consider the case where you have one large process and many small processes. If you were to skew things to allow the large process to run at the cost of all the small processes, you have just inconvenienced 98% of your users so one ozob can run a big job. Not only that, but there is no guarentee that the 'big job' will ever finish (a topic of many a paper on scheduling, BTW)... what if it's been running for hours and still has hours to go? Do we blow away the rest of the system to let it run? What if there are several big jobs? If you skew things in favor of one the others could take 60 seconds *just* to recover their RSS when they are finally allowed to run. So much for timesharing... you would have to run each job exclusively for 5-10 minutes at a time to get any sort of effiency, which is not practical in a timeshare system. So there is really very little that you can do. :Indeed, the speed limiting of the pageout scanning takes care of :this. But still, having the swapout threshold defined as being :short of inactive pages while the swapin threshold uses the number :of free+cache pages as an indication could lead to the situation :where you suspend and wake up processes while it isn't needed. : :Or worse, suspending one process which easily fit in memory and :then waking up another process, which cannot be swapped in because :the first process' memory is still sitting in RAM and cannot be :removed yet due to the pageout scan speed limiting (and also cannot :be used, because we suspended the process). We don't suspend running processes, but I do believe FreeBSD is still vulnerable to this issue. Suspending the marked process when it hits the vm_fault code is a good idea and would solve the problem. If the process never takes an allocation fault, it probably doesn't have to be swapped out. The normal pageout would suffice for that process. :> The pagein and pageout rates have nothing to do with thrashing, per say, :> and should never be arbitrarily limited. : :But they are, with the pageout daemon going to sleep for half a :second if it doesn't succeed in freeing enough memory at once. :It even does this if a large part of the memory on the active :list belongs to a process which has just been suspended because :of thrashing... No. I did say the code was complex. A process which has been suspended for thrashing gets all of its pages depressed in priority. The page daemon would have no problem recovering the pages. See line 1458 of vm_pageout.c. This code also enforces the 'memoryuse' resource limit (which is perhaps even more important). It is not necessary to try to launder the pages immediately. Simply depressing their priority is sufficient and it allows for quicker recovery when the thrashing goes away. It also allows us to implement the vm.swap_idle_{threshold1,threshold2,enabled} sysctls trivially, which results in proactive swapping that is extremely useful in certain situations (like shell machines with lots of idle users). The pagedaemon gets behind when there are too many active pages in the system and the pagedaemon is unable to move them to the inactive queue due to the pages still being very active... that is, when the active resident set for all processes in the system exceeds available memory. This is what triggers thrashing. Swapping has the side effect of reducing the total active resident set for the system as a whole, fixing the thrashing problem. -Matt :> I don't think it's possible to write a nice neat thrash-handling :> algorithm. It's a bunch of algorithms all working together, all :> closely tied to the VM page cache. Each taken alone is fairly easy :> to describe and understand. All of them together result in complex :> interactions that are very easy to break if you make a mistake. : :Heheh, certainly true ;) : :cheers, : :Rik To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200105122358.f4CNwEr20137>