Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 Sep 2011 01:10:10 +0300
From:      Andrey Kosachenko <andrey.kosachenko@gmail.com>
To:        freebsd-x11@freebsd.org
Subject:   Re: xorg-dev + intel driver + KMS
Message-ID:  <4E6A8EC2.8050305@gmail.com>
In-Reply-To: <20110908233844.GT17489@deviant.kiev.zoral.com.ua>
References:  <4E543828.2040703@gmail.com> <20110824081303.GG17489@deviant.kiev.zoral.com.ua> <4E61DF8F.1090206@gmail.com> <20110903082645.GR17489@deviant.kiev.zoral.com.ua> <20110903104701.GY17489@deviant.kiev.zoral.com.ua> <4E63983C.6000702@gmail.com> <20110904154615.GH17489@deviant.kiev.zoral.com.ua> <4E63B680.2060504@gmail.com> <20110904174333.GL17489@deviant.kiev.zoral.com.ua> <4E690971.8080208@gmail.com> <20110908233844.GT17489@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Konstantin,

On 09.09.2011 02:38, Kostik Belousov wrote:
> If you are not interested in the story, just try 9.1 patch.
 >
> If you are, please stay with me. Apparently, your pagedaemon is sleeping
> in 915unm state, that made me very much worrying. I did not understand
> how this could happen, because I thought that this is caused by
> pagedaemon dropping the last reference on the gem object device pager.
> And pagedaemon must not see pages belonging to device pagers, the pages
> must not appear on any queue.
>
> I added assertions to make sure to get the panic if a fictitious page
> is found on queues, which did not fired. But, I was able to reproduce
> the situation with pagedaemon hang, by running gem_stress and performing
> active swapping in parallel. I forgot that I finally implemented the low
> memory handler for gem, which is called from pagedaemon and which also
> does purging on the gem buffers.
>
> After that, it was relatively easy to track the issue. See the comment
> at the beginning of i915_gem_pager_fault() about interaction with
> i915_gem_release_mmap() which describes the cause of the hang:
>
> 	/*
> 	 * Remove the placeholder page inserted by vm_fault() from the
> 	 * object before dropping the object lock. If
> 	 * i915_gem_release_mmap() is active in parallel on this gem
> 	 * object, then it owns the drm device sx and might find the
> 	 * placeholder already. Then, since the page is busy,
> 	 * i915_gem_release_mmap() sleeps waiting for the busy state
> 	 * of the page cleared. We will be not able to acquire drm
> 	 * device lock until i915_gem_release_mmap() is able to make a
> 	 * progress.
> 	 */
>
> For me, the patched driver survived while doing 'sort /dev/zero' and
> gem_stress in parallel.

great!
I confirm that with all.9.1.patch system remains stable even under high 
memory pressure.

I tried your test (thanks, it is actually exactly what I had been 
looking for quite a long time: i.e. exact STR of the issue). Running 
"gem_stress" and "sort /dev/zero" in parallel turned my system into 
unusable state within less then 10 seconds. Repeated test 3 times in a 
row. The outcome was the same in all cases: X server hanged (reset was 
the only way out to get machine operational).

After applying all.9.1.patch I ran the same test again and system 
remained stable and even pretty responsive. Both (gem_stress and sort 
/dev/zero) were running for a while and after a couple of minutes sort 
process was killed by system (with "out of swap space" error).

Will keep an eye on it so should I notice more issues will let you know.
Thanks! I really appreciate it!

--
WBR,
Andrey Kosachenko



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E6A8EC2.8050305>