Unix Technical Forum

Xwindow hang on osr507

This is a discussion on Xwindow hang on osr507 within the Sco Unix forums, part of the Unix Operating Systems category; --> Bela Lubkin <belal@sco.com> wrote in message news:<20031008083936.GJ714@sco.com>... > Roger Cornelius wrote: > > > > > | I have ...


Go Back   Unix Technical Forum > Unix Operating Systems > Sco Unix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #21 (permalink)  
Old 02-15-2008, 11:51 AM
Roger Cornelius
 
Posts: n/a
Default Re: Xwindow hang on osr507

Bela Lubkin <belal@sco.com> wrote in message news:<20031008083936.GJ714@sco.com>...
> Roger Cornelius wrote:
>
> > > > | I have two dissimilar 5.0.7 systems which exhibit the same problem.
> > > > | When exiting from a console X session, X hangs approximately 75% of the
> > > > | time. It appears to be exiting, but I end up with a blank root window
> > > > | with the crosshatch pattern and an "x" as the mouse pointer. I can move
> > > > | the pointer but nothing else. Alt-Fkey or ctrl-prtscreen will switch
> > > > | away, but I just get a blank screen. Attempting to switch to another
> > > > | tty again results in a beep.
> > > > |
> > > > | The systems:
> > > > | IBM x345
> > > > | SCO odt window manager
> > > > | On board video identified by mkdev graphics as:
> > > > | ATI RAGE PRO/LT-PRO/XL/Mobility (P/M/M1)
> > > > | Also tried an ATI Xpert@Play card with same results.
> > > > |
> > > > | Dell Precision 330
> > > > | fvwm2 window manager
> > > > | Matrox Millenium G200 (configured for Matrox G100/G200/G400 series
> > > > | adapters)
> > > > |
> > > > | Both systems have osr507mp and osr507up installed.
> > > > |
> > > > | I've tried various resolution configurations in mkdev graphics but no
> > > > | change in the problem.
> > > > |
> > > > | After the hang and from another login, I can kill the X process which
> > > > | results in a black or sometimes garbled screen. I can log in again,
> > > > | though I can't see what's happening on the screen. On the Dell box, I
> > > > | can then log out and the screen returns to normal. On the IBM box,
> > > > | logging out just gives me another blank screen.

>
> I asked you to try editing each entry in the active grafinfo file to
> add:
>
> > > MEMORY(VID, 0x000A0000,0x0020000); /* Standard VGA video memory window */

>
> after the existing "MEMORY" line(s) in each mode. You say:
>
> > This changed the behaviour on the IBM system and possibly fixed it on
> > the Dell. For the latter, the couple of opportunities I've had to exit
> > X worked correctly.

>
> Perhaps you could cycle it a few more times for confidence? If it's as
> random as it seemed, just running the X server and exiting as quickly as
> possible ought to be a decent "smoke test".
>
> > For the former, I exited X three times today. The
> > first time, I was returned to the shell prompt as should be normal. The
> > second time, I got a blank, black screen, like JPR described, which I
> > used to log in blind, then ran clean_screen which got the video back.
> > The third time, I got a kernel panic and reboot.

>
> So previously the X server was hanging on exit (not affecting the whole
> machine) about 75% of the time. I assume that 75% is a very rough
> estimate. Now, out of 3 samples, one exited cleanly and two more went
> wrong (in different ways). So without further examination of the
> failure modes, I would tend to conclude that whatever was causing the
> problem is still happening. Only the failure modes have changed. That
> is, if you were to run 100 cycles under the new setup, you would see
> about 25 successful exits, about 75 failures -- same as before.
>
> Since the new failure modes include worse options (panic vs. a mere
> unusable screen), you should probably undo the patch on the IBM.
>
> Repeating part of the original message:
>
> > > > | After the hang and from another login, I can kill the X process which
> > > > | results in a black or sometimes garbled screen. I can log in again,
> > > > | though I can't see what's happening on the screen. On the Dell box, I
> > > > | can then log out and the screen returns to normal. On the IBM box,
> > > > | logging out just gives me another blank screen.

>
> Let's go back to the original grafinfo file. After a "bad" exit, you
> seem to be saying the X server is still running. You can see this from
> a network login, so the rest of the system is fine.
>
> I don't quite understand from this description what happens on the IBM
> when you run a new X server. Are you saying that it too is blank, or
> that it displays normally? In other words, has the console become
> totally unusable at this point, or are you able to return to a usable X
> server as often as you want, but not to text mode?
>
> Anyway, next time the exit hang happens, examine that X server's process
> tree. In particular, does it have a subprocess called `vbiosd`? What
> happens if you kill _that_ rather than the X server -- does X then
> finish exiting in a more normal manner?
>
> I'm thinking that you may end up with a still blank or trashed screen,
> but at least your ability to flip multiscreens should return. It might
> be that you can flip, but still can't see what you're doing. But you
> should be able to distinguish between e.g. a multiscreen that was
> sitting at a shell prompt; `echo '\07'` will beep -- vs. one that was
> sitting at a login prompt.
>
> Once the X server has exited relatively gracefully, try to get to a
> shell prompt and run /etc/clean_screen. If you can't get to a shell
> prompt on the console, run it from the network login as `clean_screen
> < /dev/tty02` (substituting the name of the tty on which X was running
> -- or, if you've flipped multiscreens, the one you think is currently
> "displayed").
>
> I'm trying both to develop a viable workaround for temporary use; and to
> better understand the problem so that we can solve it permanently
> without a clumsy workaround. So please describe the results very
> carefully.
>
> Now, back to the panic:
>
> > Here are [what I think
> > are] the important parts of the output of crash's panic command:
> >
> > Unexpected trap in kernel mode:
> > cr0 0x8001003B cr2 0x0011001C cr3 0x00002000 tlb 0x00000000
> > ss 0x00000001 uesp 0x0080A2CC efl 0x00010286 ipl 0x00000000
> > cs 0x00000158 eip 0xF005919A err 0x00000002 trap 0x0000000E
> > eax 0x00002000 ecx 0x00000001 edx 0x00000014 ebx 0xE0000E1C
> > esp 0xE0000DE0 ebp 0xE0000E0C esi 0x00000001 edi 0x00000000
> > ds 0x00000160 es 0x00000160 fs 0x00000000 gs 0x00000000
> > cpu 0x00000001

>
> ...
>
> > Kernel Stack before Trap:
> > STKADDR FRAMEPTR FUNCTION POSSIBLE ARGUMENTS
> > e0000de0 e0000e0c v86vint (u+0xe1c,0)

>
> Hmmm. Well, it panic'd while running code under an interrupt that was
> being serviced in virtual 8086 mode. Presumably that would be an
> interrupt that was provoked by something the adapter's BIOS did while
> coming down from graphics mode; and should have been handled by code
> within the BIOS. The panic was a trap E (an illegal memory reference);
> the bad reference address was 0x11001C (CR2). That address isn't a
> sensible address for BIOS code to be accessing. We have no basis to
> determine whether this is a BIOS bug or a bug in the simulated 8086
> environment under which the Unix kernel is running the BIOS.
>
> This does remind me of another thing that you should try, though. In
> fact something that all three of the original posters should try. Many
> modern systems have a BIOS setup item that boils down to "Should an
> interrupt vector be assigned to the video board?". In most cases this
> should be set to "no" for Unix. To be precise, I do not know of any
> case where it needs to be "yes", but I could easily believe that some
> video BIOSes might require it and I simply haven't run into one. This
> is another one of those things that you'll learn about right away: if
> you turn it off and the board/BIOS really need it, getting _into_ X will
> fail and you'll back out the change.
>
> Yet a third thing that you could try is to disable the high-precision
> timer interrupts that were first introduced in OSR506. To do this, boot
> with "defbootstr clock.disable_short_timers=1". The BIOS code may be
> getting an unexpectedly high speed stream of timer interrupts, which
> could get it in trouble.
>
> > I'll post again as I have more details, but I won't have console access
> > to the IBM again until Thursday.

>
> I've given you several conflicting ideas to try. When you have access,
> you'll have to decide what to fiddle with. I don't think it would be
> wise to try more than one of these ideas at the same time, because you
> wouldn't be able to tell which behavior changes were caused by what.
>
> I think my order of attack would be:
>
> 1. Revert to the original grafinfo -- the change didn't help in this
> case, and made the failure mode worse at times
>
> 2. Disable VGA IRQ in BIOS setup; test
>
> 3. Unless that made X unusable, leave it off even if it didn't help,
> because it leaves more IRQs free for other devices
>
> 4. Try "defbootstr clock.disable_short_timers=1"; test
>
> 5. If that doesn't fix the problem, reboot without it and forget about
> that setting
>
> 6. If neither of those fix the problem, work towards a workaround
> based on killing `vbiosd` and running `clean_screen`
>
> 7. Comment on all the steps you took so we learn what was really
> relevant...
>
> >Bela<


Apologies for taking so long to reply. I did some testing last
Thursday and then, Thursday night, read all the posts relating this as
a Mozilla problem so wanted to test some more, which I did over the
weekend. I found the problems on the two systems I have to be
different. Joe Chasan first suggested Mozilla as the culprit but this
didn't seem likely since I rarely use it. On the Dell system, I
typically start X and don't exit except when I need to reboot the
system, which could be weeks or months later. It's true I rarely use
Mozilla, but it apparently only takes once during an X session to
cause the hang on exit. I confirmed that mozilla is the cause of the
problem on the Dell. As was suggested in a later post, I compared
processes before and after executing Mozilla, and found that
/opt/mozilla/lib/run-mozilla.sh was left as a defunct process on one
execution, but not the next.

The IBM's problem is not related to Mozilla, though Mozilla does cause
X to hang on exit just like on the Dell. When I exit X on the IBM,
without having previously run mozilla, I get a black screen which I
can run clean_screen blind to get the video back. The suggestion to
"Disable VGA IRQ in BIOS setup" didn't work. I can change the IRQ
used but not disable it. The "defbootstr
clock.disable_short_timers=1" suggestion had no effect.

So the result of this thread is that I understand the problems better
and have workarounds for them, but not really a solution.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:44 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com