This is a discussion on Xwindow hang on osr507 within the Sco Unix forums, part of the Unix Operating Systems category; --> Bela Lubkin <belal@sco.com> wrote in message news:<20031008083936.GJ714@sco.com>... > Roger Cornelius wrote: > > > > > | I have ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Bela Lubkin <belal@sco.com> wrote in message news:<20031008083936.GJ714@sco.com>... > Roger Cornelius wrote: > > > > > | I have two dissimilar 5.0.7 systems which exhibit the same problem. > > > > | When exiting from a console X session, X hangs approximately 75% of the > > > > | time. It appears to be exiting, but I end up with a blank root window > > > > | with the crosshatch pattern and an "x" as the mouse pointer. I can move > > > > | the pointer but nothing else. Alt-Fkey or ctrl-prtscreen will switch > > > > | away, but I just get a blank screen. Attempting to switch to another > > > > | tty again results in a beep. > > > > | > > > > | The systems: > > > > | IBM x345 > > > > | SCO odt window manager > > > > | On board video identified by mkdev graphics as: > > > > | ATI RAGE PRO/LT-PRO/XL/Mobility (P/M/M1) > > > > | Also tried an ATI Xpert@Play card with same results. > > > > | > > > > | Dell Precision 330 > > > > | fvwm2 window manager > > > > | Matrox Millenium G200 (configured for Matrox G100/G200/G400 series > > > > | adapters) > > > > | > > > > | Both systems have osr507mp and osr507up installed. > > > > | > > > > | I've tried various resolution configurations in mkdev graphics but no > > > > | change in the problem. > > > > | > > > > | After the hang and from another login, I can kill the X process which > > > > | results in a black or sometimes garbled screen. I can log in again, > > > > | though I can't see what's happening on the screen. On the Dell box, I > > > > | can then log out and the screen returns to normal. On the IBM box, > > > > | logging out just gives me another blank screen. > > I asked you to try editing each entry in the active grafinfo file to > add: > > > > MEMORY(VID, 0x000A0000,0x0020000); /* Standard VGA video memory window */ > > after the existing "MEMORY" line(s) in each mode. You say: > > > This changed the behaviour on the IBM system and possibly fixed it on > > the Dell. For the latter, the couple of opportunities I've had to exit > > X worked correctly. > > Perhaps you could cycle it a few more times for confidence? If it's as > random as it seemed, just running the X server and exiting as quickly as > possible ought to be a decent "smoke test". > > > For the former, I exited X three times today. The > > first time, I was returned to the shell prompt as should be normal. The > > second time, I got a blank, black screen, like JPR described, which I > > used to log in blind, then ran clean_screen which got the video back. > > The third time, I got a kernel panic and reboot. > > So previously the X server was hanging on exit (not affecting the whole > machine) about 75% of the time. I assume that 75% is a very rough > estimate. Now, out of 3 samples, one exited cleanly and two more went > wrong (in different ways). So without further examination of the > failure modes, I would tend to conclude that whatever was causing the > problem is still happening. Only the failure modes have changed. That > is, if you were to run 100 cycles under the new setup, you would see > about 25 successful exits, about 75 failures -- same as before. > > Since the new failure modes include worse options (panic vs. a mere > unusable screen), you should probably undo the patch on the IBM. > > Repeating part of the original message: > > > > > | After the hang and from another login, I can kill the X process which > > > > | results in a black or sometimes garbled screen. I can log in again, > > > > | though I can't see what's happening on the screen. On the Dell box, I > > > > | can then log out and the screen returns to normal. On the IBM box, > > > > | logging out just gives me another blank screen. > > Let's go back to the original grafinfo file. After a "bad" exit, you > seem to be saying the X server is still running. You can see this from > a network login, so the rest of the system is fine. > > I don't quite understand from this description what happens on the IBM > when you run a new X server. Are you saying that it too is blank, or > that it displays normally? In other words, has the console become > totally unusable at this point, or are you able to return to a usable X > server as often as you want, but not to text mode? > > Anyway, next time the exit hang happens, examine that X server's process > tree. In particular, does it have a subprocess called `vbiosd`? What > happens if you kill _that_ rather than the X server -- does X then > finish exiting in a more normal manner? > > I'm thinking that you may end up with a still blank or trashed screen, > but at least your ability to flip multiscreens should return. It might > be that you can flip, but still can't see what you're doing. But you > should be able to distinguish between e.g. a multiscreen that was > sitting at a shell prompt; `echo '\07'` will beep -- vs. one that was > sitting at a login prompt. > > Once the X server has exited relatively gracefully, try to get to a > shell prompt and run /etc/clean_screen. If you can't get to a shell > prompt on the console, run it from the network login as `clean_screen > < /dev/tty02` (substituting the name of the tty on which X was running > -- or, if you've flipped multiscreens, the one you think is currently > "displayed"). > > I'm trying both to develop a viable workaround for temporary use; and to > better understand the problem so that we can solve it permanently > without a clumsy workaround. So please describe the results very > carefully. > > Now, back to the panic: > > > Here are [what I think > > are] the important parts of the output of crash's panic command: > > > > Unexpected trap in kernel mode: > > cr0 0x8001003B cr2 0x0011001C cr3 0x00002000 tlb 0x00000000 > > ss 0x00000001 uesp 0x0080A2CC efl 0x00010286 ipl 0x00000000 > > cs 0x00000158 eip 0xF005919A err 0x00000002 trap 0x0000000E > > eax 0x00002000 ecx 0x00000001 edx 0x00000014 ebx 0xE0000E1C > > esp 0xE0000DE0 ebp 0xE0000E0C esi 0x00000001 edi 0x00000000 > > ds 0x00000160 es 0x00000160 fs 0x00000000 gs 0x00000000 > > cpu 0x00000001 > > ... > > > Kernel Stack before Trap: > > STKADDR FRAMEPTR FUNCTION POSSIBLE ARGUMENTS > > e0000de0 e0000e0c v86vint (u+0xe1c,0) > > Hmmm. Well, it panic'd while running code under an interrupt that was > being serviced in virtual 8086 mode. Presumably that would be an > interrupt that was provoked by something the adapter's BIOS did while > coming down from graphics mode; and should have been handled by code > within the BIOS. The panic was a trap E (an illegal memory reference); > the bad reference address was 0x11001C (CR2). That address isn't a > sensible address for BIOS code to be accessing. We have no basis to > determine whether this is a BIOS bug or a bug in the simulated 8086 > environment under which the Unix kernel is running the BIOS. > > This does remind me of another thing that you should try, though. In > fact something that all three of the original posters should try. Many > modern systems have a BIOS setup item that boils down to "Should an > interrupt vector be assigned to the video board?". In most cases this > should be set to "no" for Unix. To be precise, I do not know of any > case where it needs to be "yes", but I could easily believe that some > video BIOSes might require it and I simply haven't run into one. This > is another one of those things that you'll learn about right away: if > you turn it off and the board/BIOS really need it, getting _into_ X will > fail and you'll back out the change. > > Yet a third thing that you could try is to disable the high-precision > timer interrupts that were first introduced in OSR506. To do this, boot > with "defbootstr clock.disable_short_timers=1". The BIOS code may be > getting an unexpectedly high speed stream of timer interrupts, which > could get it in trouble. > > > I'll post again as I have more details, but I won't have console access > > to the IBM again until Thursday. > > I've given you several conflicting ideas to try. When you have access, > you'll have to decide what to fiddle with. I don't think it would be > wise to try more than one of these ideas at the same time, because you > wouldn't be able to tell which behavior changes were caused by what. > > I think my order of attack would be: > > 1. Revert to the original grafinfo -- the change didn't help in this > case, and made the failure mode worse at times > > 2. Disable VGA IRQ in BIOS setup; test > > 3. Unless that made X unusable, leave it off even if it didn't help, > because it leaves more IRQs free for other devices > > 4. Try "defbootstr clock.disable_short_timers=1"; test > > 5. If that doesn't fix the problem, reboot without it and forget about > that setting > > 6. If neither of those fix the problem, work towards a workaround > based on killing `vbiosd` and running `clean_screen` > > 7. Comment on all the steps you took so we learn what was really > relevant... > > >Bela< Apologies for taking so long to reply. I did some testing last Thursday and then, Thursday night, read all the posts relating this as a Mozilla problem so wanted to test some more, which I did over the weekend. I found the problems on the two systems I have to be different. Joe Chasan first suggested Mozilla as the culprit but this didn't seem likely since I rarely use it. On the Dell system, I typically start X and don't exit except when I need to reboot the system, which could be weeks or months later. It's true I rarely use Mozilla, but it apparently only takes once during an X session to cause the hang on exit. I confirmed that mozilla is the cause of the problem on the Dell. As was suggested in a later post, I compared processes before and after executing Mozilla, and found that /opt/mozilla/lib/run-mozilla.sh was left as a defunct process on one execution, but not the next. The IBM's problem is not related to Mozilla, though Mozilla does cause X to hang on exit just like on the Dell. When I exit X on the IBM, without having previously run mozilla, I get a black screen which I can run clean_screen blind to get the video back. The suggestion to "Disable VGA IRQ in BIOS setup" didn't work. I can change the IRQ used but not disable it. The "defbootstr clock.disable_short_timers=1" suggestion had no effect. So the result of this thread is that I understand the problems better and have workarounds for them, but not really a solution. |