Saturday, November 19, 2011

NVidia Linux drivers...

Well, after several hard lockups during past several months and few other bugs  in even longer time, I'll be certain next time I buy computer to try very hard to buy one with ATI video card. I can not describe how much I hate binary driver from NVidia, and by extension, NVidia! Here is why I hate it:
  1. I have to manually install it. This means that every time either kernel or X driver is updated, I have to go to a single user mode or runlevel 3 to compile NVidia driver. Yeah, I know, there are rpm packages in rpmforge or similar repository, but for some reason I wasn't satisfied and I don't use it for a long time now. Nevertheless, even if I were to use it, it want help for the next two problems!
  2. It lockups, and the lockups are hard, i.e. nothing but power button helps. This happens regularly without any signs before, suddenly system is freezed and nothing works! Nor it is possible to login via network to restart the computer!
  3. Some programs, most frequently LibreOffice, have problems with redrawing the screen. At first, I thought that it is a bug in those programs, but now I'm convinced that the problem is in the video driver.
And not to forget, when I had ATI card (on Lenovo W500) dual monitors and rotations worked fantastically! And all could be controlled from a small applet in the tray. With NVidia, nothing worked so flawlessly.

I tried to download newer driver from NVidia ftp site. That was 290.06 at the time I was looking. But it locked up machine even more frequently. Now, it is marked as beta and it is, in some way, expected. So I went on NVidia site to see which version is considered stable, and that was 285.05.09, the one I had problems with in the start and the one I tried to replace!

The reason I went with NVidia binary driver was gnome-shell. Namely, when Fedora switched to gnome-shell it required 3D support and nouveau didn't support 3D capabilities of my graphic card (Quadro FX 880M on Lenovo W510). That meant using fall-back GUI that wasn't usable for me, and besides, I wanted to try gnome-shell.

So, after all this, I decided to try again nouveau driver. And for that reason I had to disable NVidia driver. At first I thought that it will be simple, but it turned out not to be!

Disabling NVidia propriatery driver

First I switched to runlevel 3 in order to turn off graphics subsystem (i.e. X11):
init 3
Ok, the first thing I did is that I blacklisted nvidia driver and removed nouveau from blacklist. This is done in /etc/modprobe.d/blacklist.conf file. In there, you'll find the following line:
blacklist nouveau
Comment out (or remove) that line, and add the folowing one:
blacklist nvidia
Since I didn't want to reboot machine, I also manually removed nvidia driver module from the kernel using the following command:
rmmod nvidia
To be sure that nvidia will not be loaded during boot, I also recreated initramfs image using:
dracut --force
Finally, I changed /etc/X11/xorg.conf. There, you'll find the following line:
Driver "nvidia"
That one I changed into
Driver "nouveau"
Ok, now I tried to switch to runlevel 5, i.e. to turn on X11, but it didn't work. The problem was that NVidia messed up Mesa libraries. So, it was clear it isn't possible to have both, NVidia propriatery driver and nouveau, in the same time. At least not so easy. So, I decided to remove NVidia propriatery driver. First, I switched again to runlevel 3, and then I run the following command:
nvidia-uninstall
It started, complained about changed setup, but eventually did what it is supposed to do. I also reinstalled mesa libraries:
yum reinstall mesa-libGL\*
And, I again tried to switch to runlevel 5. This tome I was greated with GDM login, but the resolution was probably the lowest possible!!! I thought that I can login and start display configuration in order to change resolution. But that didn't work! After some twidling and googling, first I tried to add the following line to xorg.conf:
Modes "1600x900"
Note that 1600x900 is the maximum resolution supported on my laptop. I placed that line in Screen section, Display subsection. After trying again, it even didn't work! Then, I remembered that new X server is autoconfigured so I removed xorg.conf all together, and the resolution was correct this time!
Ok, I tried to login, but now something happened to gnome-shell. Nautils worked (because I saw desktop), and Getting Things Gnome also started, but in the background I was greeted with gnome-shell error message that something went wrong and all I could do is to click OK (luckily moving windows worked or otherwise because of Getting Things Gnome overlaping with OK button I wouldn't be able to reach that button). When I clicked it I was back to login screen. So, what's now wrong!?

I connected from another machine to monitor /var/log/message file and during login procedure I noticed the following error message:
gnome-session[25479]: WARNING: App 'gnome-shell.desktop' respawning too quickly
Ok, the problem is gnome-shell, but why!? I decided to enter again graphicsless mode (init 3) and to start X from the command line, using startx command. Now, I spotted the following error:
X Error of failed request: GLXUnsupportedPrivateRequest
Actually, that is only the part of the error (as I didn't saved it), but it gave me a hint. Library was issuing some X server command that X server didn't understand. And, that is some private command, probably specific to each driver. So, NVidia didn't properly deinstall something. But, what? I again went to check if all packages are correct. mesa-libGL and mesa-libGLU were good (as reported by rpm -q --verify command). Then I checked all X libraries and programs:
for i in `rpm -qa xorg\*`; do echo; echo; echo $i; rpm -q --verify $i; done
Only xorg-x11-server-Xorg-1.10.4-1.fc15.x86_64 had some inconsistency, so I reinstalled it and tried again (go to graphics mode, try to login in, go back in text mode). It didn't work.

So, I didn't know what to look more, and decided to reinstall again mesa libraries. Then, I checked again with 'rpm -q --verify' command and now, I noticed something strange, the links were not right (signified by letter L in the output of the rpm command). So I went to /usr/lib64 directory, listed libraries that start with libGL and immediately spotted the problem. Namely, libGL.so and libGL.so.1 were symbolic links that were pointing to NVidia libraries, not the ones that were installed with Mesa. It turned out that the installation process didn's recreate soft links!!! Furthermore, NVidia left at least five versions of its own libraries (that many times I installed new NVidia binary driver). So, I removed NVidia libraries, recreated links to point to right libraries, and tried to login again. This time, everything worked as expected!

So, we'll see now if this is going to be more stable or not.

Edit 1
Ok, at least one of those lockups wasn't related to NVidia. It happened that I, in the same time, upgraded kernel and NVidia driver. Starting from that moment I saw new kernel oops, and since I already had problems with NVidia, I wrongly concluded that it was NVidia's fault. But now, I suspect on wireless network driver. So, I downgraded kernel version to see what will happen.

Edit 2
More than a day working without a problem. So, it seems that it was the combination of newer kernel (wireless) and NVidia binary driver!

Edit 3 - 20111123
Several days now everything is working. Once I had a problem while suspending laptop that made me reset machine. Oh, and detection and control of second monitor works much better than with NVidia proprietary (i.e. binary) driver.

No comments:

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)

Blog Archive