Sunday, December 6, 2020

Fixing a 3+ year old bug in NVIDIA GeForce Experience

Background: Joystick Preventing Display Sleep

Freedom 2.4 Joystick sans fil: Amazon.ca: Electronics

A few months ago, I thought I ought to try Microsoft Flight Simulator 2020. The facsimile of our planet that Asobo had created with photogrammetry and machine learning seemed like a good place to relax, "in these trying times." I plugged in my trusty Logitech Freedom 2.4 wireless joystick and took to the skies.

After spending a few hours flying around my alma mater and my childhood home, it was time to call it a day. I have my machine configured to turn off the displays after a few minutes of inactivity, and I quickly realized it wasn't doing that any longer.

This isn't something I'm altogether unfamiliar with. Sometimes it is a browser tab open with a sneaky video trying to play in the background, sometimes it is an issue with a web application, and sometimes it is a system task deciding it is too important to allow the machine to get anywhere near Standby/Sleep while it completes. On Windows, applications can request this privilege - of keeping the machine awake - from the operating system's Kernel-Mode Power Manager. This is useful as you wouldn't want your machine going to sleep or turning off the display while you are watching a movie, playing a game, or copying a file. You can see such requests by opening an elevated command prompt and running powercfg /requests.

Nothing.

I had an idea, though. I had previously contributed to a project called procrastitracker, a fantastic little time tracking application for Windows, and the feature I implemented was XInput activity detection. You see, I used the application to track how much time I spent playing games, and it thought my machine was idle when I was using the Xbox controller, so my change was just to have the application detect XInput as user activity. One thing I had noticed is that Windows does use controller inputs - not just mouse & keyboard inputs - to determine that the machine is currently in use, and prevent the display from sleeping. I also noticed that one of my controllers' analog inputs would actually drift around quite a bit, so my naïve implementation without deadzones was not going to work, or procrastitracker would think I was using my controller while I slept. Deadzones solved the issue with procrastitracker, and the controller never kept Windows awake, so that was good. Was my old Logitech joystick to blame? It is easily 15 years old by now. I unplugged the receiver and within minutes my display went to sleep. Mystery solved! 

Not so fast.

Enter NVIDIA

I'd made a few upgrades to my computer in the months since, which included getting a better joystick, and a throttle, and some flight rudder pedals. Windows still wasn't letting the display sleep with them plugged in, but I decided I should look into the issue more. Looking at the USB Game Controllers control panel with the new joystick plugged in, I noticed absolutely no drift. No analog instability. No spurious inputs. It certainly wasn't the device's activity keeping the machine awake, I was sure of that now.

This is a screenshot, but it doesn't look different in motion. Nothing moves.

I decided to do what's worked so well in the past - I asked Google. 


And Google knew.  People had tracked the issue down to the NVIDIA's GeForce Experience overlay, sometimes called ShadowPlay (or NVIDIA Share). It's a piece of software that allows you to use the NVIDIA graphics cards' NVENC encoder to capture compressed video in real time. People use it to share videos of moments in video games, and it's very handy because NVENC is good. Compressing high-resolution high-framerate video in real time on the CPU while maintaining quality would be quite a difficult task, especially for a machine already tasked with running a video game, and NVENC produces quality output without much additional load on the machine by leveraging fixed-function encoding hardware in the GPU. It's cool stuff, so I didn't want to just get rid of it.

So the issue is such: If you have a joystick plugged in, and the GeForce Experience overlay enabled, your display will not sleep. If you unplug the joystick, the display sleeps. If you disable the overlay, the display sleeps. You can have one or the other - but not both. 

People hadn't just tracked the issue down - people tracked it down 3 years ago! 

https://i.imgur.com/YWr9UF4.png 

I couldn't really believe the issue had gone unresolved for so long. So I reported a bug. 

https://i.imgur.com/GIdwGyG.png 

I'm sure they'll figure it out, but I wanted to have a crack at it. When you enable the overlay, many processes start up - all of the NVIDIA ones at the top.

Each one loads many modules:

My initial theory was that the overlay was perhaps probing the controllers for input, and translating those events into Windows messages. If it was injecting messages into one of its processes, maybe as keyboard events, perhaps some default event handling routine was resetting the system idle state. I had no evidence, but what I knew was that it wasn't Windows acting alone. The NVIDIA software was causing the issue, but the overlay doesn't react to joystick input anyway, so I somehow doubted that it was an accidental side-effect of intentional joystick-handling code. Abusing Win32 isn't unusual among GPU makers, so I expected it would be something weird. I should note at this point I am not an expert in Win32. On the other hand, I do own Raymond Chen's book and I've read it, too. The Old New Thing  is a great blog. I'm still a bit lost here, though.

I digress. First I needed a way to identify when the issue was happening without waiting for the display to sleep, so I quickly wrote a simple application that dumps the output of GetLastInputInfo. I didn't expect this function to be authoritative on the system idle state - you want to get SYSTEM_POWER_INFORMATION from CallNtPowerInformation for that - but it proved to be effective.

I attached to NVIDIA Share.exe in x64dbg and started looking for things related to input. 

I knew Xinput wasn't causing the issue - that's only for Xbox Controllers and those who emulate them, plus I knew procrastitracker was in the background polling Xinput all the time anyway and didn't cause this. What I did notice was that even with the process suspended in the debugger (and my little application monitoring idle state), I could see it was still getting reset. There are many processes it spawned, I thought, so I went through and suspended them one by one (killing them one by one doesn't work - they restart immediately). The idle state kept resetting. This was a huge clue - it means the application wasn't running any code to do this. It wasn't injecting messages or anything of the sort. It has to be something it does on initialization.

I wanted to see how NVIDIA Share initializes, in the debugger, but it's complicated. You can't start it directly, it needs to be started by nvcontainer.exe. It starts three copies of it, each with different parameters. They probably communicate with each other, as well, so their environment would have to be carefully managed to bring them up manually. Not insurmountable by any means, but there were other things to try. I thought it would be neat if I could attach & break in x64dbg as soon as the process starts, and some tips pointed me to WinDbg's gflags.exe utility. 

 mspaint

Theoretically you can use it to throw a key in the registry that tells Windows to execute a particular 'image' (executable) with a debugger when it's encountered. I wasn't able to get this to work - maybe because the process is spawned by nvcontainer, or maybe I just hadn't done it correctly.

Luckily, we have Ghidra. I did the same silly thing that I did in the debugger, I loaded up the most obvious executable (NVIDIA Share.exe) and asked the most obvious question. 

"Y'all got any input stuff 'round here?"

 

This was immediately promising! But first, I had to do some reading. Raw input isn't something that I'm familiar with. Back in the good old days, there was DirectInput. DirectInput let you do force feedback, DirctInput let you have tons of buttons and axes, and at least on Windows it made using games controllers generally a smoother experience than it had been in the past, where games needed to support your particular controller (or your controller's drivers needed to emulate another, more popular controller). After DirectInput came Xinput, and Xinput is very much built around the Xbox Controller. You don't get any more buttons or axes than an Xbox Controller can have. You can't connect more controllers than an Xbox would be able to connect. It "just works", but it's not the kind of API that supports uses like this:

EPIC Home Flight Sim Cockpit | HONEYCOMB | RealSimGear G1000 | SLAVX | X- Plane 11 - YouTube
The photo is not mine.

Now that heavyweight API is raw input. Anything that conforms to the HID standard will have its events passed through, and your role as an application developer is to support the HID usage pages that you deem appropriate. I especially like that in the middle of the Simulations Control page (0x02) is the usage ID for Magic Carpet Simulation (0x0B). Standards committees think of everything.

 

So, what is NVIDIA Share doing with raw input? RegisterRawInputDevices.

Don't worry, I can clean it up a bit:

 

It's registering its window handle to receive raw events from the keyboard at all times (regardless of which window is in the foreground). Keyboard, bummer. Not joystick. But it gave me an idea. What if I expand my little application to request raw input as well? What about DirectInput? Can I replicate the issue without NVIDIA's software? I spent one night and one day implementing various inputs methods, relearning Win32, and learning DirectInput ... and COM ... again.

 https://i.imgur.com/32EjAFA.png

I was able to replicate the issue. 

Enabling Raw Input for joysticks causes devices to prevent the system from becoming idle.

My suggestions to Microsoft:

  • make this clearer in the documentation
  • an application requesting raw input should show up in powercfg /requests and in WPA.

The application I wrote to demonstrate the issue is available in this repo: https://github.com/nuzayets/rawinput-debug/

But NVIDIA Share wasn't asking for raw input from the joystick. 

Not directly, anyhow. NVIDIA Share is partially built upon CEF, Chromium Embedded Framework. Why be happy with only wrapping your head around esoteric desktop development when you can throw frustrating web development into the mix? The more the merrier, I say. We didn't need that RAM anyway.

NVIDIA Share loads the Chromium Embedded Framework as an >100MB module called libcef.dll. This took Ghidra a bit of time to analyze, but I found the interesting bit.

 

They request raw input as part of their gamepad driver, which makes sense. They call that FUN_1842af9b4 to set up its parameters in all cases. Here is that function:

 

If you don't speak decompliation-ese, here's a rough translation:

https://i.imgur.com/dKZ6CRn.png
Typo: the pRawInputDevices[0] in the loop should be pRawInputDevices[i]. Here is the actual source in Chromium.

Luckily there wasn't any code patching to do. The values for the usage IDs live in the .rdata section of the executable (that DAT_1861e16e8 in Ghidra's decompilation).

The file is C:\Program Files\NVIDIA Corporation\NVIDIA GeForce Experience\libcef.dll and with my version of GeForce Experience (3.20.5.70), the offending byte was at 0x61e0ae8. Changing the 0x04 to a 0x06 means that instead of trying to get raw input from joysticks, they get it from the keyboard instead. I'm still not sure why the NVIDIA overlay was asking for raw joystick input from Chromium. 

 https://i.imgur.com/wu6lJ5D.png

I spent two days on this, and it ended up being one byte in the end. At least now my computer can sleep.

How to fix it on your machine

If you don't want to try using a hex editor, this Powershell script will do it for you.

Please disable the overlay first, and make sure Powershell is run as Administrator so you are able to write to the directory.

 

12 comments:

  1. Fantastic article! Small typo (I believe), in the picture following the "if you don't speak decompliation-ese, here's a rough translation:" sentence, pRawInputDevices[0] should probably be pRawInputDevices[i] in a few lines.

    ReplyDelete
  2. So... let me understand, with this script, my pc will make screensavers again without unplugging my devices?

    And without turning off nvidia share?

    Or turning it off?

    (cause i have tried to turn off nvidia share... (or geforcexperienceblablabla)

    But my screensaver will not turn on...

    Anyway, great work. (Didn't understand a bit, but the hard work it's evident) :P

    ReplyDelete
  3. Thank you so much for finding this! I was searching for months for the culprit and finally found out that it was my joysticks.

    I've got the same version of Geforce Experience but it is saying:

    Your libcef.dll doesn't seem to match the version this file was tested against.
    We have have two options: use heuristics to patch the libcef you've got: https://github.com/nuzayets/rawinput-debug/blob/master/universal_heuristic_patch_libcef.ps1
    Or download a version of a reasonable vintage and patch that: https://github.com/nuzayets/rawinput-debug/blob/master/universal_patch_libcef.ps1

    Sooo, ran the heuristics, but that returned a "no bueno" :P

    \GitHub\rawinput-debug\universal_heuristic_patch_libcef.ps1:100 char:5
    + throw "No bueno."
    + ~~~~~~~~~~~~~~~~~
    + CategoryInfo : OperationStopped: (No bueno.:String) [], RuntimeException
    + FullyQualifiedErrorId : No bueno.

    In the mean time, I will just disable the bloody overlay for now!

    ReplyDelete
    Replies
    1. Someone did tell me that their version of libcef.dll was very old and none of my patches worked - it seems that NVIDIA installs the lib once and never updates it. I would try the second option, 'download a version of a reasonable vintage'. Alternatively completely uninstall NVIDIA Geforce Experience and reinstall it, then patch using the heuristics. It's tough to try to keep on top of everything that NVIDIA does, I was hoping that they would actually push a fix.

      Delete
  4. Yep, I have encountered this with gamepad Genius MaxFire Blaze 5. I installed an old driver for the device instead of the default one (MS driver for Xbox 360 controller from 2019 I think). Well, at least I tried to. I had to resort to updating it by "have disk" and manually pushing the old drivers to the system after I found them installed in the system. And it WAS a driver issue after all. The old gamepad driver was uninstalled after machine restart (because I installed the new one instead) and it broke the monitor sleep again. So I installed it back and hopefully I will never have this issue again. The old driver is from 2009, the new one is from 2019. "Good job", Microsoft!

    ReplyDelete
  5. Hello,
    found this via Google because my Win 10 PC (2H20, full patched) did not go into Energy Saving since I have connected Thrustmaster TCA Joystick und Quadrant for Microsoft Flight Simulator 2020. Patching the byte in the dll was the solution. Still cannot believe this. Thank you very much for your research and publishing the resolution. Much appreciated!

    ReplyDelete
  6. Thank you! Installing the old Xbox Controler 2009 driver is what fixed it for me too. I've never installed Geforce Experience so it was very frustrating only seeing google results as that being the problem.

    Also note in order to force my system to use the old driver I had to go to:

    Right clikc -> Properties -> Update Driver -> Browse Computer for drivers -> Let Me Pick from a list of available drivers on my computer -> Xbox 360 Controller for Windows Version 2.1.0.1249 [8/13/2009]

    ReplyDelete
  7. Unrelated issue it seems like, but thanks for pointing it out for people who find this page!

    ReplyDelete
  8. Nice job man, thank you so much.

    ReplyDelete
  9. Just attempted to manually edit the "libcef.dll" with my hex editor and "0x61e0ae8" is no longer one of the lines. I have GeForce Experience version 3.24.0.126.

    Anyone have a quick answer to what the new line is? I will try to reverse engineer it if I don't see a reply. Thank you to the author for doing all this work. It identifies where the behavior is coming from. Now to fix it.

    ReplyDelete