Friday, December 25, 2020

Custom Windows application compatibility shim development kit

As might be guessed from my recent posts, I've been poking at the Windows application compatibility infrastructure for a while. I am now pleased to present a custom shim development kit that can be used to create real 32-bit shim modules that provide real shims applied with the real shim engine by the same mechanism as for the standard shims you see in the Compatibility Administrator. Custom shims are of course not supported by Microsoft, but my approach appears to work on Windows 10 version 2004. I am aware that Detours exists; I did this just for fun, though the shim engine's support for COM hooks turned out to be more convenient than Detours' in my opinion. WinRT hooks are apparently also a thing the shim engine can do, but I don't know how they work and don't currently facilitate them.

You can read about how to build custom shims in the shim module's README and how to use/apply/debug them in the overview README, so I'll just use this post for extra details and asides. 

But first, a demonstration. The example shim module includes a shim called FakeSchTask that hooks (i.e. intercepts) the Task Scheduler 2 COM API to make the application see a scheduled task in the root folder that isn't actually there. Like the other example shim, this is pretty silly and is implemented in a way targeted specifically at a Sysinternals utility I had laying around (Autoruns in this case), but it's a fine example of how to write a shim. The fake scheduled task's name defaults to "Fake Task" but can be specified as the shim command line. Placing the compiled shim module as AcRes.dll in SysWOW64, compiling the XML in the project's README using ShimDBC, installing the SDB, and running Autoruns:

Notice the top Task Scheduler entry in Autoruns and its absence from the real list

The first step in making this happen, once it's established that the process being started needs to be shimmed, is that the shim engine in apphelp.dll loads each needed shim by calling the GetHookAPIs function exported from the shim's DLL. It's supposed to return a pointer to an array of HOOKAPI structures specifying imported functions to redirect through the shim. At some point this involves comparing the requested shim name to the name of each shim implemented in the DLL and getting the right one's hook array. Microsoft's shim modules have, per shim, a namespace containing one major function that handles various callbacks including installation and sometimes a couple other functions, apparently implemented with a library called ShimLib that probably provides some macros in addition to the functions common to the several shim modules. I went for a more object-oriented approach, implementing each shim as a subclass of the abstract Shim class. My GetHookAPIs scans the list of shim instances for the one with the requested name, invokes the command line processing then hook-list-populating functions of that shim object, and adds it to the list of active shims for later notification.

Throughout the process's life, the other shim module exported function NotifyShims is called many times with different reason codes. After shims are installed with GetHookAPIs, the shim engine notifies shim modules about every DLL the executable is linked with (and presumably their dependencies), providing a pointed to the loader data table entry. (I wouldn't have been able to identify that structure myself. Thanks to the ReactOS project for having some of this reverse-engineered.) After all those, it notifies shim modules that everything is initialized. This appears to be a good time to install COM hooks. As DLLs are delay-loaded or unloaded, there may be more DLL-related notifications.

Shims are loaded very early in the process's life, so you have to be very careful about what features are available when. Loading the CLR in an attempt to write shims in a managed language would not go well. Using C++'s new operator in GetHookAPIs originally crashed with an access violation, probably because something important hadn't been loaded yet, so I replaced the standard memory management operators with implementations that wrap LocalAlloc and LocalFree, which are available. Even so, declaring static/global variables with constructors that allocate (e.g. std::vector) crashed the process with exit code 0xC0000142. This is why my knownShims and activeShims variables are pointers that are initialized inside functions called by GetHookAPIs at the earliest. They are never destructed, but there's only one leak of each type per process, so it's not really a leak.

While writing up a Stack Overflow answer that used my kit, I tried to use SprintDLL to quickly check that a Win32 function was being hooked. By default .NET applications' PInvoke calls won't be hooked because clr.dll is excluded by the standard "inex policy", which I learned by reading the shim engine logs after seeing SprintDLL fail to reflect the hook. If your shim is getting installed but isn't working, reading the ApphelpDebug log is a good way to see what's going on. 

Unfortunately I also noticed that SprintDLL sometimes failed to start when being shimmed, hanging for multiple seconds then exiting with a breakpoint-related NTSTATUS. The shim engine log said it "failed to lock engine", which is an error-level log event, so it brought the process down when no debugger was around to handle the DebugBreak. Running under WinDbg, a thread did indeed fail to RtlEnterCriticalSection on the engine lock, while the thread holding that lock was in the ASL-log-writing code waiting to acquire some other lock: classic deadlock. (ASL is the shims logging system, as opposed to the shim engine log. Maybe it stands for AppCompat shim log or layer.) Disabling the shims log fixed the problem. Along the way I noticed that shim engine logs also get sent to debug output and that a bunch of shim-related stuff happens before the debugger breaks into a new process, which I'm sure would make more complex issues interesting to debug.

Speaking of ways to crash the process, it is critical to get the calling convention right for hook functions and pointers to original functions. If a convention is not specified, it defaults to fastcall, which is not right for any Win32 function I know of, leading to stack imbalance then an access violation or stack cookie fast fail. Similarly, getting the parameter sizes wrong, either directly in the signature or by associating a hook function with a real function you didn't mean, will ruin the stack.

To reduce accidental mismatches as much as possible, I did some macro and template tricks. The shim engine provides the original/next function in a field of the HOOKAPI structure that was given to it by GetHookAPIs, so calling it from inside a hook function involves indexing that array, which requires keeping track of the order hooks were declared and using the right index in each hook function - easy to get wrong if done manually. Instead of having you build the array yourself, I provided an ADD_HOOK macro for use in RegisterHooks overrides that appends the hook information to a vector maintained by the Shim superclass and records its index in a struct templated by the hook function, effectively stapling the index to the function. The index is looked up in that struct by the DEFINE_NEXT macro (for use inside hook functions) and passed to a superclass function that extracts the next function pointer from the hooks array. By using decltype in a cast it spares you from repeating, and possibly mismatching, the function signature. Hook functions have to be static; the macro gets access to the shim instance through a static variable set in the constructor defined by the SHIM_INSTANCE macro. I would have liked to parameterize by hook function name to remove the need to pass even the current hook function to DEFINE_NEXT, but the Microsoft C++ compiler doesn't seem to fully support string literal template parameters, and I didn't want to use a map because of the runtime cost.

COM original function lookup is simpler because the SE_COM_Lookup function provided by the shim engine already takes a "this" pointer and hook function, both of which are easily accessible and not prone to accidental mismatches. It is pretty easy, though, to mismatch the vtable index during member function hook registration. Unfortunately I can't do anything about this unless CINTERFACE is defined, which would disable the nice C++ interfaces, so you'll just have to very carefully count the indexes of fields in the headers' C Vtbl structs yourself.

Hooking member functions of registered COM classes is pretty straightforward: you tell SE_COM_AddHook the class ID, interface ID, function index, and your replacement function and it intercepts the COM instantiation machinery (used by e.g. CoCreateInstance) to hook the function when an object of that type is created. The shim engine cannot, however, detect or interfere with COM objects allocated directly, like those created and returned by existing objects. You can still register hooks by IID and vtable index, but such hooks can't be applied automatically. Instead, once you get ahold of an object implementing that interface (in a hook function that was applied automatically), you can call SE_COM_HookObject to apply the hooks to it. Despite the function name, that generally applies the hook to the entire class since all instances of a class usually all point to the same vtable. The shim engine can't copy the vtable because it doesn't know the size. If you only want your hooks to interfere with some instances of the internally instantiated class, you will need some way for them to distinguish interesting instances. My FakeSchTask shim's hooks of ITaskFolder use get_Name to check if the folder is the root folder. Before I noticed that helpful property, I noted the pointer to the root folder in the ITaskService::GetFolder hook and checked the "this" pointer against it in the ITaskFolder hooks, but this was a mess because objects can be freed, allowing the address to be reused. The shim engine doesn't seem to support multiple hooks of the same COM function, and since it already hooked all the IUnknown functions, I couldn't instrument Release. Instead of juggling pointers, for IRegisteredTaskCollection objects produced by ITaskFolder::GetTasks, I wrapped the real objects of interest in an instance of my own FakeTaskCollection class before returning from my GetTasks hook. That way I didn't have to hook any IRegisteredTaskCollection functions. To save you some boilerplate I put the base interface IUnknown implementation and IDispatch stub, which could be useful for other wrapper classes, in a separate file. Implementing IDispatch in plain C++ is not my idea of a good time, so my objects don't actually support scripting - which is conveniently not necessary for shimming Autoruns - but I hear that generating type libraries with MIDL can make this easier.

Interestingly, the Task Scheduler 2 header taskschd.h doesn't define the CLSID_TaskScheduler GUID the way most COM component headers do. It just declares the variable extern, requiring linking with taskschd.lib, which was not in the default linker input for me. If you get "unresolved external symbol" errors about class or interface IDs and have already included initguid.h, you probably need to reference another LIB.

Well! I think that's all the adventure notes I have. Please don't rely on any part of this for anything too important, but if you have any questions or suggestions, I'd be happy to take a look via the GitHub issue tracker.

No comments:

Post a Comment