By the end of this post you will be able to figure out where IL Code is in memory (memory address and hex opcodes), have a basic understanding of WinDBG, and understand some mechanics of the CLR, JIT, and App Domains.
My goal is to write a dynamic code injector for .NET. In this aim I decided to start with learning WinDBG, and using it to explore the inner machinery of an app domain. I’ve completed my first exercise of finding a method using the debugger and printing out the ILCode (and thus discovering all of the addresses and metadata along the way. ) This article is a guide through that experience so that you may have it for yourself.
If you have ever stumbled into WinDBG before, it looks rather ugly and cryptic. Something someone from the days of WindowsNT or 95 might have used. However, even with my brief flirtation with the software I have come to already appreciate it’s power. I was surprised to learn in my rersearch that many people at Microsoft actually favor WinDBG over Visual Studio for much of their debugging. I believe this comes from the power the debugger has.
As off-putting as it might be at first, there is also something soothing about using a console to debug and explore. WinDBG was the easiest path I found to be able to explore things at a level beyond Visual Studios default debugger, and most of the JIT/.NET guides seemed to use it
1 - Getting WinDBG and a simple Sample Application
Getting WinDBG is pretty much just a google search. I used the Windows SDK download.
I started with a simple console application. Out of the box of course I had a Main Method. I added two more methods [Foo() and Bar()] with a call only to Foo. Foo prints a pretty a “hello world” like message. My goal was to use injection to eventually dynamically swap the JIT calls to Foo with calls to Bar, however for this exercise I just wanted to explore. Below you can find my simple application I was operating on with WinDBG:
static void Main(string args)
var program = new Program();
public int Foo()
int y = 15;
public int Bar()
int x = 27;
My goal was to be able find Foo() using WinDBG.
2 - SOS and WinDBG
WinDBG has an extension called SOS. WinDBG by itself is a great debugger for unmanaged code. You can do all of the usual debug stuff. It get’s a lot more challenging when you want to debug managed code. In order to help you out with managed code, the Son of Strike extension is used to fill the gap providing manged code debugging help that lets you view IL Code, see if things have been JITed or not, set managed breakpoints, and more.
As an interesting aside on the naming of SOS and COR for that matter which pops up in the names of many of the DLLS such as mscorwks.dll; Chris Schmich on stackoverflow provided some insight:
Jason Zander’s blog post explains it perfectly:
The original name of the CLR team (chosen by team founder and former Microsoft Distinguished Engineer Mike Toutonghi) was “Lighting”. Larry Sullivan’s dev team created an ntsd extension dll to help facilitate the bootstrapping of v1.0. We called it strike.dll (get it? “Lightning Strike”? yeah, I know, ba’dump bum). PSS really needed this in order to give us information back to the team when it was time to debug nasty stress failures, which are almost always done with the Windows debugger stack. But we didn’t want to hand out our full strike.dll, because it contained some “dangerous” commands that if you really didn’t have our source code could cause you confusion and pain (even to other Microsoft teams). So I pushed the team to create “Son of Strike” (Simon from our dev takes credit/blame for this), and we shipped it with the product starting with Everett (aka V1.1).
Also, I had heard of the CLR being referred to as “COM+ 2.0” before, but apparently it’s had a few names in its time (from here):
The CLR runtime lives in a DLL called MSCOREE.DLL, which stands for Microsoft Common Object Runtime Execution Engine. “Common Object Runtime,” or COR, is one of the many names this technology has had during its lifetime. Others include Next Generation Windows Services (NGWS), the Universal Runtime (URT), Lightning, COM+, and COM+ 2.0
Loading SOS is done by a command:
.loadby sos clr
This actually failed for me the first time I tried it. After trying a bunch of things I just about gave up assuming WinDBG was just a pain in the ass to work with, then as a last resort I decided to verify the process was x64. I quickly discovered the problem. I was running WinDBG x64 , but running a x32 application. I assumed I was running x64 on my application since I have a x64 laptop, however console applications are 32 bit. After switching to WinDBG 32 bit, the SOS extension loaded without issue.
Attaching, Setting Symbol Paths, and Source File Paths
So after I had this all setup I realized I didn’t have any symbols loaded. Symbols are found in program database files (.PDB). These files can contain different levels of information that can allow more granular debugging. The PDB file for my program was generated with the program FooBar.exe. as FooBar.pdb in the same output directory. In order for symbols to be found for my program and all of the Microsoft ones too I had to set them up.
First I went to the File menu in WinDBG and set the Source File Path to the bin directory of my created application. I’ll need to investigate later exactly what this setting is for. It may be what actually finds the PDB file for my application
Second and probably more importantly I set my Symbol Path to the following:
Some people recommended defining a general symbol path to store all symbols locally. I found with the above setting the Microsoft Symbols for the .NET Framework DLLs I was using were getting loaded and cached into the bin directory of my application. Notice the Microsoft symbol server line at the end will ensure we can pull any available symbols we don’t have from the Microsoft symbol servers.
At this point I am basically ready to attach. I goto the File menu and click “Attach to Process”. I find FooBar.exe in the list and attach. At this point the debugger attaches itself and gives me a console.
I type the SOS load command:
.loadby sos clr
and get back a blank line indicating it probably succeeded. The .chain call allows me to see loaded extensions to WinDBG. I typed it out:
And get back:
Notice that I typed the command wrong the first time and got back an error. Then I typed it correctly (.loadby instead .load) and it loaded. The .chain command now shows SOS in the extensions list. We can now use SOS.
Finally setup is complete and we can start looking at some cool stuff.
Modules are basically the generic term to refer to the unmanged and managed versions of what library or executable projects are. The first thing to start with when looking at a program in WinDBG is to view the modules. For this we use the list modules command:
Notice that the symbols are only loaded for a couple of these. Being the eager bastard I am I’d like to see symbols loaded for all of these. So I am going to go ahead and load the assemblies and force the symbols to be loaded by using the reload command:
Now we check the loaded modules using lm again:
Notice that now deferred has been replaced with pdb symbols, and that even the FooBar.exe has symbols!
Starting Down the Rabbit Hole
Let’s start at the top and look at the Domains. When the CLR loads a domain three domains are actually initialized. Two of these are automatic and are not known even by the host. The other is the domain your application actually runs in. The first two are created as part of bootstrapping by mscoree.dll and mscorwks.dll, or if you have multiple processors the latter may be mscorsvr.dll .
The system domain sets up the shared domain and the default application domain and loads mscorlib.dll into the shared domain. Remember from the aside earlier that “cor” in these library names refers to the Common Object Runtime, a synonym for the Common Language Runtime (CLR). The system domain also handles string interning and setting up/tearing down app domains .
The shared domain is where common code is loaded. User applications can be loaded in this domain if they are loaded as domain neutral. ASP.NET apparently is supposed to do this by default, but I have yet to see it in practice. However, there are some nuances around “binding closures” though that may be preventing applications I’ve worked on from taking advantage of it.
The default domain is where my application will run!
Extensions to WinDBG follow the convention of using the exclamation mark to denote an extension command. The first one we will use is:
Notice that we see the three domains we were just talking about, and we even have some sexy looking memory addresses. The domain I am interested in here is “Domain 1” since it holds my application. I can use that memory address next to my “FooBar.exe” module to get some more information about it using the !dumpmodule command (-mt wil provide some more information on the types):
!dumpmodule -mt 00ed2ed4
So now I know where the method table for FooBar.Program is! Let’s take a look at it using !dumpmt -md (md of course adds more detail):
Notice we have some information here about if the method was precompiled (PreJIT), already has been compiled as part of Just In time compilation (JIT), or has yet to be compiled and still contains a JIT stub for compilation (NONE). This is pretty cool!
The second column is the address of the Method Descriptor. We can now find the IL for the method using the !dumpil command and passing the Method Descriptor address: !dumpil 00ad379c
That looks the IL code for the main function! So this is pretty neat, but it would be even cooler if we could see the code for a function directly in memory. Let’s grab Foo’s IL though since we that integer in there, a string, and a call.
As an aside the nop are often used to have an address for breakpoints when they are set on brackets in the source code.
Notice the ilAddr line. Let’s open a Memory window (View Menu > Memory) and then in the address field let’s type that iladdr number:
Now if you look at the IL Code and then reference it against the opcodes for the CIL instructions you can start to see where the IL is in memory. Below are small color divets indicated some of the first mappings. I stopped at the string because I was tired and unsure how to interpet it. I’m guessing the string “Foo Called!” is stored at a memory location and that what follows what I have so far is a memory address.
So with that we have successfully found the IL code in memory. Hopefully next time we will start to able to tweak and manipulate some of these values.
Using Wikipedias CIL Page against the above opcodes:
- 00 - nop - no operation, often used as placeholder for breakpoints on brackets in IL/C#/.NET.
- 1F - ldc.i4.s <int8 (num)> - push int32 number onto stack as short
- 0F - in this context is just the number “15”
- 0A - stloc.0 - pops value from stack to local var 0
- 72 - ldstr - loads a string
- Probably address to string
- … and so on
 - http://msdn.microsoft.com/en-us/magazine/cc163791.aspx