Thursday, October 23, 2008

Understanding the Garbage Collection

One of the most important parts of the .Net framework is the Garbage Collector. Instead of letting you to take care of all the memory you have used, it automatically collects all unnecessary "garbage" when it's time. But how? This is the question I am going to discuss in this article.

The .NET CLR (Common Language Runtime) requires that all resources be allocated from the managed heap. The developer never need to free objects from the managed heap -they are automatically freed when they are no longer needed by the application. When the garbage collector runs, it checks for objects in the managed heap that are no longer needed by the application and performs the necessary operations to reclaim their memory. Because each type of the .Net is described by its Metadata, the garbage collector always knows, how to free up the unnecessary memory.
So, the garbage collector starts its job by locating the roots of the application.

The roots are:
  • all the global and static pointers
  • local variable pointers (which are on a thread's stack)
  • registers, which contain pointers to objects in the managed heap
  • pointers to the objects from the Freachable queue

The list of active roots is being maintained by the JIT compiler and CLR, and is made acceptable to the GC algorithm.
The GC works in two phases. Let's review them in detail.

The first phase (Marking)

(when the GC starts, it makes an assumption, that all the objects in the heap are garbage)
  1. identification of roots
  2. building the live object graph (GC runs through the roots and identifies live objects using the metadata of the objects)


To avoid the cycling process, if the GC tryes to add an object to the graph, which is already there, then the path, which by the object was found, is being ignored, and no more down-level searches are being made.
When the check for all the roots is done, all the alive objects are being added into the graph, so any other objects, surely, is garbage.

And now comes the second phase (Compacting).

  1. GC walks through the heap linearly looking for garbage blocks of memory.
  2. GC shifts non-garbage objects down in memory (of course, by updating all the references to the moved objects), making them easy reachable by removing all the gaps in the heap

After this phase the pointer is set to the end of the last object in the heap, referencing the place, where new object can be allocated on heap.

Finalization


Whenever a new object, having a Finalize method, is allocated on the heap a pointer to the object is placed in an internal data structure called Finalization queue. When an object is not reachable, the garbage collector considers the object garbage. The garbage collector scans the finalization queue looking for pointers to these objects. When a pointer is found, the pointer is removed from the finalization queue and appended to another internal data structure called Freachable queue, making the object no longer a part of the garbage. At this point, the garbage collector has finished identifying garbage. The garbage collector compacts the reclaimable memory and the special runtime thread empties the freachable queue, executing each object's Finalize method.
The next time the garbage collector is invoked, it sees that the finalized objects are truly garbage and the memory for those objects is then, simply freed.
It is recommended to avoid using Finalize method, unless required. Finalize methods increase memory pressure by not letting the memory and the resources used by that object to be released, until two garbage collections. Since you do not have control on the order in which the finalize methods are executed, it may lead to unpredictable results.
In my next article I will try to explain, how to optimize the GC.

No comments: