Friday, October 24, 2008

Optimizing Garbage Collection

Weak References

There is a way to affect the performance of the garbage collection, which is introduced in .Net through the WeakReferences.
When an object points to another one, this is called strong reference, or just reference, as we used to say, and in this case the GC will not collect that obhject as a "garbage". The WeakReferences are kind of references, the objects, which they point to can be collected, and if later they will be accessed throug the WeakReference, the access will fail.
The managed heap contains two internal data structures whose sole purpose is to manage weak references: short and long weak reference tables.
If an object has a short weak reference to itself, and is collected, then it's finalization method doesn't run, and it is being collected immediately. For the long weak reference, when the garbage collector collects object pointed to by the long weak reference table only after determining that the object's storage is reclaimable. If the object has a Finalize method, the Finalize method has been called and the object was not resurrected.
These two tables simply contain pointers to objects allocated within the managed heap. Initially, both tables are empty. When you create a WeakReference object, an object is not allocated from the managed heap. Instead, an empty slot in one of the weak reference tables is located; short weak references use the short weak reference table and long weak references use the long weak reference table.


Since garbage collection cannot complete without stopping the entire program, it can cause pauses at arbitrary times during the execution of the program. Those pauses can also prevent programs from responding quickly enough to satisfy the requirements of real-time systems.
One of the improvments of the GC is called generations. A generational garbage collector takes into account two facts:
  • Newly created objects tend to have short lives.
  • The older an object is, the longer it will survive.

Those collectors group objects by “age” and collect younger objects more often than older objects. All new objects added to the heap can be said to be in generation “0”, until the heap gets filled up which invokes garbage collection. As most objects are short-lived, only a small percentage of “young“ objects are likely to survive their first collection. Once an object survives the first garbage collection, it gets promoted to generation “1”. Objects, which are created after some generation stage are considered as on the “0” generation. The garbage collector gets invoked next only when the sub-heap of generation “0” gets filled up. All objects in generation “1” that survive get compacted and promoted to generation “2”. All survivors in generation “0” also get compacted and promoted to generation “1”. Generation “0” then contains no objects, but, as already was mentiond”, all newer objects after GC go into generation “0”.
Generation “2” is the maximum generation supported by the runtime's garbage collector. When future collections occur, any surviving objects currently in generation 2 simply stay in generation “2”.
Thus, dividing the heap into generations of objects and collecting and compacting younger generation objects improves the efficiency of the basic underlying garbage collection algorithm by reclaiming a significant amount of space from the heap and also being faster than if the collector had examined the objects in all generations.
This is all about garbage collectors, which, I think every, .Net developer must know.

No comments: