|By Jeff Nelson||
|September 1, 1997 12:00 AM EDT||
As any ex-C++ software developer will attest, the Java garbage collector greatly simplifies the task of cleaning up after your objects. With distributed software applications, the garbage collector faces many new challenges since objects may be used by applications running across the Internet. This article looks at some common solutions to garbage collection in CORBA, RMI and DCOM. Finally, the distributed garbage collector in RMI is implemented on top of CORBA.
Back in the old days of software development, programmers had to carefully keep track of all the memory used in a program and clean up each of the unused bits. Failure to properly care for memory could lead to memory actually getting lost somewhere in the ether. In the case of some of the older operating systems, lost memory could only be recovered by rebooting the computer in some circumstances. As a result, software developers had to be intimately familiar with the size of every piece of data used in their applications. Hours were spent tracing through code to determine when data was no longer required and even more hours were spent writing procedures to properly remove the data. A whole niche in the software industry was built on the marketing of development tools to detect lost memory.
Garbage-collected languages such as Java have improved this situation dramatically. No longer do we have to worry about where our memory goes when we are done with it. The garbage collector will find it and clean it up for us. And the Java garbage collector is pretty good at its job. It runs as a low priority thread so you probably will never notice it cleaning up your mess.
However, distributed software has a whole new set of unique challenges. Objects which previously were used only within one program on a single computer can now be used by many different programs running on many different computers. Now that Netscape has built-in CORBA support, it won't be long before you might want to access your objects across the Web.
The garbage collector's job of finding out when an object is still in use just became a whole lot more difficult. The garbage collector used to easily determine which objects were still in use by literally looking at each object in a program, marking those which were still in use and removing the leftovers. But with distributed objects, the whole Internet could be using your objects if you let them. The garbage collector can't very well look at every object on the entire Web to determine which are still in use.
Now add to the equation the realities of the modern Internet. Network links fail all the time. Corrupt packet routing tables bring down whole branches of the Internet temporarily. Machines occasionally crash, both clients and servers. Finally, how many of us have suffered through the occasional 50-bit-per-second connection to read our e-mail?
When a link fails or computer crashes, the distributed garbage collector must be smart enough to do the right thing, whatever that thing may be. Consider if a distributed object is running on a Web server hosted by ACME Web Service, Inc. and the distributed object is currently in use by your Web browser. First, let's say your computer suddenly crashes. In this case, you might want the distributed object to be cleaned up immediately. After all, you probably won't be able to log back into the Internet and just start over where you left off (unless the software was written by a particularly talented developer). But now, let's say that your Internet connection temporarily drops off. This happens often for periods of just a few seconds and you don't even notice it. In this situation, you don't want the garbage collector to go after your object; you'll be back in just a few moments. The distributed garbage collector has to walk a fine line to satisfy everyone.
The CORBA Approach
CORBA uses a combination of reference counting and Internet connection management in order to perform distributed garbage collection. Once a server object has been instantiated, the reference count to it is implicitly incremented whenever a new reference to it is created and implicitly decremented whenever a reference is destroyed. When the reference count reaches zero, the instance of the server object is cleaned up. This is enough for most situations and provides an effective means of distributed garbage collection even in languages such as C++ which don't normally have a garbage collector.
In addition to the implicit rules for reference counting, explicit operations are provided for adding and removing references called duplicate and release. These operations are most useful when manipulating object references through pointers. When a pointer to a reference is copied, the duplicate procedure should be called in order to indicate that a new reference has been created. When a pointer to a reference is destroyed, the release procedure should be called for the opposite reason.
CORBA also carefully manages Internet network connections. When a client is disconnected, either due to a client machine crash or due to a complete network failure, any references held by the client machine are immediately released. This mechanism of detecting a client failure behaves correctly even when a temporary network slowdown causes the server to lose touch with the client. As long as the network connection remains active, the references will not be released and the object will not be garbage-collected.
For the truly adventurous, additional mechanisms are provided to sever communication with an object and immediately cause garbage collection. The deactivate_obj call is an example of such a mechanism.
The RMI Approach
RMI uses a fairly straightforward mechanism for garbage collection. Any program which has a reference to an object must obtain a "lease" for the object. The lease, which is literally represented by a Lease object, entitles the program to use the object for a certain period of time, basically the same idea as leasing office equipment.
If the object continues to be used for an extended period of time, the lease must be renewed before it expires. The renewed lease again entitles the holder to use the object for a certain period of time. If the object is no longer in use, the lease is simply allowed to expire or can be explicitly terminated by the holder at any time. When all the leases have expired or terminated, the object can be garbage-collected.
This design easily solves most of the problems faced by a distributed garbage collector. When an object is no longer in use anywhere on the Internet, no leases are renewed so the object will eventually be garbage-collected. If the computer with an RMI program running on it suddenly crashes, the leases for any distributed objects will simply expire over time and can be garbage-collected.
The Lease object is obtained and renewed using the DGC interface (DGC presumably stands for Distributed Garbage Collector) which is provided by RMI. The main operations on the DGC interface, shown in Listing 1, are dirty and clean. Dirty is for obtaining a Lease object and clean is for terminating Lease objects. However, don't worry too much about learning the details. The software developer should never need to use the DGC interface since it is all taken care of by RMI itself.
The dirty method on the normal DGC interface accepts an array of ObjIDs, a sequence number and a Lease. The array of ObjIDs are object identification numbers for those objects whose lease requires renewal. The sequenceNum is used for nothing more than to guarantee proper network packet ordering since RMI makes use of the unreliable protocol UDP to transmit garbage collection requests. The Lease is just used as a data container to hold a unique identification number for the client making the request and the desired length of the lease. Keep in mind that since the DGC interface is hidden under the covers, RMI itself is choosing the "desired" length of the lease. The software developer has no part in this decision.
One thing you should keep in mind: some network overhead is incurred every time an object renews its lease. A remote request must be sent across the network to the object's host. Thus, if you would like to deploy a system with several hundred clients or several thousand distributed objects, this overhead might become quite considerable. Currently, no means are provided for configuring the leasing period for objects and thus the time between requests to renew a lease, so keep this limitation in mind when architecting your system.
In addition, any distributed architecture which relies on mechanisms like leases is subject to problems when network failures or even slowdowns occur. For example, if you are in the unfortunate situation of having your Internet connection hang just long enough to cause your leases to expire, all of your distributed objects will suddenly be garbage collected even though you are still using them.
The DCOM Approach
DCOM, which stands for Distributed COM, is Microsoft's foray into the world of distributed computing. Since DCOM is supported by the world's second largest software vendor, it deserves at least a brief mention here even though it's unclear how well it will be supported for use with Java. DCOM is totally unlike any other distributed object technology when it comes to garbage collection. First, DCOM differentiates between interfaces and objects. Each has its own type of garbage collection support.
Garbage collection of interfaces is handled through a manual reference counting mechanism. RemAddRef and RemRelease respectively add and release references to remote objects. Both of these calls are sent across the network to the remote system, incurring some network overhead whenever additional references are made. Under the covers, DCOM tries to reduce this overhead by "multiplexing references". This means that a single reference can actually stand for many references within a single program. In addition, programs may optionally request "private references", which are references associated with a particular client identification. Normally, DCOM allows one client to issue more releases than the number of references it currently owns. Private references are a way of preventing this from occurring.
An entirely different mechanism is used for objects. So called keepalive messages are sent periodically to objects as a way of pinging the objects to let them know they are still needed. These keepalive messages are similar in some ways to RMI Leases and have the same weaknesses. A temporary network failure may result in the garbage collection of objects which are still in use simply because keepalive messages were not received in time.
To add a few more variables to the equation, COM implementations may defer the release of references to an interface for an indefinite period of time. The DCOM specification recommends that the remote release of all interfaces be deferred until all local references to all interfaces on an object are released. It's not clear what sort of logic is required by the user, if any, to match up respective interfaces with their objects in order for garbage collection to work as advertised. To top it off, garbage collection of the interfaces is actually left as optional; some COM implementation may never perform it. Confused? Maybe that's what Microsoft intended.
Merging the Approaches
The RMI distributed garbage collector is rather simple and easy to implement using CORBA. First, the DGC interface is hidden from the developer. No means of directly invoking the DGC interface is available. Thus, I feel justified in redesigning the interface slightly in order to simplify the task of implementing it.
In the dirty method, I would like to just pass an object identification number for the object which is to be leased and the identification number for the client requesting the lease. I'll simply return a number indicating the length of time for which the lease was granted rather than a whole object. This method may not be as type safe as returning an object; however, the interface will only be used by our own stub code. That means type safety is not as important as efficiency. I won't return or send a Lease object or allow the desired length of the lease to be configured since this interface is not exposed for the software developer anyway. The sequenceNum which was added just to guarantee a certain amount of packet ordering due to RMI's use of the unreliable UDP network protocol can simply be deleted since CORBA would itself guarantee reliable delivery.
The sequenceNum on the clean method can be deleted for the same reasons. I also broke up the arrays of object identification numbers into a single identification number per method invocation. Although renewing leases in groups may prove useful later, manipulating the lease of one object at a time seems like the most natural way of handling a lease. The modified DGC interface is shown in Listing 2.
To implement the DGC interface, I added a class called DGCImpl whose main responsibilities are to keep track of the leases and periodically clean up those objects which no longer have any outstanding leases. This was accomplished by making the DGCImpl implement runnable so that it would have its own thread to periodically check its leases. When an object no longer has any outstanding leases, the CORBA deactivate_obj is called to immediately remove the object and allow it to be garbage-collected. The full implementation of this is too long to reproduce here due to space considerations but is available for download at my Web site, mentioned at the end of this article.
Two numbers are passed into the DGC interface, the object identification and client identification number. In RMI, these identification numbers are generated by the ObjID and VMID classes respectively. For implementing the DGC on CORBA, I continue this tradition but simplify it slightly by extracting the integer contained in both objects.
The usage of this DGC interface with CORBA is identical to its usage with RMI. When a new reference is created, a lease for the object should be obtained in order to prevent the object from being garbage collected. By performing this action in the client stubs, this can be entirely hidden from the software developer so they don't need to worry about it.
Garbage collection has greatly improved the way in which we write software, but garbage collection in distributed applications has many difficult problems to solve. We've looked briefly at how the three main distributed object systems tackle this problem and demonstrated that two of them aren't quite as different as you might expect at first glance.
Where To Go From Here
RMI and Java can be found at http://www.javasoft.com
CORBA standards can be found at http://www.omg.org
Visigenic, the makers of VisiBroker for Java, can be found at http://www.visigenic.com
More information on distributed GC may be found at http://www-sor.inria.fr