Serialization
Javed Mulla
(Immediate Joiner, Valid Work Visa for UK)Senior Software Engineer at Consors Finanz BNP Paribas
Why serialization required? Imagine you want to save the state of one or more objects. If Java didn't have serialization (as the earliest version did not), you'd have to use one of the I/O classes to write out the state of the instance variables of all the objects you want to save. The worst part would be trying to reconstruct new objects that were virtually identical to the objects you were trying to save. You'd need your own protocol for the way in which you wrote and restored the state of each object, or you could end up setting variables with the wrong values. For example, imagine you stored an object that has instance variables for height and weight. At the time you save the state of the object, you could write out the height and weight as two ints in a file, but the order in which you write them is crucial. It would be all too easy to re-create the object but mix up the height and weight values—using the saved height as the value for the new object's weight and vice versa. Or You are build the own protocol whose instance value are saved first i.e. superclass or base class etc.
If i want my instance variable dont take part for serialization process, then instance variable explicit marked transient.
(Definition) Serializable is a marker interfaces that tells the JVM is can write out the state of the object to Binary stream. Serialization is the process of converting an object's state (including its references) to a sequence of bytes, as well as the process of rebuilding those bytes into a live object at some future time. Simple......Converting an object to bytes and bytes back to object. So when is serialization used? Serialization is used when you want to persist the object. Such persist object, we are using in following situation:
Here are some uses of serialization
- To persist data for future use.
- To send data to a remote computer using such client/server Java technologies as RMI or socket programming.
- To "flatten" an object into array of bytes in memory.
- To exchange data between applets and servlets.
- To store user session in Web applications.
- To activate/passivate enterprise java beans.
- To send objects between the servers in a cluster.
In general, serialization is used when we want the object to exist beyond the lifetime of the JVM.
Lets see couple of different scenarios (examples) where we use serialization.
- Banking example: When the account holder tries to withdraw money from the server through ATM, the account holder information along with the withdrawl details will be serialized (marshalled/flattened to bytes) and sent to server where the details are deserialized (unmarshalled/rebuilt the bytes)and used to perform operations. This will reduce the network calls as we are serializing the whole object and sending to server and further request for information from client is not needed by the server.
- Stock example: Lets say an user wants the stock updates immediately when he request for it. To achieve this, everytime we have an update, we can serialize it and save it in a file. When user requests the information, deserialize it from file and provide the information. This way we dont need to make the user wait for the information until we hit the database, perform computations and get the result.
So far we saw what and when serialization is used.
how serialization is performed in java.
1> give capability of object to flatter for that purpose we are using java.io.serializable interface.
Java provides Serialization API, a standard mechanism to handle object serialization. To persist an object in java, the first step is to flatten the object. For that the respective class should implement "java.io.Serializable" interface. Thats it. We dont need to implement any methods as this interface do not have any methods. This is a marker interface/tag interface. If you implement the serializable interface, then it means such class have capability to create binary object.
public class AccountInfo implements Serializable {}
2> create the persistent object.
For persistence , you must be write the data into file. To persist an object you need to use node stream to write to file systems or transfer a flattened object across a network wire and have it be rebuilt on the other side. You can use java.io.ObjectOutputStream class, a filter stream which is a wrapper around a lower-level byte stream.
So to write an object you use "writeObject(<<instance>>)" method of "java.io.ObjectOutputStream" class and to read an object you use "readObject()" method of "java.io.ObjectOutputStream" class. "readObject()" can read only serialized object, that means if the class does not implement "java.io.Serializable" interface, "readObject()" cannot read that object.
//Class to persist the time in a flat file time.ser
public class PersistSerialClass {
public static void main(String [] args) {
String filename = "time.ser";
if(args.length > 0){
filename = args[0];
}
PersistSerialClass time = new PersistSerialClass();
FileOutputStream fos = null;
ObjectOutputStream out = null;
try{
fos = new FileOutputStream(filename);
out = new ObjectOutputStream(fos);
out.writeObject(time);
out.close();
}catch(IOException ex){
ex.printStackTrace();
}
}
}
//Class to read the time from a flat file time.ser
public class ReadSerialClass {
public static void main(String [] args) {
String filename = "time.ser";
if(args.length > 0){
filename = args[0];
}
PersistSerialClass time = null;
FileInputStream fis = null;
ObjectInputStream in = null;
try{
fis = new FileInputStream(filename);
in = new ObjectInputStream(fis);
time = (PersistSerialClass)in.readObject();
in.close();
}catch(IOException ex){
ex.printStackTrace();
}catch(ClassNotFoundException cnfe){
cnfe.printStackTrace();
}
// print out restored time
System.out.println("Restored time: " + time.getTime());
// print out the current time
System.out.println("Current time: "
+ Calendar.getInstance().getTime());
}
}
When you serialize an object, only the object's state will be saved, not the object's class file or methods.
When you serialize the above example class, the serialized class will look like below. Surprising.. isn't it? Yes, when you serialized a 2 byte object, you see 51 bytes serialized file. How did it convert to 51 bytes file? To know this,
Let's see step by step on how the object is serialized and de-serialized.
So when an object is serailized
- First it writes out the serialization stream magic data - What is serialization stream magic data? This is nothing but the STREAM_MAGIC and STREAM_VERSION data (static data) so that JVM can deserialize it when it has to. The STRAM_MAGIC will be "AC ED" and the STREAM_VERSION will be the version of the JVM.
- Then it writes out the metadata (description) of the class associated with an instance. So in the below example after writing out the magic data, it writes out the description of "SerialClass" class. What does this description include? It includes the length of the class, the name of the class, serialVersionUID (or serial version), various flags and the number of fields in this class.
- Then it recursively writes out the metadata of the superclass until it finds java.lang.object. Again in the below example, it writes out the description of "SerialParent" and "SerialParentParent" classes.
- Once it finishes writing the metadata information, it then starts with the actual data associated with the instance. But this time, it starts from the top most superclass. So it starts from "SerialParentParent", then writes out "SerialParent".
- Finally it writes the data of objects associated with the instance starting from metadata to actual content. So here it starts writing the metadata for the class Contain and then the contents of it as usual recursively.
How to customize the default protocol?
MMmmmm.. Now it's getting more interesting. Let's say, you need to perform some specific operations in the constructor when you are instantiating the class but you can’t perform those operations when you deserialize the object because constructor won’t be called when an object is de-serialized. Here we are restoring an state of object but not reconstructing an object. Then how will you call or perform those operations when you deserialize the object? Well, you have a way here and its simple too. You can enhance the normal process by providing two methods inside your serializable class. Those methods are:
private void writeObject(ObjectOutputStream out) throws IOException;
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException;
Notice that both methods are declared private, proving that neither method is inherited and overridden. The trick here is that the virtual machine will automatically check to see if either method is declared during the corresponding method call. The virtual machine can call private methods of your class whenever it wants but no other objects can. Thus, the integrity of the class is maintained and the serialization protocol can continue to work as normal.
Java serialization has a special mechanism just for this—a set of private methods you can implement in your class that, if present, will be invoked automatically during serialization and deserialization. It's almost as if the methods were defined in the Serializable interface, except they aren't. They are part of a special callback contract the serialization system offers you that basically says, "If you (the programmer) have a pair of methods matching this exact signature (you'll see them in a moment), these methods will be called during the serialization/deserialization process.
Using below method, programmer manually added the state of the Object.
private void writeObject(ObjectOutputStream os) {
os.defaultWriteObject();
os.writeInt(theCollar.getCollarSize());
}
private void readObject(ObjectInputStream is) {
is.defaultReadObject();
theCollar = new Collar(is.readInt());
}
Ooops. I mentioned earlier that for a class to be serializable either the class should implement "Serializable" interface or one of its super class should implement "Serializable" interface. Now what if you dont want to serialize one of the subclass of a serializable class? You have a way here tooo. To stop the automatic serialization, you can once again use the private methods to just throw the NotSerializableException in your class.
private void writeObject(ObjectOutputStream out) throws IOException{
throw new NotSerializableException("Dont Serialize");
}
private void readObject(ObjectInputStream in) throws IOException{
throw new NotSerializableException("Dont Serialize");
}
Well... One more way to serialize the object - the Externalizable Interface
Again there is one more way to serialize the object - create your own protocol with the Externalizable interface. Instead of implementing the Serializable interface, you can implement Externalizable, which contains two methods:
public void writeExternal(ObjectOutput out) throws IOException;
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException;
The Externalization is discussed as separate topic. Check it out here or check the menu.
How not to serialize some fields in a serializble object?
Sometimes you dont want to serialize/store all the fields in the object. Say some fields you want to hide to preserve the privacy or some fields you may want to read only from master data, then you dont seriaalize them. To do this, you just need to declare a field as transient field.
transient private int checkPoint;
Also the static fields are not serialized. Actually there is no point in serializing static fields as static fields do not represent object state but represent class state and it can be modified by any other object. Lets assume that you have serialized a static field and its value and before deserialization of the object, the static field value is changed by some other object. Now the static field value that is serialized/stored is no more valid. Hence it make no point in serializing the static field.
Apart from declaring the field as transient, there is another tricky way of controlling what fields can be serialized and what fields cannot be. This is by overriding the writeObject() method while serialization and inside this method you are responsible for writing out the appropriate fields. When you do this, you may have to override readObject() method as well. This sound similar to using Externalizable where you will write writeExternal() and readExternal() methods but anyways lets not take this route as this is not a neat route.
Note that serialization does not care about access modifiers. It serializes all private, public and protected fields.
Nonserializable objects
Earlier we discussed about not serializing certain fields in a serializable object and why it may be needed sometimes. But now lets see why certain objects should not be serialized? As you know, the Object class does not implement Serializable interface and hence any object by default is not serializable. To make an object serializable, the respective class should explicitly implement Serializable interface. However certain system classes defined by java like "Thread", "OutputStream", "Socket" are not serializable. Why so? Lets take a step back - now what is the use of serializing the Thread running in System1 JVM using System1 memory and then deserializing it in System2 and trying to run in System2 JVM. Makes no sense right! Hence these classes are not serializable.
Ok. So far so good. Now what if you want to serialize an object containing an instance of Thread? Simple. Declare the Thread instance as transient.
public class SerialClass implements Serializable, Runnable {
transient private Thread myThread;
public PersistentAnimation(int animationSpeed) {
...
...
}
}
Versioning issues
One very important item to look at is the versioning issue. Sometimes you will get "java.io.InvalidClassException" but when you check the class (it will be Serializable class), you will find nothing wrong with it. Then what is causing this exception to be thrown? Ok. Here it is. You create a Serializable class, instantiate it, and write it out to an object stream. That flattened object sits in the file system for some time. Meanwhile, you update the class file by adding a new field. Then try to read the flattened object. InvalidClassException is thrown because all persistent-capable classes are automatically given a unique identifier. If the identifier of the class does not equal the identifier of the flattened object, the exception will be thrown and when you update the class with a new field, a new identifier will be generated.
To fix this issue, manually add the identifier to the class. The identifier that is part of all classes is maintained in a field called serialVersionUID. If you wish to control versioning, you simply have to provide the serialVersionUID field manually and ensure it is always the same, no matter what changes you make to the classfile. More about it is discussed in separate topic.
Performance Issues/Improvement with Serialization
The default way of implementing the serialization (by implementing the Serializable interface) has performance glitches. Say you have to write an object 10000 times in a flat file through serialization, this will take much more (alomost double) the time than it takes to write the same object 10000 times to console. To overcome this issue, it’s always better to write your custom protocol instead of going for default option.
Also always note to close the streams (object output and input streams). The object references are cached in the output stream and if the stream is not closed, the system may not garbage collect the objects written to a stream and this will result in poor performance.
Using Serialization always have performance issues? Nope... Let me give you a situation where it is used for performance improvement. Lets assume you have an application that should display a map and pointing to different areas in the map should highlight those areas in different color. Since all these are images, when you point to each location, loading an image each time will slow the application heavily. To resolve this issue, serialization can be used. So here since the images wont change, you can serialize the image object and the respective points on the map (x and y co-ordinates) and deserialize it as and when necessary. This improves the performance greatly.
What happens to inner classes? We forgot all about it.
Yes, you can serialize inner classes by implementing the Serializable interface but it has some problems. Inner classes (declared in a non-static context) will always contain implicit references to their enclosing classes and these references are always non-transient. So, while object serialization process of inner classes, the enclosing classes will also be serialized. Now the problem is that the synthetic fields generated by Java compilers to implement inner classes are pretty much implementation dependent and hence we may face compatibility issues while deserialization on a different platform having a .class file generated by a different Java compiler. The default serialVerionUID may also be different in such cases. Not only this, the names assigned to the local and anonymous inner classes are also implementation dependent. Thus, we see that object serialization of inner classes may pose some unavoidable compatibility issues and hence the serialization of inner classes is strongly discouraged.
serialVersionUID?
When you serialize an object using Serialization mechanism (by implementing Serializable interface), there is a possibility that you may face versioning issues and because of these versioning issues, you will not be able to deserialize the object. Thats not a good thing. But first, what is this versioning issue that is troubling your serialization process?
Well, lets say you created a class, instantiated it, and wrote it out to an object stream. That flattened object sits in the file system for some time. Meanwhile, you update the class file, perhaps adding a new field. Now try to read the flattened object. hmmmm.. An exception "java.io.InvalidClassException" will be thrown. You dont understand where it went wrong because the changes in the class seem perfectly fine for you.
What is serialVersionUID?
Before we start discussing about the solution for this problem, lets first see what is actually causing this problem? Why should any change in a serialized class throw InvalidClassException? During object serialization, the default Java serialization mechanism writes the metadata about the object, which includes the class name, field names and types, and superclass. All this information is stored as part of the serialized object. When you deserialize the object, this information is read to reconsitute the object. But to perform the deserialization, the object needs to be identified first and this will be done by serialVersionUID. So everytime an object is serialized the java serialization mechanism automatically computes a hash value using ObjectStreamClass’s computeSerialVersionUID() method by passing the class name, sorted member names, modifiers, and interfaces to the secure hash algorithm (SHA), which returns a hash value, the serialVersionUID.
Now when the serilaized object is retrieved, the JVM first evaluates the serialVersionUID of the serialized class and compatible with the class. If the serialVersionUID values match then the object is said to be compatible with the class and hence it is de-serialized. If not InvalidClassException exception is thrown.
The above issue not only occurs when the object is flattened and saved but also when the object is flattened and sent to other JVMs when you implement RMI. Lets assume you have a client/server environment where client is using SUN's JVM in windows while server is using JRockit in Linux. Client sends a serializable class with default generated serialVersionUID (e.g 123L) to server over socket, while server may generate a different serialVersionUID (e.g 124L) during deserialization process, and raise an unexpected InvalidClassExceptions. Since the default serialVersionUID computation is highly sensitive to class details and may vary from different JVM implementation, an unexpected InvalidClassExceptions will result here.
What's the solution for this versioning issue?
The solution is very simple. Instead of relying on the JVM to generate the serialVersionUID, you explicitly mention (generate) the serialVersionUID in your class. The syntax is:
private final static long serialVersionUID = <integer value>
Yes, its a static, private variable in the class. Once you define the serialVersionUID in your class explicitly, you dont need to update it until and unless you make the incompatible changes. Look at the example below that explains the issue and importance of maintaining serialVersionUID.
class TestSUID implements Serializable {
private static final long serialVersionUID = 1L;
private int someId;
public TestSUID (int someId) {
this.someId = someId;
}
public int getSomeId() {
return someId;
}
}
public class SUIDTester {
public static void main(String args[]) throws Exception {
File file = new File("temp.ser");
FileOutputStream fos = new FileOutputStream(file);
ObjectOutputStream oos = new ObjectOutputStream(fos);
TestSUID writeSUID = new TestSUID(1);
oos.writeObject(writeSUID);
oos.close();
FileInputStream fis = new FileInputStream(file);
ObjectInputStream ois = new ObjectInputStream(fis);
TestSUID readSUID = (TestSUID) ois.readObject();
System.out.println("someId : " + readSUID.getSomeId());
ois.close();
}
}
In this example, we have created a Serializable class with serialVersionUID = 1L and saved the "some id" value in the "temp.ser" file. Now change the serialVersionUID value of "TestSUID" class to 2L and try to just read the "temp.ser" file. It will throw "InvalidClassException". The reason is the version change and exactly this is the reason for maintaining the version.
Exception in thread "main" java.io.InvalidClassException:
SerializeMe; local class incompatible: stream classdesc
serialVersionUID = 1, local class serialVersionUID = 2
When should you update serialVersionUID?
Adding serialVersinUID manually to the class does not mean that it should never be updated and never need not be updated. There is no need to update the serialVersionUID if the change in the class is compatible but it should be updated if the change is incompatible. What are compatible and incompatible changes? A compatible change is a change that does not affect the contract between the class and the callers.
The compatible changes to a class are handled as follows:
- Adding fields - When the class being reconstituted has a field that does not occur in the stream, that field in the object will be initialized to the default value for its type. If class-specific initialization is needed, the class may provide a readObject method that can initialize the field to nondefault values.
- Adding classes - The stream will contain the type hierarchy of each object in the stream. Comparing this hierarchy in the stream with the current class can detect additional classes. Since there is no information in the stream from which to initialize the object, the class’s fields will be initialized to the default values.
- Removing classes - Comparing the class hierarchy in the stream with that of the current class can detect that a class has been deleted. In this case, the fields and objects corresponding to that class are read from the stream. Primitive fields are discarded, but the objects referenced by the deleted class are created, since they may be referred to later in the stream. They will be garbage-collected when the stream is garbage-collected or reset.
- Adding writeObject/readObject methods - If the version reading the stream has these methods then readObject is expected, as usual, to read the required data written to the stream by the default serialization. It should call defaultReadObject first before reading any optional data. The writeObject method is expected as usual to call defaultWriteObject to write the required data and then may write optional data.
- Removing writeObject/readObject methods - If the class reading the stream does not have these methods, the required data will be read by default serialization, and the optional data will be discarded.
- Adding java.io.Serializable - This is equivalent to adding types. There will be no values in the stream for this class so its fields will be initialized to default values. The support for subclassing nonserializable classes requires that the class’s supertype have a no-arg constructor and the class itself will be initialized to default values. If the no-arg constructor is not available, the InvalidClassException is thrown.
- Changing the access to a field - The access modifiers public, package, protected, and private have no effect on the ability of serialization to assign values to the fields.
- Changing a field from static to nonstatic or transient to nontransient - When relying on default serialization to compute the serializable fields, this change is equivalent to adding a field to the class. The new field will be written to the stream but earlier classes will ignore the value since serialization will not assign values to static or transient fields.
Incompatible changes to classes are those changes for which the guarantee of interoperability cannot be maintained. The incompatible changes that may occur while evolving a class are:
- Deleting fields - If a field is deleted in a class, the stream written will not contain its value. When the stream is read by an earlier class, the value of the field will be set to the default value because no value is available in the stream. However, this default value may adversely impair the ability of the earlier version to fulfill its contract.
- Moving classes up or down the hierarchy - This cannot be allowed since the data in the stream appears in the wrong sequence.
- Changing a nonstatic field to static or a nontransient field to transient - When relying on default serialization, this change is equivalent to deleting a field from the class. This version of the class will not write that data to the stream, so it will not be available to be read by earlier versions of the class. As when deleting a field, the field of the earlier version will be initialized to the default value, which can cause the class to fail in unexpected ways.
- Changing the declared type of a primitive field - Each version of the class writes the data with its declared type. Earlier versions of the class attempting to read the field will fail because the type of the data in the stream does not match the type of the field.
- Changing the writeObject or readObject method so that it no longer writes or reads the default field data or changing it so that it attempts to write it or read it when the previous version did not. The default field data must consistently either appear or not appear in the stream.
- Changing a class from Serializable to Externalizable or visa-versa is an incompatible change since the stream will contain data that is incompatible with the implementation in the available class.
- Removing either Serializable or Externalizable is an incompatible change since when written it will no longer supply the fields needed by older versions of the class.
- Adding the writeReplace or readResolve method to a class is incompatible if the behavior would produce an object that is incompatible with any older version of the class.
How to generate a serialVersionUID?
There are two ways to generate the serialVersionUID.
- Go to commanline and type "serialver <>. SerialVersionUID wil be generated. Copy, paste the same into your class.
- In Windows, generate serialVersionUID using the JDK's graphical tool like so : use Control Panel | System | Environment to set the classpath to the correct directory
- run serialver -show from the command line
- point the tool to the class file including the package, for example, finance.stock.Account - without the .class
- (here are the serialver docs for both Win and Unix)
- One way is through Eclipse IDE. After you implement Serializable interface and save the class, eclipse will show a warning asking you to add the serialVersionUID and it provides you the option to generate it or use the default one. Click on the link to generate the serialVersionUID and it will generate it for you and adds it to the class.
Finally few guidelines for serialVersionUID :
- always include it as a field, for example: "private static final long serialVersionUID = 7526472295622776147L; " include this field even in the first version of the class, as a reminder of its importance
- do not change the value of this field in future versions, unless you are knowingly making changes to the class which will render it incompatible with old serialized objects
- new versions of Serializable classes may or may not be able to read old serialized objects; it depends upon the nature of the change; provide a pointer to Sun's guidelines for what constitutes a compatible change, as a convenience to future maintainers
Externalization in Java
Before going into what externalization is, you need to have some knowledge on what serialization is because externalization is nothing but serialization but an alternative for it and Externalizable interface extends Serializable interface. Check Serialization article for information on serialization. Just as an overview, Serialization is the process of converting an object's state (including its references) to a sequence of bytes, as well as the process of rebuilding those bytes into a live object at some future time. Serialization can be achieved by an object by implementing Serializable interface or Externalizable interface.
Well, when serialization by implementing Serializable interface is serving your purpose, why should you go for externalization?
Good question! Serializing by implementing Serializable interface has some issues. Lets see one by one what they are.
- Serialization is a recursive algorithm. What I mean to say here is, apart from the fields that are required, starting from a single object, until all the objects that can be reached from that object by following instance variables, are also serialized. This includes the super class of the object until it reaches the "Object" class and the same way the super class of the instance variables until it reaches the "Object" class of those variables. Basically all the objects that it can read. This leads to lot of overheads. Say for example, you need only car type and licence number but using serialization, you cannot stop there. All the information that includes description of car, its parts, blah blah will be serialized. Obviously this slows down the performance.
- Both serializing and deserializing require the serialization mechanism to discover information about the instance it is serializing. Using the default serialization mechanism, will use reflection to discover all the field values. Also the information about class description is added to the stream which includes the description of all the serializable superclasses, the description of the class and the instance data associated with the specific instance of the class. Lots of data and metadata and again performance issue.
- You know that serialization needs serialVersionUID, a unique Id to identify the information persisted. If you dont explicitly set a serialiVersionUID, serialization will compute the serialiVersionUID by going through all the fields and methods. So based on the size of the class, again serialization mechanism takes respective amount of time to calculate the value. A third performance issue.
- Above three points confirm serialization has performance issues. Apart from performance issues,When an object that implements Serializable interface, is serialized or de-serialized, no constructor of the object is called and hence any initialization which is done in the constructor cannot be done. Although there is an alternative of writing all initialization logic in a separate method and call it in constructor and readObject methods so that when an object is created or deserialized, the initialization process can happen but it definitely is a messy approach.
The solution for all the above issues is Externalization. Cool. Here enters the actual topic.
So what is externalization?
Externalization is nothing but serialization but by implementing Externalizable interface to persist and restore the object. To externalize your object, you need to implement Externalizable interface that extends Serializable interface. Here only the identity of the class is written in the serialization stream and it is the responsibility of the class to save and restore the contents of its instances which means you will have complete control of what to serialize and what not to serialize. But with serialization the identity of all the classes, its superclasses, instance variables and then the contents for these items is written to the serialization stream. But to externalize an object, you need a default public constructor.
Unlike Serializable interface, Externalizable interface is not a marker interface and it provides two methods - writeExternal and readExternal. These methods are implemented by the class to give the class a complete control over the format and contents of the stream for an object and its supertypes. These methods must explicitly coordinate with the supertype to save its state. These methods supersede customized implementations of writeObject and readObject methods.
How serialization happens? JVM first checks for the Externalizable interface and if object supports Externalizable interface, then serializes the object using writeExternal method. If the object does not support Externalizable but implement Serializable, then the object is saved using ObjectOutputStream. Now when an Externalizable object is reconstructed, an instance is created first using the public no-arg constructor, then the readExternal method is called. Again if the object does not support Externalizable, then Serializable objects are restored by reading them from an ObjectInputStream.
Lets see a simple example.
import java.io.*;
public class Car implements Externalizable {
String name;
int year;
/*
* mandatory public no-arg constructor
*/
public Car() { super(); }
Car(String n, int y) {
name = n;
year = y;
}
/**
* Mandatory writeExernal method.
*/
public void writeExternal(ObjectOutput out) throws IOException {
out.writeObject(name);
out.writeInt(year);
}
/**
* Mandatory readExternal method.
*/
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
name = (String) in.readObject();
year = in.readInt();
}
/**
* Prints out the fields. used for testing!
*/
public String toString() {
return("Name: " + name + "\n" + "Year: " + year);
}
}
import java.io.*;
public class ExternExample {
public static void main(String args[]) {
// create a Car object
Car car = new Car("Mitsubishi", 2009);
Car newCar = null;
//serialize the car
try {
FileOutputStream fo = new FileOutputStream("tmp");
ObjectOutputStream so = new ObjectOutputStream(fo);
so.writeObject(car);
so.flush();
} catch (Exception e) {
System.out.println(e);
System.exit(1);
}
// de-serialize the Car
try {
FileInputStream fi = new FileInputStream("tmp");
ObjectInputStream si = new ObjectInputStream(fi);
newCar = (Car) si.readObject();
}
catch (Exception e) {
System.out.println(e);
System.exit(1);
}
/*
* Print out the original and new car information
*/
System.out.println("The original car is ");
System.out.println(car);
System.out.println("The new car is ");
System.out.println(newCar);
}
}
In this example, class Car implements Externalizable interface which means that car object is ready for serialization. This class have two public methods - "writeExternal" and "readExternal". Unlike Serializable interface which will serialize all the variables in the object with just by implementing the interface, here you have to explicitly mention what fields or variables you want to serialize and the same is done in "writeExternal" and "readExternal" methods. So in the "ExternExample" class, when you write the "Car" object to the OutputStream, the "writeExternal" method is called and the data is persisted. The same applies to "readExternal" method in the Car object i.e., when you read the "Car" object from the ObjectInputStream, "readExternal" method is called.
What will happen when an externalizable class extends a non externalizable super class?
Then in this case, you need to persist the super class fields also in the sub class that implements Externalizable interface. Look at this example.
/**
* The superclass does not implement externalizable
*/
class Automobile {
/*
* Instead of making thse members private and adding setter
* and getter methods, I am just giving default access specifier.
* You can make them private members and add setters and getters.
*/
String regNo;
String mileage;
/*
* A public no-arg constructor
*/
public Automobile() {}
Automobile(String rn, String m) {
regNo = rn;
mileage = m;
}
}
public class Car extends Automobile implements Externalizable {
String name;
int year;
/*
* mandatory public no-arg constructor
*/
public Car() { super(); }
Car(String n, int y) {
name = n;
year = y;
}
/**
* Mandatory writeExernal method.
*/
public void writeExternal(ObjectOutput out) throws IOException {
/*
* Since the superclass does not implement the Serializable interface
* we explicitly do the saving.
*/
out.writeObject(regNo);
out.writeObject(mileage);
//Now the subclass fields
out.writeObject(name);
out.writeInt(year);
}
/**
* Mandatory readExternal method.
*/
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
/*
* Since the superclass does not implement the Serializable interface
* we explicitly do the restoring
*/
regNo = (String) in.readObject();
mileage = (String) in.readObject();
//Now the subclass fields
name = (String) in.readObject();
year = in.readInt();
}
/**
* Prints out the fields. used for testing!
*/
public String toString() {
return("Reg No: " + regNo + "\n" + "Mileage: " + mileage +
"Name: " + name + "\n" + "Year: " + year );
}
}
Here the Automobile class does not implement Externalizable interface. So to persist the fields in the automobile class the writeExternal and readExternal methods of Car class are modified to save/restore the super class fields first and then the sub class fields.
Sounds good! What if the super class implements the Externalizable interface?
Well, in this case the super class will also have the readExternal and writeExternal methods as in Car class and will persist the respective fields in these methods.
import java.io.*;
/**
* The superclass implements externalizable
*/
class Automobile implements Externalizable {
/*
* Instead of making thse members private and adding setter
* and getter methods, I am just giving default access specifier.
* You can make them private members and add setters and getters.
*/
String regNo;
String mileage;
/*
* A public no-arg constructor
*/
public Automobile() {}
Automobile(String rn, String m) {
regNo = rn;
mileage = m;
}
public void writeExternal(ObjectOutput out) throws IOException {
out.writeObject(regNo);
out.writeObject(mileage);
}
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
regNo = (String)in.readObject();
mileage = (String)in.readObject();
}
}
public class Car extends Automobile implements Externalizable {
String name;
int year;
/*
* mandatory public no-arg constructor
*/
public Car() { super(); }
Car(String n, int y) {
name = n;
year = y;
}
/**
* Mandatory writeExernal method.
*/
public void writeExternal(ObjectOutput out) throws IOException {
// first we call the writeExternal of the superclass as to write
// all the superclass data fields
super.writeExternal(out);
//Now the subclass fields
out.writeObject(name);
out.writeInt(year);
}
/**
* Mandatory readExternal method.
*/
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
// first call the superclass external method
super.readExternal(in);
//Now the subclass fields
name = (String) in.readObject();
year = in.readInt();
}
/**
* Prints out the fields. used for testing!
*/
public String toString() {
return("Reg No: " + regNo + "\n" + "Mileage: " + mileage +
"Name: " + name + "\n" + "Year: " + year );
}
}
In this example since the Automobile class stores and restores its fields in its own writeExternal and readExternal methods, you dont need to save/restore the superclass fields in sub class but if you observe closely the writeExternal and readExternal methods of Car class closely, you will find that you still need to first call the super.xxxx() methods that confirms the statement the externalizable object must also coordinate with its supertype to save and restore its state.
Lets see the difference in sizes when you serialize using Serializable interface and serialize using Externalizable interface
Let's take a simple case, an object of type SimpleClass with just few fields - firstName, lastName, weight and location, containing data {"Brad", "Pitt", 180.5, {49.345, 67.567}}. When you serialize this object that is about 24 bytes by implementing Serializable interface, it turns into 220 bytes (approx). As it turns out, the basic serialization mechanism stores all kinds of information in the file so that it can deserialize without any other assistance. Look at the format below when the object is serialized and you will understand why it is turned out to 200 bytes.
Length: 220 |
Now if you serialize the same by extending Externalizable interface, the size will be reduced drastically and the information saved in the persistant store is also reduced a lot. Here is the result of serializing the same class, modified to be externalizable. Notice that the actual data is not parseable externally any more--only your class knows the meaning of the data!
Length: 54 |
Well, externalization has its own limitations
Externalization efficiency comes at a price. The default serialization mechanism adapts to application changes due to the fact that metadata is automatically extracted from the class definitions (observe the format above and you will see that when the object is serialized by implementing Serializable interface, the class metadata(definitions) are written to the persistent store while when you serialize by implementing Externalizable interface, the class metadata is not written to the persistent store). Externalization on the other hand isn't very flexible and requires you to rewrite your marshalling and demarshalling code whenever you change your class definitions.
As you know a default public no-arg constructor will be called when serializing the objects that implements Externalizable interface. Hence, Externalizable interface can't be implemented by Inner Classes in Java as all the constructors of an inner class in Java will always accept the instance of the enclosing class as a prepended parameter and therefore you can't have a no-arg constructor for an inner class. Inner classes can achieve object serialization by only implementing Serializable interface.
If you are subclassing your externalizable class, you have to invoke your superclass’s implementation. So this causes overhead while you subclass your externalizable class. Observe the examples above where the superclass writeExternal method is explicitly called in the subclass writeExternal method.Methods in externalizable interface are public. So any malicious program can invoke which results into loosing the prior serialized state.
Once your class is tagged with either Serializable or Externalizable, you can't change any evolved version of your class to the other format. You alone are responsible for maintaining compatibility across versions. That means that if you want the flexibility to add fields in the future, you'd better have your own mechanism so that you can skip over additional information possibly added by those future versions.
So much of it. Here are some final tips for serialization.
You can decide whether to implement Externalizable or Serializable on a class-by-class basis. Within the same application, some of your classes can be Serializable, and some can be Externalizable. This makes it easy to evolve your application in response to actual performance data and shifting requirements. You can do the following thing:
* Make all your classes implement Serializable.
* Then make some of them, the ones you send often and for which serialization is inefficient, implement Externalizable instead.
To reduce memory size:
* Write primitives or Strings directly. For example, instead of writing out a contained object, Point (in SimpleClass, we have a field of type Point), write out each of its integer coordinates separately. When you read them in, create a new Point from the two integers. This can be very significant in terms of size: an array of three Points takes 117 bytes; an array of 6 ints takes 51 bytes.
* Strings are special-cased and don't carry much of the object overhead; you will normally use them as is. However, the serialized representation of a String is UTF, which works great for ASCII characters, is neutral for most European characters, but causes a 50% increase in size for Japanese and other scripts. If you have significant strings of Asian text you better serialize a char array instead
Why must classes be marked serializable in order to be written to an ObjectOutputStream?
The decision to require that classes implement the java.io.Serializable interface was not made lightly. The design called for a balance between the needs of developers and the needs of the system to be able to provide a predictable and safe mechanism. The most difficult design constraint to satisify was the safety and security of Java classes.
If classes were to be marked as being serializable the design team worried that a developer, either out of forgetfulness, laziness, or ignorance might not declare a class as being Serializable and then make that class useless for RMI or for purposes of persistence. We worried that the requirement would place on a developer the burden of knowing how a class was to be used by others in the future, an essentially unknowable condition. Indeed, our preliminary design, as reflected in the alpha API, concluded that the default case for a class ought to be that the objects in the class be serializable. We changed our design only after considerations of security and correctness convinced us that the default had to be that an object not be serialized.
Security restrictions
The first consideration that caused us to change the default behavior of objects had to do with security, and in particular in the privacy of fields declared to be private, package protected, or protected. The Java runtime restricts access to such fields for either read or write to a subset of the objects within the runtime.
No such restriction can be made on an object once it has been serialized; the stream of bytes that are the result of object serialization can be read and altered by any object that has access to that stream. This allows any object access to the state of a serialized object, which can violate the privacy guarantees users of the language expect. Further, the bytes in the stream can be altered in arbitrary ways, allows the reconstruction of an object that was never created within the protections of a Java environment. There are cases in which the recreation of such an object could compromise not only the privacy guarantees expected by users of the Java environment, but the integrity of the environment itself.
These violations cannot be guarded against, since the whole idea of serialization is to allow an object to be converted into a form that can be moved outside of the Java environment (and therefore outside of the privacy and integrity guarantees of that environment) and then be brought back into the environment. Requiring objects to be declared serializable does mean that the class designer must make an active decision to allow the possibility of such a breach in privacy or integrity. A developer who does not know about serialization should not be open to compromise because of this lack of knowledge. In addition, we would hope that the developer who declares a class to be Serializable does so after some thought about the possible consequences of that declaration.
Note that this sort of security problem is not one that can be dealt with by the mechanism of a security manager. Since serialization is intended to allow the transport of an object from one virtual machine to some other (either over space, as it is used in RMI, or over time, as when the stream is saved to a file), the mechanisms used for security need to be independent of the runtime environment of any particular virtual machine. We wanted to avoid as much as possible the problem of being able to serialize an object in one virtual machine and not being able to deserialize that object in some other virtual machine. Since the security manager is part of the runtime environment, using the security manager for serialization would have violated this requirement.
Forcing a conscious decision
While security concerns were the first reason for considering the design change, a reason that we feel is at least as convincing is that serialization should only be added to a class after some design consideration. It is far too easy to design a class that falls apart under serialization and reconstruction. By requiring a class designer to declare support for the Serialization interface, we hoped that the designer would also give some thought to the process of serializing that class.
Examples are easy to cite. Many classes deal with information that only makes sense in the context of the runtime in which the particular object exists; examples of such information include file handles, open socket connections, security information, etc. Such data can be dealt with easily by simply declaring the fields as transient, but such a declaration is only necessary if the object is going to be serialized. A novice (or forgetful, or hurried) programmer might neglect to mark fields as transient in much the same way he or she might neglect to mark the class as implementing the Serializable interface. Such a case should not lead to incorrect behavior; the way to avoid this is to not serialize objects not marked as implementing Serializable.
Another example of this sort is the "simple" object that is the root of a graph that spans a large number of objects. Serializing such an object could result in serializing lots of others, since serialization works over an entire graph. Doing something like this should be a conscious decision, not one that happens by default.
The need for this sort of thought was brought home to us in the group when we were going through the base Java class libraries marking the system classes as Serializable (where appropriate). We had originally thought that this would be a fairly simple process, and that most of the system classes could just be marked as implementing Serializable and then use the default implementation with no other changes. What we found was that this was far less often the case than we had suspected. In a large number of the classes, careful thought had to be given to whether or not a field should be marked as transient or whether it made sense to serialize the class at all.
Of course, there is no way to guarantee that a programmer or class designer is actually going to think about these issues when marking a class as Serializable. However, by requiring the class to declare itself as implementing the Serializable interface we do require that some thought be given by the programmer. Having serialization be the default state of an object would mean that lack of thought could cause bad effects in a program, something that the overall design of Java has attempted to avoid.
A Serializable object is written with writeObject, modified and written a second time, the modification is missing when deserializing the stream.
The ObjectOutputStream class keeps track of each object it serializes and sends only the handle if the object is written into the stream a subsequent time. This is the way it deals with graphs of objects. The corresponding ObjectInputStream keeps track of all of the objects it has created and their handles so when the handle is seen again it can return the same object. Both output and input streams keep this state until they are freed.
Alternatively, the ObjectOutputStream class implements a reset method that discards the memory of having sent an objecct, so sending an object again will make a copy.
OutOfMemoryError thrown after writing a large number of objects into an ObjectOutputStream
The ObjectOutputStream maintains a table mapping objects written into the stream to a handle. The first time an object is written to a stream its contents are written into the stream, subsequent writes of the object result in a handle to the object being written into the stream. This table maintains references to objects that might otherwise be unreachable by an application, thus, resulting in an unexpected situation of running out of memory. A call to the ObjectOutputStream.reset() method resets the object/handle table to its initial state, allowing all previously written objects to be elgible for garbage collection. See handle.
Does object serialization support encryption?
Object Serialization does not contain any encryption/decryption in itself. It write to and reads from Java Streams, so it can be coupled with any available encryption technology. Object serialization can be used in many different ways from simple persistence, writing and read to/from files, or for RMI to communicate across hosts.
RMI's use of serialization leaves encryption and decryption to the lower network transport. We expect that when a secure channel is needed the network connections will be made using SSL or the like.
The object serialization classes are stream oriented. How do I write objects to a random access file?
Currently there is no direct way to write objects to a random access file.
You can use the ByteArray I/O streams as an intermediate place to write and read bytes to/from the random access file and create Object I/O streams from the byte streams to write/read the objects. You just have to make sure that you have the entire object in the byte stream or reading/writing the object will fail.
For example, java.io.ByteArrayOutputStream can be used to receive the bytes of ObjectOutputStream. From it you can get a byte[] of the result. That in turn can be used with ByteArrayInputStream as input to ObjectInput
How can I create an ObjectInputStream from an ObjectOutputStream without a file in between?
ObjectOutputStream and ObjectInputStream work to/from any stream object. You could use a ByteArrayOutputStream and then get the array and insert it into a ByteArrayInputStream. You could also use the piped stream classes as well. Any java.io class that extends the OutputStream and InputStream classes can be used.
Alternatively, the ObjectOutputStream class implements a reset method that discards the memory of having sent an object, so sending an object again will make a copy.
Can I compute diff(serial(x),serial(y))?
The diff will produce the same stream each time the same object is serialized. You will need to create a new ObjectOutputStream to serialize each object.
Can I compress the serial representation of my objects using my own zip/unzip methods?
ObjectOutputStream produces an OutputStream, If your zip object extends the OutputStream class there is no problem compressing it.
Can I execute methods on compressed versions of my objects, for example isempty(zip(serial(x)))?
This is not really viable for arbitrary objects because of the encoding of objects. For a particular object (such as String) you can compare the resulting bit streams. The encoding is stable, in that every time the same object is encoded it is encoded to the same set of bits.
How do I serialize a tree of objects?
Here's a brief example that shows how to serialize a tree of objects.
import java.io.*;
class tree implements java.io.Serializable {
public tree left;
public tree right;
public int id;
public int level;
private static int count = 0;
public tree(int l) {
id = count++;
level = l;
if (l > 0) {
left = new tree(l-1);
right = new tree(l-1);
}
}
public void print(int levels) {
for (int i = 0; i < level; i++)
System.out.print(" ");
System.out.println("node " + id);
if (level <= levels && left != null)
left.print(levels);
if (level <= levels && right != null)
right.print(levels);
}
public static void main (String argv[]) {
try {
/* Create a file to write the serialized tree to. */
FileOutputStream ostream = new FileOutputStream("tree.tmp");
/* Create the output stream */
ObjectOutputStream p = new ObjectOutputStream(ostream);
/* Create a tree with three levels. */
tree base = new tree(3);
p.writeObject(base); // Write the tree to the stream.
p.flush();
ostream.close(); // close the file.
/* Open the file and set to read objects from it. */
FileInputStream istream = new FileInputStream("tree.tmp");
ObjectInputStream q = new ObjectInputStream(istream);
/* Read a tree object, and all the subtrees */
tree new_tree = (tree)q.readObject();
new_tree.print(3); // Print out the top 3 levels of the tree
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
If class A does not implement Serializable but a subclass B implements Serializable, will the fields of class A be serialized when B is serialized?
Only the fields of Serializable objects are written out and restored. The object may be restored only if it has a no-arg constructor that will initialize the fields of non-serializable supertypes. If the subclass has access to the state of the superclass it can implement writeObject and readObject to save and restore that state.
When a local object is serialized and passed as a parameter in an RMI call, are the byte codes for the local object's methods also passed? What about object coherency, if the remote VM application "keeps" the object handle?
The bytecodes for a local object's methods are not passed directly in the ObjectOutputStream, but the object's class may need to be loaded by the receiver if the class is not already available locally. (The class files themselves are not serialized, just the names of the classes.) All classes must be able to be loaded during deserialization using the normal class loading mechanisms. For applets this means they are loaded by the AppletClassLoader.
There are no conherency guarantees for local objects passed to a remote VM since such objects are passed by copying their contents (a true pass-by-value).
Which JDK 1.1 system classes will be marked serializable.?
Here's an initial list of the classes that are marked Serializable. Note that classes that extend these classes are also serializable:
java.lang.Character
java.lang.Boolean
java.lang.String
java.lang.StringBuffer
java.lang.Throwable - Including all subtypes of Exception
java.lang.Number - including Integer, Long, etc.
java.util.Hashtable
java.util.Random
java.util.Vector - includes Stack
java.util.Date
java.util.BitSet
java.io.File
java.net.InetAddress
java.rmi.server.RemoteObject
The AWT classes
Arrays of primitives
Arrays of objects are Serializable though the objects may not be.
There are many classes for which Serialization makes no sense, such as those representing the state of something in the current VM (e.g. java.io.FileInputStream) or are exceedingly hard to do correctly (e.g. java.lang.Thread).. I am having problems deserializing AWT components. How can I make this work?
AWT has not yet been modified to work well with Serialization. When you serialize AWT widgets, also serialized are the Peer objects that map the AWT functions to the local window system. When you deserialize (reconsitute) the AWT widgets, the old Peers are recreated, but they are out of date. Peers are native to the local window system and contain pointers to data structures in the local address space, and therefore cannnot be moved.
As a work around you should first remove the top level widget from its container (so the widgets are no longer live). The peers are discarded at this point and you will save only the AWT widget state. When you later deserialize and read the widgets back in, add the top level widget to the frame to make the AWT widgets appear. You may need to add a show call.
For JDK 1.1 AWT widgets wil be serializable. However, they will not be interoperable with JDK 1.0.2 widgets.
Are there any plans to support the serialization of thread objects?
In JDK1.1 Threads will NOT be serializable. In the present implementation, if you attempt to serialize and then deserialize a thread, there is NO explicit allocation of a new native thread or stack; all that happens is that the Java object is allocated with none of the native implementation. In short, it just won't work and will fail in unpredictable ways.
The difficulty with threads is that they have so much state which is intricately tied into the virtual machine that it is difficult or impossible to re-establish the context somewhere else. For example, saving the Java call stack is insufficient because if there were native methods that had called C procedures that in turn called Java, there would be an incredible mix of Java constructs and C pointers to deal with. Also, Serializing the stack would imply serializing any object reachable from any stack variable.
If a thread were resumed in the same VM, it would be sharing a lot of state with the original thread, and would therefore fail in unpredictable ways if both threads were running at once, just like two C threads trying to share a stack. When deserialized in a separate VM, its hard to tell what might happen.
If I try to serialize a font or image object and reconstitute it in a different VM, my application dies. Why?
AWT does not yet work well with serialization and you will therefore have trouble trying to pass fonts and images. This is because each contains memory pointers that are valid only in the originating VM, which will cause a segmentation violation when passed to a new VM.
These problems should be corrected by the time JDK 1.1 releases. As a work around for fonts, you will need to pass the information necessary to recreate a new font object that duplicates the characteristics of the font object in the originating VM. There is no current work around to allow images to be passed correctly.
Difference b/w Serializable and Externalizable Interfaces
So lets discuss what are the differences in both the interfaces and how to decide which one should be used -
- Serializable Interface is based on a recursive algorithm i.e during the serialization besides the fields it will serialize all the objects that can be reached through its instance variables i.e. all the objects that can be reached from that object (provided that all the classes must implement Serializable Interface). This includes the super class of the object until it reaches the “Object” class and the same way the super class of the instance variables until it reaches the “Object” class of those variables. Basically all the objects that it can read. And this leads to a lot of overhead when we want to save only few variable or a small data as compared to the class
- For eg - If you have a class named Mercedes and you just want to store the car series and its car identification number then you can not stop at this only and will have to store all the fields of that class and also of its super class(if exists and implements serializable interface) and a lot more.
- Serializable is a marker interface and hence no need to override any method and whenever there is any change in the entity or bean classes you just need to recompile your program whether in the case of Externalizable interface you have to implement writeExternal() andreadExternal() methods which contains the logic to store and retrieve data and with changes you might need to do changes in the code logic.
- Serializable provides you both options i.e. you can handle the process by your own or you can leave it for the process to be done in the default way but in Externalizable you have to provide the logic of the process and have full control over the serialize and deserialize process.
- Serializable involves reflection mechanism to recover the object. This also adds the the metadata i.e. class description, variable information etc of all the serializable classes in the process which adds a lot of data and metadata into the stream and consumes bandwidth and a performance issue.
- A public no-arg constructor is needed while using Externalizable interface but in Serializable it reads the required information from the ObjectInputStream and this is why it uses reflection mechanism.
- You need to define serialVersionUID in case of Serializable and if it is not explicitly defined it will be generated automatically and it is based on all the fields, methods etc of the class and it changes every time you do the changes in the class. You if current id does not match with generated id you will not be able to recover the previously stored data. Since the ID is generated every time it will take considerable amount of time which is not a case with externalizable interface.
- Externalizable interface is fast and also consumes less memory as compared to the other one.