Improving Object Storage and Access Efficiency with Adaptive Data Storage Structures

Storing string representations of primitive data type values uses more memory than storing them as primitive data type values. Adaptive data storage structures minimize storage by determining which primitive data type is represented by a string, converting it into a value of that type, and storing it in the most appropriate object wrapper. Identifying the type of each value at storage time enables fast and efficient identification of objects of different classes when performing tasks such as searching, comparisons, and calculations. Adaptive data storage structures provide information about the objects they stores, such as the most frequently occurring class.

1. Introduction

When the data type of a set of values to be stored is unknown at compilation time, or may be of different types, a generic data structure is required. In Java, generic data structure objects such as those that implement the List, Set, and Map interfaces, can store objects of any class derived from the class Object (Horstmann & Cornell 1999). This feature makes them very flexible and enables a variety of object types to be stored in the same data structure.

In some cases, the values to be stored are string representations of primitive data types, such as integral and floating point values. Values marked up in XML, for example, are returned as strings from an XML parser. A simple solution is to store the values as String objects in a generic data structure such as a list.

Storing string representations of values with a primitive data type is inefficient because, in general, the string representation of a primitive data type value stored in a String object requires more bytes than the corresponding primitive type object wrapper such as Integer or Double. There is also a significant overhead in storing a single character in a String object; the implementation of the String class uses three int variables for internal housekeeping tasks.

2. Storage Requirements Analysis

The following example highlights why string representations use more memory than the equivalent primitive data type value. Assume that one character is stored in one byte and one integer is stored in four bytes. The string representation of “83742” stored in a String object takes 5 bytes for the characters 8, 3, 7, 4, and 2, and 12 bytes for the three housekeeping int variables, making a total of 17 bytes. The int representation of 83742 takes only four bytes.

Tests were conducted on 1000, 10000, and 500000 objects to determine the difference between the amount of memory required to store values of the primitive data types char, int, and double, in the corresponding object wrappers Character, Integer, and Double, and their equivalent string representations stored in String objects. The following points summarize the findings:

  • 120% more memory is required to store a single character as a string in a String object rather than as a single character in a Character object.
  • 200% more memory is required to store the maximum int value, 2147483647, as a string in a String object than as an int in an Integer object, and 160% more to store 123 in a String than an Integer.
  • 308% more memory is required to store the maximum double value, 1.7976931348623157E308, in a String than in a Double, and 197% more to store 123.456 in a String than a Double.
  • For the three primitive data types char, int, and double, the memory saved by storing them as primitive data type values in wrapper objects rather than as strings ranges from 120% to 308%.

The following tables summarize the average difference in storage required.

Strings

The following table shows the number of megabytes required to store characters in String objects rather than as the primitive char data type wrapped in a Character object.

Number of Objects Ave. % Difference
1000 10000 500000
String "c" 44072 440040 21988840
Character 'c' 20072 200040 9988840
% Difference 119.5695 119.976 120.1341 119.8932

Integers

The following tables show the number of megabytes required to store two integers in String objects rather than as the primitive int data type wrapped in an Integer object.

Number of Objects Ave. % Difference
1000 10000 500000
String "2147483645" 60072 600040 29989080
Integer 2147483645 20072 200040 9988840
% Difference 199.2826 199.96 200.2259 199.8228
Number of Objects Ave. % Difference
1000 10000 500000
String "123" 52072 520120 25988840
Integer 123 20072 200040 9988840
% Difference 159.4261 160.008 160.1788 159.8709

Floating Point Numbers

The following tables show the number of megabytes required to store two floating point numbers in String objects rather than as the primitive double data type wrapped in a Double object.

Number of Objects Ave. % Difference
1000 10000 500000
String "1.7976E308" 87240 843952 41993336
Double 1.7976E308 22672 202640 9991440
% Difference 284.7918 316.4785 320.2931 307.1878
Number of Objects Ave. % Difference
1000 10000 500000
String "123.456" 61016 601304 29992984
Double 123.456 20952 200920 9989720
% Difference 191.218 199.2753 200.2385 196.9106

3. Adaptive Storage Data Structures

A more efficient solution is to determine the primitive data type of the value to be stored, and store it as an object wrapper for that primitive type. For example, the string value “83742” represents an integral value that would be stored more efficiently as an Integer object that wraps the int primitive data type.

All generic data structures such as lists, sets, and maps can be extended to become adaptive data storage structures by adding methods for storing string representations of primitive data types in the most appropriate object wrapper for that type.

Apart from lower storage requirements, storing numeric values in the most appropriate representation has three other advantages: numeric values can be identified efficiently, different views of the data can be provided, and information about the type of objects can be provided. Adaptive data storage structures enable numeric values to be identified efficiently. String values do not need to be parsed into numeric values when they are needed for calculations or comparisons because the numeric values are already stored as numeric objects. The parsing overhead is incurred when the data is stored rather than when it is used which can improve the responsiveness of interactive applications, especially when large amounts of data are involved.

Applications often need to provide different views of data when, for example, displaying the values for users to search and browse. Literal string matching can be performed on all of the elements because every object has a toString() method that returns a string representation of the object. Numeric comparisons can only be made on numeric elements which can be selected efficiently with a selection iterator.

When the type of the data to be stored is unknown or may be of different types, clients may need to know the data type characteristics of the elements, such as the primary data type of the elements in a list. Adaptive data storage structures provide clients with information about the objects they contain, such as the most frequently occurring class. For example, this information can be used to signal to the user that a set of input values is invalid: a list with a high proportion of integral values may be acceptable, where non-integral values can be ignored; a low proportion of integral values might signal that an invalid set of data has been supplied.

The next section describes an example adaptive data storage structure.

4. An Example Adaptive Data Storage Structure

The following code shows a generic add element method for an adaptive data storage structure. The AdaptiveArrayList class extends the standard Java ArrayList class to override the add(Object o) method.

class AdaptiveArrayList extends java.util.ArrayList {
    /**
     * The object "classes" maps a Class object onto an Integer object
     * which records the number of occurrences of that class.
     */
    private Map classes = new HashMap();

    public boolean add(Object o) {
        o = getStorageObject(o);
        return super.add(o);
    }

    private Object getStorageObject(Object o) {
        // test for a String object
        if (o instanceof String) {
            String s = (String)o;
            try {
                // try to parse a floating point value
                double d = Double.parseDouble(s);

                /**
                 * get the integral version of the floating
                 * point value.
                 */
                int i = (int)d;

                /**
                 * test if there is a fractional part to the
                 * floating point value
                 */
                if (Math.abs(d - i) > 0) {
                    o = new Double(d);
                }
                else {
                    o = new Integer(i);
                }
            }
            catch (NumberFormatException e) {
                // test if the string represents a single character
                if (s.length() == 1) {
                    o = new Character(s.charAt(0));
                }
            }
        }
        updateClassCount(o.getClass());
        return o;
    }

    /**
     * Increment the count of the number of times an object of class c
     * is stored.
     */
    private void updateClassCount(Class c) {
        int count = 0;
        if (classes.containsKey(c)) {
            count = ((Integer)classes.get(c)).intValue();
        }
        count++;
        classes.put(c, new Integer(count));
    }

    // Return the map of Class objects to Integer objects.
    public Map getClassCount() {
        return Collections.unmodifiableMap(classes);
    }

    // Return the most frequently occurring Class of objects stored.
    public Class getPrimaryClass() {
        Set keySet = classes.keySet();
        List classCounts = new ArrayList(keySet.size());
        for (Iterator i = keySet.iterator(); i.hasNext();) {
            Class c = (Class)i.next();
            Integer count = (Integer)classes.get(c);
            classCounts.add(new ClassCount(c, count));
        }
        Collections.sort(classCounts);
        ClassCount head =
            (ClassCount)classCounts.get(classCounts.size() - 1);
        return head._getClass();
    }

    private class ClassCount implements Comparable {
        private Class _class;
        private Integer count;

        public ClassCount(Class c, Integer count) {
            _class = c;
            this.count = count;
        }

        public Class _getClass() {
            return _class;
        }

        public Integer getCount() {
            return count;
        }

        public int compareTo(Object o) {
            ClassCount classCount = (ClassCount)o;
            return count.compareTo(classCount.getCount());
        }
    }
}

The new add(Object o) method calls getStorageObject(Object o) to get the most efficient object for storing the data value. If the object to be stored is not a String, the object is returned. If it is a String, an attempt is made to parse it to create a floating point value stored as a double. If parsing is successful, the floating point value is tested to determine if it has a fractional part. If so, the floating point value is stored in a Double wrapper object, otherwise it is stored as an integral value in an Integer wrapper. If parsing was unsuccessful, the length of the string is tested to determine if it contains a single character and, if so, it is stored in a Character wrapper object.

The add(Object o) method of AdaptiveArrayList finishes by calling the add method of its super-class, ArrayList, to add the object to the list. Note that this overridden add method only changes the object to be added, it does not change the way the object is stored; in fact, it is completely unaware and unconcerned with the implementation details of how elements are stored. This enables the implementation of this method to be used for the add methods of other generic data structure objects such as those that implement the Map and Set interfaces.

To provide information about the classes of the objects it stores, AdaptiveArrayList objects maintain a count of the number of each class of object it contains. A Map object is maintained that maps a Class object onto an Integer object which stores the number of occurrences of that class, and can be obtained with a call to Map getClassCount(). Clients can determine the most frequently occurring class of object in the list with a call to Class getPrimaryClass().

References

  • Horstmann, C. S. and G. Cornell, Core Java 2: Volume 1—Fundamentals, Prentice Hall, 1999.