Improving Object Storage and Access Efficiency with Adaptive Data Storage Structures
Storing string representations of primitive data type values uses more memory than storing them as primitive data type values. Adaptive data storage structures minimize storage by determining which primitive data type is represented by a string, converting it into a value of that type, and storing it in the most appropriate object wrapper. Identifying the type of each value at storage time enables fast and efficient identification of objects of different classes when performing tasks such as searching, comparisons, and calculations. Adaptive data storage structures provide information about the objects they stores, such as the most frequently occurring class.
1. Introduction
When the data type of a set of values to be stored is unknown at compilation time, or may be of different types, a generic data structure is required. In Java, generic data structure objects such as those that implement the List
, Set
, and Map
interfaces, can store objects of any class derived from the class Object
(Horstmann & Cornell 1999). This feature makes them very flexible and enables a variety of object types to be stored in the same data structure.
In some cases, the values to be stored are string representations of primitive data types, such as integral and floating point values. Values marked up in XML, for example, are returned as strings from an XML parser. A simple solution is to store the values as String
objects in a generic data structure such as a list.
Storing string representations of values with a primitive data type is inefficient because, in general, the string representation of a primitive data type value stored in a String
object requires more bytes than the corresponding primitive type object wrapper such as Integer
or Double
. There is also a significant overhead in storing a single character in a String
object; the implementation of the String
class uses three int
variables for internal housekeeping tasks.
2. Storage Requirements Analysis
The following example highlights why string representations use more memory than the equivalent primitive data type value. Assume that one character is stored in one byte and one integer is stored in four bytes. The string representation of “83742” stored in a String
object takes 5 bytes for the characters 8, 3, 7, 4, and 2, and 12 bytes for the three housekeeping int
variables, making a total of 17 bytes. The int
representation of 83742 takes only four bytes.
Tests were conducted on 1000, 10000, and 500000 objects to determine the difference between the amount of memory required to store values of the primitive data types char
, int
, and double
, in the corresponding object wrappers Character
, Integer
, and Double
, and their equivalent string representations stored in String
objects. The following points summarize the findings:
- 120% more memory is required to store a single character as a string in a
String
object rather than as a single character in aCharacter
object. - 200% more memory is required to store the maximum
int
value, 2147483647, as a string in aString
object than as anint
in anInteger
object, and 160% more to store 123 in aString
than anInteger
. - 308% more memory is required to store the maximum
double
value, 1.7976931348623157E308, in aString
than in aDouble
, and 197% more to store 123.456 in aString
than aDouble
. - For the three primitive data types
char
,int
, anddouble
, the memory saved by storing them as primitive data type values in wrapper objects rather than as strings ranges from 120% to 308%.
The following tables summarize the average difference in storage required.
Strings
The following table shows the number of megabytes required to store characters in String
objects rather than as the primitive char
data type wrapped in a Character
object.
Number of Objects | Ave. % Difference | |||
---|---|---|---|---|
1000 | 10000 | 500000 | ||
String "c" | 44072 | 440040 | 21988840 | |
Character 'c' | 20072 | 200040 | 9988840 | |
% Difference | 119.5695 | 119.976 | 120.1341 | 119.8932 |
Integers
The following tables show the number of megabytes required to store two integers in String
objects rather than as the primitive int
data type wrapped in an Integer
object.
Number of Objects | Ave. % Difference | |||
---|---|---|---|---|
1000 | 10000 | 500000 | ||
String "2147483645" | 60072 | 600040 | 29989080 | |
Integer 2147483645 | 20072 | 200040 | 9988840 | |
% Difference | 199.2826 | 199.96 | 200.2259 | 199.8228 |
Number of Objects | Ave. % Difference | |||
---|---|---|---|---|
1000 | 10000 | 500000 | ||
String "123" | 52072 | 520120 | 25988840 | |
Integer 123 | 20072 | 200040 | 9988840 | |
% Difference | 159.4261 | 160.008 | 160.1788 | 159.8709 |
Floating Point Numbers
The following tables show the number of megabytes required to store two floating point numbers in String
objects rather than as the primitive double
data type wrapped in a Double
object.
Number of Objects | Ave. % Difference | |||
---|---|---|---|---|
1000 | 10000 | 500000 | ||
String "1.7976E308" | 87240 | 843952 | 41993336 | |
Double 1.7976E308 | 22672 | 202640 | 9991440 | |
% Difference | 284.7918 | 316.4785 | 320.2931 | 307.1878 |
Number of Objects | Ave. % Difference | |||
---|---|---|---|---|
1000 | 10000 | 500000 | ||
String "123.456" | 61016 | 601304 | 29992984 | |
Double 123.456 | 20952 | 200920 | 9989720 | |
% Difference | 191.218 | 199.2753 | 200.2385 | 196.9106 |
3. Adaptive Storage Data Structures
A more efficient solution is to determine the primitive data type of the value to be stored, and store it as an object wrapper for that primitive type. For example, the string value “83742” represents an integral value that would be stored more efficiently as an Integer
object that wraps the int
primitive data type.
All generic data structures such as lists, sets, and maps can be extended to become adaptive data storage structures by adding methods for storing string representations of primitive data types in the most appropriate object wrapper for that type.
Apart from lower storage requirements, storing numeric values in the most appropriate representation has three other advantages: numeric values can be identified efficiently, different views of the data can be provided, and information about the type of objects can be provided. Adaptive data storage structures enable numeric values to be identified efficiently. String values do not need to be parsed into numeric values when they are needed for calculations or comparisons because the numeric values are already stored as numeric objects. The parsing overhead is incurred when the data is stored rather than when it is used which can improve the responsiveness of interactive applications, especially when large amounts of data are involved.
Applications often need to provide different views of data when, for example, displaying the values for users to search and browse. Literal string matching can be performed on all of the elements because every object has a toString()
method that returns a string representation of the object. Numeric comparisons can only be made on numeric elements which can be selected efficiently with a selection iterator.
When the type of the data to be stored is unknown or may be of different types, clients may need to know the data type characteristics of the elements, such as the primary data type of the elements in a list. Adaptive data storage structures provide clients with information about the objects they contain, such as the most frequently occurring class. For example, this information can be used to signal to the user that a set of input values is invalid: a list with a high proportion of integral values may be acceptable, where non-integral values can be ignored; a low proportion of integral values might signal that an invalid set of data has been supplied.
The next section describes an example adaptive data storage structure.
4. An Example Adaptive Data Storage Structure
The following code shows a generic add element method for an adaptive data storage structure. The AdaptiveArrayList
class extends the standard Java ArrayList
class to override the add(Object o)
method.
The new add(Object o)
method calls getStorageObject(Object o)
to get the most efficient object for storing the data value. If the object to be stored is not a String
, the object is returned. If it is a String
, an attempt is made to parse it to create a floating point value stored as a double
. If parsing is successful, the floating point value is tested to determine if it has a fractional part. If so, the floating point value is stored in a Double
wrapper object, otherwise it is stored as an integral value in an Integer
wrapper. If parsing was unsuccessful, the length of the string is tested to determine if it contains a single character and, if so, it is stored in a Character
wrapper object.
The add(Object o)
method of AdaptiveArrayList
finishes by calling the add
method of its super-class, ArrayList
, to add the object to the list. Note that this overridden add
method only changes the object to be added, it does not change the way the object is stored; in fact, it is completely unaware and unconcerned with the implementation details of how elements are stored. This enables the implementation of this method to be used for the add
methods of other generic data structure objects such as those that implement the Map
and Set
interfaces.
To provide information about the classes of the objects it stores, AdaptiveArrayList
objects maintain a count of the number of each class of object it contains. A Map
object is maintained that maps a Class
object onto an Integer
object which stores the number of occurrences of that class, and can be obtained with a call to Map getClassCount()
. Clients can determine the most frequently occurring class of object in the list with a call to Class getPrimaryClass()
.
References
- Horstmann, C. S. and G. Cornell, Core Java 2: Volume 1—Fundamentals, Prentice Hall, 1999.