This article reviews the problems with current visualization software and develops four groups of requirements for visualization software: hardware and software requirements, user requirements, visualization researcher requirements, and visualization developer requirements. I believe that meeting these requirements will produce visualization software that will be effective, flexible, highly usable, and fast and easy to implement.
Hardware and software requirements deal with the range of software that contain visualizations and the variety of hardware platforms on which they run. User requirements concern the needs of the users of visualization software. Visualization researchers have requirements for designing, prototyping, and evaluating new visualization software. Finally, visualization developer requirements deal with the needs of programmers that implement visualization software. I believe that meeting these requirements will produce effective, flexible, and highly usable visualization software that is fast and easy to implement.
Section 2 introduces visualization and the people involved with it. Sections 3, 4, 5 and 6 describe the hardware and software, user, researcher, and developer requirements, respectively. Section 7 summarizes the four groups of requirements.
The amount of information produced is growing exponentially. Recent estimates indicate that approximately one million terabytes of data is generated annually worldwide, of which 99.9% is available only in digital form (Keim 2001). Repositories are swelling with financial, legal, bibliographical, geographical, political, and environmental information, to name a few.
Information is analyzed by commercial, governmental, and academic research institutions to shape policies and plan for growth and development. Information collected by businesses provides a valuable strategic asset. In particular, e-commerce businesses are able to collect an unprecedented amount of information about their customers’ on-line activities. Each page visited and re-visited, mouse click, and on-line purchase is recorded for analysis.
The need to process, understand, and make decisions using information has never been greater and this need will increase in the future. Visualization techniques provide powerful tools for displaying, analyzing, and exploring information.
Visualization uses a variety of techniques to present large amounts of information in a pictorial form so that it can be understood more easily. Comprehending a large amount of data can be an enormous cognitive task, but may be eased through visualization which exploits the highly developed human visual perceptual system.
The simplest forms of visualization use techniques from mathematics. Simple line graphs and histograms show numeric data such as patient temperatures and stock prices that change over time. Two- and three-dimensional scatter plots show clusters, trends, and outliers. More dimensions can be added by varying the size, shape, colour, and shading of each data point.
Data visualization is primarily concerned with understanding large amounts of numeric information, often physical phenomena with inherently spatial characteristics. Data visualization has many applications in science, engineering, and medicine; disciplines that commonly use 3D graphics and animation to visualize data. Magnetic resonance imaging (MRI), for example, produces 3D models of organs, bones, and teeth that medical professionals can rotate and slice. Data that change over time such as weather patterns and the turbulence over an aeroplane wing can be understood more easily using animation to show the changes as they happen.
Information visualization is concerned with understanding the structure of non-numeric information and its interrelationships. Examples of information visualization include network and hierarchical information structures, the spatial data used in geographical information systems, and the analysis of business transactions.
The power of a visualization can be increased by providing tightly coupled tools for exploring and analyzing the data. Visual exploration has enormous potential for revealing interesting patterns and relationships such as clusters, correlations, trends, dependencies, and exceptions. Sophisticated exploratory data mining and analysis systems can be created by enabling the user’s task knowledge and sophisticated decision making abilities to drive highly interactive visualization software.
So far, I have given a traditional definition of visualization. I extend this definition to include all visual displays of information such as interactive maps, product advertisements, and organization charts. Broadening the definition to accommodate a much wider range of visual information display problems provides a significant opportunity for supplying visualization solutions.
I am concerned with improving visualization software for three groups of people: users, researchers, and developers. Visualization users are the people that use visualization software to display, explore, and analyze their data. Visualization researchers design, prototype, and evaluate new visualization software. Visualization developers implement complete, robust, and polished visualization software, often using the results of visualization researchers. These three groups are not mutually exclusive. For example, a visualization developer might research and then implement new visualization software. A visualization user might be able to design a new visualization style that meets his or her needs better, but require a developer to implement it.
The next section begins the discussion of visualization software requirements with the requirements for hardware and software.
Visualization software needs to be incorporated into a variety of software applications and needs to be run on a range of hardware with vastly different resources.
Visualization software was previously only available on high end workstations. As processing power has increased and memory prices fallen, PCs and laptops are now also able to run sophisticated visualization software. Extending the definition of visualization to all visual information displays opens up the range of hardware on which visualizations need to be run to include less powerful PCs and resource limited mobile devices such as personal digital assistants (PDAs) and mobile phones.
Each new generation of mobile technology is becoming more powerful and is able to run a wider range of applications. For example, devices that provide in-car navigation using global positioning satellite (GPS) data are already available. GPS navigation devices use visualization to display the user’s current position overlaid on a map of the region.
Small devices such as PDAs require different modes of interaction than the hardware traditionally used with visualization applications. Virtual on-screen keyboards and styli replace hardware keyboards and mice. Such devices rely more on visual interaction which provides further opportunities for visualization applications.
A vast number of users already have mobile devices such as mobile phones. As mobile devices become more sophisticated, the opportunities for supplying visualization applications to such a vast user base is enormous. To capture the largest possible user base, visualization software needs to run on a scale of hardware such as, from most powerful to least powerful: workstations, PCs/laptops, PDAs, and mobile phones. A different version of a visualization application will be needed for each of these platforms.
Visualization is used in a wide variety of applications such as dedicated visualization software, networked software and web pages, and ordinary non-visualization software applications. Dedicated visualization applications are powerful workstation-based software packages that focus on tasks such as visualizing geographical data collected from satellites, complex mathematical simulations of nuclear reactions, and the results of MRI scans. Visualization software needs to support traditional workstation-based visualization applications.
The growth of the Internet and the development of the web and browser technology requires software to be networked. The uptake of ISDN and now broadband means that fast network connections are available to small business and domestic users as well as large companies and research establishments. Visualization software must be able to be run over networks and to visualize networked information. As network bandwidth increases, visualizations of data that change in real time will need to be updated in real time.
Visualizations are also becoming more common in ordinary non-visualization software. The term non-visualization refers to software that is not primarily designed to visualize information, such as word processors, spreadsheets, and file management programs, but use data visualization; examples include the disk utilization pie chart in the Windows Explorer application and the Disk Defragmentation application, both part of the Microsoft Windows operating system, currently the most widely used PC software. The Calendar application in the Pocket PC operating system for PDAs uses visualization to display large amounts of information on a limited screen with levels of detail. There is a significant opportunity for providing visualization software components that enable developers to add visualization to their software.
Visualization software needs to be able to be integrated into third party software and to expose APIs to enable developers to programmatically control all aspects of a visualization and to capture low level user interaction events such as mouse movements. Visualization software must also capture and generate high level visualization events such as data selection to fully support the development of visualization applications.
Visualization software needs to be able to scale up to large complex 3D scientific visualizations and to scale down to simple 2D information displays on resource limited mobile devices. Visualizations need to be scalable so that the most appropriate version of a visualization can be supplied for the platform on which it is to run. For example, a complex 3D visualization of a large amount of data is ideal for a workstation but would not be able to run on a PDA. Resource limited devices have far less memory and processing power than workstations and PCs so visualization scaling is required. Visualization scaling reduces the size and complexity of visualizations to enable them to be displayed by less powerful hardware platforms. For example, a complex 3D animated visualization of 10,000 data points for a workstation could be simplified to a 2D animated visualization of 2000 data points for a PC/Laptop, and further simplified to a static visualization with summaries of the data for a PDA. Another example of the need for visualization scaling is to run PC route finding software on PDAs and mobile phones. PDAs have smaller screens and less memory and processing power than PCs, and current mobile phones have smaller screens and less memory and processing power than PDAs. Visualization scaling would produce a simpler map of the region with fewer place names and landmarks for a PDA than for a PC, and an even simpler map for a mobile phone than for a PDA.
Users have a variety of requirements for visualization software. This section describes the need for different visualization styles, generic visualization and data analysis tools, and for the ability to tailor generic visualization software to meet the needs of specific visualization applications. This section concludes with the requirements for an integrated visualization software package that incorporates these requirements.
Users need to visualize many different types of information; seven of the most common are one-, two- and three-dimensional; temporal; multi-dimensional; tree; and network data. One-dimensional linear data types include textual documents, program source code, and alphabetical lists of names that are all organized sequentially. Two-dimensional planar data represent objects that have area, and examples include geographical maps, floor plans, and newspaper layouts. Three-dimensional data represent objects that have volume or 3D co-ordinates such as real world objects such as molecules, the human body, and buildings.
Temporal data record events that happen over time such as medical record timelines, project management schedules, historical records of events, and video editing sequences. Temporal data is separate from one-dimensional data because temporal items have a start and finish time.
Multi-dimensional data are often stored in relational and statistical databases. Data with n attributes are often represented as points in an n-dimensional space. Techniques such as multi-dimensional scaling can scale an n-dimensional space onto two- or three-dimensional spaces to make use of the visualization techniques that are available for these lower dimensional spaces.
Tree data represent hierarchies and tree structures that represent a collections of items that are connected to their parents. Networks represent data that cannot be represented by trees because they need arbitrarily complex interconnections between items.
There are also many variations of these seven basic data types, such as multi-trees where each item can be the root of another tree, and four-dimensional data that is represented by adding colour or different plotting symbols as an extra dimension to three-dimensional co-ordinates.
The amount and homogeneity of the data displayed by a visualization varies widely. Information displays such as an interactive product advertisement will present a relatively small amount of heterogeneous data such as price, dimensions, available colours, and the current stock level. In contrast, database visualizations will display a large number of homogeneously structured database records.
Visualization software needs to be able to represent all of the commonly displayed types of information, as well as their combinations and variants. Visualization software must also be able to represent large and small amounts of homogenous and heterogeneous data.
Visualization research shows that no visualization style is best for all data sets or tasks and several different styles can often be used to visualize the same data; for example, pie charts and histograms can summarize the same set of numeric data.
The selection of an appropriate style depends on several criteria such as the amount of data and its characteristics, the stage the user is at in a data exploration, and the experience and abilities of the user. The choice of visualization style is often constrained by the amount of information to be visualized. Some visualizations present all the data at once, others present an overview and require users to explore it to discover more details. Some styles are not suitable for large amounts of data.
The characteristics of the information can also influence the style of visualization. For example, scientific data visualization tends to model physical phenomena that are inherently three dimensional. Models of organs, bones, and teeth built from the results of MRI scans, and geographical models built from satellite data are naturally presented in 3D. Information visualization, on the other hand, models data that is more abstract and that does not naturally map onto a particular visualization style.
The stage the user is at in a data exploration can suggest which visualization style is best. Shneiderman’s (1996) information seeking mantra—overview, zoom and filter, then details on demand—supports this view. Visualizations such as maps and network graphs provide useful overviews of large data sets without cluttering up the display with the details of each data object. More detail can be presented when users zoom into interesting parts of the data using levels of detail: as the user gets closer to the information of interest, more detail is added; when the user zooms out, the details are hidden.
Other styles of visualization do not enable this progressive disclosure of detail; visualizations that are suitable for overviews may not be suitable for displaying more details when the user zooms in. For example, pie charts summarize numerical information but do not naturally fit into the progressive disclosure with zooming model. In such cases, different visualization styles need to be used to provide more detail. Several co-ordinated styles may also be used to provide overview and detail simultaneously; for instance, a network graph might provide a global overview of the departments in an organization that marks the user’s current position, and a tree might show a detailed view of the hierarchical organization of the current department.
The style of visualization can also depend on the experience and abilities of the user. Users new to a subject may prefer a general overview with gradual and progressive disclosure of the details; experts in the field may want to navigate as quickly as possible to an area of interest and then interrogate it. The needs of disabled users can also restrict the visualization styles that can be used. For example, partially sighted users may not be able to use visualizations with a large number of densely clustered data points; they may prefer summaries and different exploration tools to discover the details.
Visualization software needs to be able to represent data in as many different styles as possible and to enable users to control which styles are used during a data exploration session. Visualization software should enable different visualization styles to be combined to produce multiple co-ordinated views: a change in one view causes a corresponding change in another view. Users should be able to select data from a visualization and generate a new visualization of the selected data in the same style or a different style.
Many generic operations are commonly applied to visualizations. Shneiderman’s information seeking mantra suggests four basic operations: requesting an overview, filtering out uninteresting items, zooming into items of interest, and requesting more detailed information. This initial list can be extended to include a wide range of generic tools and operations that are common to all visualizations: navigating around an information display, searching for specific values, browsing for serendipitous discovery, partitioning the data into user-defined groups or groups suggested by the data, viewing the relationships between items of data, and automatic data analysis, such as cluster and statistical analysis.
Maintaining a history of reversible actions is a central feature of graphical user interfaces and is also important for visualization software. A history of reversible actions encourages experimentation because undesirable outcomes can be undone. Preserving sequences of exploration and analyzes enables them to be replayed to produce guided tours though a data set. Users should also be able to extract subsets of the data in a visualization. Subsets may be selected by the user or may be the results of an exploration or search. Subsets should be able to be saved, printed, and imported into other software.
These generic visualization tools can be used with most visualization styles. Some tools will be better than others for interacting with certain types of visualization and for completing certain types of tasks, but users should be able to choose which tool to use.
Visualization software should provide these generic tools to produce a consistent and familiar set of operations that will, once they have been learned, create a powerful visualization environment. All of the generic visualization and data analysis tools need to be tightly integrated with the visualizations to produce an effective exploratory data analysis environment.
Some visualization applications need specific data interrogation and analysis facilities that are not provided by the generic tools listed above. Users should be able to use a single integrated package to perform specialist visualization tasks as well as a variety of generic tasks. Visualization software should capitalize on the power of, and user familiarity with, the generic tools and be flexible enough to incorporate additional application specific tools. Using task specific visualization styles, generic visualization software can then be tailored to meet the needs of specialist visualization tasks. Visualization software should provide an environment in which new tools are automatically integrated with existing tools and are tightly integrated with the environment.
Users need an integrated visualization application that meets their requirements for visualizing different types and amounts of data in different visualization styles, that provides a generic set of visualization and data analysis tools, and that can be tailored to specific visualization tasks.
An integrated visualization application that provides a variety of styles and tools has the important advantage of a single interface for the user to learn. A single interface will reduce the time and effort required to learn how to use the software.
The goal of information visualization research is to develop rich visual interfaces to help users understand and navigate through complex information spaces that are often abstract, non-spatial, and highly dimensional with no natural physical mapping onto 2D or 3D spaces. Visualization researchers need to develop new visualization styles for presenting information and new tools for manipulating and exploring them.
Effective visualization software combines imaging, graphics, visualization, and human computer interaction. Visualization software is highly interactive and must undergo extensive usability evaluation to ensure that users are able to focus on their tasks rather than on the software. Rapid prototyping—an iterative cycle of prototyping and user evaluation—is an essential part of developing any usable and effective interactive system. Visualization software should support rapid prototyping to encourage experimentation with new visual interaction styles and exploration tools that will produce more usable and effective visualization software. Making rapid prototyping easier will broaden the range users and developers able to implement visualizations.
Implementing visualization designs requires specific skills; visualization designers may not have the necessary skills to be able to prototype their designs. Visualization software should reducing the complexity of prototyping to broaden the base of researchers that are able to prototype their designs. Even for skilled developers, lengthy development times and complex implementations make rapid prototyping prohibitively expensive. Reducing the development time and complexity will make extensive rapid prototyping easier.
Academic research tends to produce highly creative experimental prototypes. The prototype is the focus of the research so complete and robust software is rarely produced. Visualization software that enables researchers to develop complete and robust software from their prototypes will increase the potential for marketing the results of their research. Software for enabling new visualization styles to be produced quickly and easily from prototypes will also provide a competitive advantage for companies that develop visualizations.
Two of the main approaches to implementing visualizations are programming with graphics APIs and visualization toolkits and describing visualizations with textual graphics description languages. Each of these approaches has its advantages but they also have significant disadvantages. Visualization software that addresses the disadvantages will be a significant improvement over the visualization software that is currently available.
Visualization software can be implemented with the graphics APIs of programming languages such as Java, C++, C#, and Visual Basic. The advantage of developing a visualization from scratch is that developers have complete control over the implementation. Visualization applications can be built exactly to the requirements and application specific optimizations can be implemented. The main drawback is that implementing visualizations with graphics APIs is complex and time consuming which limits development to highly skilled developers. The complexity of the code and the time required to implement it, increases further when 3D, interactivity, real-time behaviors, and tightly integrated data exploration and analysis tools are required.
Two- and three-dimensional scenegraph toolkits have been developed to ease the development of graphic scenes. Scenegraph toolkits describe graphic scenes as a hierarchy of objects which simplifies the code and provides rendering optimizations. Scenegraph toolkits provide a higher level of abstraction than graphics APIs and are particularly useful for implementing zooming, animation, and levels of detail. Examples of scenegraph toolkits are Jazz/Piccolo (2D) and OpenGL and Java3D (3D).
Visualization toolkits have also been developed to help reduce the complexity and development time by providing commonly used visualization components. These include 2D and 3D information displays such as scatter plots, histograms, pie charts, and line graphs as well as sliders, radio buttons, and other interaction controls. These off the shelf components are highly parameterized and can be easily integrated into bespoke software. Although visualization toolkits make implementing existing visualization styles easier, they do not help to develop new styles. New styles must be developed with graphics APIs and scenegraph toolkits.
The complexity of implementing visualizations can limit designs to those which the designer has the ability to implement. Developers need experience with graphics APIs and scenegraph and visualization toolkits to be able to use them effectively. Projects that require visualizations often do not have developers with the necessary skills or experience to implement them. Reducing the complexity of producing visualizations will broaden the base of developers that will be able to implement powerful visualization software.
Visualizations are often implemented by describing them with textual graphics description languages such as the Virtual Reality Modelling Language (VRML), the 3D Modelling Language (3DML), and Scalar Vector Graphics (SVG), an XML application for describing graphic scenes. Description languages have a number of advantages over graphics APIs and toolkits for implementing visualizations: rendering software is provided, they use higher level concepts, generating visualizations is simple, and they have high compression ratios. The most significant advantage is that software to render scenes is already provided. Developers do not need to write rendering code so development time is reduced considerably. Rendering software, often called viewers, are usually distributed as a web browser plug-in or as a stand alone application.
Description languages often provide higher level concepts than graphics APIs that enable developers to focus on the visualization rather than the implementation details of the graphics. VRML, for example, enables complex colouring and lighting of 3D objects to be described without requiring the mathematical knowledge to be able to implement such effects.
Visualizations implemented with description languages can be easily generated by creating a text file with a text editor or as the output of a program. The level of programming ability required to implement a visualization with a description language is significantly less than that required to implement a visualization with a graphics API or toolkit—compare the code required to output a text file with the code required for a 3D interactive visualization. People with limited programming skills are able to implement their own visualizations with description languages.
Description languages have high compression ratios that enables large visualizations to be archived, shared among users, and sent over networks such as the Internet.
Description languages also have several significant drawbacks for implementing visualizations: a lack of visualization concepts, no preservation of data, a lack of flexibility, limited language extensions, and no support environment. Although graphics description languages often provide higher level concepts than graphics APIs, they are not designed for implementing visualizations. They are only able to express a limited number of the concepts that are present in visualizations. High level visualization concepts include layout algorithms, links between objects rather than co-ordinates, rule based behaviors, and real-time updates of visual presentations based on live data.
Visualizations created with description languages do not preserve the data that is visualized. The data is embodied in the visual presentation but the data itself is not contained in the textual description. This has two important implications for users. First, new visualizations cannot be generated from an existing visualization. For example, users may want to select interesting data points in a visualization and summarize them by producing a new visualization of the selections. Second, the data is not available for further analysis. Users are unable to verify that the visualization accurately describes the data or whether it hides aspects of the data which can occur with animated guided tours through data.
Visualization development with description languages is less flexible than programming with graphics APIs and scenegraph toolkits. Developers are constrained by what the language is able to describe, and by what APIs the rendering software exposes to enable it to be integrated into other software.
Textual graphics description languages are often adopted as standards and tightly controlled by the standards body so they cannot be extended. This eliminates incompatibilities caused by non-standard features not being available in all viewers. Prohibiting extensions ensures that visualizations written in a description language will be rendered correctly by any viewer that renders the language standard. Language extensions are often desirable and there is a dichotomy between inflexible standardized languages and flexible languages with non-standard features that inhibit compatibility. A middle ground is needed where description languages can be extended with a well specified extensions mechanism. Such a mechanism must be able to describe the extensions to the language, the parser, and to the rendering software. A graceful degradation system is also required to enable viewing software without an extension to be able to render a visualization.
The final disadvantage of description languages is that they enable visualizations to be described, but they do not support visualization. A supportive visualization environment should provide tightly integrated tools to navigate, explore, and analyze the information provided by a visualization. Although description language viewers such as the VRML viewer can render 3D scenes with complex lighting and shading, they are simply presentation tools. A limited set of navigation tools are usually provided but there are no tools for searching, exploring, or analyzing the information because description language viewers are not designed for this purpose.
Viewers can be extended to provide extra facilities to support visualization but they must be explicitly programmed. Description language viewers for VRML, for example, expose APIs to enable them to be integrated into a visualization environment, but integration is cumbersome because the APIs were an afterthought: VRML was not originally designed to be programmatically controlled so the concepts in the language do not facilitate easy integration with other software. Implementing extra facilities requires the skills and experience required to implement visualizations with graphics APIs and toolkits.
The following points summarize the requirements for visualization software discussed in this article.