Kernel Density Estimation: an image based technique

Télécharger le fichier original (Mémoire de fin d’études)

Data Exploration with data density maps

In this chapter I will detail research mainly performed with FromDaDy regarding data exploration with density map computation. This work tried to address the exploration of dense datasets. Since data density hinders data exploration with numerous overlapping visual marks, I focused part of my research on this topic.
The use of the popular scatterplot method (Cleveland, 1993) is not sufficient to display all information because a lot of overlapping occurs. When transforming data to graphical marks, a regular visualization system draws each graphical mark independently from the others: if a mark to be drawn happens to be at the same position as previously drawn marks, the system replaces (or merges using color blending) the pixels in the resulting image. The standard visualization of this pixel accumulation process is not sufficient to accurately assess their density. For instance Figure 12 left shows one day of recorded aircraft trajectories over France with the standard color blending method. Figure 12 right shows the same dataset with a 3D and shaded density map and one can easily perceive that the data density is drastically higher over the Paris area which is not that obvious with the standard view.
I investigated this density computation algorithm with a hardware-accelerated extension of FromDaDy (Hurter et al., 2009b) to support the exploration of aircraft trajectories (Hurter et al., 2010b) with the Kernel Density Estimation (Silverman, 1986).

Kernel Density Estimation: an image based technique

Kernel Density Estimation (KDE) (Silverman, 1986) is a mathematical method that computes density by a convolution of a kernel K (Figure 13: Kernel profiles) with data points. This method produces a smooth data aggregation which also reduces data sampling artefacts and is suitable for showing an overview of amounts of data.The density 𝜌 can be visualized as a 2D height field by a straightforward color map, contour plot, or terrain map (Figure 12). Landscape visualization with hills and valleys have been shown to be easy to interpret (Wise et al., 1995). For quantitative analysis, a contour plot is preferred over a colormap, since value estimation by colors is perceptually difficult. Since contour plots only use isolines, color can be used for other purposes. In 2D, the density plot becomes visually more detailed by using shading and can be enriched to a contour map (van Wijk and Telea, 2001). KDE maps can be interactively explored and modified (Van Liere and Leeuw, 2003). The KDE algorithm has also been used to investigate objects movements (Willems et al., 2009), (Scheepens et al., 2011).The initial version of FromDaDy computed the density map and stored it in an off-screen buffer with additive blending enabled. To achieve this, fromDaDy used a 32-bit floating color texture. More recent versions use OpenCL with the full support of the convolution process with tabular data. To ensure data integrity with the multi-threaded data processing, FromDaDy uses atomic functions. This GPU implementation allows interactive manipulation and visualization of the density map. The computation time varies with the number of points in the data set and the Kernel size. As an example, the frame rate is around 10 frames per second with 400 000 points and a Kernel point size of 20 pixels and a Nvidia GTX 275 graphic card.

Density map visualization and interaction techniques

In the following, I describe a set of interaction and visualization techniques with density maps to perform interactive data exploration. Thanks to a GPU implementation the users can interact in real time with the density map and the process is divided into three steps:
 Users can choose which data dimensions to accumulate, and can adjust the kernel size,
 Users can “brush, pick and drop” data to remove them from, or add them to the density map,
 Users can explicitly choose to use the computed density values with one of the available design customizations (color, size or position).

Brushing Technique with density maps

Originally, FromDaDy supported the brushing of trajectories with their spreading across views. This interaction helps to select an entire trajectory with the brushing of only few points, but in certain cases, the data exploration requires only parts of trajectories. We added the brushing of points, which allows the selection and manipulation of points. The user can brush in the standard view or in the accumulator view. The user uses a size configurable round shape to brush the view to selected trajectories or points (Figure 15).
Thanks to the brushing technique, the user can select and highlight parts of the displayed data. By pressing the space bar, the user can extract previously selected data and attach them to the mouse cursor. By default, the selected data are picked: they are removed from the view, and appear in a “fly-over” view. When the user presses the space bar for the second time, a drop occurs in another view under the cursor. Although it resembles a regular drag’n’drop operation, we prefer to use the term “pick’n’drop” (Rekimoto, 1997) in the sense that data is removed from the previous view and is attached to the mouse even if the space bar is released.
We also added the pick and drop paradigm to the accumulator map. Figure 16 shows the difference between the point and the trajectory mode. With the point mode only the brushed points are selected and isolated. With the trajectory mode, brushing points also selects their entire trajectory.
Figure 16: Points or pick and drop trajectories on an density map.
The pick and drop of brushed data, from the accumulation map to another view, is useful during the exploration process for three reasons:
 It helps to isolate data to perform separate analyses,
 In the trajectory mode, the brush selects entire trajectories. When picking these trajectories, a new accumulation map is computed and unveils new accumulation initially hidden by the picked trajectories (image d in Figure 16),
 With the point and the trajectory mode, FromDaDy uses the full gradient scale in such a way that the minimum accumulation value has the first gradient color and the maximum accumulation value has the last gradient color. When brushing/picking and dropping points with minimum or maximum accumulation value, FromDaDy computes a new accumulation map that unveils a new maximum value with the maximum gradient color and then unveils new patterns (comparison of Figure 15 and Figure 16).

Interactive lighting direction

In order to compute the shaded density map, one can consider it as a height map and use the standard Phong light computation(Phong, 1975). Since this technique needs a normal vector, a normal map can be generated thanks to the computation of the gradient of the density map. The user can choose to display the accumulator map with, or without, this shading and interactively set the lighting direction with the mouse pointer. High accumulation values are considered as mountains that create shade, and low accumulation values are considered as valleys. By pointing with the mouse pointer to a specific area, the lighting direction can be interactively manipulated. This manipulation allows furrows or ridges to be emphasized. When defining the lighting direction from the left or from the right, vertical furrows are accentuated whereas a lighting direction from the top or bottom emphasizes horizontal furrows (Figure 17).

Density maps as data sources

The color blending process computes an implicit density map since it combines pixels with the following blending formula:
OutputPixel = SourcePixel × SourceBlendFactor + DestinationPixel × DestinationBlendFactor; with 4D color vector (r,g,b,a) and the × symbol denotes component-wise multiplication.
Even if this formula can be customized, the pixels produced do not always provide an efficient quantitative comparison of the accumulation value. In Figure 18, the visualized data base is a one day record of aircraft flight plans (the routes that aircraft are supposed to follow). The view shows a matrix of points with the aircraft departure airport on the X axis and the aircraft type (Boing 747, Airbus A380…) on the Y axis. Since many aircraft have the same departure airport and aircraft type couple, many points in the matrix have the same location on the screen. In the standard blended view, the brighter points show the most frequent couple in one day’s traffic (Figure 18 left).
Figure 18: Matrix view with standard color blending (left) and customized visual mapping with the size (right).
FromDaDy offers another solution with a specific visual mapping. First, the user defines the data fields he or she wants to investigate (departure, aircraft type). Secondly, the system computes the corresponding density map. Finally the user defines the visual mapping of the output view. In Figure 18, the density is mapped to the size and the color. Figure 19 summarizes this configuration. This process operates as if a new field was provided into the dataset. The computed density map acts as a new data source.

Application domains

In this section I will give some examples of usage in specific application domains. Additional examples are provided in a survey dedicated to air traffic control (Hurter et al., 2014a).

Exploration of aircraft proximity

The main activity of air traffic controllers consists of maintaining safe distances between aircraft by
giving clearances to pilots (heading, speed, or altitude orders). However, when aircraft fly below the
safety distance, an alarm is triggered. These alarms are common since Air Traffic Controllers supervise aircraft in dense areas. Nevertheless they are all monitored to avoid aircraft collision. In this example, the dataset contains only safety distance alarms with the recorded aircraft positions. The user connects the X and Y position of each aircraft to the X and Y density entries. The computed accumulation is visualized with a blue (low accumulation values) to red (high accumulation values) color scale.
Figure 20. Density map of the safety-distance alarms triggered over France over a one year period. Red colored areas correspond to dense alarm areas where aircraft triggered proximity alerts
Paris is of course the main dense area with the largest proportion of alarms. However, when visualizing all these alarms over a one year period, users discovered that some unexpected dense areas emerge
(Figure 20): for example Montpellier, which is a far smaller airport, shows a lot of alarms. The user can use the selective brushing to retrieve the exact number of alarms.

Patterns detection in dense datasets

This first example shows how density maps and their height map visualization can be used to isolate relevant aircraft trajectories. With the recording of one day’s flights over France, each datum represents the position of an aircraft at a certain time. The corresponding density map (Figure 21) is the result of the accumulation of plots with a triangular kernel. Hence, the X and Y position of each plot is mapped on the X and Y dimensions of the density map and on the X and Y dimensions of the resulting image. The image produced shows very dense areas over the main airports in France (Roissy, Orly, Lyon…), which was expected.
Figure 21: Design configuration and accumulation maps without shading.
When visualizing the density map with illumination, circular shapes emerge (Figure 21, right) that were not initially noticeable (Figure 21, left). Thanks to the shading process, density gradients are emphasized and this is the reason why these circular hills stand out. The user can then brush these shapes to extract the aircraft that cause this accumulation of data recording. Thus, the user brushes the hills and drops these data onto a second view. The user discovers that the picked trajectories correspond to stationary radar test plots recorded throughout the whole day. These radar test plots are mandatory to assess the correctness of the whole Radar data processing (merging of multiple radar sources)(Renso et al., 2013).

Density flaw detection in a dense dataset

In this example, we use the data density as a tool to highlight flaws in the dataset. The dataset is a one-day record of aircraft positions. Radars send data over networks with a constant stream rate (in our dataset, one radar position of each aircraft every 4 to 8 minutes). Figure 22 shows the content of our dataset. The X screen axis shows the time of each radar plot and the Y screen axis shows the aircraft’s identifier. Since the identifier of each aircraft is a number incremented over the day, the resulting shape shows a remarkable continuous pattern in which each horizontal line represents the lifetime of one flight (each flight has a unique identifier). The longest lines at the bottom of the visualization are the stationary Radar test points recorded all day long. The width of this shape gives the average flight duration in the dataset: it is about 2 and a half hours which represents the average time to cross France by airplane (most aircraft cross France at a high altitude).

Table des matières

1 Introduction
1.1 Image-based assets
1.2 Image-based algorithm opportunities
1.3 Structure of the presented document
1.4 Timeline of projects and student advisory
2 From visualization characterization to data exploration
2.1 Evaluation of visualizations
2.2 Application domain
2.2.1 Instance of design evaluation: the radar comet
2.3 The Card and Mackinlay model improvements
2.4 Characterization or data exploration tool
2.5 FromDaDy: from data to Display
2.6 Conclusion
3 Data Exploration with data density maps
3.1 Kernel Density Estimation: an image based technique
3.2 Density map visualization and interaction techniques
3.2.1 Brushing Technique with density maps
1.1.1 Interactive lighting direction
3.2.2 Density maps as data sources
3.3 Application domains
3.3.1 Exploration of aircraft proximity
3.3.2 Patterns detection in dense datasets
3.3.3 Density flaw detection in a dense dataset
3.3.4 Exploration of gaze recording
3.4 Conclusion
4 Edge bundling
4.1 MoleView
4.2 SBEB: Skeleton-based edge bundling
4.3 KDEEB: Kernel Density Edge Bundling
4.4 Dynamic KDEEB
4.5 3D DKEEB
4.6 Directional KDEEB
4.7 Conclusions
5 Animation as an efficient data exploration tool
5.1 From the Mole View to Color Tunneling: the animation as an data exploration tool
5.2 GPGPU usages to address scalability issues
5.2.1 GP/GPU technique and history
5.2.2 Instances of GPU usages
5.3 Color Tunneling: a scalable solution to large dataset manipulation with image based interaction
5.4 Conclusions
6 Strip’TIC: Striping for tangible interface for controllers
6.1 Strip’TIC and image based techniques
6.2 Conclusion
7 Research program
7.1 Computed graphic and raster data
7.2 Raster data inaccuracy
7.3 Technical challenges
7.4 Personal image based road map
7.4.1 Tasks
7.4.2 Dynamic graphs
7.4.3 Algorithm setting
7.4.4 Bundling faithfulness and accuracy
7.5 Proposal to improve bundling techniques
7.5.1 Particle system
7.6 Image based algorithm in application domains
7.6.1 Cognitive maps
7.6.2 Eye tracking
7.6.3 Image processing: skin cancer investigation
7.6.4 Point cloud display
7.6.5 Movement data analysis
7.1 Conclusion
8 Bibliography
9 Selected research papers