Visualization of graphs and trees is an important part of a data analyst’s toolkit, whether the goal is to visualize individual data points or to explore the overall structure of a large dataset. There are dozens, if not hundreds, of visualization options available, some of which have more capabilities than others. Among those that I use regularly are network displays and mapping, which show how individual pieces in a data set are connected to one another.
Often the first step in visualization of a large dataset is to map it out into a tree. This helps users understand the structure of a dataset and can be useful for differentiating between groups.
However, this method can be cumbersome for very large data sets. It requires a lot of scrolling to display all the elements and can be difficult for users to interpret depth or encoding additional attributes.
Tree simplification algorithms
There are several tree simplification algorithms that can be applied to a dataset and are designed to improve the visualization by making it easier to read. Some are better performed as a last step, while others can be used at any stage. The order of application is critical to the results and can determine whether the simplifications will improve the visualization or make it worse.
When compared to other approaches, the GFF approach is particularly appealing because it allows the user to focus on relationships between features without being distracted by the labels that usually represent them. This allows the user to investigate relationships that are weak or distant from the target label.
Unlike Attribute-RadViz, which emphasizes the relationship between an attribute and a label (often the most relevant for a feature), the GFF approach focuses on the relationships between the features. This is done by drawing edges between the attributes in a graph and their weighted edge values.
To achieve this, a similarity graph is constructed having all the attributes in the dataset as vertices. The weighted edge value of each of the vertices represents the degree to which the attribute is similar to the other vertices on that edge. The resulting graph, which can be visualized in various ways, has the property that the most similar items are placed in neighboring branches and the least similar ones in the middle.
It is important to note that the weights of a given edge may change over time. This can be caused by a change in the value of a corresponding attribute or by the introduction of new attributes.
In addition, it is important to avoid overuse of colors in tree diagrams as this can cause confusion. In particular, it is best to avoid using both red and green in the same treemap as this can lead to confusion when users attempt to distinguish between groups of data points with similar colors.
The visualization of a graph is an extremely powerful tool for helping analysts understand the relationships and implications of their data. This is especially true when the graph has a complex structure and multiple dimensions.