Estimating Correlation from TDA Data Structure

ReNom ｜ ReNomTDA

“ReNom TDA” is a module for mapping, visualizing, and analyzing high dimensional data into topological space (space without distance with topology added). By understanding the data structure and intuitively grasping the relationship between variables, ReNom TDA can help data analysis engineers with modelling. ReNom TDA can also be used not only for preprocessing data and grasping data structure, but also for advanced profiling tool. For example, by visualizing the connection between complex data, users can analyze various data such as customer data analysis, machine data analysis, financial analysis, unauthorized access, and cyber security analysis.

Visualizing Data Using Topological Geometry

Topological geometry is the field of mathematics that focus on the property of a shape even, when it deforms continuously, and the connection it has in topological space, where distance has no importance. Unlike traditional geometry, length and angle are not considered as features, thus the expression that the data has can differ from tradional method. By projecting
data to toplogical space, features that are lost during dimentional reduction can be obtained.

Estimating Correlation by Comparing Objective Variable and Explanatory Variable

Topological geometry is the field of mathematics that focus on the property of a shape even, when it deforms continuously, and the connection it has in topological space, where distance has no importance. Unlike traditional geometry, length and angle are not considered as features, thus the expression that the data has can differ from tradional method. By projecting data to toplogical space, features that are lost during dimentional reduction can be obtained.

Resolving Problems that Occur in Traditional Dimensional Reduction

Dimensional reduction methods, such as PCA, reduce dimensions of data by
projecting data on the axis that can keep the data volume as much as possible.
Therefore, if dimensions of data with multidimensional variables are reduced,
information of variables not following the axis are lost.

What does TDA do ?

“ReNom TDA” can reduce dimensions while maintaining the connections of data in high dimensions, thus data characteristics which were hard to visualize in traditional methods, can be visible.

Comparison of various Algorithms with ReNom TDA GUI

With GUI-based web applications, users can compare results based on different algorithms using ReNom TDA. For example, users can compare TDA result – which was acquired from a certain dimensional reduction – with other dimensional reduction methods. Moreover, parameters in the GUI can be selected to run unsupervised classification such as K-means, or supervised classification such as K-Nearest Neighbor, or Random Forest.

The up column of the left figure is the visualization result of “iris classification data” that has undergone dimension reduction through PCA, TSNE, and AutoEncoder. Each point corresponds to a label. The figures in the middle two columns are the results of two unsupervised clustering methods
known as K-means and DBSCAN on the same data. Each color corresponds to class labels based on clustering criteria. From the result, it is clear that classification using both K-means and DBSCAN – both with PCA- did
not work. A rough correct classification has been acquired using K-means based on TSNE.AutoEncoder+DBSCAN was not able to classify properly. On the other hand, the points that have undergone TDA have already been grouped into one class with yellow color appearing between the red and green sections, near the border of the clusters which represents existence of multiple mixed class data. Instead of clarifying the classification boundary, ReNom TDA gives users opportunity to think about the meaning and classification of such data.

ReNom TDA GUI

Users can visualize the graph structure of the TDA result with web application, by reading CSV file and setting the parameters, thus visualizing and analyzing data can be conducted without programming.

Python API

Users can make comparisons with web applications and make configurations in more detail by utilizing Python module. In addition, point cloud can be created by combining multi-dimension reduction techniques