Topological data analysis and UNICEF poverty surveys
Recent years have seen an emergence of new and innovative techniques designed to better handle the increasing amount and complexity of data. One of these is topological data analysis (TDA), a method where one views data as point clouds with points thought of as samples taken from a topological space.
Recent years have seen an emergence of new and innovative techniques designed to better handle the increasing amount and complexity of data. One of these is topological data analysis (TDA), a method where one views data as point clouds with points thought of as samples taken from a topological space. The geometry is provided by a notion of a metric on the data. The metric can be defined in various ways, but once a useful one is selected, the data cloud inherits the structure of a topological space and can be studied using the techniques of algebraic topology. One can in this way obtain information about the “shape” of the data cloud and this information can in turn often be used to infer something new about the correlations or patterns in the data.
Wellesley College student Jun Ru Andreson is applying TDA to a MICS (Multiple Indicator Cluster Survey) studies, funded and conducted around the world by UNICEF. The importance of MICS surveys is that the data is compiled into the country’s wealth index, a number that in some way indicates the country’s socioeconomic status and outlook. Wealth index is one of the most important economic markers and often plays an important role in determining policy. MICS studies are performed in more than a hundred countries and have important political and social ramifications. Any new insight into the data can consequently also be transformative.
Anderson is looking for interesting new features of the data using the persistent homology and the Kepler Mapper graph. She has for example discovered new relationships between immunization of children and the wealth index that the standard statistical analysis did not see and is investigating the meaning of shortest paths in the data cloud as a measure of economic mobility.