Capture and reconstruction of the topology of undirected graphs from partial coordinates: a matrix completion based approach
Ramasamy, Sridhar, author
Jayasumana, Anura, advisor
Paffenroth, Randy, committee member
Ray, Indrajit, committee member
Pasricha, Sudeep, committee member
With the advancement in science and technology, new types of complex networks have become common place across varied domains such as computer networks, Internet, bio-technological studies, sociology, and condensed matter physics. The surge of interest in research towards graphs and topology can be attributed to important applications such as graph representation of words in computational linguistics, identification of terrorists for national security, studying complicated atomic structures, and modeling connectivity in condensed matter physics. Well-known social networks, Facebook, and twitter, have millions of users, while the science citation index is a repository of millions of records and citations. These examples indicate the importance of efficient techniques for measuring, characterizing and mining large and complex networks. Often analysis of graph attributes to understand the graph topology and embedded properties on these complex graphs becomes difficult due to causes such need to process huge data volumes, lack of compressed representation forms and lack of complete information. Due to improper or inadequate acquiring processes, inaccessibility, etc., often we end up with partial graph representational data. Thus there is immense significance in being able to extract this missing information from the available data. Therefore obtaining the topology of a graph, such as a communication network or a social network from incomplete information is our research focus. Specifically, this research addresses the problem of capturing and reconstructing the topology of a network from a small set of path length measurements. An accurate solution for this problem also provides means of describing graphs with a compressed representation. A technique to obtain the topology from only a partial set of information about network paths is presented. Specifically, we demonstrate the capture of the network topology from a small set of measurements corresponding to a) shortest hop distances of nodes with respect to small set of nodes called as anchors, or b) a set of pairwise hop distances between random node pairs. These two measurement sets can be related to the Distance matrix D, a common representation of the topology, where an entry contains the shortest hop distance between two nodes. In an anchor based method, the shortest hop distances of nodes to a set of M anchors constitute what is known as a Virtual Coordinate (VC) matrix. This is a submatrix of columns of D corresponding to the anchor nodes. Random pairwise measurements correspond to a random subset of elements of D. The proposed technique depends on a low rank matrix completion method based on extended Robust Principal Component Analysis to extract the unknown elements. The application of the principles of matrix completion relies on the conjecture that many natural data sets are inherently low dimensional and thus corresponding matrix is relatively low ranked. We demonstrate that this is applicable to D of many large-scale networks as well. Thus we are able to use results from the theory of matrix completion for capturing the topology. Two important types of graphs have been used for evaluation of the proposed technique, namely, Wireless Sensor Network (WSN) graphs and social network graphs. For WSN examples, we use the Topology Preserving Map (TPM), which is a homeomorphic representation of the original layout, to evaluate the effectiveness of the technique from partial sets of entries of VC matrix. A double centering based approach is used to evaluate the TPMs from VCs, in comparison with the existing non-centered approach. Results are presented for both random anchors and nodes that are farthest apart on the boundaries. The idea of obtaining topology is extended towards social network link prediction. The significance of this result lies in the fact that with increasing privacy concerns, obtaining the data in the form of VC matrix or as hop distance matrix becomes difficult. This approach of predicting the unknown entries of a matrix provides a novel approach for social network link predictions, and is supported by the fact that the distance matrices of most real world networks are naturally low ranked. The accuracy of the proposed techniques is evaluated using 4 different WSN and 3 different social networks. Two 2D and two 3D networks have been used for WSNs with the number of nodes ranging from 500 to 1600. We are able to obtain accurate TPMs for both random anchors and extreme anchors with only 20% to 40% of VC matrix entries. The mean error quantifies the error introduced in TPMs due to unknown entries. The results indicate that even with 80% of entries missing, the mean error is around 35% to 45%. The Facebook, Collaboration and Enron Email sub networks, with 744, 4158, 3892 nodes respectively, have been used for social network capture. The results obtained are very promising. With 80% of information missing in the hop-distance matrix, a maximum error of only around 6% is incurred. The error in prediction of hop distance is less than 0.5 hops. This has also opened up the idea of compressed representation of networks by its VC matrix.
Includes bibliographical references.
Includes bibliographical references.
wireless sensor networks