Extracting knowledge from networks

Murata laboratory specializes in research on artificial intelligence and Web mining. There are several kinds of data that can be represented as networks or graphs, and we believe that mining networks is important for extracting knowledge from the networks. Our recent activities include: (1) detecting dense sub-networks from given networks (community detection), (2) finding important nodes in given networks (ranking), and (3) predicting future growth of given networks (link prediction).

Link Mining

Predicting relations between words in Semantic Web

Predicting relations between words is important for building large-scale Semantic Web such as Knowledge Graph. We have developed Deep Neural Network (called RDFDNN) for the purpose of accurate prediction of relations between given subject and object in Resource Description Framework(RDF). RDFDNN predicts relations more accurately than previous methods such as TransE and TransR.

Yohei Onuki, Tsuyoshi Murata, Shun Nukui, Seiya Inagi, Xule Qiu, Masao Watanabe and Hiroshi Okamoto, "Predicting relations between RDF entities by Deep Neural Networks" (in Japanese), 41st meeting of Special Interest Group on Semantic Web and Ontology (SWO), SIG-SWO-041-02, 8 pages, 2017. (JSAI Incentive Award 2016)

Tsuyoshi Murata, Yohei Onuki, Shun Nukui, Seiya Inagi, Xule Qiu, Masao Watanabe, and Hiroshi Okamoto, "Predicting relations between RDF entities by Deep Neural Network", Workshop on Semantic Deep Learning (SemDeep), 12 pages, collocated with ESWC 2017, 2017.(SemDeep Best Paper Nomination)

Yohei Onuki, Tsuyoshi Murata, Shun Nukui, Seiya Inagi, Xule Qiu, Masao Watanabe, and Hiroshi Okamoto, "Predicting relations of embedded RDF entities by Deep Neural Network", The 16th International Semantic Web Conference, poster (4 pages), 2017.

Codes are available in GitHub.

Transductive classification on heterogeneous networks

We propose a method for transductive classification on heterogeneous networks composed of multiple types of vertices (such as papers, authors and conferences). When a network and the labels of some vertices are given, transductive classification is used to classify the labels of the remaining vertices. Based on novel definition of edge betweenness for heterogeneous networks, our method gained around 5% increase in accuracy from the state-of-the-art methods including GNetMine.

Phiradet Bangcharoensap, Tsuyoshi Murata, Hayato Kobayashi, Nobuyuki Shimizu, "Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization",Proceedings of the 9th ACM International Conference on Web Search and Data Mining (WSDM2016), pp.437-446, 2016.

Speed-up of constrained community detection

Many automatic methods for community detection have been already proposed. If we can utilize background knowledge of node similarity or dissimilarity in a network, more convincing communities will be detected. Eaton et al. proposed a method for constrained community detection which optimizes constrained Hamiltonian. But their method is slow for large-scale networks because they employ simulated annealing. Our method uses Louvain method for optimizing constrained Hamiltonian and it accelerates constrained community detection greatly.

Keisuke Nakata and Tsuyoshi Murata, "Fast Optimization of Hamiltonian for Constrained Community Detection", Proceedings of the 6th Workshop on Complex Networks (CompleNet 2015), Studies in Computational Intelligence, Vol. 597, pp.79-89, Springer, 2015.

Keisuke Nakata and Tsuyoshi Murata, "Speed-up of Constrained Community Detection based on Hamiltonian", Transactions of JSAI, Vol.30, No.1, pp.96-101, 2015.(in Japanese)

Koji Eguchi and Tsuyoshi Murata, "Constrained Community Detection in Multislice Networks", Transactions of JSAI, Vol.32, No.1, WII-C, pp.1-9, 2017.(in Japanese)

Koji Eguchi and Tsuyoshi Murata, "Constrained Community Detection in Multiplex Networks", Proceedings of the 9th International Conference on Social Informatics (SocInfo 2017), pp.75-87, 2017.

Influence maximization in dynamic networks

Influence maximization involves a problem of finding a set of nodes that will propagate information or disease most in given social networks. Many previous methods search for nodes of high centrality, which are not always good for this problem. Finding strict solution for the problem is computationally intractable even for static networks. This research employs greedy algorithm and heuristics for influence maximization in dynamic social networks. Our method is much faster than previous methods, and the accuracies of its solutions are not inferior to strict solutions.

Shogo Osawa and Tsuyoshi Murata, "Selecting Seed Nodes for Influence Maximization in Dynamic Networks", Proceedings of the 6th Workshop on Complex Networks (CompleNet 2015), Studies in Computational Intelligence, Vol. 597, pp.91-98, Springer, 2015.

Shogo Osawa and Tsuyoshi Murata, "Influence Maximization in Dynamic Social Networks", Transactions of JSAI, Vol.30, No.6, pp.693-702, 2015 (in Japanese).

Detecting Communities from Tripartite Members

YouTube can be regarded as a network of videos, users and tags. Many social media can be represented as heterogeneous networks that contain relations among many types of nodes and edges. We invented a method for detecting communities of tripartite networks that contain three types of nodes.

Python code is available from here.

Kyohei Ikematsu and Tsuyoshi Murata, "A Fast Method for Detecting Communities from Tripartite Networks", Proceedings of the 5th International Conference on Social Informatics (SocInfo 2013), pp.192-205, 2013.

Kyohei Ikematsu, Tsuyoshi Murata, "Improvement of a Tripartite Modularity and Its Optimization Method", Transactions of JSAI, Vol.29, No.2, pp.245-258, 2014. (in Japanese)

Detecting Communities from Signed Networks

A signed network contains two types of edges: friendship and hostility. We extend modularity (a criteria for evaluating goodness of network partitions) for signed networks so that we can detect nested factions, which often appears in real social networks.

Demonstration is here.

T. Sugihara, X. Liu, T. Murata, "Detecting Communities from Signed Networks", Transactions of JSAI, Vol.28, No.1, pp.67-76, 2013. (in Japanese)

Detecting Communities of Distant Members

People nearby tend to be close friends. We propose Dist-Modularity, an extension of Girvan-Newman modularity, in order to exclude such tendency. Dist-Modularity are useful for detecting communities whose members are far apart, and it has abilities of detecting communities of different granularity.

Xin Liu, Tsuyoshi Murata, Ken Wakita, "Detecting network communities beyond assortativity-related attributes", Physical Review E 90, 012806, 2014..

Dist-Modularity is used for detecting terrorist networks by the researchers of U.S. Military Academy.

Paulo Shakarian, Patrick Roos, Devon Callahan, Cory Kirk, "Mining for Geographically Disperse Communities in Social Networks by Leveraging Distance Modularity", KDD2013.

Detecting Communities from Scale-free Networks

Many real-world networks exhibit scale-free property. We propose a method for detection communities from such scale-free networks.

Sorn Jarukasemratana, Tsuyoshi Murata, Xin Liu, "Community detection algorithm based on centrality and node distance in scale-free networks", Proceedings of the 24th ACM Conference on Hypertext and Social Media (Hypertext 2013), pp.258-262, 2013.

Sorn Jarukasemratana, Tsuyoshi Murata, Xin Liu, "Community Detection Algorithm based on Centrality and Node Closeness in Scale-Free Networks", Transactions of JSAI, Vol.29, No.2, pp.234-244, 2014.

Survey of Large-scale Network Visualization Tools The following survey paper explains many tools for visualizing large-scale networks, such as igraph, Gephi, Cytoscape, Tulip, WiGis, CGV, VisANT, Pajek, In Situ Framework, Honeycomb, JavaScript InfoVis Toolkit, and GraphGL.

Sorn Jarukasemratana, Tsuyoshi Murata,"Recent Large Graph Visualization Tools : A Review"Computer Software, Vol. 30, No. 2 pp.159-175, 2013.

Visualizing the Structure of Great Speeches

Great speeches are semantically well-structured, and they are often studied as a good example for writing compositions. As the first step for evaluating the quality of a given speech, we propose a method for visualizing the speech structure using WordNet (a dictionary of semantic relations of English words) and semantic orientation (positiveness or negativeness of words).

Demonstration is here.

Yu Sakamoto and Tsuyoshi Murata, "Visualizing the Structure of Great Speeches", Poster Proceedings of IEEE Pacific Visualization Symposium , pp.21-22, 2014.

Detecting and Visualizing Communities from Bipartite Networks

The relation of online bulletin boards and their users can be represented as a bipartite network composed of two types of vertices. Detecting communities from such bipartite network will enable us to find boards of related contents and users of similar interests. Experiments have been made for detecting both board communities and user communities and for visualizing them. In addition to that, a new criterion for the degree of correspondence the communities of different vertex types was measured.

Tsuyoshi Murata and Tomoyuki Ikeya, "Analysis of Online Question-Answering Forums as Heterogeneous Networks", Proceedings of the Second International Conference on Weblogs and Social Media (ICWSM-08), pp.210-211, 2008.

Link Prediction for Online QA Sites

Relations among people can be represented as a social network if you regard a person as a vertex and his/her friendship as an edge. Predicting future structure of a social network based on its present structure is important for estimating the evolution of its organization and detecting key persons. Our new link prediction method is based on improved distance functions, and its prediction accuracy is better than previous ones.

Tsuyoshi Murata and Sakiko Moriyasu, Link Prediction based on Structural Properties of Online Social Networks, New Generation Computing, Vol.26, No.3, pp.245-257, 2008.

Fast Community Detection from Large-scale Bipartite Networks
Finding communities from large-scale network is not an easy task because the search for the best network division is NP-hard in general. There are several methods for detecting communities. Our new method combines a fast algorithm (LP) and an accurate one (BRIM) in order to detect suitable communities from large-scale bipartite networks (about one million vertices).

Xin Liu and Tsuyoshi Murata, "Community Detection in Large-scale Bipartite Networks", Proceedings of the 2009 IEE/WIC/ACM International Conference on Web Intelligence (WI-09), pp.50-57, 2009.

New Criterion for Evaluating Communities in Heterogenous Networks

As the criterion for evaluating the goodness of communities in a network, Newman-Girvan modularity is widely used. However, it is not suitable for heterogeneous networks composed of more than one type of vertices. We have extended the definition of Newman-Girvan modularity so that one-to-many correspondence the communities of different vertex types will be allowed.

Tsuyoshi Murata and Tomoyuki Ikeya, "A New Modularity for Detecting One-to-Many Correspondence of Communities in Bipartite Networks", Advances in Complex Systems, World Scientific, Vol. 13, No. 1, pp.19-31, 2010.

Title: Community Detection in K-Partite K-Uniform (Hyper)Networks

Introduction: Recently we propose a framework to solve the generalized problem of community detection in k-partite k-uniform (hyper)networks. The new method overcomes limitations of previous approaches and have several desired properties such as comprehensive, adaptive, parater-free, accurate, and scalable.

poster is available here

Web Usage Mining

When a user accesses to the Web, several usage data are generated such as access logs, bookmarks, and searched keywords. We are doing research on detecting the user's interests base on the usage data.
Extracting Users' Interests from Web Usage Data

We have proposed a new method for extracting a user's interests based on the network of the sites he/she visited and the keywords he/she entered, which we call a site-keyword graph. In our experiments, ranking and community detection are performed to the site-keyword graphs of 8,000 users in order to detect their interests.

Tsuyoshi Murata and Kota Saito, "Extracting Users' Interests of Web-watching Behaviors Based on Site-Keyword Graph", in A. Namatame, S. Kurihara, H. Nakashima, (Eds.), Emergent Intelligence of Networked Agents, Studies in Computational Intelligence, Vol.56, pp.139-146, Springer, 2007.