Large-Scale Semi-supervised Learning via Graph Structure Learning over High-dense Points
We focus on developing a novel scalable graph-based semi-supervised learning (SSL) method for a small number of labeled data and a large amount of unlabeled data. Due to the lack of labeled data and the availability of large-scale unlabeled data, existing SSL methods usually encounter either suboptimal performance because of an improper graph or the high computational complexity of the large-scale optimization problem. In this paper, we propose to address both challenging problems by constructing a proper graph for graph-based SSL methods. Different from existing approaches, we simultaneously learn a small set of vertexes to characterize the high-dense regions of the input data and a graph to depict the relationships among these vertexes. A novel approach is then proposed to construct the graph of the input data from the learned graph of a small number of vertexes with some preferred properties. Without explicitly calculating the constructed graph of inputs, two transductive graph-based SSL approaches are presented with the computational complexity in linear with the number of input data. Extensive experiments on synthetic data and real datasets of varied sizes demonstrate that the proposed method is not only scalable for large-scale data, but also achieve good classification performance, especially for extremely small number of labels.
Dr. Li Wang is currently an assistant professor with Department of Mathematics and Department of Computer Science Engineering, University of Texas at Arlington, Texas, USA. She worked as a research assistant professor with Department of Mathematics, Statistics, and Computer Science at University of Illinois at Chicago, Chicago, USA from 2015 to 2017. She worked as the Postdoctoral Fellow at University of Victoria, BC, Canada in 2015 and Brown University, USA, in 2014. She received her Ph.D. degree in Department of Mathematics at University of California, San Diego, USA, in 2014. Her research interests include data science, large-scale optimization and machine learning.