Deep Learning and Its Utility for Data Mining and Computer Vision

November 11, 2021

These are the works I did, co-led, or supervised at the Alibaba DAMO Academy.

Research Highlights

Aiming to learn better visual representations/embeddings for diverse downstream tasks, we introduced a new attention mechanism (B-Attention) for more robust grasph structure learning via multiple statistical tests (Wang et al., 2022). The effectiveness of multiple tests for graph structure learning is verified both theoretically and empirically on multiple clustering and ReID benchmark datasets.
Aiming for learning better visual representations/embeddings with less noisy graphs using Graph Convolutional Neural Networks (GCNs), we proposed a method (Ada-NETS) to adaptively remove noisy nodes in graphs (Wang et al., 2022). Experiments on multiple public clustering datasets show that Ada-NETS significantly outperforms current state-of-the-art methods, proving its superiority and generalization.
To better understand the data imbalance problem in training GCNs, we conducted an empirical study on GCN-based clustering with imbalanced datasets, yielding valuable insights into the issue (Yang et al., 2021).
To tackle the problems of tail shadow and background distortion in learning-based video compression, we introduced a new method for more accurate motion prediction, which surpassed the state-of-the-art rate-distortion performance in most video compression datasets (Li et al., 2022).
To enable variable-rate control without sacrificing the performance, we proposed an efficient Interpolation Variable-Rate (IVR) network, by introducing a handy Interpolation Channel Attention (InterpCA) module in the compression network (Sun et al., 2021). The method achieves a fine PSNR interval of 0.001 dB and a fine rate interval of 0.0001 Bits-Per-Pixel (BPP) with 9000 rates in the IVR network. Experimental results demonstrate that the IVR network is the first variable-rate learned method that outperforms VTM 9.0 (intra) in PSNR and Multiscale Structural Similarity (MS-SSIM).

References

2022

Robust Graph Structure Learning via Multiple Statistical Tests

Yaohua Wang, Fangyi Zhang, Ming Lin, Senzhang Wang, Xiuyu Sun, and 1 more author

In Advances in Neural Information Processing Systems (NeurIPS), Dec 2022

Abs PDF Code

Graph structure learning aims to learn connectivity in a graph from data. It is particularly important for many computer vision related tasks since no explicit graph structure is available for images for most cases. A natural way to construct a graph among images is to treat each image as a node and assign pairwise image similarities as weights to corresponding edges. It is well known that pairwise similarities between images are sensitive to the noise in feature representations, leading to unreliable graph structures. We address this problem from the viewpoint of statistical tests. By viewing the feature vector of each node as an independent sample, the decision of whether creating an edge between two nodes based on their similarity in feature representation can be thought as a single statistical test. To improve the robustness in the decision of creating an edge, multiple samples are drawn and integrated by multiple statistical tests to generate a more reliable similarity measure, consequentially more reliable graph structure. The corresponding elegant matrix form named -Attention is designed for efficiency. The effectiveness of multiple tests for graph structure learning is verified both theoretically and empirically on multiple clustering and ReID benchmark datasets.
Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space

Yaohua Wang, Yaobin Zhang, Fangyi Zhang, Senzhang Wang, Ming Lin, and 2 more authors

In International Conference on Learning Representations (ICLR), Dec 2022

Abs PDF Code

Face clustering has attracted rising research interest recently to take advantage of massive amounts of face images on the web. State-of-the-art performance has been achieved by Graph Convolutional Networks (GCN) due to their powerful representation capacity. However, existing GCN-based methods build face graphs mainly according to kNN relations in the feature space, which may lead to a lot of noise edges connecting two faces of different classes. The face features will be polluted when messages pass along these noise edges, thus degrading the performance of GCNs. In this paper, a novel algorithm named Ada-NETS is proposed to cluster faces by constructing clean graphs for GCNs. In Ada-NETS, each face is transformed to a new structure space, obtaining robust features by considering face features of the neighbour images. Then, an adaptive neighbour discovery strategy is proposed to determine a proper number of edges connecting to each face image. It significantly reduces the noise edges while maintaining the good ones to build a graph with clean yet rich edges for GCNs to cluster faces. Experiments on multiple public clustering datasets show that Ada-NETS significantly outperforms current state-of-the-art methods, proving its superiority and generalization.
Jmpnet: Joint Motion Prediction for Learning-Based Video Compression

Dongyang Li, Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, and 2 more authors

In IEEE International Conference on Acoustics, Speech and Signal Processing, Dec 2022

Abs PDF

In recent years, more attention is attracted by learning-based approaches in the field of video compression. Recent methods of this kind normally consist of three major components: intra-frame network, motion prediction network, and residual network, among which the motion prediction part is particularly critical for video compression. Benefiting from the optical flow which enables dense motion prediction, recent methods have shown competitive performance compared with traditional codecs. However, problems such as tail shadow and background distortion in the predicted frame remain unsolved. To tackle these problems, JMPNet is introduced in this paper to provide more accurate motion information by using both optical flow and dynamic local filter as well as an attention map to further fuse these motion information in a smarter way. Experimental results show that the proposed method surpasses state-of-the-art (SOTA) rate-distortion (RD) performance in the most data-sets.

2021

GCN-Based Linkage Prediction for Face Clustering on Imbalanced Datasets: An Empirical Study

Huafeng Yang, Xingjian Chen, Fangyi Zhang, Guangyue Hei, Yunjie Wang, and 1 more author

In Workshops of the International Joint Conference on Artificial Intelligence, Dec 2021

Abs PDF Code

In recent years, benefiting from the expressivepower of Graph Convolutional Networks (GCNs),significant breakthroughs have been made in faceclustering. However, rare attention has been paidto GCN-based clustering on imbalanced data. Al-though imbalance problem has been extensivelystudied, the impact of imbalanced data on GCN-based linkage prediction task is quite different,which would cause problems in two aspects: im-balanced linkage labels and biased graph represen-tations. The problem of imbalanced linkage labelsis similar to that in image classification task, but thelatter is a particular problem in GCN-based clus-tering via linkage prediction. Significantly biasedgraph representations in training can cause catas-trophic overfitting of a GCN model. To tacklethese problems, we evaluate the feasibility of thoseexisting methods for imbalanced image classifica-tion problem on graphs with extensive experiments,and present a new method to alleviate the imbal-anced labels and also augment graph representa-tions using a Reverse-Imbalance Weighted Sam-pling (RIWS) strategy, followed with insightfulanalyses and discussions. A series of imbalancedbenchmark datasets synthesized from MS-Celeb-1M and DeepFashion will be openly available.
Interpolation Variable Rate Image Compression

Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Yichen Qian, and 2 more authors

In ACM International Conference on Multimedia, Virtual Event, China, Dec 2021

Abs PDF

Compression standards have been used to reduce the cost of image storage and transmission for decades. In recent years, learned image compression methods have been proposed and achieved compelling performance to the traditional standards. However, in these methods, a set of different networks are used for various compression rates, resulting in a high cost in model storage and training. Although some variable-rate approaches have been proposed to reduce the cost by using a single network, most of them brought some performance degradation when applying fine rate control. To enable variable-rate control without sacrificing the performance, we propose an efficient Interpolation Variable-Rate (IVR) network, by introducing a handy Interpolation Channel Attention (InterpCA) module in the compression network. With the use of two hyperparameters for rate control and linear interpolation, the InterpCA achieves a fine PSNR interval of 0.001 dB and a fine rate interval of 0.0001 Bits-Per-Pixel (BPP) with 9000 rates in the IVR network. Experimental results demonstrate that the IVR network is the first variable-rate learned method that outperforms VTM 9.0 (intra) in PSNR and Multiscale Structural Similarity (MS-SSIM).