Deep Learning and Its Utility for Data Mining and Computer Vision

November 11, 2021

These are the works I did, co-led, or supervised at the Alibaba DAMO Academy.

Research Highlights

  • Aiming to learn better visual representations/embeddings for diverse downstream tasks, we introduced a new attention mechanism (B-Attention) for more robust grasph structure learning via multiple statistical tests (Wang et al., 2022). The effectiveness of multiple tests for graph structure learning is verified both theoretically and empirically on multiple clustering and ReID benchmark datasets.
  • Aiming for learning better visual representations/embeddings with less noisy graphs using Graph Convolutional Neural Networks (GCNs), we proposed a method (Ada-NETS) to adaptively remove noisy nodes in graphs (Wang et al., 2022). Experiments on multiple public clustering datasets show that Ada-NETS significantly outperforms current state-of-the-art methods, proving its superiority and generalization.
  • To better understand the data imbalance problem in training GCNs, we conducted an empirical study on GCN-based clustering with imbalanced datasets, yielding valuable insights into the issue (Yang et al., 2021).
  • To tackle the problems of tail shadow and background distortion in learning-based video compression, we introduced a new method for more accurate motion prediction, which surpassed the state-of-the-art rate-distortion performance in most video compression datasets (Li et al., 2022).
  • To enable variable-rate control without sacrificing the performance, we proposed an efficient Interpolation Variable-Rate (IVR) network, by introducing a handy Interpolation Channel Attention (InterpCA) module in the compression network (Sun et al., 2021). The method achieves a fine PSNR interval of 0.001 dB and a fine rate interval of 0.0001 Bits-Per-Pixel (BPP) with 9000 rates in the IVR network. Experimental results demonstrate that the IVR network is the first variable-rate learned method that outperforms VTM 9.0 (intra) in PSNR and Multiscale Structural Similarity (MS-SSIM).

References

2022

  1. Robust_Graph_Structure.png
    Robust Graph Structure Learning via Multiple Statistical Tests
    Yaohua Wang, Fangyi Zhang, Ming Lin, Senzhang Wang, Xiuyu Sun, and 1 more author
    In Advances in Neural Information Processing Systems (NeurIPS), Dec 2022
  2. Face_Clustering_via.png
    Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space
    Yaohua Wang, Yaobin Zhang, Fangyi Zhang, Senzhang Wang, Ming Lin, and 2 more authors
    In International Conference on Learning Representations (ICLR), Dec 2022
  3. Jmpnet.png
    Jmpnet: Joint Motion Prediction for Learning-Based Video Compression
    Dongyang Li, Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, and 2 more authors
    In IEEE International Conference on Acoustics, Speech and Signal Processing, Dec 2022

2021

  1. GCN_Based_Linkage.png
    GCN-Based Linkage Prediction for Face Clustering on Imbalanced Datasets: An Empirical Study
    Huafeng Yang, Xingjian Chen, Fangyi Zhang, Guangyue Hei, Yunjie Wang, and 1 more author
    In Workshops of the International Joint Conference on Artificial Intelligence, Dec 2021
  2. Interpolation_Variable.png
    Interpolation Variable Rate Image Compression
    Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Yichen Qian, and 2 more authors
    In ACM International Conference on Multimedia, Virtual Event, China, Dec 2021