Tutorials | ADC 2023

Towards Data-centric Graph Machine Learning

Wed 1 Nov 2023 10:30 AEDT (UTC+11)

▶ Abstract

Graph-structured data, constituted by discrete nodes connected by independent edges within a non-Euclidean space, serves as the foundational data type for depicting and capturing complex interdependencies among massive diverse entities in the real world. In the context of Data-centric AI, this tutorial will provide an introduction of the recent advances in Data-centric Graph Machine Learning (DC-GML). Concretely, this tutorial will cover the systematic framework of DC-GML that encompasses all stages of the graph data lifecycle, including graph data collection, exploration, improvement, exploitation, and maintenance. Three critical graph-centric questions will be answered covering: (1) how to enhance graph data availability and quality; (2) how to learn from graph data with limited-availability and low-quality; (3) how to build graph MLOps systems from the graph data-centric view. Lastly, this tutorial will offer a forward-looking outlook to navigate future advancements and applications of the DC-GML domain.

▶ Speakers

Shirui PanGriffith University

Xin ZhengMonash University

Shirui Pan is a Professor and an ARC Future Fellow with the School of Information and Communication Technology, Griffith University, Australia. He received his Ph.D degree in computer science from University of Technology Sydney (UTS), Australia. He is a Senior Member of IEEE and ACM, and a Fellow of Queensland Academy of Arts and Sciences (FQA). Shirui’s research focuses on artificial intelligence, with a focus on graph machine learning. His research has been published in top conferences and journals including NeurIPS, ICML, KDD, TPAMI, TNNLS, and TKDE. His research received the 2024 IEEE CIS TNNLS Outstanding Paper Award and the IEEE ICDM Best Student Paper Award.

Xin Zheng is a final year Ph.D. student at Monash University, Australia. She received her B.S degree (2017) and Master degree (2020) both from Dalian University of Technology, China. Her research interests mainly on the study of automated graph machine learning operations (MLOps) workflow, specifically within the automated GNN design and graph data-centric learning. She has published papers on top-tier journals and conference papers, such as IJCV, PR, ICDM, WWW, MM.

TUTORIAL

Detect Label Errors in Datasets

Wed 1 Nov 2023 13:00 AEDT (UTC+11)

▶ Abstract

With the rise of large AI models, data assets have gained increasing importance. Understanding how to identify and correct label errors in our datasets is crucial. This is primarily because label errors are pervasive in the era of big data, and rectifying them can significantly enhance our knowledge. Moreover, large AI models are susceptible to overfitting label errors, which hinders their ability to generalize effectively unless label noise is adequately addressed. In this tutorial, we will present typical approaches to handle label noise, such as extracting confident/non-confident examples (indicating likely correct/incorrect labels) using deep network properties and intuitions. Additionally, we will explore methods that focus on directly modelling the label noise, providing theoretical guarantees. By illustrating the intuitions behind state-of-the-art techniques, this tutorial aims to equip researchers and practitioners with valuable insights into effectively managing label noise in datasets.

▶ Speaker

Tongliang LiuUniversity of Sydney

Tongliang Liu is the Director of Sydney AI Centre at the University of Sydney. He is broadly interested in the fields of trustworthy machine learning and its interdisciplinary applications, with a particular emphasis on learning with noisy labels, adversarial learning, transfer learning, unsupervised learning, and statistical deep learning theory. He has authored and co-authored more than 200 research articles including ICML, NeurIPS, ICLR, CVPR, AAAI, IJCAI, JMLR, and TPAMI. His monograph on machine learning with noisy labels will be published by MIT Press. He is/was a (senior-) meta reviewer for many conferences, such as ICML, NeurIPS, ICLR, UAI, AAAI, IJCAI, and KDD, and was a notable AC for ICLR. He is an Associate Editor of TMLR and is on the Editorial Boards of JMLR and MLJ. He is a recipient of the AI’s 10 to Watch Award from IEEE in 2023, the Future Fellowship Award from Australian Research Council (ARC) in 2022, and the Discovery Early Career Researcher Award (DECRA) from ARC in 2018.

TUTORIAL

Data-centric Computer Vision: Problems, Good Practices and Preliminary Solutions

Wed 1 Nov 2023 15:30 AEDT (UTC+11)

▶ Abstract

As the demand for data-driven decision-making and artificial intelligence applications continues to rise, the importance of data cannot be understated. This tutorial will provide a comprehensive overview of the key principles, good practices, and challenges associated with data-centric computer vision problems. On the one hand, this tutorial gives a few examples of data properties, such as image-text alignment strength, test data difficulty and training data quality. On the other hand, we will discuss collecting, cleaning, organizing, and validating data to improve its reliability and relevance for specific applications. Through two representative cases, one in domain generalization and one in medical imaging data, this tutorial will demonstrate how to curate high-quality and useful datasets for future research.

▶ Speakers

Xin YuUniversity of Queensland

Liang ZhengAustralian National University

Zijian WangUniversity of Queensland

Dr Xin Yu is a Senior Lecturer at the University of Queensland (UQ) and is an ARC DECRA fellow. Previously, he was a research fellow at the Australian National University (ANU). He received PhD degrees from Tsinghua University and the Australian National University, respectively. His research interests cover a wide range of topics in Computer Vision and Machine Learning. He has published more than 70 papers on top-tier conference papers and journals, such as CVPR, ECCV, NeurIPS, ICLR, TPAMI, and IJCV. He also received Best Paper Honorable Mention Award in WACV 2020, and his paper was nominated for the Best Paper Award in CVPR 2020. He is a recipient of Google Research Scholar Award in 2021. He also won several Challenge championships in the workshops of CVPR, ACCV, etc.

Dr Liang Zheng is a Senior Lecturer and was an ARC DECRA Fellow in the Australian National University. He is best known for his contributions in object re-identification, where he and his collaborators designed widely used datasets and algorithms such as Market-1501 (ICCV 2015), part-based convolutional baseline (ECCV 2018), random erasing (AAAI 2020) and joint detection and embedding (ECCV 2020). His recent research interest is data-centric computer vision, aiming at analysing and improving data rather than models themselves. He is a leading organizer of the Vision Datasets Understanding workshop series and the DataCV challenge at CVPR and serves as an Area Chair for leading conferences such as CVPR, ICCV, ECCV and NeurIPS. He received the Outstanding Young Author (Paper) Award from IEEE Transactions on Circuits and Systems for Video Technology and was named one of AI’s 10 to Watch by IEEE Intelligent Systems and Australia’s Early Achievers by The Australian. He received his B.S degree (2010) and Ph.D degree (2015) both from Tsinghua University, China.

Zijian Wang is a Postdoctoral Research Fellow at the University of Queensland (UQ). His PhD thesis is mainly on domain adaptation and generalization in computer vision. He has published papers on top-tier conference papers and journals, such as ICCV, ICML, ICLR, MM, and TPAMI. Zijian has also been widely engaged in a number of cross-disciplinary research projects, spanning civil engineering and chemical engineering.

TUTORIAL

Towards Trustworthy Data Markets: Recent Advances and Open Problems

Thu 2 Nov 2023 10:30 AEDT (UTC+11)

▶ Abstract

Data is the new oil. The value of data is rapidly increasing, with companies like Google and Facebook surpassing traditional oil companies in market capitalization and ranking on the Fortune 500 list. As the data science community explores ways to determine, transfer, and allocate the value of data, new technical challenges arise when considering the economic constraints in the data science pipeline, including data collection, cleaning, sharing, and analysis. One of the biggest hurdles in data markets is exchanging sensitive data related to individuals, such as social networks, spatiotemporal trajectories, and healthcare information. In this tutorial, I will discuss recent studies on creating a trustworthy data market that enables private, secure, and fair data trading. We will also examine areas for further research and opportunities to improve the current state of the data market.

▶ Speaker

Yang CaoHokkaido University

Yang Cao is an Associate Professor at Hokkaido University. He earned his Ph.D. in Informatics from Kyoto University in 2017. His research areas include security and privacy, data markets, data management, and trustworthy machine learning. His work has been published in esteemed conferences and journals such as VLDB, SIGMOD, ICDE, KDD, AAAI, and USENIX Security. Three of his papers were finalists for best papers in ICDE 2017, ICME 2020, and BigData 2022. He has received several awards, including the IEEE Computer Society Japan Chapter Young Author Award in 2019 and the Database Society of Japan Kambayashi Young Researcher Award in 2021. His research projects have been supported by various organizations, including JSPS, JST, MSRA, KDDI, LINE, and WeBank.

TUTORIAL

Privacy Challenges in Graph Neural Networks in MLaaS

Thu 2 Nov 2023 13:00 AEDT (UTC+11)

▶ Abstract

Graph Neural Networks (GNNs) have established themselves as influential graph learning tools with applications spanning from common utilities such as recommendation systems and advanced domains like drug discovery. As the adoption of GNNs in data-sensitive areas increases, their privacy considerations have garnered more focus. Recent research indicates that GNN models might be susceptible to privacy risks, emphasising the need to ensure the privacy of sensitive data, including model parameters and graph information. Two primary challenges in GNN privacy are: (1) the protection of diverse objectives like nodes, edges, graphs, and models, each with its unique requirements; and (2) the delicate balance between privacy and model utilities. In this tutorial, we aim to offer a comprehensive overview of existing GNN privacy methodologies and to shed light on unresolved challenges and emerging trends.

▶ Speakers

Bang WuCSIRO - Data61

He ZhangMonash University

Bang Wu currently holds CSIRO Early Research Career (CERC) Postdoctoral Fellowship at CSIRO’s DATA61, Australia. His PhD thesis is mainly on securing graph neural networks in machine learning as a service. He has authored research papers featured in top-tier conferences and journals spanning multiple domains, including ICDM, ICML, AsiaCCS, TIFS, and TDSC. His research interests include trustworthy graph-based machine learning, trustworthy machine learning on multimodal systems, and various facets of security and privacy in machine learning across different domains.

He Zhang is a final year Ph.D. candidate at the Faculty of Information Technology, Monash University, Australia. He has a profound interest in GNNs and the development of trustworthy AI systems. His research in trustworthy GNNs has led to several academic publications in top conferences like ICML and CIKM, as well as top journals like IEEE TKDE. Currently, He Zhang is exploring the navigation in multiple objectives (e.g., privacy, fairness, and utility) with the aim of comprehensively building trustworthy GNNs.