
Lei Xu, Emeritus Professor of Computer Science and Engineering, Chinese University of Hong Kong (CUHK); Zhiyuan Chair Professor of Computer Science and Engineering Department, Chief Scientist of AI Research Institute, Chief Scientist of SJTU-Sensetime Research Institute; Chief Scientist of Brain Sci & Tech Research Centre, Shanghai Jiao Tong University (SJTU); Director of Neural Computation Research Centre in Brain and Intelligence Science-Technology Institute, Zhang Jiang National Lab.
Elected to Fellow of IEEE in 2001; Fellow of intl. Association for Pattern Recognition in 2002 and of European Academy of Sciences (EURASC) in 2003. Received several national and international academic awards, e.g., including 1993 National Nature Science Award, 1995 Leadership Award from International Neural Networks Society (INNS) and 2006 APNNA Outstanding Achievement Award. Conducted research in several areas of Artificial Intelligence over 40 years. Published about 400 papers (including 140+ Journal papers,also published 4 papers on NIPS during 1992-95 and the one in 1992 with Peking university marks the first time a Chinese academic institution entered this topmost AI conference). His influential contributions on Randomized Hough Transform (RHT), RPCL learning, LMSER learning, classifier combination and mixture of experts, and BYY harmony learned are well known and widely followed. Served as EIC and associate editors of several academic journals.
There is a belief in recent AI studies that enabling model to learn causality will be a miracle cure to the bottlenecks of LLM, but unaware of limitations and difficulties. Here we rethink causality as abstracted dynamics, driven by causal potential and matched conductivity, and understand that data from observational sensors as mostly did in the current AI are not enough for tackling the challenges and especially for identifying casual structures, such that causalities cannot be correctly learned and used for inference. In fact, assumptions about casual structures are pre-specified or implied in the literatures of causal analyses, with confusions and quarrels incurred by inappropriate assumptions. Humen intelligence finds causality not merely from observations but mainly from designing experiments plus logical thinking by those smart brains. Here,three remedies are suggested for improving LLM. First, training data with human obtained causal knowledges are given much heavier weightings. Second, externally intervening critical elements within LLM during training. Third, finding ways to transfer the known causal knowledges into data for training.

Professor Haizhou LI is a Fellow of the Singapore Academy of Engineering, a Fellow of the IEEE, and a Fellow of International Speech Communication Association. He is currently the Dean of the School of Artificial Intelligence and Presidential Chair Professor at The Chinese University of Hong Kong, Shenzhen. He also serves as Adjunct Professor at the National University of Singapore and U Bremen Excellence Chair Professor at the University of Bremen, Germany.
Professor LI has made outstanding contributions to speech recognition and natural language processing. He has led the development of multiple major technology deployments, including the voiceprint recognition engine for Lenovo’s A586 smartphone in 2012, and the music search engine for Baidu Music in 2013. He was the Editor-in-Chief of IEEE/ACM Transactions on Audio, Speech and Language Processing, the President of International Speech Communication Association, and a Vice President of IEEE Signal Processing Society.
Humans have a remarkable ability to pay their auditory attention only to a sound source of interest, that we call selective auditory attention, in a multi-talker environment or a Cocktail Party. As discovered in neuroscience and psychoacoustics, the auditory attention is achieved by a modulation of top-down and bottom-up attention. However, signal processing approach to speech separation and/or speaker extraction from multi-talker speech remains a challenge for machines. In this talk, we study the deep learning solutions to monaural speech separation and speaker extraction that enable selective auditory attention. We review the findings from human audio-visual speech perception to motivate the design of speech perception algorithms. We will also discuss the computational auditory models, technical challenges and the recent advances in the field.

Massimo Tornatore (Fellow, IEEE) is currently a Full Professor in the Department of Electronics, Information, and Bioengineering, Politecnico di Milano. He has also held appointments as Adjunct Professor at University of California, Davis, USA and
as Visiting Professor at University of Waterloo, Canada.
His research interests include performance evaluation, optimization and design of communication networks (with an emphasis on the application of optical networking technologies), network virtualization, network reliability, and machine learning application for network management. In these areas, he co-authored more than 500 peer-reviewed conference and journal papers (with 23 best paper awards), 3 books, and 4 patents.
He is a member of the Editorial Board, among others, of IEEE Communication Surveys and Tutorials, IEEE Transactions on Networking, IEEE Transactions on Network and Service Management.
As AI models continue to grow in size and complexity, training is increasingly constrained by the limits of single-datacenter infrastructures. Emerging approaches are exploring geo-distributed training across multiple datacenters to improve scalability, resource utilization, and sustainability. This shift places the network at the center of AI system performance, requiring new solutions that jointly address latency, bandwidth, and coordination at scale.
In this keynote, I will discuss the networking challenges introduced by geo-distributed AI training, focusing on the role of collective communications and inter-datacenter connectivity. I will highlight ongoing research on optimizing distributed training traffic and outline how advances in optical and programmable networks can help enable efficient large-scale AI training across geographically distributed sites. The talk will conclude with key open questions and research opportunities at the intersection of machine learning systems and next-generation networking.

Bin Hu is a (Full) Professor and the Dean of the School of Medical Technology at Beijing Institute of Technology, China. He is a National Distinguished Expert, Chief Scientist of 973 as well as National Advanced Worker in 2020. He is a Fellow of IEEE/IET/AAIA and IET Fellow Assessor & Fellowship Advisor. He serves as the Editor-in-Chief for the IEEE Transactions on Computational Social Systems and an Associate Editor for IEEE Transactions on Affective Computing. He is one of Clarivate Highly Cited Researchers, World's Top 2% Scientists and 0.05% Highly Ranked Scholar from ScholarGPS.
In recent years, mental health issues have become increasingly prominent all of the world. According to the report from the World Health Organization, approximately 970 million people suffer from mental disorders, accounting for 13% of the global population. Currently, the diagnosis of mental illnesses primarily relies on physician interviews and Brief Psychiatric Rating Scale (BPRS), lacking objective and quantifiable diagnostic indicators. Besides, the common treatment of mental disorders is pharmacotherapy, which is often associated with significant side effects. The rapid advancement of cutting-edge artificial intelligence and big data technologies offers new opportunities for the diagnosis and treatment of mental disorders. These technologies are shifting the approach to data driven screening and treatment, offering more precise, personalized, and effective solutions. This report will introduce the opportunities and challenges in the field of medical electronics and computational methodologies for the diagnosis and treatment of mental disorders

Li Jingjing is a Professor at the University of Electronic Science and Technology of China. His research focuses on multimodal learning and transfer learning. He has published over 80 papers in TPAMI and other CCF A-level venues with more than 10000 citations. He has won multiple national awards, including the Wu Wenjun AI Outstanding Youth Award and the ACM SIGAI China Rising Star Award.
Vision-language pretraining has enabled powerful vision-language models (VLMs) with strong zero-shot capabilities. Yet, their performance drops in domain-specific tasks, motivating research on transferring and generalizing VLM knowledge to downstream applications. This talk briefly reviews generalization settings, methodologies, and benchmarks, categorizing approaches into prompt-based, parameter-based, and feature-based methods. We also discuss our recent research on generalizing VLMs to novel domains.