Workshops

Selected Workshops

We thank the IAPR community for all submitted workshop proposals. The list of selected events are given below. The associated webpages with submission instructions and more information will be completed by 16 February 2026. For general queries on the workshops, please, contact workshop_chairs@icpr2026.org. For queries on a specific workshop, please, contact the respective organisers listed on their webpage.

#	Acronym	Title
WS1	BRAIN	Behavioral Robotics and AI for iNdustrial and social systems
WS2	PRRS	Workshop on Pattern Recognition in Remote Sensing
WS3	EMM	Efficient Methods for Multimodal Models
WS4	AI4MT	The First International Workshop on Artificial Intelligence for Multimodal Transportation
WS5	MV2	Machine Vision for Industrial Inspection (MVI2)
WS6	AIHA	Fourth International Workshop on Artificial Intelligence for Healthcare Applications
WS7	PRESTIGE	Pattern Recognition and Computer Vision for e-Heritage and Digital Humanities Workshop
WS8	PRHA	The Fourth International Workshop on Pattern Recognition in Healthcare Analytics and Bioinformatics
WS9	CompressLLMs	The Art of Compressing LLMs: Pruning, Distillation, and Quantization Demystified
WS10	V3SC 2026-3	The third International Workshop on Video Surveillance Systems in Smart Cities: Aerial Monitoring and Synthetic Data
WS11	PIDM4HCPS	Perception, Interaction and Decision-Making for Human Cyber-Physical Systems
WS12	BIOMAP	BIO-inspired Methods for Pattern Recognition
WS13	HVG	Human-Centric Video Generation
WS14	FMVA	Foundation Models for Vision Applications
WS15	PAVER	PAVER: Workshop on Physics-Aware Video gEneration and Restoration
WS16	ETTAC	Second International Workshop on Eye Tracking Techniques, Applications and Challenges
WS17	MCMI	Second International Workshop on Multi- and Cross-Modal Information for Enhanced Pattern Recognition
WS18	AIXAIE@ICPR2026	Fourth workshop on Explainable and Ethical
WS19	TrustDoc	Trustworthy Document Understanding: Privacy, Unlearning, Robustness, and Explainability
WS20	CVBMC	Computer Vision for Biodiversity Monitoring and Conservation
WS21	MPRSS 2026	Multimodal pattern recognition for social signal processing in human computer interaction
WS22	AUSTech 2.0	Advances in Underwater Surveillance: Technologies, Challenges, and Future Directions
WS23	TIPS	Workshop on Textual Information Processing & Synthesis in the Wild
WS24	HGVA	Workshop on Human Gaze and Visual Attention Modeling
WS25	RRPR	Reproducible Research in Pattern Recognition
WS26	MANPU	International Workshop on coMics ANalysis, Processing and Understanding
WS27	GREEN-PR	Sustainable PR & PR for Environment
WS28	IMTAX-2026	Image Mining. Theory and Applications
WS29	PRCID	Pattern Recognition Challenges in Infectious Disease
WS30	GenAAI 2026	Workshop on Generative and Agentic AI for Real-World Video Understanding

Behavioral Robotics and AI for iNdustrial and social systems (BRAIN) WS1

Overview

The convergence of Artificial Intelligence and Robotics is driving a new era of intelligent systems capable of interacting with the physical world in increasingly sophisticated ways. At the heart of this transformation lies Physical AI, namely the application of AI to systems that directly engage with their environment through sensors and actuators. Physical AI enables robots and devices to perceive, understand, and act upon complex, dynamic, and often uncertain environments, making it a cornerstone of modern robotics. This workshop, BRAIN: 'Behavioral Robotics and AI for iNdustrial and social systems', seeks to explore the role of Physical AI in shaping robotic behaviors across different critical domains, including industrial automation and social interaction. In industrial settings, Physical AI drives precision, adaptability, and efficiency in tasks such as manufacturing, logistics, and inspection. In social contexts, it empowers robots to interpret human cues, respond empathetically, and engage in meaningful interactions. By focusing on behavioral robotics, the workshop emphasizes the integration of perception, decision-making, and physical embodiment, which are key elements for creating intelligent systems that are not only reactive but also proactive and context-aware. AI plays a vital role in this process, enabling robots to interpret sensory data, learn from experience, and adapt their behavior in real time.

Workshop Website: tba

Workshop on Pattern Recognition in Remote Sensing (PRRS) WS2

Overview

As climate change intensifies, Earth observation (EO) and remote sensing has become indispensable for monitoring and understanding its impact. In the past five years, half of all EO satellites have been launched, producing an unprecedented volume of data with diverse spatial, spectral, and temporal characteristics. While these rich data sources open opportunities for various applications, they also create significant challenges in representation and interpretation. Pattern recognition and, more generally, artificial intelligence have therefore become essential for extracting key insights from these data. In this context, this workshop aims to explore the following topics:

Deep learning for Earth observation
Earth observation foundation models
Usability of foundation model embeddings for Earth observation tasks
Vision and language models for Earth observation
Hybrid models, combining physics and machine learning
Dynamic Earth observation, including multi-temporal and change detection analysis
Active, interactive and transfer learning
Explainable and interpretable machine learning
Novel pattern recognition tasks in remote sensing applications
Benchmark models and datasets

Workshop Website: available here

Efficient Methods for Multimodal Models (EMM) WS3

Overview

With extensive applications in image and video understanding, as well as in image, video, and text generation, multimodal models have become increasingly prominent and transformative in the fields of pattern recognition and computer vision.

As the scale of these models grows exponentially, there is an urgent need to explore efficient learning and deployment strategies to address the associated computational and resource challenges. This workshop aims to provide a dedicated platform for researchers and practitioners to exchange ideas and develop innovative solutions, thereby advancing the efficiency and practical deployment of multimodal models.

The topics will cover efficient inference methods and model architecture designs for multimodal models, e.g., compression, quantization, distillation, and lightweight architectures, as well as efficient learning strategies, e.g., training and fine-tuning techniques for multimodal tasks. In addition, the workshop will address related applications, including efficient multimodal generation and editing models, practical multimodal systems, and the deployment of multimodal models on resource-constrained or low-power devices.

Relevant topics include:

Compression, quantization, conditional compute, pruning, and distillation of multimodal models;
Efficient training/finetuning of multimodal models, e.g., low-rank adaptation;
Efficient sampling of multimodal diffusion models, e.g., step distillation and consistency models;
Efficient LLM/LVLM/MLLM in multimodal tasks, e.g., token pruning and merging
Efficient multi-/cross-modal learning
Efficient multimodal generative and editing models and sensors, e.g., for vision, language, audio and 3D objects;
Efficient image, video and audio synthesis by multimodal data;
Efficient multimodal applications (e.g., drone vision, autonomous driving, etc.);
Efficient self-/un-/weakly-supervised learning for multimodal data;
Deploying multimodal models on low power devices e.g., smartphone;

Workshop Website: tba

The First International Workshop on Artificial Intelligence for Multimodal Transportation (AI4MT) WS4

Overview

This workshop addresses the transformative role of artificial intelligence in multimodal transportation systems, with particular emphasis on pattern recognition techniques that enable intelligent decision-making across interconnected transport networks.

The workshop cover pattern recognition and AI applications across multiple transportation domains:

Computer Vision for Transportation: Real-time object detection and tracking for autonomous vehicles, traffic monitoring, pedestrian behavior analysis, and infrastructure inspection using deep learning architectures
Spatiotemporal Pattern Mining: Identification of mobility patterns, traffic flow prediction, demand forecasting, and anomaly detection in multimodal networks using recurrent and transformer-based models
Sensor Fusion and Multimodal Data Integration: Integration of heterogeneous data sources (camera, LiDAR, radar, GPS, IoT sensors) for comprehensive transportation state estimation
Reinforcement Learning for Operations Management: Dynamic routing, scheduling optimization, airspace management, and adaptive traffic control systems
Graph Neural Networks for Network Analysis: Modeling transportation networks as graphs for congestion prediction, route optimization, and network resilience assessment
Trajectory Prediction and Planning: Human mobility modeling, vehicle path planning, and conflict resolution in shared spaces

Workshop Website: tba

Machine Vision for Industrial Inspection (MVI2) (MV2) WS5

Overview

Machine vision for industrial inspection represents a critical real-world application domain that drives fundamental advances in pattern recognition research while addressing pressing industrial needs. The field presents unique challenges that push the boundaries of classical pattern recognition methodologies: the scarcity of defect samples motivates innovation in few-shot learning and anomaly detection; high-consequence decision-making demands advances in uncertainty quantification and explainable AI; naturally multi-modal sensing environments (visual, thermal, 3D, ultrasonic, X-ray) serve as ideal testbeds for sensor fusion algorithms; and real-time processing requirements drive research in efficient architectures and edge computing. Manufacturing variations and evolving production processes create natural domain shift scenarios that advance transfer learning and continual learning applicable across pattern recognition applications. This workshop serves as a crucial bridge between academic research and industrial deployment, addressing evaluation beyond standard metrics, scalability from laboratory prototypes to production systems, and standardization of benchmark datasets specific to inspection tasks. As Chair of IAPR Technical Committee 8, this workshop directly supports TC8's mission while creating a focused forum for presenting state-of-the-art methodologies, fostering academic-industry collaboration, and identifying future research directions. The broader impact spans global manufacturing competitiveness, product safety, infrastructure integrity, and sustainability through zero-defect manufacturing, automated quality assurance for critical systems, enhanced worker safety, and optimized production efficiency—making this workshop valuable to both pattern recognition researchers seeking impactful applications and industry practitioners requiring cutting-edge solutions.

The topics will include:

Deep Learning Architectures for Industrial Vision
Few-Shot and Zero-Shot Learning for Defect Recognition
Multi-Modal and Multi-Scale Pattern Analysis
Texture and Surface Pattern Analysis
3D Vision and Geometric Pattern Recognition
Explainable and Interpretable Pattern Recognition
Transfer Learning and Domain Adaptation
Pattern Recognition for Non-Destructive Testing
Real-Time Vision Systems and Optimization
Novel Pattern Recognition Paradigms
Benchmark Datasets and Evaluation Metrics
Emerging Applications and Case Studies

Standard ICPR workshop facilities should be sufficient. The primary needs are reliable AV equipment and adequate space for anticipated attendance. If industrial partners plan equipment demonstrations, advance coordination would be needed, but this is optional and not essential for the workshop's success.

Workshop Website: tba

Fourth International Workshop on Artificial Intelligence for Healthcare Applications (AIHA) WS6

Overview

Most of the medical data collected from healthcare systems is recorded in digital format. The increased availability of these data has enabled numerous artificial intelligence applications. Specifically, machine learning can generate insights to improve the discovery of new therapeutics tools, to support diagnostic decisions, to help in the rehabilitation process, to name a few. Researchers, along with expert clinicians, can play an important role in turning complex medical data (e.g., genomic data, online acquisitions of physicians, medical imagery, etc.) into actionable knowledge that ultimately improves patient care. In recent years, these topics have drawn the attention of clinical and machine learning researchers, resulting in practical and successful applications in healthcare. These techniques have been deployed across various healthcare system levels with applications ranging from diagnosis to therapeutics.

AHIA also host a special session focusing on the latest advancements in AI-based solutions to enhance assessment and facilitate recovery in rehabilitation. The goal of this session is to highlight feasible approaches driven by real-world data for the development of practical clinical solutions.

The purpose of this workshop is to present recent advances in artificial intelligence techniques for healthcare applications. To bring together the advances in this wide and multidisciplinary subject, we propose a workshop that covers (but is not limited to) the following topics:

Biomedical image analysis;
Data analytics for healthcare;
Automatic disease prediction;
Automatic diagnosis support systems;
Genomic and proteomic data analysis;
Artificial intelligence for personalized medicine;
Machine Learning as a tool to support medical diagnoses and decisions;
Machine learning for diagnosis and rehabilitation;
Generative AI for healthcare;
Multimodal analysis of health data;
Explainability to support diagnosis;
Neural signal analysis for diagnosis assistance;
Physiological signals processing;
Gait analysis;
Therapy selection;
Brain-computer interface for healthcare;
Biomechatronics for medicine;
Neuromotor rehabilitation;
Machine learning approaches in rehabilitation;
Motion analysis for healthcare.

Workshop Website: tba

Pattern Recognition and Computer Vision for e-Heritage and Digital Humanities Workshop (PRESTIGE) WS7

Overview

The role of digital technologies in the preservation, analysis, and dissemination of cultural and historical heritage has become increasingly central. Pattern recognition, machine learning, and computer vision now play a key role in this transformation, providing innovative methodologies to digitize, restore, interpret, and enhance artworks, documents, and cultural sites. The digital age has not only broadened access to heritage collections but has also enabled new forms of analysis, revealing patterns and relationships that were previously inaccessible. At the same time, the convergence of technology and the humanities introduces new challenges and research questions. These range from the reliable digital reconstruction of artifacts and the interpretation of historical data using AI-driven methods, to issues of authorship, authenticity, and ethical considerations surrounding generative AI and cultural production. This workshop aims to address these emerging challenges by providing a dedicated forum for researchers, professionals, and practitioners to present advances, share experiences, and discuss future directions at the intersection of pattern recognition and cultural heritage applications. Topics of interest include, but are not limited to:

Computer vision and generative AI for art and cultural heritage
Automated analysis and transcription of historical manuscripts and documents
Digital acquisition, representation, and manipulation of cultural artifacts
Augmented and virtual reality for cultural heritage communication and education
Image processing, classification, retrieval, and similarity search in the art domain
Point cloud analysis and segmentation for heritage sites and objects
3D reconstruction of historical artifacts and architectural environments
Serious games, edutainment, and interactive storytelling for cultural heritage
Knowledge representation, ontology learning, and semantic modeling in cultural heritage
Robotic and autonomous systems for inspection, conservation, and preservation
Projects, prototypes, and digital tools for restoration, conservation, and heritage outreach

Workshop Website: available here

The Fourth International Workshop on Pattern Recognition in Healthcare Analytics and Bioinformatics (PRHA) WS8

Overview

The fourth PRHA workshop aims to continue showcasing the latest developments in pattern recognition for healthcare analytics. The scope of the workshop entails, but is not limited to, bioinformatics, phenotyping and subtyping, patient monitoring and machine learning in pervasive healthcare, temporal modeling for disease progression, interpretable models for clinical decision support, privacy-preserving techniques for distributed and sensitive patient data, medical image analysis, and disease progression modeling.

Biomedical informatics and bioinformatics are interdisciplinary fields leveraging machine learning, deep learning, and natural language processing techniques. Tasks and datasets in these fields constitute various challenges due to the complex and multimodal nature of data. Furthermore, interpretability and explainability have become inevitable requirements to consider while designing models to address bioinformatics and biomedical informatics tasks. With the increasing importance and widespread use of pattern recognition, applications such as vision, biometrics, and natural language processing have migrated to their respective communities. However, the biomedical informatics and bioinformatics studies within the machine learning and pattern recognition communities have been proliferating. While specific challenges posed by digital patient data, as well as gene and protein data, encourage novel ideas in the field of pattern recognition, achievements attained by pattern recognition techniques open new doors to a better understanding of the complex nature of data in healthcare and bioinformatics. The fourth PRHA will again take on the duty of providing a platform under the ICPR for the exchange and collaboration between pattern recognition and medical communities.

Workshop Website: tba

The Art of Compressing LLMs: Pruning, Distillation, and Quantization Demystified (CompressLLMs) WS9

Overview

This workshop explores the art and engineering of compressing large language models to make them cheaper, faster, and easier to deploy while preserving practical capability. Designed for a broad audience that spans beginners to advanced practitioners, the workshop will introduce foundational concepts for newcomers, share implementation patterns and pitfalls for experienced engineers, and highlight cutting-edge research directions for specialists.

Workshop Website: tba

The third International Workshop on Video Surveillance Systems in Smart Cities: Aerial Monitoring and Synthetic Data (V3SC 2026-3) WS10

Overview

We invite original research papers, case studies, and technical reports on (but not limited to) the following topics:

Integration of traditional CCTV systems with aerial images
Innovations in urban monitoring using aerial and satellite images
AI and machine learning applications in smart city surveillance
Data fusion and analytics for enhanced urban security
Privacy and ethical implications of widespread surveillance
Case studies of real-world implementations of urban surveillance
Real-time monitoring: applications in traffic management and anomaly detection (e.g., fires, critical events)
Use of synthetic images for training and validation of surveillance algorithms
Synthetic-to-Real domain adaptation and generalization for surveillance tasks
Foundation models and their applicability in video surveillance and analysis
Reducing bias and improving AI model accuracy through synthetic data strategies
Privacy preserving model training and deployment
Action recognition in surveillance systems
Enabling technologies for aerial monitoring

Workshop Website: available here

Perception, Interaction and Decision-Making for Human Cyber-Physical Systems (PIDM4HCPS) WS11

Overview

The workshop focuses on the emerging pipeline that connects synthetic data generation (digital twins, simulation-to-real) with downstream perception and decision-making tasks in robotics, remote sensing, and extended reality environments. Core topics include, but are not limited to:

Synthetic data generation and domain adaptation (sim-to-real, Digital Twin).
Multimodal data fusion and remote sensing.
6D pose estimation, semantic segmentation, and scene understanding.
Active perception (next-best-view, information-theoretic planning).
SLAM and long-term mapping in dynamic environments.
Embodied navigation and decision-making.
Real-time perception and interaction in the Metaverse and eXtended Reality.
Augmented perception and vision-based mixed reality.
Etc.

Workshop Website: tba

BIO-inspired Methods for Pattern Recognition (BIOMAP) WS12

Overview

Biological systems exhibit remarkable pattern recognition capabilities, achieved through evolutionary adaptation, self-organisation, and efficient information processing. These mechanisms inspire a growing class of computational approaches-drawing from evolutionary computation, swarm intelligence, artificial life, and neuromorphic computing-that offer new solutions beyond conventional optimisation or fixed-model learning. Recent advances demonstrate this potential: evolutionary algorithms automatically discover novel neural architectures, swarm-based methods enable distributed visual processing, and self-organising systems adapt to dynamic pattern sets in real-time. This workshop aims to present recent advances in bio-inspired techniques for pattern recognition applications and to bring together researchers and practitioners interested in exploiting the potential of such approaches. It will explore how principles from evolution, collective behaviour, and self-organisation can support robust, adaptive, and resource-efficient pattern recognition. Key questions include: How can evolutionary processes support the automated design of pattern recognition models? What insights does swarm intelligence offer for distributed perception? How do self-organising mechanisms maintain performance under shifting data conditions?

Topics:

Evolutionary neural architecture search;
Swarm intelligence for distributed pattern recognition;
Particle swarm optimisation for feature selection and extraction;
Bio-inspired algorithms for image processing and analysis
Artificial life and emergent pattern recognition
Evolutionary multi-objective optimisation in computer vision;
Evolutionary few-shot and zero-shot learning;
Bio-inspired algorithms for automatic data augmentation;
Evolutionary meta-learning;
Hybrid evolutionary-gradient methods
Neuromorphic and brain-inspired evolutionary systems;
Federated evolutionary learning;
Dynamic and online evolutionary adaptation;
Evolutionary robotics and embodied vision;
Quantum-inspired evolutionary pattern recognition
Evolutionary explainability and interpretability;
Bio-inspired hardware-software co-evolution;
DNA computing for pattern matching

Workshop Website: tba

Human-Centric Video Generation (HVG) WS13

Overview

The Workshop on Human-Centric Video Generation (HVG) will focus on advancing methodologies for synthesizing realistic 2D human videos guided by multimodal control signals (text, audio, pose), addressing critical challenges in temporal consistency, anatomical fidelity, and environmental interaction. Aligned with recent advancements in diffusion models, auto-regressive models and generative AI, this workshop emphasizes three core technical pillars: (1) conditional motion synthesis (text-to-action alignment, audio-driven co-speech gestures, pose-guided motion transfer), (2) quality assurance (occlusion handling, deformation mitigation, appearance consistency), and (3) evaluation frameworks (metrics for temporal coherence, action accuracy, and perceptual realism).

Throughout the workshop among others, we are aiming to address the following topics during the workshop:

Text-Driven Synthesis: Generating human videos from textual descriptions with accurate action-semantic alignment.
Audio-Gesture Synchronization: Modeling co-speech gesture dynamics from audio-visual correlations.
Pose-Guided Motion Transfer: Transferring source motion to target subjects while preserving identity and context.
Multi-Conditional Animation: Integrating heterogeneous control signals (e.g., text + pose + audio) for compositional generation.
Long-Form Synthesis: Maintaining consistency and diversity in extended video sequences.
Interactive Generation: Enabling real-time user control via sketches, prompts, or physical simulations.
Evaluation Frameworks: Developing metrics for temporal stability, biomechanical plausibility, and environmental interaction.
Dataset Curation: Constructing cross-modal datasets with annotated human motions and multi-view sequences.
3D-Consistent Avatars: Bridging 2D generation with 3D-aware representations for viewpoint-invariant synthesis.

Workshop Website: tba

Foundation Models for Vision Applications (FMVA) WS14

Overview

In recent years large scale visual-language models have seen an explosive rise in capability. As general models they offer exceptional performance on a range of downstream tasks not explicitly trained for. By necessity, the full capability of an architecture is often only expressed in terms of zero-shot performance on benchmark dataset tasks, such as ImageNet classification or ADE20K segmentation. The performance, generality, comprehension and prompt-sensitivity of the architectures in specific vision fields, such as biometrics or medical imaging, has room for exploration.

Visual Language Models have the potential to revolutionise these areas through direct application, novel systems design and explainability, leading to insights on future model development, and a more comprehensive understanding of the tasks they are applied to. This workshop aims to provide a platform for the effective utilisation of such architectures in all fields of computer vision with their varying requirements.

The workshop topics include (but are not limited to): Dataset development and curation; Metrics and benchmarking methodologies (performance, robustness, and fairness, etc.); Innovative applications and methodological advances for exploiting visual-language models; Multimodal learning and representation (for example synergy between vision and language); Biometric analysis and human-computer interaction; Biomedical imaging and bioinformatics; Vehicular traffic perception and analysis; Image, speech, and video processing; Explainable and privacy-preserving artificial intelligence.

Workshop Website: tba

PAVER: Workshop on Physics-Aware Video gEneration and Restoration (PAVER) WS15

Overview

This workshop aims to bring together researchers and practitioners from computer vision, machine learning, and physics-based modeling to discuss the latest advancements in video generation and restoration. The workshop will emphasize the importance of integrating physical constraints and real-world priors into generative video models to ensure realism, consistency, and applicability in diverse domains.

Physics-aware video generation and restoration are fundamental for applications where realistic motion, temporal consistency, and adherence to physical laws are critical, including:

Autonomous driving: Accurate motion forecasting and scene reconstruction.
Medical imaging: High-fidelity generation/restoration for better diagnostics.
Scientific simulations: Data-driven video generation for physics-based modeling.
Streaming, AR/VR, and gaming: Real-time video enhancement for immersive experiences.
Surveillance and forensics: Reconstruction of occluded or degraded video.
Infant/toddler monitoring: Detecting and restoring subtle movements in low-quality recordings.
Elderly care and assistive technology: Enhancing visibility and understanding of movement patterns.

As Generative AI (GenAI) advances, new challenges emerge, such as maintaining physical realism, enforcing temporal consistency, and ensuring generalization across domains. This workshop will explore cutting-edge research at the intersection of physics-aware modeling, deep learning, and generative AI to push the boundaries of video generation and restoration.

Workshop Website: tba

Second International Workshop on Eye Tracking Techniques, Applications and Challenges (ETTAC) WS16

Overview

Eye tracking technology is becoming increasingly widespread, thanks to the recent availability of cheap commercial devices, both remote and wearable. At the same time, novel techniques are continually pursued to enhance the precision of gaze detection, and new methods are continuously explored to fully leverage the potential of eye data. Regardless of the considered context (human-computer interaction, user behavior understanding, biometrics, or others), pattern recognition often plays a significant role. The purpose of the ETTAC2026 workshop is to present recent eye tracking research that directly or indirectly exploits any form of pattern recognition.

The event’s scope includes (but is not limited to) the following topics:

Gaze detection techniques
Gaze-based human-computer interaction (e.g., assistive technologies, hybrid interfaces, etc.)
Eye tracking and AI integration (e.g., combination of gaze and LLMs)
Eye tracking for usability inspection
Eye tracking in VR/AR applications
User behavior understanding from eye data
Eye movement analysis for biometrics, security, and privacy
Eye tracking in medicine and health care
Machine/deep learning for gaze analysis

Workshop Website: tba

Second International Workshop on Multi- and Cross-Modal Information for Enhanced Pattern Recognition (MCMI) WS17

Overview

Modern AI systems combine information from multiple sources (such as images, audio, and text) to enable better pattern recognition and understanding. MCMI brings together researchers from various fields, including audio processing, computer vision, and natural language processing, to share their work and ideas.

Topics:

Technological Advancements in Multimodal Systems:

Multimodal Fusion Strategies
Deep Learning Architectures for Multimodal Fusion
Cross-Modal Feature Extraction and Learning
Audio-Visual Speech Recognition
Scene and Object Recognition from Audio-Visual Cues
Temporal Dynamics in Multimodal Data Processing
Robustness and Generalization across Diverse Modalities

Applications and Interaction in Multimodal Systems:

Emotion Recognition in Multimodal Data
Natural Language Processing for Multimedia Content
Multimodal Datasets and Benchmarks
Novel Interaction Paradigms for Multimodal Data Acquisition

Security, Ethics, and Transparency in Multimodal Systems:

Ethical Considerations in Multimodal Data Processing
Privacy-Preserving Multimodal Learning
Explainable AI (XAI) in Multimodal Systems
Security and Robustness in Multimodal Systems
Adversarial Attacks and Defense in Multimodal Systems

Workshop Website: available here

Fourth workshop on Explainable and Ethical (AIXAIE@ICPR2026) WS18

Overview

The topics covered by the workshop are Naturally explainable AI methods, Post-Hoc Explanation methods of Deep Neural Networks, including transformers and Generative Artificial Intelligence, and any ethical consideration when using pattern recognition models.

Technical issues in explainability are related to the creation of explanations, their representation, as well as the quantification of their confidence, while those in AI ethics include automated audits, detection of bias, ability to control AI systems to prevent harm and others methods to improve AI explainability in general, including algorithms and evaluation methods, user interface and visualization for achieving more explainable and ethical AI, real world applications and case studies.

Workshop Website: available here

Trustworthy Document Understanding: Privacy, Unlearning, Robustness, and Explainability (TrustDoc) WS19

Overview

The workshop welcomes contributions on topics including (but not limited to):

Machine unlearning in document AI

unlearning for document image classification
unlearning for handwritten text recognition
unlearning for document visual question answering
unlearning for multimodal document understanding

Robustness in document image recognition systems

adversarial attacks & defenses
robustness to data noise, layout variation, and handwriting styles

Privacy in document image understanding

differential privacy for document image datasets
membership inference and training data leakage prevention
privacy-preserving OCR and document representation learning

Explainability and interpretability

explaining document model decisions at pixel, region, and semantic levels
visualization tools for understanding features learned from document images

Evaluation, benchmarks, and best practices

metrics and datasets for assessing unlearning, privacy, robustness, and explainability
standardized evaluation pipelines for trustworthy document intelligence

Applications and case studies

legal, financial, medical, and government documents
lifelong learning and compliance-driven document AI

Workshop Website: available here

Computer Vision for Biodiversity Monitoring and Conservation (CVBMC) WS20

Overview

The Workshop on Computer Vision for Biodiversity Monitoring and Conservation (CVBMC) aims to bridge the gap between advanced pattern recognition and ecological preservation. While computer vision has reached maturity in industrial and urban domains, its application to the natural world presents unique, high-dimensional challenges that require specialized technical approaches. The workshop will explore the deployment of state-of-the-art deep learning and AI methodologies to automate the extraction of biological information from unstructured visual datasets.

The subject is highly relevant to Pattern Recognition because biodiversity monitoring requires solving complex problems such as fine-grained visual categorization for species identification, often under conditions of high occlusion, varying illumination, and camouflage. Furthermore, several tasks in this domain (e.g., long-term population monitoring, study of behavioral changes) require handling of temporal data, thus increasing the dimensionality of the problem and calling for temporal-aware visual modeling.

Topics to be addressed include:

animal and plant species identification
organism tracking and movement analysis
land-cover mapping, deforestation, and habitat monitoring
classification of different organisms (e.g., by subspecies)
assessment of organism behavior or behavior changes
computer vision tools for ecological assessment
counting and biodiversity monitoring
analysis of terrestrial and underwater wildlife
ecosystems and conservation case studies

Workshop Website: available here

Multimodal pattern recognition for social signal processing in human computer interaction (MPRSS 2026) WS21

Overview

The workshop will provide participants with foundational and applied skills for developing advanced AI agents, including preparing diverse data types to be neural-network ready, understanding model fusion techniques and the distinctions between early, late, and intermediate fusion, and performing PDF data extraction using OCR. Participants will also learn to differentiate between modality orchestration and agent orchestration and gain hands-on experience customizing NVIDIA AI Blueprints—particularly the Video Search and Summarization (VSS) blueprint—to design and deploy powerful multimodal AI agents.

Workshop Website: available here

Advances in Underwater Surveillance: Technologies, Challenges, and Future Directions (AUSTech 2.0) WS22

Overview

The workshop focuses on recent developments in underwater surveillance, including underwater imaging, multimodal sensor fusion, acoustic–optical perception, generative data enhancement, graph-based reasoning, and autonomous underwater vehicles.

Topics include:

Underwater image enhancement, dehazing, and restoration
Object detection, segmentation, and tracking in underwater scenes
Sensor fusion using optical, sonar, and LiDAR data
AI-driven underwater communication and networking
Learning-based AUV navigation and marine robotics
Applications in environmental monitoring, Blue Economy, maritime security, and industrial inspection

Workshop Website: available here

Workshop on Textual Information Processing & Synthesis in the Wild (TIPS) WS23

Overview

In text analysis and synthesis areas many recent research topics like text editing in images and videos, vision languages models, text style transfer etc. are getting special attentions. The aim of this workshop is towards information processing and synthesis of textual data that may appear in normal text and different images and videos.

Topics of Interest include, but not limited to:

Text editing in images and Videos
Text style transfer
Vision Language models
Sentiment analysis from text
Personality traits detection
Natural Language Processing
Multimodal document understanding
Stylistic text recognition
Font analysis and synthesis
Detection of synthetic manipulation in documents, and document forensics
Analysis and interpretation of graphical documents
Recognition and Analysis of low-resource language
Complex Handwriting recognition
Historical document analysis
Text summarization and translation
Language model for document information extraction
Scene and video text detection and recognition
NLP+Vision multimodal approaches

Workshop Website: tba

Workshop on Human Gaze and Visual Attention Modeling (HGVA) WS24

Overview

Human attention modeling, including human gaze modeling, eye tracking, have been key areas of focus in computer vision and pattern recognition research over the past decade. These dynamic fields have evolved from various perspectives, shaped by the diverse expertise and objectives of researchers. Despite recent advancements in technologies such as large vision and language models, these innovations have yet to be fully integrated into current research. Additionally, while the potential applications of gaze estimation, attention prediction and eye tracking across images, videos, and audio are vast, there remains a lack of groundbreaking approaches that push the boundaries of the field.

The Workshop on Human Gaze and Visual Attention Modeling aims to address this gap by providing a platform for introducing novel ideas and methodologies in human attention modeling, and gaze estimation and eye tracking. The workshop will also explore the intersection of these fields with emerging domains, such as human-computer interaction (HCI), robotics, autonomous systems and medical image analysis. By highlighting the broader, yet still underexplored, opportunities of gaze and human-attention and eye tracking research in real-world applications, we aim to spark new interdisciplinary collaborations.

Through this workshop, we aim to encourage forward-looking discussions that move beyond conventional computer vision applications and address meaningful real-world challenges. Our goal is to inspire the next wave of research and broaden the impact of human visual attention, eye-tracking, and gaze-based methods across diverse industries and domains.

Main topics that will be covered in the workshop:

Computational Modeling of Human Visual Attention & Deep Learning for Visual Saliency
Advances in Eye-Tracking Technologies and Applications
Scanpath Prediction and Temporal Dynamics of Gaze in Video Understanding
Active Vision and Real-World Applications
Human Visual Perception for Computer Vision & Cognitive Modeling
Benchmarking, Evaluation, and Privacy-Preserving Methods for Gaze Analysis
Applications of Human Visual Attention in Medical Imaging & Vision-Based Interaction
Robotics and HCI: Eye-Gaze Interaction and Gaze-Based Interfaces

Workshop Website: available here

Reproducible Research in Pattern Recognition (RRPR) WS25

Overview

The topic is Reproducible Research (RR) in direct relation to the Pattern Recognition domains, corresponding to most of the ICPR topics where research results rely on algorithms and code implementation. This workshop is an important activity of the newly created TC-22, specifically on computational reproducibility. As in all previous editions, RRPR 2026 is intended as both a short participative course on computational reproducibility, leading to open discussions with the participants, and also as a practical workshop on how to actually practice RR. In addition, another key goal for gathering the research community is to further advance the scientific aspects of reproducibility specifically in pattern recognition research. The call for papers includes three main tracks: (1) RR frameworks (including experiences, frameworks, or complete platforms), (2) RR results focusing on the quality of the reproducible research results, and (3) short papers. RRPR short papers are tailored to allow authors the extra space needed to document the steps they have taken to make their regular ICPR submission more reproducible. Sometimes this valuable information is not made available in conference papers, but doing so certainly grows awareness and increases visibility for reproducibility as a core aspect of pattern recognition research. RRPR is also an excellent forum for ICPR authors to share and discuss best practices in reproducibility, thereby advancing the field.

Reproducibility is an important topic in general and particularly important for PhD students and young researchers to learn about the best practices in research, including not only the scientific paper, but also the source code and the data. The special track is also related to the Deep Learning and Geometry fields where reproducibility and reliability are key points. The RRPR workshop has special relevance in ICPR 2026 since this time the conference will grant a Reproducibility Badge to authors, to recognize and celebrate the effort of authors towards trustworthy reproducible research, transparency, and reliable science. A selection of papers will be presented in RRPR and all ICPR attendees are welcomed to participate in this workshop, for which we expect substantial discussions around computational reproducibility.

Workshop Website: available here

International Workshop on coMics ANalysis, Processing and Understanding (MANPU) WS26

Overview

This workshop deals with research activities about comics. MANPU workshop targets researchers in image processing, image analysis, pattern recognition and even knowledge representation. Indeed, the large variability of comic books and the complexity of their contents make comics analysis a kind of advanced problem of document image understanding or more generally knowledge-based scene recognition.

The topics of interest include among others:

Comics Image Processing
Comics Analysis and Understanding
Comics Recognition
Comics Retrieval and Spotting
Comics Enrichment
Born-digital comics
Reading Behavior Analysis
Comics Generation
Copy Protection – Fraud Detection
Physical/Digital Comics Interfaces

Workshop Website: tba

Sustainable PR & PR for Environment (GREEN-PR) WS27

Overview

GREEN-PR aims to group researchers working on more sustainable approaches in pattern recognition and/or recent developments of pattern recognition contributions in environmental applications.

Topics (non-exhaustive list):

Low-complexity and more energy-efficient PR approaches : lightweight models, sparse and modular representations, edge and federated learning, etc.
Development of pattern recognition for environmental applications : remote sensing and earth observation, climate and environmental monitoring, smart cities, wildlife and ecosystem models, agriculture, etc.
Efficient data-driven learning approaches : transfer learning, distillation, active learning, etc.
Lifecycle and sustainable AI frameworks, economical assessment, measure of carbon footprint, best practices for PR and AI developments.

Workshop Website: available here

Image Mining. Theory and Applications (IMTAX-2026) WS28

Overview

The main subject of the workshop in the broadest sense of the word is the assessment and discussion of the current state of the mathematical theory of image analysis and the prospects for its development, as well as its application to solve particularly important difficult and socially significant applied problems.

The main purposes of the IMTA workshops are:

1) to observe, overview and discuss state of the art in mathematical foundations of image-mining;
2) to provide the fusion of modern fundamental mathematical approaches and techniques for image analysis/pattern recognition with the requests of applications.

This workshop is intended to cover, but it is not limited to, the following topics:

Methodological and mathematical advances in image analysis and pattern recognition with a special focus on:
New Mathematical Techniques in Image-Mining
Image Models, Representations and Features
Automation of Image- and Data-Mining
Artificial Intelligence Techniques in Image-Mining
Applied Problems of Image-Mining

Workshop Website: available here

Pattern Recognition Challenges in Infectious Disease (PRCID) WS29

Overview

Bacterial infections are the second-leading cause of death globally. Antimicrobial resistance threatens the effectiveness of modern healthcare and is recognized by the WHO as one of the top global public health threats. Infections also lead to significant economic costs, further motivating the urgent need to develop novel techniques for monitoring spread, develop accessible and accurate diagnostics, and tracking information spread and societal impact. These urgent challenges all require different forms of pattern recognition. We envision a workshop covering three broad topics: (i) diagnostics based on biomedical image and video analysis, (ii) bioinformatics as a tool for surveillance and prediction of spread, and (iii) artificial intelligence for analysis of communication strategies and societal response.

Technical pattern recognition challenges central to each topic could for (i) for example be: DL for highly skewed datasets, rare case and anomaly detection, explainable AI, low computational demand networks for point of care diagnostics, real-time video analysis of pathogen growth, and domain adapted evaluation approaches. For (ii) for example: genomic sequence pattern analysis for resistance and emerging pathogen detection, GNNs for modeling transmission networks and co-occurrence networks for tracing resistance genes across populations, time series analysis of genomic data for modeling spread and mutations, and forecasting outbreaks. For (iii), for example: NLP for detection of misinformation patterns, information diffusion modeling, behavioural response analysis, and multi-modal AI for understanding how communication strategies affect the societal response in emergency situations.

Workshop Website: tba

Workshop on Generative and Agentic AI for Real-World Video Understanding (GenAAI 2026) WS30

Overview

Key Workshop Highlights – GenAAI-2026

1. Generative Video Models as Autonomous Reasoning Engines: This theme examines how diffusion-based video models, 3D-aware world models, and large multimodal transformers can serve as cognitive backbones for agentic systems. Participants will explore how generative priors improve temporal coherence, fill in missing frames, simulate potential outcomes, and support high-level reasoning over complex dynamic scenes. By integrating planning modules and feedback loops, generative agents can autonomously interpret human activities, detect anomalies, and propose future scene evolutions.

2. Agentic AI for Real-World Environments: Agentic AI introduces capabilities such as autonomous task decomposition, context-aware decision-making, and self-optimization. This segment focuses on how these properties enhance video understanding in challenging real-world conditions—crowded surveillance scenes, variable lighting, occlusions, abrupt motion patterns, and multi-agent interactions. Discussions will center on vision-language-action models, tool-use agents, embodied decision systems, and reinforcement learning frameworks designed to operate in unconstrained environments.

3. Spatiotemporal Representation Learning & Predictive Modeling: High-quality video understanding depends on structured spatiotemporal representations. This section explores world models, graph neural video architectures, motion-aware transformers, and predictive generative models that anticipate future states of complex scenes. Use cases include trajectory prediction, pedestrian intent estimation, environmental hazard forecasting, and robotics navigation. The session will also investigate multimodal fusion across video, LiDAR, IMU, and audio, enabling agents to form a holistic perception of real-world environments.

We invite submissions and participation under the theme “GenAAI 2026”, including but not limited to:

Agentic AI for Video Reasoning and Decision-Making
Video Models and Spatiotemporal Generative Representations
Vision-Language-Action Agents
Generative Trajectory Prediction and Scene Forecasting
Real-World Video Surveillance, Crowd Analytics, and Behavior Understanding
Video Anomaly Detection Using Generative Priors
Self-Supervised and Foundation Models for Long Video Sequences
Generative Augmentation, Reconstruction, and Inpainting in Videos
Few-shot and Zero-shot Video Understanding
Edge and Real-Time Deployment of Generative Agents

Workshop Website: tba