Agentic Multimodal Framework for Adaptive Sign Language Translation

Mohsin Sami; Saira Andleeb Gillani; Kashif Nasr; Rabia Tehseen

Authors

Mohsin Sami Department of Computer Science, University of Central Punjab, Lahore, Pakistan.
Saira Andleeb Gillani Department of Computer Science, University of Central Punjab, Lahore, Pakistan.
Kashif Nasr Department of Computer Science, University of Central Punjab, Lahore, Pakistan.
Rabia Tehseen Department of Computer Science, University of Central Punjab, Lahore, Pakistan.

Keywords:

Agentic AI, Multimodal Learning, Sign Language Translation, Adaptive Systems, Feedback Loops, Uncertainty-Aware Routing, Human-Centered Artificial Intelligence

Abstract

Sign Language Translation (SLT) is challenging because human communication is multimodal and context-dependent. Fixed approaches to SLT do not work because they do not account for differences among signers, varying light conditions, and other linguistic differences. This paper presents the Agentic Multimodal Framework for Adaptive Sign Language Translation (AMF-ASLT), a new self-adjusting architecture designed to incorporate agentic principles within multimodal translation. Forges the unique self-adjusting architecture bridging agentic principles within multimodal translation. The framework consists of a Perception Layer for feature extraction from RGB, depth, pose, and facial modalities; an Agentic Reasoning Layer with Gestural, Facial, and Linguistic Agents that work together to sustain a common Belief State; and a Translation Fusion Layer that recursively fuses modalities through dynamic fuses using adaptive weighted-averaging and uncertainty-driven routing frameworks. One Meta-Controller managing the continuous feedback loops helps the system to improve autonomously and pivots through intrinsic and extrinsic feedback from the user. Experiments conducted on the RWTH-PHOENIX-Weather 2024T, How2Sign, WLASL datasets and demonstrated signer adaptability with staunch improvements over the previous best with 4.4 BLEU points and 12% WER. The 12% WER reflects both signer adaptability, agentic self-evaluation, and feedback-driven refinement—fundamentally enhances translation robustness and contextual understanding. AMF-ASLT thus establishes a scalable foundation for human-centered, continuously learning sign language translation systems.

Agentic Multimodal Framework for Adaptive Sign Language Translation

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

SCOPUS

HJRS

ISSN

Online First

Call for Papers

Make a Submission

Open Access

Information

Conference

SC-2