WMSM: An Efficient Real-Time Framework for Hebrew Sign Language Recognition and Sentence-Level Translation

Eyal Pasha; Dr. Galit Haim

Research

WMSM: Hebrew Sign Language recognition research with IEEE publication and product follow-through.

A closer look at the paper, the core results, and the path from research into Handibur.

By Eyal Pasha (אייל פשה) and Dr. Galit Haim · Published November 26, 2025 · Last updated April 9, 2026

Publication

WMSM: An Efficient Real-Time Framework for Hebrew Sign Language Recognition and Sentence-Level Translation

Eyal Pasha / Dr. Galit Haim

IEEE FLLM 2025 | November 26, 2025

Lead-author publication in IEEE FLLM 2025 on an efficient framework for real-time Hebrew Sign Language recognition and sentence-level translation.

WMSM replaces heavy sequence-to-sequence translation with an efficient word-level model and a sentence mechanism designed for real-time Hebrew Sign Language recognition. The architecture shows that x-y hand landmarks can be enough for accurate gesture recognition, while a custom loss function and temporal sentence assembly process keep predictions stable in live video.

Reached 97.29% word-level accuracy, 99.72% Top-3 accuracy, and 21.15 BLEU-4.
Reduced training to 24 GPU-hours, a 99.93% drop in computational cost compared with existing benchmarks.
Expanded 2,342 videos into 252,712 training samples through a three-stage augmentation pipeline.
Presented internationally in Vienna and Israel, then integrated into the Handibur iOS beta.

Open PDF IEEE ResearchGate Scholar ORCID

Context

Why the work stands out beyond the paper itself.

The interesting part is the mix of accuracy, efficiency, and real product follow-through.

Existing datasets for Israeli Sign Language (ISL) were extremely limited — only 2,342 source videos covered the target vocabulary. That scarcity made standard supervised training impractical without significant augmentation.

The three-stage augmentation pipeline addressed this by applying spatial transformations, temporal perturbations, and landmark noise injection to expand the training corpus to 252,712 high-quality samples. Each stage was designed to preserve the biomechanical plausibility of hand movements while introducing enough variation to prevent overfitting.

The key architectural insight is that spatial hand landmarks extracted via MediaPipe Hands carry sufficient discriminative information for accurate gesture recognition — full video frame processing is unnecessary. This dramatically reduces computational requirements, enabling the entire model to train in just 24 GPU-hours (a 99.93% reduction compared to Seq2Seq baselines). A custom loss function then maintains temporal sentence coherence during live video inference, preventing the fragmentation that standard classification losses introduce when translating continuous signing into discrete words and sentences.

The research was published at IEEE FLLM 2025 in Vienna on November 26, 2025, and was subsequently presented at academic conferences in Israel. Following publication, the WMSM model was integrated into Handibur, an iOS video chat application delivering real-time Hebrew Sign Language-to-text translation at 40 FPS — a project that secured 2nd place in a collegiate Shark Tank competition.

What makes it different

Word-level recognition + sentence mechanism replaces heavy Seq2Seq translation.
Landmark-only input (no full video frames) enables mobile-grade inference.
24 GPU-hours total training — 99.93% cheaper than existing approaches.
Live product deployment in Handibur iOS app at 40 FPS.
2nd place in collegiate Shark Tank competition with Handibur.

Academic context

Lead author: Eyal Pasha (אייל פשה). Co-author: Dr. Galit Haim.
Research conducted at The College of Management Academic Studies (COLMAN), Rishon LeZion, Israel.
Published at IEEE FLLM 2025, Vienna, Austria (November 26, 2025).
Available on IEEE Xplore, Google Scholar, ResearchGate, and ORCID.

More on the academic background, broader AI work, and production systems.

A few deeper reads across the rest of the work.

Israel

AI/ML engineer in Israel focused on Hebrew NLP and applied ML systems.

A closer look at Israel-based AI/ML work across Hebrew AI, applied machine learning, and production systems.

LegalTech AI

Hebrew NLP and LegalTech AI in production legal workflows.

A closer look at enterprise document intelligence, multi-model orchestration, and Hebrew-English AI delivery for cross-border legal workflows.

Academic

Academic AI research in Israel, from IEEE publication to interdisciplinary thesis.

A closer look at the publication record, international conference footprint, COLMAN background, and the interdisciplinary thesis.

Questions

A few things people usually ask about WMSM.

The short version of what the paper does, what it achieved, and how it carried into product work.

What is WMSM in plain language?

WMSM is a lightweight framework for Hebrew Sign Language recognition and sentence-level translation that was designed to stay accurate while remaining practical enough for real-time deployment.

What are the most important results from the research?

The work reached 97.29% word-level accuracy, 99.72% Top-3 accuracy, and 21.15 BLEU-4 while reducing training costs by 99.93% compared with heavier benchmark approaches.

Why does WMSM use hand landmarks instead of full video frames?

WMSM extracts spatial hand landmarks using MediaPipe Hands rather than processing full video frames. This design choice reduces the input dimensionality by orders of magnitude, which is what makes real-time inference at 40 FPS feasible on mobile hardware. The paper shows that landmark-based spatial features alone are sufficient for 97.29% word-level accuracy — full-frame video processing is unnecessary for gesture recognition when hand position data is precise enough. A custom loss function then preserves temporal sentence coherence during live video streaming, preventing the fragmentation that standard classification losses introduce when translating continuous sign language into discrete words and sentences.