Can automatic speech recognition be satisficing for audio/video search? Keyword-focused analysis of Hebrew automatic and manual transcription

Vered Silber-Varod, Nitza Geri

פרסום מחקרי: פרסום בכתב עתמאמרביקורת עמיתים

תקציר

With massive amounts of academic audio and video content over the web, it is important to assess the performance of state-of-the-art automatic speech recognition (ASR) systems for audio/video navigation through search queries.
This paper suggests a novel perspective of the challenges of ASR: instead of minimizing word error rates (WER), focus on keyword recognition. Focusing on keywords may be worthwhile for under-resourced languages, such as
Hebrew, which their ASR systems have not yet reached a satisfactory accuracy level of transcription. We provide an initial Proof of Concept by demonstrating the feasible use of ASR for achieving affordable mass transcription that enables satisficing keyword recognition of a video or an audio lecture via a search engine. A forty-minutes recording set, which includes audio books and academic lectures, is used for measuring the performance of two Hebrew ASR systems, and comparing them to stenographer recordings of the video lectures, while focusing on keyword recognition. Keyness tests show advantage of keyword recognition over key-phrases results, and stenographers' records exceed both engines. Yet, keyword recognition up to 78% was achieved, which suggests that ASR has reached a satisficing accuracy level that enables its use for searching audio/video content on the web.
שפה מקוריתאנגלית אמריקאית
עמודים (מ-עד)104-121
מספר עמודים18
כתב עתOnline Journal of Applied Knowledge Management
כרך2
מספר גיליון1
סטטוס פרסוםפורסם - 2014

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'Can automatic speech recognition be satisficing for audio/video search? Keyword-focused analysis of Hebrew automatic and manual transcription'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי