Can automatic speech recognition be satisficing for audio/video search? Keyword-focused analysis of Hebrew automatic and manual transcription

Vered Silber-Varod, Nitza Geri

نتاج البحث: نشر في مجلةمقالةمراجعة النظراء

ملخص

With massive amounts of academic audio and video content over the web, it is important to assess the performance of state-of-the-art automatic speech recognition (ASR) systems for audio/video navigation through search queries.
This paper suggests a novel perspective of the challenges of ASR: instead of minimizing word error rates (WER), focus on keyword recognition. Focusing on keywords may be worthwhile for under-resourced languages, such as
Hebrew, which their ASR systems have not yet reached a satisfactory accuracy level of transcription. We provide an initial Proof of Concept by demonstrating the feasible use of ASR for achieving affordable mass transcription that enables satisficing keyword recognition of a video or an audio lecture via a search engine. A forty-minutes recording set, which includes audio books and academic lectures, is used for measuring the performance of two Hebrew ASR systems, and comparing them to stenographer recordings of the video lectures, while focusing on keyword recognition. Keyness tests show advantage of keyword recognition over key-phrases results, and stenographers' records exceed both engines. Yet, keyword recognition up to 78% was achieved, which suggests that ASR has reached a satisficing accuracy level that enables its use for searching audio/video content on the web.
اللغة الأصليةإنجليزيّة أمريكيّة
الصفحات (من إلى)104-121
عدد الصفحات18
دوريةOnline Journal of Applied Knowledge Management
مستوى الصوت2
رقم الإصدار1
حالة النشرنُشِر - 2014

بصمة

أدرس بدقة موضوعات البحث “Can automatic speech recognition be satisficing for audio/video search? Keyword-focused analysis of Hebrew automatic and manual transcription'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا