Enriching audio databases with temporal information hidden in the acoustic signal

Nehory Carmi, Anat Lerner, Vered Silber-Varod

Research output: Contribution to conferenceAbstractpeer-review

Abstract

Audio information retrieval is an emerging field as users' needs for accurate search results are growing (Pradhan & Sharma, 2018), and as Media Assets Management (MAM) systems tackle with a massive amount of audio and video excerpts that need to be indexed. Previous research investigated the use of automatic speech recognition engine to search engine optimization (Silber-Varod, Winer, & Geri, 2016). However, conventional search engines cannot harness a potentially varied temporal information in videos, such as those exist in documentaries, for example. We believe existing techniques can be used to automatically tag a massive amount of cultural heritage videos with no (accurate) metadata concerning their temporal information. In this feasibility study, we investigate the possibility to carbon-dating (i.e., estimate the accurate period) a video, based on its acoustic signal, and by that, to offer an automatic enrichment of video databases, beyond the traditional manual meta-data. For this purpose, we use a collection of web media that combine present-days broadcasting with archive excerpts. For the current proof-of concept study, we chose broadcasts from 2010 that combine speech from 2010 and archive sections (with varied sounds, not necessarily speech) ranging from 1967-1973. The procedure included extraction of acoustic features, and clustering by k-mean methods, usually used for speaker recognition technology (Giannakopoulos and Pikrakis, 2014). We then compared the automatic results to a manual segmentation of the video into "new" or "old" sections. We succeeded in obtaining classification for 96% of video sections, using automatic technique of parsing the videos according to the two periods.
Original languageAmerican English
Pages29
Number of pages1
StatePublished - 2019
EventKnowledge Management Conference 2019 - Faculty of Economic Sciences, Warsaw University of Life Sciences (SGGW), Warsaw, Poland
Duration: 26 Jun 201929 Jun 2019

Conference

ConferenceKnowledge Management Conference 2019
Abbreviated titleKM Conference 2019
Country/TerritoryPoland
City Warsaw
Period26/06/1929/06/19

Fingerprint

Dive into the research topics of 'Enriching audio databases with temporal information hidden in the acoustic signal'. Together they form a unique fingerprint.

Cite this