Task Grouping for Multilingual Text Recognition

Jing Huang, Kevin J. Liang, Rama Kovvuri, Tal Hassner

פרסום מחקרי: פרק בספר / בדוח / בכנספרסום בספר כנסביקורת עמיתים

תקציר

Most existing OCR methods focus on alphanumeric characters due to the popularity of English and numbers, as well as their corresponding datasets. On extending the characters to more languages, recent methods have shown that training different scripts with different recognition heads can greatly improve the end-to-end recognition accuracy compared to combining characters from all languages in the same recognition head. However, we postulate that similarities between some languages could allow sharing of model parameters and benefit from joint training. Determining language groupings, however, is not immediately obvious. To this end, we propose an automatic method for multilingual text recognition with a task grouping and assignment module using Gumbel-Softmax, introducing a task grouping loss and weighted recognition loss to allow for simultaneous training of the models and grouping modules. Experiments on MLT19 lend evidence to our hypothesis that there is a middle ground between combining every task together and separating every task that achieves a better configuration of task grouping/separation.

שפה מקוריתאנגלית
כותר פרסום המארחComputer Vision – ECCV 2022 Workshops, Proceedings
עורכיםLeonid Karlinsky, Tomer Michaeli, Ko Nishino
מוציא לאורSpringer Science and Business Media Deutschland GmbH
עמודים297-313
מספר עמודים17
מסת"ב (מודפס)9783031250682
מזהי עצם דיגיטלי (DOIs)
סטטוס פרסוםפורסם - 2023
אירוע17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, ישראל
משך הזמן: 23 אוק׳ 202227 אוק׳ 2022

סדרות פרסומים

שםLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
כרך13804 LNCS
ISSN (מודפס)0302-9743
ISSN (אלקטרוני)1611-3349

כנס

כנס17th European Conference on Computer Vision, ECCV 2022
מדינה/אזורישראל
עירTel Aviv
תקופה23/10/2227/10/22

הערה ביבליוגרפית

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'Task Grouping for Multilingual Text Recognition'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי