TY - JOUR
T1 - Generalized substring compression
AU - Keller, Orgad
AU - Kopelowitz, Tsvi
AU - Landau Feibish, Shir
AU - Lewenstein, Moshe
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2014/3/13
Y1 - 2014/3/13
N2 - In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression more efficient. We focus our attention on generalized substring compression and present the first non-trivial correct algorithm for this problem. Inherent to our algorithm is a new method for finding the bounded longest common prefix of substrings, which may be of independent interest. In addition, we propose an efficient algorithm for substring compression which makes use of range successor queries. We present several tradeoffs for both problems. For compressing the substring S[i..j] (possibly with the substring S[α..β] as a context), the best query times we achieve are O(C) and O(Clog(j-i/C)) for substring compression query and generalized substring compression query, respectively, where C is the number of phrases encoded. A preliminary version of this paper has been presented in [21].
AB - In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression more efficient. We focus our attention on generalized substring compression and present the first non-trivial correct algorithm for this problem. Inherent to our algorithm is a new method for finding the bounded longest common prefix of substrings, which may be of independent interest. In addition, we propose an efficient algorithm for substring compression which makes use of range successor queries. We present several tradeoffs for both problems. For compressing the substring S[i..j] (possibly with the substring S[α..β] as a context), the best query times we achieve are O(C) and O(Clog(j-i/C)) for substring compression query and generalized substring compression query, respectively, where C is the number of phrases encoded. A preliminary version of this paper has been presented in [21].
KW - Data compression
KW - Lempel-Ziv compression
KW - Range searching
KW - Suffix tree
UR - http://www.scopus.com/inward/record.url?scp=84895918948&partnerID=8YFLogxK
U2 - 10.1016/j.tcs.2013.10.010
DO - 10.1016/j.tcs.2013.10.010
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84895918948
SN - 0304-3975
VL - 525
SP - 42
EP - 54
JO - Theoretical Computer Science
JF - Theoretical Computer Science
ER -