I have a set of elements of size about 100-200. Let a sample element be X
.
Each of the elements is a set of strings (number of strings in such a set is between 1 and 4). X
= {s1
, s2
, s3
}
For a given input string (about 100 characters), say P
, I want to test whether any of the X
is present in the string.
X
is present in P
iff for all s
belong to X
, s
is a substring of P
.
The set of elements is available for pre-processing.
I want this to be as fast as possible within Java. Possible approaches which do not fit my requirements:
- Checking whether all the strings
s
are substring ofP
seems like a costly operation - Because
s
can be any substring ofP
(not necessarily a word), I cannot use a hash of words - I cannot directly use regex as
s1
,s2
,s3
can be present in any order and all of the strings need to be present as substring
Right now my approach is to construct a huge regex out of each X
with all possible permutations of the order of strings. Because number of elements in X
<= 4, this is still feasible. It would be great if somebody can point me to a better (faster/more elegant) approach for the same.
Please note that the set of elements is available for pre-processing and I want the solution in java.