まあ、これはちょっと醜いですが、うまくいきます。これは汎用 SQL であり、あらゆる環境で機能します。読み取り中のフィールドの最大長を超える部分文字列の選択を生成するだけです。関数内の数値 50 をフィールド長を超える数値に変更します。非常に長いクエリが返される場合がありますが、前述のとおり、問題なく動作します。Python での例を次に示します。
import sqlite3
c = sqlite3.connect('test.db')
c.execute('create table myTable (id integer, content varchar[50])')
for id, content in ((1,'apple'),(2,'pineapple'),(3,'application'),(4,'nation')):
c.execute('insert into myTable values (?,?)', [id,content])
c.commit();
def GenerateSQL(substrSize):
subqueries = ["select substr(content,%i,%i) AS substr, count(*) AS myCount from myTable where length(substr(content,%i,%i))=%i group by substr(content,%i,%i) " % (i,substrSize,i,substrSize,substrSize,i,substrSize) for i in range(50)]
sql = 'select substr FROM \n\t(' + '\n\tunion all '.join(subqueries) + ') \nGROUP BY substr HAVING sum(myCount) > 1'
return sql
print GenerateSQL(3)
print c.execute(GenerateSQL(3)).fetchall()
生成されたクエリは次のようになります。
select substr FROM
(select substr(content,0,3) AS substr, count(*) AS myCount from myTable where length(substr(content,0,3))=3 group by substr(content,0,3)
union all select substr(content,1,3) AS substr, count(*) AS myCount from myTable where length(substr(content,1,3))=3 group by substr(content,1,3)
union all select substr(content,2,3) AS substr, count(*) AS myCount from myTable where length(substr(content,2,3))=3 group by substr(content,2,3)
union all select substr(content,3,3) AS substr, count(*) AS myCount from myTable where length(substr(content,3,3))=3 group by substr(content,3,3)
union all select substr(content,4,3) AS substr, count(*) AS myCount from myTable where length(substr(content,4,3))=3 group by substr(content,4,3)
... )
GROUP BY substr HAVING sum(myCount) > 1
そして、それが生み出す結果は次のとおりです。
[(u'app',), (u'ati',), (u'ion',), (u'nat',), (u'pin',), (u'ple',), (u'ppl',), (u'tio',)]