因為後綴數組可以由後綴樹進行遍歷轉換而來,而構造後綴樹僅需花費線性的時間,所以構造後綴數組的時間可為線性\ord N,但是後綴數組本身就是為了減少構造後綴樹的空間與代碼量而被發明出來的,直接由後綴樹轉換是沒意義的
但是仍然有其他線性構造後綴數組的方法,像是DC3、SA-IS等會在下一篇介紹,這次要講的是比較簡單常用的方法-----倍增法
關於後綴數組的使用說明可以參考《后缀数组——处理字符串的有力工具》
關於倍增法的說明可以參考 演算法筆記-SuffixArray 的部分
這邊提供\ord{N*logN*logN}及\ord{NlogN}的模板
\ord{N*logN*logN}:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include<algorithm> | |
struct CMP{ | |
int len,k,*rank,a,b; | |
inline bool operator()(int i,int j){ | |
if(rank[i]!=rank[j])return rank[i]<rank[j]; | |
a=(i+=k)<len?rank[i]:-1; | |
b=(j+=k)<len?rank[j]:-1; | |
return a<b; | |
} | |
}; | |
inline void suffix_array(const char *s,int len,int *sa,int *rank,int *tmp){ | |
for(int i=0;i<len;++i){ | |
sa[i]=i; | |
rank[i]=s[i]; | |
} | |
CMP cmp={len,1}; | |
for(;;cmp.k<<=1){ | |
cmp.rank=rank; | |
std::sort(sa,sa+len,cmp); | |
tmp[sa[0]]=0; | |
for(int i=1;i<len;++i)tmp[sa[i]]=tmp[sa[i-1]]+cmp(sa[i-1],sa[i]); | |
if(tmp[sa[len-1]]==len-1)break; | |
std::swap(rank,tmp); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include<algorithm> | |
#define radix_sort(x,y){\ | |
for(i=0;i<A;++i)c[i]=0;\ | |
for(i=0;i<len;++i)c[x[y[i]]]++;\ | |
for(i=1;i<A;++i)c[i]+=c[i-1];\ | |
for(i=len-1;i>=0;--i)sa[--c[x[y[i]]]]=y[i];\ | |
} | |
#define sac(r,a,b) r[a]!=r[b]||a+k>=len||r[a+k]!=r[b+k] | |
inline void suffix_array(const char *s,int len,int *sa,int *rank,int *tmp,int *c){ | |
int A='z'+1,i,k,id=0; | |
for(i=0;i<len;++i)rank[tmp[i]=i]=s[i]; | |
radix_sort(rank,tmp); | |
for(k=1;id<len-1;k<<=1){ | |
for(id=0,i=len-k;i<len;++i)tmp[id++]=i; | |
for(i=0;i<len;++i)if(sa[i]>=k)tmp[id++]=sa[i]-k; | |
radix_sort(rank,tmp); | |
std::swap(rank,tmp); | |
for(rank[sa[0]]=id=0,i=1;i<len;++i) | |
rank[sa[i]]=id+=sac(tmp,sa[i-1],sa[i]); | |
A=id+1; | |
} | |
} | |
#undef radix_sort | |
#undef sac |
所需的陣列長度只需要與字串陣列相同即可
當然N*logN*logN的做法會比較值觀,NlogN的方法則是利用radix_sort進行的,radix_sort本來在倍增的時候要先排序第一關鍵字跟第二關鍵字,但是第二關鍵字排序的結果可以用已經求好的SA直接求出來
對radix_sort還不了解的人請看這個網頁:https://www.cs.usfca.edu/~galles/visualization/RadixSort.html
如果想對Suffix Array算法進行測試請使用這個Online Judge:
http://www.spoj.com/problems/SARRAY/
沒有留言:
張貼留言