[ Pobierz całość w formacie PDF ]
.levenshteinPodręcznik PHPPoprzedniNastępnylevenshtein (PHP 3>= 3.17, PHP 4 )levenshtein -- Calculate Levenshtein distance between two stringsDescriptionint levenshtein (string str1, string str2)int levenshtein (string str1, string str2, int cost_ins, int cost_rep, int cost_del)int levenshtein (string str1, string str2, function cost)
This function returns the Levenshtein-Distance between thetwo argument strings or -1, if one of the argument stringsis longer than the limit of 255 characters (255 should bemore than enough for name or dictionary comparison, andnobody serious would be doing genetic analysis with PHP).
The Levenshtein distance is defined as the minimal number ofcharacters you have to replace, insert or delete to transformstr1 into str2.The complexity of the algorithm is O(m*n),where n and m are thelength of str1 andstr2 (rather good when compared tosimilar_text(), which is O(max(n,m)**3),but still expensive).
In its simplest form the function will take only the twostrings as parameter and will calculate just the number ofinsert, replace and delete operations needed to transformstr1 into str2.A second variant will take three additional parameters thatdefine the cost of insert, replace and delete operations.Thisis more general and adaptive than variant one, but not asefficient.The third variant (which is not implemented yet) will be the mostgeneral and adaptive, but also the slowest alternative.It willcall a user-supplied function that will determine the cost forevery possible operation.
The user-supplied function will be called with the followingarguments:
operation to apply: 'I', 'R' or 'D'
actual character in string 1
actual character in string 2
position in string 1
position in string 2
remaining characters in string 1
remaining characters in string 2The user-supplied function has to return a positive integerdescribing the cost for this particular operation, but it maydecide to use only some of the supplied arguments.The user-supplied function approach offers the possibility totake into account the relevance of and/or difference betweencertain symbols (characters) or even the context those symbolsappear in to determine the cost of insert, replace and deleteoperations, but at the cost of losing all optimizations doneregarding cpu register utilization and cache misses that havebeen worked into the other two variants.
See also soundex(),similar_text(), andmetaphone().PoprzedniSpis treściNastępnyjoinPoczątek rozdziałulocaleconv
[ Pobierz całość w formacie PDF ]