1

Say I had a reference string

"abcdabcd"

and a target string

"abcdabEd"

Is there a simple way in javascript and python to get the string sequence similarity ratio?

Example:

"abcdabcd" differs from "abcdabEd" by the character "E" so the ratio of similarity is high but less than 1.0

"bcdabcda" differs from "abcdabEd" greatly because every character at a specific string index is different so the similarity ratio is 0.0

note that the similarity ratio is not how many similar characters are in each string but how similar the sequences are from each other

therefore code like

# python - incorrect for this problem 
difflib.SequenceMatcher(None, "bcdabcda", "abcdabEd").ratio()

would be wrong

4

2 に答える 2

2

You can use this general formula, it works with strings or arrays of objects with the same or different lengths:

similarity=#common/(sqrt(nx*ny));

where #common are the common occurrences (in this case the number of matching characters);
nx is the length of the array of objects x (or the string called x);
ny is the length of the array of objects y (or the string called y).

If the length of the strings is the same that formula reduces to the simple case:

similarity=#common/n;
where: n=nx=ny.

In python this formula for similarity of strings (considering the order of characters, as you want) can be written as:

from math import sqrt

def similarity(x, y):
    n=min(len(x), len(y))
    common=0
    for i in range(n):
        if (x[i]==y[i]):
            common+=1
    return common/sqrt(len(x)*len(y))

and in javascript it's analogous.

于 2013-02-25T22:10:08.943 に答える
1

how bout

float(sum([a==b for a,b in zip(my_string1,my_string2)]))/len(my_string1)



>>> s1,s2 = "abcdabcd","abcdabEd"
>>> print float(sum([a==b for a,b in zip(s1,s2)]))/len(s1)
0.875
于 2013-02-25T21:38:15.107 に答える