Jaro similarity
The Jaro similarity of two given strings and is
Where:
- is the length of the string ;
- is the number of matching characters (see below);
- is the number of transpositions (see below).
Jaro similarity score is 0 if the strings do not match at all, and 1 if they are an exact match. In the first step, each character of is compared with all its matching characters in . Two characters from and respectively, are considered matching only if they are the same and not farther than characters apart. For example, the following two nine character long strings, FAREMVIEL and FARMVILLE, have 8 matching characters. 'F', 'A' and 'R' are in the same position in both strings. Also 'M', 'V', 'I', 'E' and 'L' are within three (result of ) characters away.[3] If no matching characters are found then the strings are not similar and the algorithm terminates by returning Jaro similarity score 0.
If non-zero matching characters are found, the next step is to find the number of transpositions. Transposition is the number of matching characters that are not in the right order divided by two. In the above example between FAREMVIEL and FARMVILLE, 'E' and 'L' are the matching characters that are not in the right order. So the number of transposition is one.
Finally, plugging in the number of matching characters and number of transpositions the Jaro similarity of FAREMVIEL and FARMVILLE can be calculated,