<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Coordable (Articles sur data-quality)</title><link>https://coordable.co/</link><description></description><atom:link href="https://coordable.co/fr/categories/cat_data-quality.xml" rel="self" type="application/rss+xml"></atom:link><language>fr</language><copyright>Contents © 2026 &lt;a href="mailto:contact@coordable.co"&gt;Nikola Tesla&lt;/a&gt; </copyright><lastBuildDate>Tue, 28 Apr 2026 16:26:40 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>String Distance Metrics for Address Comparison: Levenshtein, Damerau, Jaro, and Jaro-Winkler</title><link>https://coordable.co/fr/blog/string-distance-metrics-address-comparison/</link><dc:creator>François Andrieux</dc:creator><description>&lt;h3 id="comparing-addresses-is-not-a-simple-equality-check"&gt;Comparing addresses is not a simple equality check&lt;/h3&gt;
&lt;p&gt;When you compare two address strings to decide if they refer to the same location, an exact match is rarely enough. Real-world addresses come with typos, abbreviations, inconsistent casing, and missing components. "15 Baker St" and "15 Baker Street" are the same address, but &lt;code&gt;"15 Baker St" == "15 Baker Street"&lt;/code&gt; evaluates to &lt;code&gt;False&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;String distance metrics can be used to solve this problem. They assign a number to two strings that quantifies how different (or similar) they are. This number lets you set a threshold: "if the distance is less than 2, or the similarity is above 0.9, treat them as the same address."&lt;/p&gt;
&lt;p&gt;In this guide, we cover the four most widely used metrics for address comparison:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Levenshtein distance&lt;/strong&gt; - the classic edit distance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Damerau-Levenshtein distance&lt;/strong&gt; - edit distance with transpositions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Jaro similarity&lt;/strong&gt; - character matching with position awareness&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Jaro-Winkler similarity&lt;/strong&gt; - Jaro with a bonus for matching prefixes&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For each one, we explain how it works, walk through a concrete address example, and discuss when it is (and is not) a good fit. At the end, a summary table and practical recommendations.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Table of contents:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="toc"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#comparing-addresses-is-not-a-simple-equality-check"&gt;Comparing addresses is not a simple equality check&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#a-quick-note-before-diving-in"&gt;A quick note before diving in&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#1-levenshtein-distance"&gt;1. Levenshtein distance&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#how-it-works"&gt;How it works&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#address-example"&gt;Address example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#in-python"&gt;In Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#pros-and-cons"&gt;Pros and cons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#when-to-use-it-for-addresses"&gt;When to use it for addresses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#2-damerau-levenshtein-distance"&gt;2. Damerau-Levenshtein distance&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#how-it-works_1"&gt;How it works&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#address-example_1"&gt;Address example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#in-python_1"&gt;In Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#pros-and-cons_1"&gt;Pros and cons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#when-to-use-it-for-addresses_1"&gt;When to use it for addresses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#3-jaro-similarity"&gt;3. Jaro similarity&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#how-it-works_2"&gt;How it works&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#address-example_2"&gt;Address example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#in-python_2"&gt;In Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#pros-and-cons_2"&gt;Pros and cons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#when-to-use-it-for-addresses_2"&gt;When to use it for addresses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#4-jaro-winkler-similarity"&gt;4. Jaro-Winkler similarity&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#how-it-works_3"&gt;How it works&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#address-example_3"&gt;Address example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#in-python_3"&gt;In Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#pros-and-cons_3"&gt;Pros and cons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#when-to-use-it-for-addresses_3"&gt;When to use it for addresses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#summary-table"&gt;Summary table&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#normalization-the-step-that-matters-more-than-metric-choice"&gt;Normalization: the step that matters more than metric choice&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#putting-it-all-together-a-complete-address-comparison"&gt;Putting it all together: a complete address comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://coordable.co/fr/blog/string-distance-metrics-address-comparison/#where-to-go-from-here"&gt;Where to go from here&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;hr&gt;
&lt;h3 id="a-quick-note-before-diving-in"&gt;A quick note before diving in&lt;/h3&gt;
&lt;p&gt;There are two flavors of metric:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Distance metrics&lt;/strong&gt; (Levenshtein, Damerau-Levenshtein): return 0 for identical strings, and a higher integer for more different strings. Think of them as counting "how many operations to fix this?"&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Similarity metrics&lt;/strong&gt; (Jaro, Jaro-Winkler): return 1.0 for identical strings, and a lower value (down to 0.0) for more different strings. Think of them as "what fraction of characters match?"&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither type is universally better. Which one to use depends on your data and your use case.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="1-levenshtein-distance"&gt;1. Levenshtein distance&lt;/h3&gt;
&lt;h4 id="how-it-works"&gt;How it works&lt;/h4&gt;
&lt;p&gt;The Levenshtein distance between two strings is the &lt;strong&gt;minimum number of single-character edits&lt;/strong&gt; needed to transform one into the other. The allowed operations are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Insertion&lt;/strong&gt;: add a character (e.g., "Streeet" - one character - to "Street")&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deletion&lt;/strong&gt;: remove a character (e.g., "Streeet" + one deletion - to "Street")&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Substitution&lt;/strong&gt;: replace one character with another (e.g., "Straat" - replace 'a' with 'e' - to "Street")&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A distance of 0 means the strings are identical. A distance of 1 means a single operation is enough.&lt;/p&gt;
&lt;h4 id="address-example"&gt;Address example&lt;/h4&gt;
&lt;p&gt;Compare &lt;strong&gt;"15 baker srteet"&lt;/strong&gt; (a common keyboard transposition typo) to &lt;strong&gt;"15 baker street"&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="mf"&gt;15&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;baker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;srteet&lt;/span&gt;
&lt;span class="mf"&gt;15&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;baker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;street&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To fix "srteet" into "street", Levenshtein needs &lt;strong&gt;2 substitutions&lt;/strong&gt;: replace 'r' with 't' at position 9, and 't' with 'r' at position 10.&lt;/p&gt;
&lt;p&gt;Distance = &lt;strong&gt;2&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This is technically correct, but it feels like a lot for what is clearly a single typing mistake. We will fix this in the next section.&lt;/p&gt;
&lt;h4 id="in-python"&gt;In Python&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;jellyfish&lt;/span&gt;

&lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;levenshtein_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"15 baker srteet"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"15 baker street"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → 2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Install with &lt;code&gt;pip install jellyfish&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id="pros-and-cons"&gt;Pros and cons&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple and well understood&lt;/td&gt;
&lt;td&gt;Counts a transposition (swap of two adjacent chars) as 2 operations instead of 1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Handles insertions and deletions well (great for abbreviations like "St" vs "Street")&lt;/td&gt;
&lt;td&gt;Raw distance is hard to compare across string pairs of different lengths&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Available in virtually every language and library&lt;/td&gt;
&lt;td&gt;Does not give extra weight to matching prefixes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast on short strings&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4 id="when-to-use-it-for-addresses"&gt;When to use it for addresses&lt;/h4&gt;
&lt;p&gt;Levenshtein is a solid default, especially when the main source of variation is abbreviations or missing characters. Set a relative threshold to make comparisons fair across different string lengths:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;normalized_levenshtein&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;levenshtein_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Treat as same address if &amp;lt; 15% different&lt;/span&gt;
&lt;span class="n"&gt;normalized_levenshtein&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"15 baker street"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"15 baker st"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# → 0.267&lt;/span&gt;
&lt;span class="n"&gt;normalized_levenshtein&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"15 baker street"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"15 baker sreet"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# → 0.067&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h3 id="2-damerau-levenshtein-distance"&gt;2. Damerau-Levenshtein distance&lt;/h3&gt;
&lt;h4 id="how-it-works_1"&gt;How it works&lt;/h4&gt;
&lt;p&gt;Damerau-Levenshtein extends Levenshtein by adding one more operation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Transposition&lt;/strong&gt;: swap two adjacent characters (e.g., "srteet" to "street" - swap 'r' and 't')&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This single addition makes a big practical difference. Studies of typing errors show that transpositions (adjacent character swaps) are one of the most frequent mistakes people make when typing by hand. Recognizing them as a single error rather than two substitutions gives a much more realistic picture of "how many mistakes did the user make?"&lt;/p&gt;
&lt;h4 id="address-example_1"&gt;Address example&lt;/h4&gt;
&lt;p&gt;Same pair as before: &lt;strong&gt;"15 baker srteet"&lt;/strong&gt; vs &lt;strong&gt;"15 baker street"&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="mf"&gt;15&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;baker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;srteet&lt;/span&gt;
&lt;span class="mf"&gt;15&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;baker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;street&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="o"&gt;^^&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;←&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;rt&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;transposed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kr"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="n"&gt;tr&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Damerau-Levenshtein distance = &lt;strong&gt;1&lt;/strong&gt; (one transposition).&lt;/p&gt;
&lt;p&gt;Compare this to the Levenshtein distance of 2 for the same pair. One error, one operation - much more intuitive.&lt;/p&gt;
&lt;p&gt;Another example: &lt;strong&gt;"rue de al paix"&lt;/strong&gt; vs &lt;strong&gt;"rue de la paix"&lt;/strong&gt; (the words "al" and "la" are swapped - a common cut-and-paste error):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;rue de al paix
rue de la paix
       ^^       ← 'a' and 'l' transposed
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Levenshtein = 2, Damerau-Levenshtein = 1.&lt;/p&gt;
&lt;h4 id="in-python_1"&gt;In Python&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;jellyfish&lt;/span&gt;

&lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;damerau_levenshtein_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"15 baker srteet"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"15 baker street"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → 1&lt;/span&gt;

&lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;damerau_levenshtein_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"rue de al paix"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"rue de la paix"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → 1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="pros-and-cons_1"&gt;Pros and cons&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;More realistic for human typing errors&lt;/td&gt;
&lt;td&gt;Slightly more complex to implement from scratch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transpositions (very frequent mistakes) cost 1, not 2&lt;/td&gt;
&lt;td&gt;Same issue with raw distance across different string lengths&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strict superset of Levenshtein - always &amp;lt;= Levenshtein distance&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4 id="when-to-use-it-for-addresses_1"&gt;When to use it for addresses&lt;/h4&gt;
&lt;p&gt;Prefer Damerau-Levenshtein over plain Levenshtein whenever your input comes from users typing addresses manually (web forms, search boxes, mobile apps). It gives a better answer to the question "how careless was this input?" and lets you set tighter thresholds while still tolerating common slip-of-the-finger errors.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="3-jaro-similarity"&gt;3. Jaro similarity&lt;/h3&gt;
&lt;h4 id="how-it-works_2"&gt;How it works&lt;/h4&gt;
&lt;p&gt;Jaro takes a completely different approach. Instead of counting edit operations, it measures the &lt;strong&gt;proportion of characters that match&lt;/strong&gt; between two strings, with a penalty for characters that are far apart or out of order.&lt;/p&gt;
&lt;p&gt;Two characters are considered matching if:
1. They are the same character, and
2. Their positions are not too far apart. The maximum allowed distance between two matching characters is: &lt;code&gt;floor(max(len(s1), len(s2)) / 2) - 1&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Once you count the matching characters &lt;code&gt;m&lt;/code&gt; and the number of matching characters that are out of order &lt;code&gt;t&lt;/code&gt; (transpositions), the Jaro similarity is:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Jaro(s1, s2) = (1/3) * (m / |s1| + m / |s2| + (m - t) / m)
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The result is between 0.0 (nothing matches) and 1.0 (identical).&lt;/p&gt;
&lt;h4 id="address-example_2"&gt;Address example&lt;/h4&gt;
&lt;p&gt;Compare &lt;strong&gt;"church rd"&lt;/strong&gt; (abbreviated) to &lt;strong&gt;"church road"&lt;/strong&gt; (full form):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;s1 = "church rd" (9 characters), s2 = "church road" (11 characters)&lt;/li&gt;
&lt;li&gt;Matching window = floor(11 / 2) - 1 = 4&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Every character in "church rd" can be matched to a character in "church road" within the window: c, h, u, r, c, h, (space), r, d all appear in the same relative order. So m = 9, t = 0.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Jaro = (1/3) &lt;span class="gs"&gt;* (9/9 + 9/11 + 9/9)&lt;/span&gt;
&lt;span class="gs"&gt;     = (1/3) *&lt;/span&gt; (1.000 + 0.818 + 1.000)
     ≈ 0.939
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A similarity of 0.939 for what is clearly the same street, just abbreviated. That is the right answer.&lt;/p&gt;
&lt;h4 id="in-python_2"&gt;In Python&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;jellyfish&lt;/span&gt;

&lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jaro_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"church rd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"church road"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → ~0.939&lt;/span&gt;

&lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jaro_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"main street"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"mane street"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# typo: i→e&lt;/span&gt;
&lt;span class="c1"&gt;# → ~0.970&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="pros-and-cons_2"&gt;Pros and cons&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Normalized score between 0 and 1 - easy to threshold&lt;/td&gt;
&lt;td&gt;Formula is less intuitive than edit distance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Handles abbreviations and length differences naturally&lt;/td&gt;
&lt;td&gt;Can give surprisingly high scores to long strings with many common characters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Well suited for comparing individual address components&lt;/td&gt;
&lt;td&gt;Does not give extra weight to matching prefixes - we fix this with Jaro-Winkler&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4 id="when-to-use-it-for-addresses_2"&gt;When to use it for addresses&lt;/h4&gt;
&lt;p&gt;Jaro is a good fit for comparing &lt;strong&gt;individual address fields&lt;/strong&gt; (just the street name, just the city) rather than full concatenated address strings. It is also useful for detecting near-duplicate records in a database.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="4-jaro-winkler-similarity"&gt;4. Jaro-Winkler similarity&lt;/h3&gt;
&lt;h4 id="how-it-works_3"&gt;How it works&lt;/h4&gt;
&lt;p&gt;Jaro-Winkler builds on Jaro by adding a &lt;strong&gt;prefix bonus&lt;/strong&gt;: if two strings share the same first characters, their similarity is boosted. The motivation is that strings that start the same way are more likely to refer to the same thing than strings that diverge immediately.&lt;/p&gt;
&lt;p&gt;The formula is:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;JaroWinkler(s1, s2) = Jaro(s1, s2) + l &lt;span class="gs"&gt;* p *&lt;/span&gt; (1 - Jaro(s1, s2))
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Where:
- &lt;code&gt;l&lt;/code&gt; = length of the common prefix, up to a maximum of 4 characters
- &lt;code&gt;p&lt;/code&gt; = scaling factor, conventionally 0.1&lt;/p&gt;
&lt;p&gt;The effect is a modest upward adjustment when the prefix matches. With &lt;code&gt;p = 0.1&lt;/code&gt; and &lt;code&gt;l = 4&lt;/code&gt;, the maximum bonus is &lt;code&gt;+4 * 0.1 * (1 - Jaro)&lt;/code&gt;, which boosts a Jaro of 0.90 by at most 0.04. Small but meaningful for ranking candidates.&lt;/p&gt;
&lt;h4 id="address-example_3"&gt;Address example&lt;/h4&gt;
&lt;p&gt;Take &lt;strong&gt;"church rd"&lt;/strong&gt; vs &lt;strong&gt;"church road"&lt;/strong&gt; again (Jaro = 0.939, common prefix "chur", l = 4):&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;JaroWinkler = 0.939 + 4 &lt;span class="gs"&gt;* 0.1 *&lt;/span&gt; (1 - 0.939)
            = 0.939 + 0.4 * 0.061
            ≈ 0.963
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now consider two streets that differ at the very beginning:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;"avenue du general leclerc"&lt;/strong&gt; vs &lt;strong&gt;"boulevard du general leclerc"&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These share no prefix at all (l = 0), so Jaro-Winkler equals Jaro exactly. No bonus is applied, and the score is lower than it would be for two strings with the same beginning. This is exactly the intended behavior: a difference at the start is a stronger signal of a mismatch than a difference in the middle.&lt;/p&gt;
&lt;p&gt;Now consider a house number mismatch:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;"12 rue de la paix"&lt;/strong&gt; vs &lt;strong&gt;"21 rue de la paix"&lt;/strong&gt;: the prefix is "1" vs "2", so l = 0. The score penalizes the mismatch at the start, which is what we want - these are two completely different addresses.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="in-python_3"&gt;In Python&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;jellyfish&lt;/span&gt;

&lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jaro_winkler_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"church rd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"church road"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → ~0.963&lt;/span&gt;

&lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jaro_winkler_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"12 rue de la paix"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"21 rue de la paix"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → ~0.980  (lower score due to no prefix match)&lt;/span&gt;

&lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jaro_winkler_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"12 rue de la paix"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"12 rue de la pai"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → ~0.988  (strong prefix, only one char missing at end)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="pros-and-cons_3"&gt;Pros and cons&lt;/h4&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prefix bonus aligns with how we read addresses (number first, then street name)&lt;/td&gt;
&lt;td&gt;Prefix bonus requires consistent formatting - mixed case will kill the bonus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Often more accurate than plain Jaro for addresses and proper names&lt;/td&gt;
&lt;td&gt;&lt;code&gt;p = 0.1&lt;/code&gt; is conventional; tuning it for your data takes experimentation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Normalized 0-1 score, easy to threshold&lt;/td&gt;
&lt;td&gt;Like Jaro, harder to explain to non-technical stakeholders than edit distance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h4 id="when-to-use-it-for-addresses_3"&gt;When to use it for addresses&lt;/h4&gt;
&lt;p&gt;Jaro-Winkler is the best general-purpose choice for full address comparison when strings are properly normalized (lowercase, consistent formatting). The prefix bonus is particularly useful when the house number is at the start of the string - it correctly penalizes address strings that start with different numbers, since those represent different physical locations regardless of the rest.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="summary-table"&gt;Summary table&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Score range&lt;/th&gt;
&lt;th&gt;Transpositions&lt;/th&gt;
&lt;th&gt;Prefix bonus&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Levenshtein&lt;/td&gt;
&lt;td&gt;Distance&lt;/td&gt;
&lt;td&gt;0 to +inf&lt;/td&gt;
&lt;td&gt;Costs 2&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Abbreviations, insertions, deletions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Damerau-Levenshtein&lt;/td&gt;
&lt;td&gt;Distance&lt;/td&gt;
&lt;td&gt;0 to +inf&lt;/td&gt;
&lt;td&gt;Costs 1&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;User-typed input, keyboard typos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jaro&lt;/td&gt;
&lt;td&gt;Similarity&lt;/td&gt;
&lt;td&gt;0 to 1&lt;/td&gt;
&lt;td&gt;Handled&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Individual address fields&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jaro-Winkler&lt;/td&gt;
&lt;td&gt;Similarity&lt;/td&gt;
&lt;td&gt;0 to 1&lt;/td&gt;
&lt;td&gt;Handled&lt;/td&gt;
&lt;td&gt;Yes (up to 4 chars)&lt;/td&gt;
&lt;td&gt;Full normalized address strings&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Quick decision guide:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;User types addresses into a form&lt;/strong&gt; - use Damerau-Levenshtein. Transpositions are very common in manual input.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You are deduplicating a database of street names&lt;/strong&gt; - use Jaro or Jaro-Winkler on the street name component alone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;You want a single score for a full address string&lt;/strong&gt; - use Jaro-Winkler on normalized strings.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Your main problem is abbreviations&lt;/strong&gt; ("Blvd" vs "Boulevard") - normalize abbreviations first (see below), then any metric will do.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3 id="normalization-the-step-that-matters-more-than-metric-choice"&gt;Normalization: the step that matters more than metric choice&lt;/h3&gt;
&lt;p&gt;No string metric can compensate for what normalization ignores. Before comparing two addresses, apply at least these basics:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Lowercase everything&lt;/strong&gt;: "Baker Street" and "baker street" should score 1.0, not 0.9.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expand abbreviations consistently&lt;/strong&gt;: "St" -&amp;gt; "street", "Blvd" -&amp;gt; "boulevard", "Ave" -&amp;gt; "avenue", "Rd" -&amp;gt; "road". Or shorten everything to a consistent short form.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Remove punctuation&lt;/strong&gt;: "St. James's" and "St Jamess" will score very differently if you keep periods and apostrophes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Normalize whitespace&lt;/strong&gt;: collapse multiple spaces, strip leading/trailing whitespace.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With these steps, "15 Baker St." and "15 Baker Street" become "15 baker street" and "15 baker street" - identical strings that score perfectly on any metric.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;re&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;normalize_address&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"[.,'\-]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\bst\b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"street"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\bblvd\b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"boulevard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\bave\b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"avenue"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\brd\b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"road"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\s+"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;

&lt;span class="n"&gt;normalize_address&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"15 Baker St."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# "15 baker street"&lt;/span&gt;
&lt;span class="n"&gt;normalize_address&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"15 Baker Street"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# "15 baker street"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After normalization, both strings are identical and any metric returns a perfect score.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="putting-it-all-together-a-complete-address-comparison"&gt;Putting it all together: a complete address comparison&lt;/h3&gt;
&lt;p&gt;Here is a short example that combines normalization with all four metrics:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;jellyfish&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;re&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;normalize_address&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"[.,'\-]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\bst\b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"street"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\bblvd\b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"boulevard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\bave\b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"avenue"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\brd\b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"road"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s2"&gt;"\s+"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;

&lt;span class="n"&gt;pairs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"15 Baker St."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"15 Baker Street"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"15 Baker Srteet"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"15 Baker Street"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"12 Rue de la Paix"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"21 Rue de la Paix"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Church Rd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Church Road"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;na&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;normalize_address&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;normalize_address&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;lev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;levenshtein_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;na&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;dlev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;damerau_levenshtein_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;na&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;jaro&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jaro_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;na&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;jw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jellyfish&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jaro_winkler_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;na&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s2"&gt; vs &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"  Levenshtein: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lev&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, Damerau-Levenshtein: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dlev&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;"  Jaro: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;jaro&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;.3f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, Jaro-Winkler: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;jw&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;.3f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Expected output:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;'15 Baker St.' vs '15 Baker Street'
* Levenshtein: 0, Damerau-Levenshtein: 0  # normalization made them identical
* Jaro: 1.000, Jaro-Winkler: 1.000

'15 Baker Srteet' vs '15 Baker Street'
* Levenshtein: 2, Damerau-Levenshtein: 1  # one transposition, not two substitutions
* Jaro: 0.978, Jaro-Winkler: 0.987

'12 Rue de la Paix' vs '21 Rue de la Paix'
* Levenshtein: 2, Damerau-Levenshtein: 1  # different house number
* Jaro: 0.980, Jaro-Winkler: 0.980        # no prefix bonus (1 != 2)

'Church Rd' vs 'Church Road'
* Levenshtein: 0, Damerau-Levenshtein: 0  # normalization expanded Rd -&amp;gt; Road
* Jaro: 1.000, Jaro-Winkler: 1.000
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Note how the normalization step handles "St" vs "Street" and "Rd" vs "Road" completely, leaving the metrics to handle what normalization cannot catch (typos, transpositions, genuine differences).&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="where-to-go-from-here"&gt;Where to go from here&lt;/h3&gt;
&lt;p&gt;String similarity metrics are one tool in the address quality toolbox. In geocoding workflows, they are commonly used to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Verify that a geocoded result actually matches the input address (compare the returned address label to the input).&lt;/li&gt;
&lt;li&gt;Deduplicate address lists before batching geocoding requests.&lt;/li&gt;
&lt;li&gt;Build fuzzy search over a local address database before calling a paid API.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are working on address data quality as part of a geocoding pipeline, these resources may also be useful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learn how to geocode large batches cost-effectively: &lt;a href="https://coordable.co/blog/how-to-reduce-geocoding-costs-by-67/"&gt;How to reduce geocoding costs by 67%&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;See what geocoding providers are available and how their prices compare: &lt;a href="https://coordable.co/blog/geocoding-prices-2026/"&gt;Geocoding prices in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Find the best geocoding provider for your country: &lt;a href="https://coordable.co/country-analysis/best-geocoding-providers-france/"&gt;Best geocoding providers for France&lt;/a&gt;, &lt;a href="https://coordable.co/country-analysis/best-geocoding-providers-united-kingdom/"&gt;UK&lt;/a&gt;, &lt;a href="https://coordable.co/country-analysis/best-geocoding-providers-germany/"&gt;Germany&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://coordable.co" class="learn-more-btn"&gt;Try address geocoding with Coordable&lt;/a&gt;&lt;/p&gt;</description><guid>https://coordable.co/fr/blog/string-distance-metrics-address-comparison/</guid><pubDate>Wed, 01 Apr 2026 10:00:00 GMT</pubDate></item></channel></rss>