|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: Phone number |
| 4 | +parent: Analyzers |
| 5 | +nav_order: 140 |
| 6 | +--- |
| 7 | + |
| 8 | +# Phone number analyzers |
| 9 | + |
| 10 | +The `analysis-phonenumber` plugin provides analyzers and tokenizers for parsing phone numbers. |
| 11 | +A dedicated analyzer is required because parsing phone numbers is a non-trivial task (even though it might seem trivial at first glance). For common pitfalls in parsing phone numbers, see [Falsehoods programmers believe about phone numbers](https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md). |
| 12 | + |
| 13 | + |
| 14 | +OpenSearch supports the following phone number analyzers: |
| 15 | + |
| 16 | +* `phone`: An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) to use at indexing time. |
| 17 | +* `phone-search`: A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) to use at search time. |
| 18 | + |
| 19 | +Internally, the plugin uses the [`libphonenumber`](https://github.com/google/libphonenumber) library and follows its parsing rules. |
| 20 | + |
| 21 | +The phone number analyzers are not meant to find phone numbers in larger texts. Instead, you should use them on fields which contain phone numbers alone. |
| 22 | +{: .note} |
| 23 | + |
| 24 | +## Installing the plugin |
| 25 | + |
| 26 | +Before you can use phone number analyzers, you must install the `analysis-phonenumber` plugin by running the following command: |
| 27 | + |
| 28 | +```sh |
| 29 | +./bin/opensearch-plugin install analysis-phonenumber |
| 30 | +``` |
| 31 | + |
| 32 | +## Specifying a default region |
| 33 | + |
| 34 | +You can optionally specify a default region for parsing phone numbers by providing the `phone-region` parameter within the analyzer. Valid phone regions are ISO 3166 country codes. For more information, see [List of ISO 3166 country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes). |
| 35 | + |
| 36 | +When tokenizing phone numbers containing the international calling prefix `+`, the default region is irrelevant. However, for phone numbers which either use a national prefix for international numbers (for example, `001` instead of `+1` to dial Northern America from most European countries), the region needs to be provided. You can also properly index local phone numbers with no international prefix if you specify the region. |
| 37 | + |
| 38 | +## Example |
| 39 | + |
| 40 | +The following request creates an index containing one field, which ingests phone numbers for Switzerland (region code `CH`): |
| 41 | + |
| 42 | +```json |
| 43 | +PUT /example-phone |
| 44 | +{ |
| 45 | + "settings": { |
| 46 | + "analysis": { |
| 47 | + "analyzer": { |
| 48 | + "phone-ch": { |
| 49 | + "type": "phone", |
| 50 | + "phone-region": "CH" |
| 51 | + }, |
| 52 | + "phone-search-ch": { |
| 53 | + "type": "phone-search", |
| 54 | + "phone-region": "CH" |
| 55 | + } |
| 56 | + } |
| 57 | + } |
| 58 | + }, |
| 59 | + "mappings": { |
| 60 | + "properties": { |
| 61 | + "phoneNumber": { |
| 62 | + "type": "text", |
| 63 | + "analyzer": "phone-ch", |
| 64 | + "search_analyzer": "phone-search-ch" |
| 65 | + } |
| 66 | + } |
| 67 | + } |
| 68 | +} |
| 69 | +``` |
| 70 | +{% include copy-curl.html %} |
| 71 | + |
| 72 | +Analysing a (fictional) Swiss phone number with an international calling prefix will work the same with either the Swiss-specific phone region or without: |
| 73 | +```json |
| 74 | +GET /example-phone/_analyze |
| 75 | +{ |
| 76 | + "analyzer" : "phone-ch", |
| 77 | + "text" : "+41 60 555 12 34" |
| 78 | +} |
| 79 | +``` |
| 80 | +{% include copy-curl.html %} |
| 81 | + |
| 82 | +and |
| 83 | + |
| 84 | +```json |
| 85 | +GET /example-phone/_analyze |
| 86 | +{ |
| 87 | + "analyzer" : "phone", |
| 88 | + "text" : "+41 60 555 12 34" |
| 89 | +} |
| 90 | +``` |
| 91 | +{% include copy-curl.html %} |
| 92 | + |
| 93 | +will produce the same result: |
| 94 | +```json |
| 95 | +["+41 60 555 12 34", "6055512", "41605551", "416055512", "6055", "41605551234", ...] |
| 96 | +``` |
| 97 | + |
| 98 | +If, however, the phone number is given without the international calling prefix `+` (either by using `0041` or omitting |
| 99 | +the international calling prefix altogether) then only the analyzer with the correct phone region will be able to parse it: |
| 100 | +```json |
| 101 | +GET /example-phone/_analyze |
| 102 | +{ |
| 103 | + "analyzer" : "phone-ch", |
| 104 | + "text" : "060 555 12 34" |
| 105 | +} |
| 106 | +``` |
| 107 | +{% include copy-curl.html %} |
| 108 | + |
| 109 | +In contrast the `phone-search` analyzer does not create n-grams and only issues some basic tokens: |
| 110 | +```json |
| 111 | +GET /example-phone/_analyze |
| 112 | +{ |
| 113 | + "analyzer" : "phone-search", |
| 114 | + "text" : "+41 60 555 12 34" |
| 115 | +} |
| 116 | +``` |
| 117 | +{% include copy-curl.html %} |
| 118 | + |
| 119 | +```json |
| 120 | +["+41 60 555 12 34", "41 60 555 12 34", "41605551234", "605551234", "41"] |
| 121 | +``` |
0 commit comments