Skip to content

Commit d81da26

Browse files
document the new analysis-phonenumber plugin
this is part of opensearch-project/OpenSearch#11326. the actual implementation was done opensearch-project/OpenSearch#15915. see the commit message on the PR for further details. resolves #8389 Co-authored-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: Ralph Ursprung <Ralph.Ursprung@avaloq.com>
1 parent cd31d82 commit d81da26

3 files changed

Lines changed: 155 additions & 24 deletions

File tree

_analyzers/supported-analyzers/index.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,13 @@ Analyzer | Analysis performed | Analyzer output
2929

3030
## Language analyzers
3131

32-
OpenSearch supports analyzers for various languages. For more information, see [Language analyzers]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/).
32+
OpenSearch supports analyzers for various languages. For more information, see [Language analyzers]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/).
33+
34+
## Additional analyzers
35+
36+
The following table lists the additional analyzers that OpenSearch supports.
37+
38+
| Analyzer | Analysis performed |
39+
|:---------------|:---------------------------------------------------------------------------------------------------------|
40+
| `phone` | An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) for parsing phone numbers. |
41+
| `phone-search` | A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) for parsing phone numbers. |
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
layout: default
3+
title: Phone number
4+
parent: Analyzers
5+
nav_order: 140
6+
---
7+
8+
# Phone number analyzers
9+
10+
The `analysis-phonenumber` plugin provides analyzers and tokenizers for parsing phone numbers.
11+
A dedicated analyzer is required because parsing phone numbers is a non-trivial task (even though it might seem trivial at first glance). For common pitfalls in parsing phone numbers, see [Falsehoods programmers believe about phone numbers](https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md).
12+
13+
14+
OpenSearch supports the following phone number analyzers:
15+
16+
* `phone`: An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) to use at indexing time.
17+
* `phone-search`: A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) to use at search time.
18+
19+
Internally, the plugin uses the [`libphonenumber`](https://github.com/google/libphonenumber) library and follows its parsing rules.
20+
21+
The phone number analyzers are not meant to find phone numbers in larger texts. Instead, you should use them on fields which contain phone numbers alone.
22+
{: .note}
23+
24+
## Installing the plugin
25+
26+
Before you can use phone number analyzers, you must install the `analysis-phonenumber` plugin by running the following command:
27+
28+
```sh
29+
./bin/opensearch-plugin install analysis-phonenumber
30+
```
31+
32+
## Specifying a default region
33+
34+
You can optionally specify a default region for parsing phone numbers by providing the `phone-region` parameter within the analyzer. Valid phone regions are ISO 3166 country codes. For more information, see [List of ISO 3166 country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes).
35+
36+
When tokenizing phone numbers containing the international calling prefix `+`, the default region is irrelevant. However, for phone numbers which either use a national prefix for international numbers (for example, `001` instead of `+1` to dial Northern America from most European countries), the region needs to be provided. You can also properly index local phone numbers with no international prefix if you specify the region.
37+
38+
## Example
39+
40+
The following request creates an index containing one field, which ingests phone numbers for Switzerland (region code `CH`):
41+
42+
```json
43+
PUT /example-phone
44+
{
45+
"settings": {
46+
"analysis": {
47+
"analyzer": {
48+
"phone-ch": {
49+
"type": "phone",
50+
"phone-region": "CH"
51+
},
52+
"phone-search-ch": {
53+
"type": "phone-search",
54+
"phone-region": "CH"
55+
}
56+
}
57+
}
58+
},
59+
"mappings": {
60+
"properties": {
61+
"phoneNumber": {
62+
"type": "text",
63+
"analyzer": "phone-ch",
64+
"search_analyzer": "phone-search-ch"
65+
}
66+
}
67+
}
68+
}
69+
```
70+
{% include copy-curl.html %}
71+
72+
Analysing a (fictional) Swiss phone number with an international calling prefix will work the same with either the Swiss-specific phone region or without:
73+
```json
74+
GET /example-phone/_analyze
75+
{
76+
"analyzer" : "phone-ch",
77+
"text" : "+41 60 555 12 34"
78+
}
79+
```
80+
{% include copy-curl.html %}
81+
82+
and
83+
84+
```json
85+
GET /example-phone/_analyze
86+
{
87+
"analyzer" : "phone",
88+
"text" : "+41 60 555 12 34"
89+
}
90+
```
91+
{% include copy-curl.html %}
92+
93+
will produce the same result:
94+
```json
95+
["+41 60 555 12 34", "6055512", "41605551", "416055512", "6055", "41605551234", ...]
96+
```
97+
98+
If, however, the phone number is given without the international calling prefix `+` (either by using `0041` or omitting
99+
the international calling prefix altogether) then only the analyzer with the correct phone region will be able to parse it:
100+
```json
101+
GET /example-phone/_analyze
102+
{
103+
"analyzer" : "phone-ch",
104+
"text" : "060 555 12 34"
105+
}
106+
```
107+
{% include copy-curl.html %}
108+
109+
In contrast the `phone-search` analyzer does not create n-grams and only issues some basic tokens:
110+
```json
111+
GET /example-phone/_analyze
112+
{
113+
"analyzer" : "phone-search",
114+
"text" : "+41 60 555 12 34"
115+
}
116+
```
117+
{% include copy-curl.html %}
118+
119+
```json
120+
["+41 60 555 12 34", "41 60 555 12 34", "41605551234", "605551234", "41"]
121+
```

_install-and-configure/additional-plugins/index.md

Lines changed: 24 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,29 +9,30 @@ nav_order: 10
99

1010
There are many more plugins available in addition to those provided by the standard distribution of OpenSearch. These additional plugins have been built by OpenSearch developers or members of the OpenSearch community. While it isn't possible to provide an exhaustive list (because many plugins are not maintained in an OpenSearch GitHub repository), the following plugins, available in the [OpenSearch/plugins](https://github.com/opensearch-project/OpenSearch/tree/main/plugins) directory on GitHub, are some of the plugins that can be installed using one of the installation options, for example, using the command `bin/opensearch-plugin install <plugin-name>`.
1111

12-
| Plugin name | Earliest available version |
13-
| :--- | :--- |
14-
| analysis-icu | 1.0.0 |
15-
| analysis-kuromoji | 1.0.0 |
16-
| analysis-nori | 1.0.0 |
17-
| analysis-phonetic | 1.0.0 |
18-
| analysis-smartcn | 1.0.0 |
19-
| analysis-stempel | 1.0.0 |
20-
| analysis-ukrainian | 1.0.0 |
21-
| discovery-azure-classic | 1.0.0 |
22-
| discovery-ec2 | 1.0.0 |
23-
| discovery-gce | 1.0.0 |
24-
| [`ingest-attachment`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/ingest-attachment-plugin/) | 1.0.0 |
25-
| mapper-annotated-text | 1.0.0 |
26-
| mapper-murmur3 | 1.0.0 |
27-
| [`mapper-size`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/) | 1.0.0 |
28-
| query-insights | 2.12.0 |
29-
| repository-azure | 1.0.0 |
30-
| repository-gcs | 1.0.0 |
31-
| repository-hdfs | 1.0.0 |
32-
| repository-s3 | 1.0.0 |
33-
| store-smb | 1.0.0 |
34-
| transport-nio | 1.0.0 |
12+
| Plugin name | Earliest available version |
13+
|:-----------------------------------------------------------------------------------------------------------------------|:---------------------------|
14+
| analysis-icu | 1.0.0 |
15+
| analysis-kuromoji | 1.0.0 |
16+
| analysis-nori | 1.0.0 |
17+
| [`analysis-phonenumber`]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/phone-analyzers/) | 2.18.0 |
18+
| analysis-phonetic | 1.0.0 |
19+
| analysis-smartcn | 1.0.0 |
20+
| analysis-stempel | 1.0.0 |
21+
| analysis-ukrainian | 1.0.0 |
22+
| discovery-azure-classic | 1.0.0 |
23+
| discovery-ec2 | 1.0.0 |
24+
| discovery-gce | 1.0.0 |
25+
| [`ingest-attachment`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/ingest-attachment-plugin/) | 1.0.0 |
26+
| mapper-annotated-text | 1.0.0 |
27+
| mapper-murmur3 | 1.0.0 |
28+
| [`mapper-size`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/) | 1.0.0 |
29+
| query-insights | 2.12.0 |
30+
| repository-azure | 1.0.0 |
31+
| repository-gcs | 1.0.0 |
32+
| repository-hdfs | 1.0.0 |
33+
| repository-s3 | 1.0.0 |
34+
| store-smb | 1.0.0 |
35+
| transport-nio | 1.0.0 |
3536

3637
## Related articles
3738

0 commit comments

Comments
 (0)