Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
- name: agent.id
type: keyword
dimension: true
- name: agent.ip
type: ip
dimensiont: true
- name: agent.call_count
type: long
metric_type: counter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,11 @@ streams:
type: text
title: Period
default: 10s

elasticsearch:
index_template:
settings:
# This should produce an error because this field is not a keyword.
index.routing_path: "example.agent.ip"
data_stream:
index_mode: "time_series"
3 changes: 3 additions & 0 deletions test/packages/time_series/data_stream/example/manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,6 @@ elasticsearch:
settings:
# Defaults to 16
index.mapping.dimension_fields.limit: 32
index.routing_path: "example.agent.id"
data_stream:
index_mode: "time_series"
Comment on lines +19 to +20
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about adding this data_stream.index_mode setting to the spec. This setting needs that the data stream has dimension fields, and that index.routing_path is configured to a dimension field of type keyword. If we add this to the spec we should also check that the other requirements are satisfied in order to provide good feedback to package developers. We should also ensure that Fleet reads this setting and installs it in the index template.

Another option could be to make Fleet to automatically enable this mode if the index meets the requirements, that is if it has dimension fields, and it has an index.routing_path.

@mtojek @joshdover opinions?

Copy link
Copy Markdown
Member Author

@jsoriano jsoriano Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to install a data stream with these settings, and it fails with:

Error: can't install the package: can't install the package: could not install package; API status code = 500; response body = {"statusCode":500,"error":"Internal Server Error","message":"illegal_argument_exception: [illegal_argument_exception] Reason: [index.routing_path] requires [index.mode=time_series]"}

So it seems that current fleet would use index_template.settings but not index_template.data_stream.

We may need to check the required version to enable these settings.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about adding this data_stream.index_mode setting to the spec. This setting needs that the data stream has dimension fields, and that index.routing_path is configured to a dimension field of type keyword. If we add this to the spec we should also check that the other requirements are satisfied in order to provide good feedback to package developers. We should also ensure that Fleet reads this setting and installs it in the index template.

I can comment on this one. I'm pretty sure we have a lot of gaps in the validation, but at some point, it helps developers discover what are available properties (index_mode with value time _series), even though it doesn't validate all dependent places.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments/questions:

  • Why add a data_stream.index_mode field rather than using the existing index_template.settings.index.mode?
  • What is the dev experience we're aiming for with index settings? Do we want to move towards a more high-level, easy configuration experience or do we prefer making it explicit?
    • If we prefer the latter (which would be my preference), I think we should require the user to set the index.mode explicitly and validate this in the spec rather than having Kibana automatically apply the index mode when the setting is present. IMO, the less magic Kibana does to install packages the better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why add a data_stream.index_mode field rather than using the existing index_template.settings.index.mode?

It seems that after elastic/elasticsearch#82621, it is recommended to use data_stream.index_mode. The setting under the index template settings seems to also require to manually set the start and end time of each index.

@martijnvg can probably confirm what setting we should use.

What is the dev experience we're aiming for with index settings? Do we want to move towards a more high-level, easy configuration experience or do we prefer making it explicit?

  • If we prefer the latter (which would be my preference), I think we should require the user to set the index.mode explicitly and validate this in the spec rather than having Kibana automatically apply the index mode when the setting is present. IMO, the less magic Kibana does to install packages the better.

Not sure what would be the best experience. What concerns me here is that there are three settings that are strongly dependant, a developer needs to use dimensions, set one of these dimensions as routing path and enable time series mode to actually enable time series. I can imagine that a developer can forget of any of these things.

I think that dimensions validation belongs to the spec, there we can statically check if the dimensions and the routing path are valid. But at this point the index mode would be redundant, the developer most likely wants to use it (actually it doesn't seem possible to use a routing path without the time series mode, see #297 (comment)).
I guess that adding another static check for this in the spec is preferred to adding more magic to Kibana.
One problem of this approach is that packages enabling the time series mode will only work with versions of the stack supporting these settings. We make harder for developers to support a broad range of stack versions.

Another, more explicit option, could be to add to the data stream manifest something like this:

time_series:
  enabled: true
  routing_path: elastic.agent.id

This would need to be explicitly interpreted by Kibana and applied to the index template.

There are two big advantages of this:

  • It is more explicit for developers.
  • It is more backwards compatible, older versions of the stack could continue using these packages, but without time series features.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to also require to manually set the start and end time of each index

This is not required, and when setting index_mode to time_series inside data stream snippet of a template then start and end time index settings are automatically set.

Only the the routing_path index setting needs to be additionally configured as index setting inside the template.

time_series:
enabled: true
routing_path: elastic.agent.id

I like this idea. Assuming that time_series is a field inside data_stream object of a template?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to also require to manually set the start and end time of each index

This is not required, and when setting index_mode to time_series inside data stream snippet of a template then start and end time index settings are automatically set.

Oh ok, I understood from elastic/elasticsearch#82621 that this is required, and when defining it in the index settings then it to be set manually. In any case I think that for out use case we want this to be done automatically, and then it would be in the data stream object of the template, right?

time_series:
enabled: true
routing_path: elastic.agent.id

I like this idea. Assuming that time_series is a field inside data_stream object of a template?

Here I am talking about the data_stream manifest in a package (like this one for apache status). This would need to be translated by Fleet to the proper template settings when installing a package.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case I think that for out use case we want this to be done automatically, and then it would be in the data stream object of the template, right?

Yes, defining index_mode in data stream object would automatically setup the start and end time index settings for the indices that are backing a data stream.

Here I am talking about the data_stream manifest in a package (like this one for apache status). This would need to be translated by Fleet to the proper template settings when installing a package.

I see. I'm currently discussing with others to change how tsdb data streams are configured in templates. In the way that you describe. (bringing routing_path index setting into the data stream object of a template)

10 changes: 10 additions & 0 deletions versions/1/data_stream/manifest.spec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,16 @@ spec:
type: string
required:
- name
data_stream:
description: Data stream settings
type: object
additionalProperties: false
properties:
index_mode:
description: Index mode
type: string
enum:
- time_series
privileges:
description: Elasticsearch privilege requirements
type: object
Expand Down