-
Notifications
You must be signed in to change notification settings - Fork 313
[RFC] Core Peer Forwarding #700
Description
Background
The background for this change is explained in #699.
Proposal
Data Prepper will include peer forwarding as a core feature which any plugin can use. The aggregate plugin defined in #699 will use this new feature.
Design
The proposed design is to create a more general Peer Forwarder as part of Data Prepper Core. In this design, any plugin can request peer forwarding of events between Data Prepper nodes. Peer Forwarder takes Events, groups these by the plugin-defined correlation values, and then sends them to the correct Data Prepper node. It continues to use the existing hash ring approach for determining the destination.
The following diagram shows the flow of Event with the proposed Peer Forwarder.
Peer Forwarder Configuration
The user will configure Peer Forwarder in the existing data-prepper-config.yaml file. Below is a snippet depicting how a user can configure peer-forwarding and what options are available. For brevity, the example does not show all the existing configurations related to peer discovery.
peer_forwarder:
max_batch_event_count: 48
port: 4910
time_out: 300
discovery_mode: "dns"
domain_name: "data-prepper-cluster.my-domain.net"
This design allows for one peer-forwarder in Data Prepper. See the Alternatives and Questions below for a discussion on supporting multiple peer-forwarders.
Service Discovery Configuration
The core Peer Forwarder will use the existing service discovery options. Presently, peers can be discovered via:
- Pre-configured static IP list
- DNS entry
- AWS CloudMap
Security Configuration
The peer-forwarder will support authentication and TLS. For TLS encryption, peer-forwarder can utilize the work which is planned for unifying certificate loading #364.
For authentication, peer-forwarder can use the same mechanism for securing its endpoint as was provided in #464. Additionally, it will need a new concept for authenticating requests when it is the client. This could be based on the authentication configure so that the username and password need not be repeated.
Here is a possible secured configuration.
peer_forwarder:
max_batch_event_count: 48
port: 4910
time_out: 300
ssl: true
certificate:
file:
certificate_path: /usr/share/my/path/public.cert
private_key_path: /usr/share/my/path/private.key
authentication:
http_basic:
username: admin
password: admin
discovery_mode: "dns"
domain_name: "data-prepper-cluster.my-domain.net"
Peer Forwarder Communication
Peer Forwarder will send batches of Event objects. It will send them over HTTP/2 to a user-configurable port.
The model for communication is loosely defined as:
public class ForwardedEvent {
private String event;
private String destinationPlugin;
}
public class ForwardedEvents {
private List<String> events;
}
Each event is a string. It is the serialized JSON for that event.
The Peer Forwarder also specifies the destination plugin. It must do this so that multiple aggregate plugins can use one shared peer-forwarder.
Peer Forwarder Implementation
The peer forwarder will continue to use consistent hashing and a hash ring to determine the destination node. One significant implementation change is that it will now support multiple keys for determining the hash. Peer Forwarder will perform this by appending the values together into a single string or byte array value.
Peer Forwarder Plugins
Plugins requiring peer-forwarding must implement the following interface. Data Prepper will detect plugins which implement this interface and configure the peer-forwarder for that plugin.
/**
* Add this interface to a Processor which must have peer forwarding
* prior to processing events.
*/
interface RequiresPeerForwarding {
/**
* Gets the correlation keys which Peer Forwarder uses to allocate
* Events to specific Data Prepper nodes.
*
* @return A set of keys
*/
Set<String> getCorrelationKeys();
}
Data Prepper will wrap the plugin with a peer-forwarder. With this, plugins will not need to write code to route to peer-forwarder or receive from peer-forwarder. The Data Prepper pipeline will resolve the peer-forwarding.
The plugin only needs to implement the getCorrelationKeys() method. The plugin will return a list of key names which the peer-forwarder will use to determine the node. For example, in Trace Analytics, this could be implemented as follows.
@Override
public Set<String> getCorrelationKeys() {
return Collection.singleton("traceId");
}
Alternatives and Questions
How will the Peer Forwarder Migrate?
This proposal is to refactor the current peer-forwarder plugin to support the generic peer forwarding. Until the next major release (2.0), it must remain as a plugin. It should be left unchanged.
What Plugin Types can use Peer Forwarding?
The initial implementation will allow peer-forwarding only on Processor plugins. If you need a Source or Sink to peer-forward, please create a new GitHub issue to expand the functionality.
Multiple Peer Forwarders
Data Prepper could support multiple peer forwarders. Users would assign names so that different aggregate plugins could specify which to use. Below is a small example.
peer_forwarder:
- name: default
max_batch_event_count: 48
port: 4910
time_out: 300
discovery_mode: "dns"
domain_name: "data-prepper-cluster.my-domain.net"
- name: other_forwarder
max_batch_event_count: 48
port: 4912
time_out: 300
discovery_mode: "dns"
domain_name: "data-prepper-cluster.my-domain.net"
This could be confusing for users and there may not be a need for it. If you know of a specific use-case that would require this, please comment and explain in the issue.
Distinct Plugins
This RFC proposes core support for peer-forwarding and is based on #699. One alternative I considered is keeping peer-forwarder as distinct plugin which must run prior to the aggregate plugin.
Here is a notional pipeline definition (the details are left out for brevity).
aggregate-pipeline:
source:
http:
processor:
- grok:
- peer-forwarder:
- aggregate:
sink:
- opensearch:
Pros to proposed solution:
- Pipeline authors need not add boilerplate peer-forwarder plugins before the aggregate plugin. It will be easier for plugin authors to create correct pipelines.
- Other plugins could use peer-forwarding
Pros to alternative solution:
- It would match the existing design of a peer-forwarder plugin and service-map-stateful plugin.
- The peer-forwarder configuration is closer to where it is needed by being in the pipeline configuration rather than a different configuration file.
- Single node clusters don’t need peer-forwarding and it would be easy to leave it out in such cases.
Peer Forwarder as Processor and Source
Another solution would be to create a Peer Forwarder Source and a Peer Forwarder Processor. In this approach, a pipeline author must configure the pipeline to have both the source and processor.
Here is a notional pipeline definition (the details are left out for brevity).
pre-forwarding-pipeline:
source:
http:
processor:
- grok:
- peer-forwarder
sink:
- pipeline:
name: post-forwarded-pipeline
post-forwarded-pipeline:
source:
- peer-forwarder:
- pipeline:
name: pre-forwarding-pipeline
processor:
- grok:
sink:
- opensearch:
Pros to the proposed solution:
- Authors don’t have to think about which plugins need peer-forwarding.
- Authors don’t have to split their pipelines in order to get input from other nodes into the desired plugin.
Pros to the alternative solution:
- This fits the current model better because processors are not currently able to add to the buffer
- There would be no need for additional support within Data Prepper core.
Peer Forwarding gRPC
The Peer Forwarder can use gRPC for communication instead of raw HTTP. This may not be necessary since Peer Forwarder can use HTTP/2 and binary messages. However, the protocol must not change within a major version since this would make two Data Preppers of the same major version incompatible with each other.
Tasks
- Add PeerForwardingProcessingDecorator skeleton #1589
- Add RequiresPeerForwarding interface #1590
- Add PeerForwarderConfiguration class #1591
- Identify processors which implement RequiredPeerForwarding #1597
- Embed PeerForwarderConfiguration into DataPrepperConfiguration #1602
- Create a thread safe buffer for PeerForwarder server #1603
- Add PeerForwarderServer similar to HTTP source #1605
- Add PeerForwarderClient #1606
- PeerForwarderProcessorDecorator will take client, server and buffer as input #1607
- Inject PeerForwarderServer and start in execute #1608
- Add metrics for monitoring PeerForwarder #1609
- Make Peer Forwarding secure by default #1699
- Write events received by server to buffer #1746
- Support mTLS authentication between peers for core peer-forwarding #1758
- Create a guide to using Peer Forwarding (documentation task) in Data Prepper repo #1772
- Document Peer Forwarding in OpenSearch.org documentation #1773
- Validate server certificates on the client #1775
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
