Skip to content

Comments

Adding support for aws-efa receiver#412

Open
mitali-salvi wants to merge 7 commits intofeature-otlp-ingestionfrom
efa
Open

Adding support for aws-efa receiver#412
mitali-salvi wants to merge 7 commits intofeature-otlp-ingestionfrom
efa

Conversation

@mitali-salvi
Copy link

@mitali-salvi mitali-salvi commented Feb 13, 2026

Description

Adds a new awsefareceiver — a standalone OpenTelemetry receiver that collects Amazon Elastic Fabric Adapter (EFA) metrics by reading hardware counters from /sys/class/infiniband/*/ports/*/hw_counters/.

Emits 11 cumulative monotonic sum metrics per device/port (RDMA bytes, rx/tx bytes, retransmissions, connection health events) with device and port resource attributes. Supports containerized deployments via a configurable host_path.

Testing

  1. Unit tests
  2. Deployed the agent with configured efa receiver. added a debug exporter for demo purpose - formatted output of the data point
# ResourceMetrics #0

## Resource
| Attribute | Value |
|-----------|-------|
| `device` | `efa_0` |
| `port` | `1` |

## ScopeMetrics #0
**InstrumentationScope:** `github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsefareceiver` (Unknown)

---

### Metric #0 — `efa_impaired_remote_conn_events`
- **Description:** The number of times EFA SRD connections entered an impaired state resulting in a reduced throughput rate limit
- **Unit:** `1`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #1 — `efa_rdma_read_bytes`
- **Description:** The number of bytes received using RDMA read operations
- **Unit:** `By`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #2 — `efa_rdma_write_bytes`
- **Description:** The number of bytes written by other instances using RDMA write operations
- **Unit:** `By`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #3 — `efa_rdma_write_recv_bytes`
- **Description:** The number of bytes received by RDMA write operations
- **Unit:** `By`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #4 — `efa_retrans_bytes`
- **Description:** The number of EFA SRD bytes retransmitted
- **Unit:** `By`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #5 — `efa_retrans_pkts`
- **Description:** The number of EFA SRD packets retransmitted
- **Unit:** `1`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #6 — `efa_retrans_timeout_events`
- **Description:** The number of times EFA SRD traffic timed out and resulted in a network path change
- **Unit:** `1`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #7 — `efa_rx_bytes`
- **Description:** The number of bytes received
- **Unit:** `By`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #8 — `efa_rx_dropped`
- **Description:** The number of packets that were received and then dropped
- **Unit:** `1`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #9 — `efa_tx_bytes`
- **Description:** The number of bytes transmitted
- **Unit:** `By`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

### Metric #10 — `efa_unresponsive_remote_events`
- **Description:** The number of times an EFA SRD remote connection was unresponsive
- **Unit:** `1`
- **DataType:** Sum (Monotonic, Cumulative)
- **Start:** `2026-02-23 21:50:28.319634708 UTC`
- **Timestamp:** `2026-02-23 21:50:32.647593691 UTC`
- **Value:** `0`

---

Documentation

README with configuration example, metrics table, and resource attributes. metadata.yaml defining all 11 metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant