Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ int FIFOEndpointStore::deleteEndpoint(const std::string &peer_nic_path) {
// in case it is setting up connection or submitting slice
if (iter != endpoint_map_.end()) {
waiting_list_.insert(iter->second);
iter->second->set_active(false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The pull request aims to fix a QP leak by setting the endpoint status to inactive when it is deleted. This ensures that the endpoint can be reclaimed once all outstanding transfers are complete. However, the fix is only applied to the deleteEndpoint method and is missing from the FIFOEndpointStore::evictEndpoint method. When an endpoint is evicted from the FIFO store because it has reached its maximum size, it is moved to the waiting_list_ but its active_ status remains true. As a result, RdmaEndPoint::hasOutstandingSlice() will always return true, and FIFOEndpointStore::reclaimEndpoint() will never remove the endpoint from the waiting_list_. This leads to a leak of RDMA Queue Pairs (QPs) and other resources, which can eventually lead to a Denial of Service (DoS) when the system runs out of available QPs.

Note that SIEVEEndpointStore::evictEndpoint correctly calls set_active(false) on line 212, so FIFOEndpointStore should be updated to match this behavior.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. But what do authors think about this issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviors of SIEVEEndpointStore and FIFOEndpointStore do indeed need to match, but the purpose of this PR is to fix the qp leakage issue introduced in PR#384. PR#384 did not modify the evictEndpoint function, and I cannot predict what impact it would have.

endpoint_map_.erase(iter);
auto fifo_iter = fifo_map_[peer_nic_path];
fifo_list_.erase(fifo_iter);
Expand Down Expand Up @@ -174,6 +175,7 @@ int SIEVEEndpointStore::deleteEndpoint(const std::string &peer_nic_path) {
if (iter != endpoint_map_.end()) {
waiting_list_len_++;
waiting_list_.insert(iter->second.first);
iter->second.first->set_active(false);
endpoint_map_.erase(iter);
auto fifo_iter = fifo_map_[peer_nic_path];
if (hand_.has_value() && hand_.value() == fifo_iter) {
Expand Down
Loading