-
Notifications
You must be signed in to change notification settings - Fork 405
change file save buffer size and use external buffer to avoid neighbo… #944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -100,6 +100,23 @@ pub(crate) trait GetAdjacencyList { | |
| /// Retrieve the data stored at index `i`. | ||
| fn get_adjacency_list(&self, i: usize) -> ANNResult<Self::Item<'_>>; | ||
|
|
||
| /// Retrieve the data stored at index `i` into a pre-allocated buffer. | ||
| /// | ||
| /// This method allows callers to reuse a buffer across multiple calls, | ||
| /// avoiding per-call memory allocation overhead. The buffer is cleared | ||
| /// before being populated with the adjacency list data. | ||
| /// | ||
| /// Default implementation falls back to `get_adjacency_list` and copies. | ||
| fn get_adjacency_list_into(&self, i: usize, buffer: &mut Vec<Self::Element>) -> ANNResult<()> | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can make this even more efficient by not adding struct Guarded<'a, T> {
guard: RwLockReadGuard<'a, ()>,
slice: &'a [T],
}
impl<T> Deref for Guarded<'_, T> {
type Target = [T];
fn deref(&self) -> &[T] {
self.slice
}
}This bypasses the copy into the temporary buffer and lets you copy directly from the underlying provider into the destination. This doesn't work for the
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't catch your point here. The optimization is to avoid memory allocation per node to flatten its neighbor list to index file, with a global fixed size buffer we can save allocation time, if we don't want to use a new interface to pass the reference of the global buffer address, we may allocate it beforehand and make it referenceable for get_adjacency_list
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. With your API - you are first copying the data into the externally allocated buffer, then copying it out of that external buffer into the final stream. With the change I showed use using
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I know your meaning now, you want to expose the vertex's neighbor array to final bufwriter to copy, It will save one memcpy operation, but the copy to bufwriter in protection of read lock, which bufwriter is blackbox we don't know when the flush will happen(like if possible part of the copy fill the buffer and trigger flush) and write to disk which is a blocking process and may cause readlock hold for long time compared with a single memcpy, and block other operations like update of the node. memcopy to a small external buffer is very cheap so generally I prefer to copy it which we can ensure hold readlock for short time.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I sympathize with the desire to minimize the duration under which a lock is held, but this leads me to a bigger concern: why do we need to minimize the duration the lock is held in this API? The only reason I can think of is because doing so would prevent other concurrent writes to the graph, but if writes are happening concurrently to a save using this API, isn't the save going to be incorrect/not a faithful snapshot? If writes are not happening concurrently, then it doesn't matter if we hold onto the lock?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, we need consider the write traffic has possibly to update the vertex's neighbors and be blocked by the stream write which fetch read lock. |
||
| where | ||
| Self::Element: Clone, | ||
| { | ||
| buffer.clear(); | ||
| let list = self.get_adjacency_list(i)?; | ||
| buffer.extend_from_slice(&list); | ||
| Ok(()) | ||
| } | ||
|
|
||
| /// Return the total number of elements contained in `self`. | ||
| fn total(&self) -> usize; | ||
|
|
||
|
|
@@ -344,31 +361,31 @@ where | |
| let mut observed_max_degree: u32 = 0; | ||
|
|
||
| out.write_all(&index_size.to_le_bytes())?; | ||
| out.write_all(&observed_max_degree.to_le_bytes())?; // Will be updated later with correct max_degree | ||
| out.write_all(&observed_max_degree.to_le_bytes())?; | ||
| out.write_all(&start_point.to_le_bytes())?; | ||
|
|
||
| out.write_all(&graph.additional_points().to_le_bytes())?; | ||
|
|
||
| let total = graph.total(); | ||
|
|
||
| // Pre-allocate a reusable buffer for adjacency lists | ||
| let initial_capacity = graph.max_degree().map(|d| d as usize).unwrap_or(128); | ||
| let mut neighbor_buffer: Vec<u32> = Vec::with_capacity(initial_capacity); | ||
|
|
||
| for i in 0..total { | ||
| let binding = graph.get_adjacency_list(i)?; | ||
| let neighbors: &[u32] = &binding; | ||
| let num_neighbors: u32 = neighbors.len() as u32; | ||
| // Reuse buffer to avoid per-vertex allocation overhead | ||
| graph.get_adjacency_list_into(i, &mut neighbor_buffer)?; | ||
| let num_neighbors: u32 = neighbor_buffer.len() as u32; | ||
|
|
||
| // Write the number of neighbors as a `u32`. | ||
| out.write_all(&num_neighbors.to_le_bytes())?; | ||
|
|
||
| // Write all the neighbors, applying transformation if provided. | ||
| neighbors | ||
| .iter() | ||
| .copied() | ||
| .try_for_each(|n| out.write_all(&n.to_le_bytes()))?; | ||
| // Bulk write using bytemuck for zero-copy conversion | ||
| let neighbor_bytes: &[u8] = bytemuck::must_cast_slice(&neighbor_buffer); | ||
| out.write_all(neighbor_bytes)?; | ||
|
Comment on lines
379
to
+383
|
||
|
|
||
| observed_max_degree = observed_max_degree.max(num_neighbors); | ||
| index_size += (std::mem::size_of::<u32>() * (1 + neighbors.len())) as u64; | ||
| index_size += (std::mem::size_of::<u32>() * (1 + neighbor_buffer.len())) as u64; | ||
| } | ||
|
|
||
| // Use configured max degree if provided, otherwise use observed | ||
| let max_degree = graph.max_degree().unwrap_or(observed_max_degree); | ||
|
|
||
| // Finish up by writing the observed index size and max degree. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.