[model-gateway] add image processor and transformer structure#14344
[model-gateway] add image processor and transformer structure#14344
Conversation
Summary of ChangesHello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational structure for image processing and transformation within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request lays a solid foundation for vision processing in Rust by introducing a well-structured module for image transformations, configuration parsing, and a processor registry. The code is clean and includes tests, which is great. My review focuses on improving the robustness of the processor registry, making configuration handling more explicit to avoid surprising defaults, and suggesting performance optimizations for tensor conversions.
| pub fn get_target_size(&self) -> Option<(u32, u32)> { | ||
| self.size.as_ref().map(|s| { | ||
| // Try explicit height/width first | ||
| let h = s | ||
| .get("height") | ||
| .or_else(|| s.get("shortest_edge")) | ||
| .copied() | ||
| .unwrap_or(224); | ||
| let w = s | ||
| .get("width") | ||
| .or_else(|| s.get("shortest_edge")) | ||
| .copied() | ||
| .unwrap_or(224); | ||
| (h, w) | ||
| }) | ||
| } |
There was a problem hiding this comment.
The use of unwrap_or(224) can lead to surprising behavior. For example, if the config contains "size": {}, this function will return Some((224, 224)). It would be more robust to return None if the necessary keys (height/width or shortest_edge) are not present in the size map. This makes the function's behavior more explicit and less magical.
pub fn get_target_size(&self) -> Option<(u32, u32)> {
self.size.as_ref().and_then(|s| {
if let (Some(&h), Some(&w)) = (s.get("height"), s.get("width")) {
Some((h, w))
} else {
s.get("shortest_edge").map(|&edge| (edge, edge))
}
})
}| pub fn get_crop_size(&self) -> Option<(u32, u32)> { | ||
| self.crop_size.as_ref().map(|s| { | ||
| let h = s.get("height").copied().unwrap_or(224); | ||
| let w = s.get("width").copied().unwrap_or(224); | ||
| (h, w) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Similar to get_target_size, get_crop_size uses unwrap_or(224), which can cause unexpected behavior if crop_size is an empty map. It's better to return None if height and width are not specified, making the contract of the function clearer.
pub fn get_crop_size(&self) -> Option<(u32, u32)> {
self.crop_size.as_ref().and_then(|s| {
if let (Some(&h), Some(&w)) = (s.get("height"), s.get("width")) {
Some((h, w))
} else {
None
}
})
}| pub fn to_tensor_no_norm(image: &DynamicImage) -> Array3<f32> { | ||
| let rgb = image.to_rgb8(); | ||
| let (w, h) = (rgb.width() as usize, rgb.height() as usize); | ||
| let mut arr = Array3::<f32>::zeros((3, h, w)); | ||
|
|
||
| for (x, y, pixel) in rgb.enumerate_pixels() { | ||
| let (x, y) = (x as usize, y as usize); | ||
| arr[[0, y, x]] = pixel[0] as f32; | ||
| arr[[1, y, x]] = pixel[1] as f32; | ||
| arr[[2, y, x]] = pixel[2] as f32; | ||
| } | ||
| arr | ||
| } |
There was a problem hiding this comment.
Similar to to_tensor, this function can be optimized by avoiding the pixel-by-pixel loop and using ndarray's vectorized operations on a view of the raw image data. This will improve performance, especially for larger images.
pub fn to_tensor_no_norm(image: &DynamicImage) -> Array3<f32> {
let rgb = image.to_rgb8();
let (h, w) = (rgb.height() as usize, rgb.width() as usize);
let raw_data = rgb.into_raw();
// Create a view of shape [h, w, 3], permute to [3, h, w], then map to f32
ndarray::ArrayView::from_shape((h, w, 3), &raw_data)
.expect("Image buffer should match dimensions")
.permuted_axes([2, 0, 1])
.mapv(|&x| x as f32)
}| return Err(TransformError::InvalidShape { | ||
| expected: format!("[{}, {}, {}]", c, h, w), | ||
| actual: tensor.shape().to_vec(), | ||
| }); |
There was a problem hiding this comment.
When checking for inconsistent tensor shapes, the function returns a TransformError::InvalidShape. However, there is a more specific and descriptive error variant, TransformError::InconsistentShapes, available in the same enum. Using the more specific variant would improve error handling clarity and make debugging easier.
return Err(TransformError::InconsistentShapes);
Checklist