Summary
Replace the current manual gradient-based edge detection implementation with the well-tested imageproc crate's Canny edge detector. This will improve edge detection quality, reduce code maintenance burden, and provide better noise handling for document/table processing.
Current Implementation
The current edge detection is implemented manually in two files:
src/table_transformer.rs (lines 409-424)
fn detect_edges(&self, img: &ImageBuffer<Luma<u8>, Vec<u8>>) -> ImageBuffer<Luma<u8>, Vec<u8>> {
let (width, height) = img.dimensions();
let mut edges = ImageBuffer::new(width, height);
for y in 1..height.saturating_sub(1) {
for x in 1..width.saturating_sub(1) {
// Naive gradient computation
let gx = (img.get_pixel(x + 1, y)[0] as i16) - (img.get_pixel(x - 1, y)[0] as i16);
let gy = (img.get_pixel(x, y + 1)[0] as i16) - (img.get_pixel(x, y - 1)[0] as i16);
let magnitude = ((gx * gx + gy * gy) as f32).sqrt();
let val = if magnitude > 50.0 { 255 } else { 0 }; // Hard-coded threshold
edges.put_pixel(x, y, Luma([val]));
}
}
edges
}
src/table_structure.rs (lines 55-70)
Nearly identical implementation with the same naive gradient magnitude approach.
Issues with Current Implementation
- Hard-coded threshold (50.0) - not adaptive to document quality
- No hysteresis thresholding - weaker edges are not properly suppressed
- No non-maximum suppression - produces thick edges
- Sensitive to noise - no Gaussian smoothing or noise filtering
- No morphological cleanup - gaps in table lines remain undetected
Proposed Solution
Option A: Minimal Change (Recommended for Initial Implementation)
Replace the manual gradient with imageproc::edges::canny:
use imageproc::edges::canny;
fn detect_edges(&self, img: &ImageBuffer<Luma<u8>, Vec<u8>>) -> ImageBuffer<Luma<u8>, Vec<u8>> {
// Use Canny with adaptive thresholds
// Max edge strength: sqrt(5) * 2 * 255 ≈ 1140.39
canny(img, 50.0, 100.0)
}
Pros:
- Single function call
- Better edge quality with hysteresis thresholding
- Non-maximum suppression for thin edges
- Still uses existing
image crate types
Cons:
- ~2-3x slower than naive implementation (but more accurate)
Option B: Full Pipeline with Morphological Cleanup
For production-grade table detection:
use imageproc::edges::canny;
use imageproc::morphology::{close, open};
use imageproc::distance_transform::Norm;
fn detect_table_edges(&self, img: &GrayImage) -> GrayImage {
// 1. Canny edge detection
let edges = canny(img, 50.0, 100.0);
// 2. Connect broken table lines (bridge gaps)
let closed = close(&edges, Norm::LInf, 2);
// 3. Remove small noise artifacts
let cleaned = open(&closed, Norm::L1, 1);
cleaned
}
Dependencies
imageproc is already a dependency in Cargo.toml. Enable rayon feature for parallel processing:
[dependencies]
imageproc = { version = "0.25", features = ["rayon"] }
Performance Considerations
| Algorithm |
Speed |
Noise Handling |
Edge Accuracy |
| Current (naive gradient) |
Fast |
Poor |
Low |
| Canny (imageproc) |
Moderate |
Excellent |
High |
| Canny + Morphology |
Slower |
Best |
Highest |
- Canny detects 53-79% of edges accurately in noisy images (vs. ~30-40% for naive gradient)
- Trade-off: 2-3x slower but much more robust for real-world documents
Implementation Steps
Additional Resources
Related Code
detect_lines() function (lines 426-503 in table_transformer.rs) - uses output from detect_edges
extract_cells_from_lines() - processes detected lines for cell extraction
MIN_LINE_LENGTH_RATIO constant (0.3) - threshold for line detection
Acceptance Criteria
Summary
Replace the current manual gradient-based edge detection implementation with the well-tested
imageproccrate's Canny edge detector. This will improve edge detection quality, reduce code maintenance burden, and provide better noise handling for document/table processing.Current Implementation
The current edge detection is implemented manually in two files:
src/table_transformer.rs(lines 409-424)src/table_structure.rs(lines 55-70)Nearly identical implementation with the same naive gradient magnitude approach.
Issues with Current Implementation
Proposed Solution
Option A: Minimal Change (Recommended for Initial Implementation)
Replace the manual gradient with
imageproc::edges::canny:Pros:
imagecrate typesCons:
Option B: Full Pipeline with Morphological Cleanup
For production-grade table detection:
Dependencies
imageprocis already a dependency inCargo.toml. Enablerayonfeature for parallel processing:Performance Considerations
Implementation Steps
Cargo.tomlto enablerayonfeature forimageprocdetect_edgesinsrc/table_transformer.rsdetect_edgesinsrc/table_structure.rsAdditional Resources
Related Code
detect_lines()function (lines 426-503 in table_transformer.rs) - uses output fromdetect_edgesextract_cells_from_lines()- processes detected lines for cell extractionMIN_LINE_LENGTH_RATIOconstant (0.3) - threshold for line detectionAcceptance Criteria
imageproc::edges::cannyinstead of manual gradient