Skip to content

Refactor edge detection to use imageproc crate instead of manual implementation #204

@lfgranja

Description

@lfgranja

Summary

Replace the current manual gradient-based edge detection implementation with the well-tested imageproc crate's Canny edge detector. This will improve edge detection quality, reduce code maintenance burden, and provide better noise handling for document/table processing.

Current Implementation

The current edge detection is implemented manually in two files:

src/table_transformer.rs (lines 409-424)

fn detect_edges(&self, img: &ImageBuffer<Luma<u8>, Vec<u8>>) -> ImageBuffer<Luma<u8>, Vec<u8>> {
    let (width, height) = img.dimensions();
    let mut edges = ImageBuffer::new(width, height);

    for y in 1..height.saturating_sub(1) {
        for x in 1..width.saturating_sub(1) {
            // Naive gradient computation
            let gx = (img.get_pixel(x + 1, y)[0] as i16) - (img.get_pixel(x - 1, y)[0] as i16);
            let gy = (img.get_pixel(x, y + 1)[0] as i16) - (img.get_pixel(x, y - 1)[0] as i16);
            let magnitude = ((gx * gx + gy * gy) as f32).sqrt();
            let val = if magnitude > 50.0 { 255 } else { 0 };  // Hard-coded threshold
            edges.put_pixel(x, y, Luma([val]));
        }
    }
    edges
}

src/table_structure.rs (lines 55-70)

Nearly identical implementation with the same naive gradient magnitude approach.

Issues with Current Implementation

  1. Hard-coded threshold (50.0) - not adaptive to document quality
  2. No hysteresis thresholding - weaker edges are not properly suppressed
  3. No non-maximum suppression - produces thick edges
  4. Sensitive to noise - no Gaussian smoothing or noise filtering
  5. No morphological cleanup - gaps in table lines remain undetected

Proposed Solution

Option A: Minimal Change (Recommended for Initial Implementation)

Replace the manual gradient with imageproc::edges::canny:

use imageproc::edges::canny;

fn detect_edges(&self, img: &ImageBuffer<Luma<u8>, Vec<u8>>) -> ImageBuffer<Luma<u8>, Vec<u8>> {
    // Use Canny with adaptive thresholds
    // Max edge strength: sqrt(5) * 2 * 255 ≈ 1140.39
    canny(img, 50.0, 100.0)
}

Pros:

  • Single function call
  • Better edge quality with hysteresis thresholding
  • Non-maximum suppression for thin edges
  • Still uses existing image crate types

Cons:

  • ~2-3x slower than naive implementation (but more accurate)

Option B: Full Pipeline with Morphological Cleanup

For production-grade table detection:

use imageproc::edges::canny;
use imageproc::morphology::{close, open};
use imageproc::distance_transform::Norm;

fn detect_table_edges(&self, img: &GrayImage) -> GrayImage {
    // 1. Canny edge detection
    let edges = canny(img, 50.0, 100.0);
    
    // 2. Connect broken table lines (bridge gaps)
    let closed = close(&edges, Norm::LInf, 2);
    
    // 3. Remove small noise artifacts
    let cleaned = open(&closed, Norm::L1, 1);
    
    cleaned
}

Dependencies

imageproc is already a dependency in Cargo.toml. Enable rayon feature for parallel processing:

[dependencies]
imageproc = { version = "0.25", features = ["rayon"] }

Performance Considerations

Algorithm Speed Noise Handling Edge Accuracy
Current (naive gradient) Fast Poor Low
Canny (imageproc) Moderate Excellent High
Canny + Morphology Slower Best Highest
  • Canny detects 53-79% of edges accurately in noisy images (vs. ~30-40% for naive gradient)
  • Trade-off: 2-3x slower but much more robust for real-world documents

Implementation Steps

  • Update Cargo.toml to enable rayon feature for imageproc
  • Replace detect_edges in src/table_transformer.rs
  • Replace detect_edges in src/table_structure.rs
  • Add adaptive threshold computation (optional enhancement)
  • Add morphological cleanup for table-specific optimization
  • Update tests to verify edge detection quality
  • Benchmark performance impact

Additional Resources

Related Code

  • detect_lines() function (lines 426-503 in table_transformer.rs) - uses output from detect_edges
  • extract_cells_from_lines() - processes detected lines for cell extraction
  • MIN_LINE_LENGTH_RATIO constant (0.3) - threshold for line detection

Acceptance Criteria

  • Edge detection uses imageproc::edges::canny instead of manual gradient
  • All existing tests pass
  • Performance is acceptable (within 3x of current implementation)
  • Edge quality improves for noisy/low-quality document images
  • Code is simpler and more maintainable

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions