Skip to content

Extracting text is not working correctly in v5.3.0-2 in Alpine #293

@dongzhou-coder

Description

@dongzhou-coder

Summary

Hey, we are using gosseract library to extract text from certain png file but the output is not expected.
It was expected to extract "Jan3Fri2014" but the output is "Jan3Fr12014".

Reproducibility

Reproducibility Frequency

  • 100%

Test logic

package main

import (
	"strings"
	"testing"

	"github.com/otiai10/gosseract/v2"
)

func TestGenerateImage(t *testing.T) {

	var text = "Jan 3 Fri 2014"

	client := gosseract.NewClient()
	client.SetImage("./out.png")
	out, err := client.Text()
	if err != nil {
		t.Error(err)
	}
	if removeSpace(out) != removeSpace(text) {
		t.Errorf("expect: %s, got %s", removeSpace(text), removeSpace(out))
	}
	client.Close()
}

func removeSpace(s string) string {
	s = strings.TrimSpace(s)
	return strings.Join(strings.Split(s, " "), "")
}

Reproducible Dockerfile

FROM alpine:latest

RUN apk update
RUN apk add \
    g++ \
    git \
    musl-dev \
    go \
    tesseract-ocr-dev
RUN apk add tesseract-ocr-data-eng

ENV GOPATH=/root/go

ADD . ${GOPATH}/src/github.com/otiai10/gosseract
WORKDIR ${GOPATH}/src/github.com/otiai10/gosseract

ENV GOSSERACT_CPPSTDERR_NOT_CAPTURED=1
CMD ["go", "test", "-v", "./..."]

The png file has been attached.

out

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions