Align text images with OpenCV using Python

Most optical character recognition(OCR) software first aligns the image properly before detecting the text in it. Here I’ll show you how to do that with OpenCV:

Initial image on the left, aligned image on the right

First we need to import opencv:

import cv2

Let’s read an image in. You probably will have a color image, so first we need to convert it to gray scale (one channel only).

image = cv2.imread("text_rotated.png")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Initial image, converted to grayscale(one channel)

We’re interested in the text itself and nothing else, so we’re going to create an image that represents this. Non-zero pixels represent what we want, and zero pixels represent background. To do that, we will threshold the grayscale image, so that only the text has non-zero values. The explanation of the inputs is simple: gray is the grayscale image used as input, the next argument is the threshold value, set to 0, although it is ignored since it will be calculated by the Otsu algorithm(it’s 170 in this case) because we’re using THRESH_OTSU flag. The next argument is 255 which is the value to set the pixels that pass the threshold, and finally the other flag indicates that we want to use a binary output (all or nothing), and that we want the output reversed (since the text is black in the original). The returned values are the actual value of the threshold to be used as calculated by the Otsu algorithm(otsu_thresh), and the thresholded image itself (thresh).

otsu_thresh, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)

By thresholding the initial image, we now have an image that has non-zero values for the pixels that represent text, although it still has some noise

First we need to remove those extra white pixels outside of the area of the text. We can do that with the morphological operator Open, which basically erodes the image (makes the white areas smaller), and then dilates it back (make the white areas larger again). By doing that any small dots, or noise in the image, will be removed. The larger the kernel size, the more noise will be removed. You can do this either by hand, or with an iterative method that checks, for example, the ratio of non-zero to zero pixels for each value, and select the one that maximizes that metric.

kernel_size = 4
ksize=(kernel_size, kernel_size)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, ksize)
thresh_filtered = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)

We now have an image that only represents text, without any external noise

Now that we have an image that represents the text, we’re interested in knowing the angle in which it is rotated. There are a few different ways of doing this. Since OpenCV has a convenient function that gives you the minimum rectangle that contains all non-zero values, we could use that. Let’s see how it works:

nonZeroCoordinates = cv2.findNonZero(thresh_filtered)
imageCopy = image.copy()
for pt in nonZeroCoordinates:
    imageCopy = cv2.circle(imageCopy, (pt[0][0], pt[0][1]), 1, (255, 0, 0))

In blue are all the pixels selected as text in the original image. We don’t need to be exact here and select every text pixel, but we don’t want any non text pixel to be blue.

We can now use those coordinates and ask OpenCV to give us the minimum rectangle that contains them:

box = cv2.minAreaRect(nonZeroCoordinates)
boxPts = cv2.boxPoints(box)
for i in range(4):
    pt1 = (boxPts[i][0], boxPts[i][1])
    pt2 = (boxPts[(i+1)%4][0], boxPts[(i+1)%4][1])
    cv2.line(imageCopy, pt1, pt2, (0,255,0), 2, cv2.LINE_AA);

In green is the minimum rectangle that contains all the blue points, which represents text in this case

The estimated angle can then be simply retrieved from the returned rectangle. Note: Remember to double check the returned angle, as it might be different to what you’re expecting. The function minAreaRect always returns angles between 0 and -90 degrees.

angle = box[2]
if(angle < -45):
    angle = 90 + angle

Once you have the estimated angle, it’s time to rotate the image back. First we calculate a rotation matrix based on the angle, and then we apply it. The rotation angle is expressed in degrees, and positive values mean a counter-clockwise rotation.

h, w, c = image.shape
scale = 1.
center = (w/2., h/2.)
M = cv2.getRotationMatrix2D(center, angle, scale)
rotated = image.copy()
cv2.warpAffine(image, M, (w, h), rotated, cv2.INTER_CUBIC, cv2.BORDER_REPLICATE )