Digital (image) transformation

Digital (image) transformation

Digital transformation can mean many things to many people. In this post, I avoid the bigger question and demonstrate how we can transform an image to provide more of a bird's eye view. All we need is Python, OpenCV and a little trial and error. Sometimes it's good to see things from a different perspective.

This article was originally posted on my blog site (below). Please feel free to ignore the grey code below - you'll hopefully get the idea without it.

The challenge

My aim is to provide a way to create orthogonal images or basically to 'square up' images that have been taken from a different perspective. I took the first image (above) of a dummy credit card as a test case. I wanted a simple image to start with that has clearly defined boundaries. The smaller text on the card provides a way to asess how well the overall transformation works.

The use case is largely for whiteboard images so a 2D image should not constrain us too much. There are commercial applications that will do this for you - but where's the fun in that! This prototype uses OpenCV for processing images.

OpenCV is an open source computer vision and machine learning software library. It is freely downloadable and has many features that will help with this challenge. The final image (pictured 6 above) is a result only of applying the Python code - with no further post-processing.

The entire application is a little less than 80 lines of code - with many of these lines added purely to show what's happening. Without the narrative, it should be possible to reduce this considerably. As an intitial attempt, the outcomes is reasonably accurate with the smaller text seeming at least as readable in the final image.

The processing workflow

Figure 1 shows the basic steps to transform the image of a credit card without technical details. The full code is available on GitHub and suggestions are welcome as always.

Series of images showing how to transform an initial picture to a different perspective.

Step 1: Loading your image of choice

The load_image function allows us to read an image into memory and resize it as necessary.

def load_image(image_path, width, height):
    """
    Loads an image from the specified file and changes its size.
    :param image_path: The relative path & name of the image file.
    :param width: The required width of the image.
    :param height: The required height of the image.
    :return: A re-sized object representing the image.
    """
    img = cv2.imread(image_path)
    return cv2.resize(img, (width, height))

# Load an image and resize it.
test_image = load_image('card3.jpg', 800, 600)

Step 2: Creating an image mask

The OpenCV threshold function allows us to create a mask for our image. The COLOR_BGR2GRAY constant is one of many colour converters that OpenCV supplies.

def create_binary_image(image):
    """
    Creates a binary threshold of the image by
    classifying each pixel as either black or white.
    :param image: The image to be processed.
    :return: A classified image.
    """
    new_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, new_image = cv2.threshold(new_image, 225, 255, cv2.THRESH_BINARY_INV)
    return new_image


threshold_image = create_binary_image(image_name)

Step 3: Adding contours

A contour is essentially a curve joining all the continuous points (along a boundary), that have the same color or intensity. Contouring provides a useful tool for shape analysis and object detection. We are interested only in the object's boundary and our binary image (created in step 2) should work ok.

# Get an array of contours from the binary image we previously created.
contours, _ = cv2.findContours(threshold_image, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours_restricted = sorted(contours, key=cv2.contourArea, reverse=True)[0]

The code above creates contours for our binary image, the cv2.CHAIN_APPROX_SIMPLE parameter removes redundant points thereby requiring less memory.

Step 4: Finding the corners

This step uses something called the Ramer–Douglas–Peucker algorithm to find the corners of the object in question. It approximates a contour shape to another shape with less vertices depending upon the precision (epsilon) we specify. If we choose a good value for epsilon, it should work reasonably well.

def get_start_image_corners(source_image, image_contours):
    """
    Returns the start coordinates for the corners of the image.
    :param source_image:
    :param image_contours:
    :return:
    """
    epsilon = 0.05 * cv2.arcLength(image_contours, True)
    approx_corners = cv2.approxPolyDP(image_contours, epsilon, True)
    approx_corners = sorted(np.concatenate(approx_corners).tolist())

There is no need to fully understand this algorithm, a value of epsilon equating to 5% of the object perimeter works well enough

Step 5: Transforming the image

In this step, the corner coordinates for the transformed image are calculated before passing them to OpenCV to do all the heavy lifting. The code uses pythagoras' theorem to calculate the width and height of the transformed image. It then finds the coordinates of each corner by using (0, 0) as the top left hand corner and offsetting the other corners appropriately. In reality, I added an arbirtary offset to effectively move the final image from the edge of its canvas.

def transform_image(source_image, start_pos, end_pos):
    """
    This is where the magic happens. OpenCV provides the 2 functions required to
    transform the image once the start and end corner coordinates are provided.
    :param source_image:
    :param start_pos:
    :param end_pos:
    :return: The transformed image.
    """
    h, w = source_image.shape[:2]
    homography, _ = cv2.findHomography(start_pos, end_pos, 
      method=cv2.RANSAC, ransacReprojThreshold=3.0)
    transform = cv2.warpPerspective(source_image, homography, 
      (w, h), flags=cv2.INTER_LINEAR)

    return transform

There are some simple utility functions that help to calculate coordinates and dimensions.

A couple other tests

I snapped another couple of images from my office to see how well the code handled them in terms of camparing the before and after versions. First was the picture on my wall.

No alt text provided for this image

A much harder test was my whiteboard - it's actually much more of a textured rock board as you can see below. This is actually two boards side by side which leads to more of a skew in the original picture. And the lighting and reflections all played their part.

No alt text provided for this image

I was still reasonably pleased with the outcome given the constraints and the limited time spent with the code.

Closing thoughts

This post will hopefully point you in the right direction if you are interested in similar challenges. It's not perfect, by any means, and there are many changes you can make to improve the way it works. I am thinking of extending my Sudoku solution to read images from a puzzle book and the code here will certainly help. It won't handle the OCR to convert parts of the image to digital numbers but that's a challenge for another day.

I'm sure it would not be too hard to fool the code presented here. Still, this is meant as a guide to what is possible rather than a production strength solution. Also, the code peforms much better with a flat background and I admit to blanking out the space behind my whiteboard before letting the algorithm loose on it. Pre-processing was not a goal of this exercise so I'm comfortable with a little preparation work.

If you are looking at a larger digital transformation within your business then I'd be delighted to help. Please feel free to get in touch. My company, Objectivity, has been helping our clients for almost 30 years to derive business value from technology.

Pawe? Wichary

Software Architect and Developer

4 年

Great book for any beginner is "Introduction to computer graphics" by James Foley and others. Don't be scare of first chapter where an author describes old fasioned printer (from 90 I guess), the knowledge in following chapters is still alive and you will get all nicely explained. https://www.amazon.com/Introduction-Computer-Graphics-James-Foley/dp/0201609215

William Bradley

IoT Product Specialist at AMADA UK

4 年

Good read there Matt, nice one!

要查看或添加评论,请登录

Matthew Weaver的更多文章

  • Why do your friends have more friends than you?

    Why do your friends have more friends than you?

    It’s more of a statistical trait than a personal one. Have you ever thought that your friends on social media seem to…

  • Ask the Right Questions, Get the Right?Answers

    Ask the Right Questions, Get the Right?Answers

    The ability to ask great questions is one of the biggest differentiators between a great leader and a mediocre one -…

  • Why do LLMs Hallucinate?

    Why do LLMs Hallucinate?

    Because response diversity cannot exist without it Large language models (LLMs) generate text by predicting the most…

  • The Devil’s in the Detail: Why Coastlines Defy Measurement

    The Devil’s in the Detail: Why Coastlines Defy Measurement

    We’ll use a simple fractal to make sense of it all Many of us have dreamt about owning a private island to retire to…

    4 条评论
  • Don’t Be Fooled: Sneaky Stats Can Sabotage Your Sales Analysis

    Don’t Be Fooled: Sneaky Stats Can Sabotage Your Sales Analysis

    Uncover a hidden statistical trap that may be misleading you into making poor business decisions. Imagine you are the…

    2 条评论
  • The long and winding road

    The long and winding road

    Last week, I received an email telling me a recent online purchase was out for delivery. There was a real-time tracking…

    8 条评论
  • Not all choices are equal

    Not all choices are equal

    We are all familiar with the classic dilemma - you have two (or more) choices and must decide on the best option…

  • Simple is not always easy

    Simple is not always easy

    Last week I wrote about a simple process that can generate complex patterns. Today's topic is the equally 'simple' but…

    7 条评论
  • Can random choices lead to predictable outcomes?

    Can random choices lead to predictable outcomes?

    The simplest algorithms can sometimes generate unexpected outcomes. This post looks at a simple, three-step process and…

  • My thoughts on Future Decoded 2019

    My thoughts on Future Decoded 2019

    A couple of weeks ago, a familiar journey ended as I arrived at London ExCeL for Microsoft Future Decoded (FD) 2019. My…

    3 条评论