Image Journey from Sensor to Host Processor

Image Journey from Sensor to Host Processor

Have you ever wondered how the image sensor works? How are the image colors generated? What Image processing happens behind the scenes?

The image goes through multiple stages from the moment it is captured by the image sensor to be time it gets displayed on a screen. The integration of camera (image sensor) in multiple electronic devices, such as smartphone, tablets, Industrial system or even automotive, made it look trivial and natural. When we want to take a picture, we just use our smartphone and instantly can see the captured image or video on the screen. But what happen behind the scenes is quite impressive in terms of technology.

In this article we will try to explain at a high level the image journey from the image sensor to the host for processing. Each application has its specificity, but the principles are the same.

What is an image sensor?

An image sensor is a device that converts the light associated with an image to a digital format. The amount of photons that hits the image sensor, composed of what is called a pixels, creates an electrical signal. A number of these pixel arranged in rows and columns define the resolution of an image sensor. In general, a higher number of pixels will provide a higher resolution of the captured image.

Someone may ask: how does the image sensor recognize the different colors that represent the image? The image sensor works very similarly to the human eye. If we look at the human eye we can identify a couple of components: lens, pupil, iris, and retina to mention a few of these. The lens is held with ligament inside the eye connected to some mussels that can apply a certain force to change its shape and hence focus the light on the retina.

?Similarly, when you use a camera, you may manipulate the lens to get a focused image (In most cases this is automatic process). The iris on the other hand, control the pupil that gets wider or narrower to regulate the amount of light that can be seen by the retina. Likewise, the digital camera system will have auto exposure (AE) that manage the amount of time the image sensor (pixels) gets exposed to the external light. In low light condition the exposure time is longer to allow more lights to be captured by each pixel. And finally, the retina captures the light and communicate that to the brain through the nerves.

Human eye

The retina is composed of Cone and Rod cells, Cones are sensitive to color and rods are sensitive to brightness. We can categorize the cones into 3 categories based on their sensitivity to colors wavelength. 50% of the cones are sensitive to the green color, 25% to red and the remaining 25% to blue color. A couple of interesting outcomes from this:

  • First the eye works based on what is called additive colors of lights: Red, Green, and Blue (RGB). Based on theses primary colors we can create any other color by mixing a certain proportion of the RGB primary colors

RGB Color Space

In the digital world we can represent each color intensity on 8 bits. This means that the intensity of light could have up to 256 levels for each primary color (0 to 255). A Red color will be represented by 255 on the red component, 0 on the green and blue component RGB (255,0,0). A black color means the absence of light hence a (0,0,0) representation. If we count the number of possible primary color intensity combinations (256 light intensity per primary color), we can have over 16 million different color variants generated. In fact, the color codes that you see in most documentation is based on this principle as shown in the table below:

RGB color Hex code

Each two Hexadecimal digits represent the intensity of each of the RGB components. Some image sensors use a 10 bit per color or even up to 24 bits per color, you can do the math to determine the number of colors that could be represented.

  • ?Second, the human eye is more sensitive to green than the other 2 primary color components (Red and Blue). This is very important characteristics that was the basis of Bryce Bayer theory which is used in most today’s image sensors. The essence of this theory is that in order to have a true representation of the image, you do not need to capture the light associated with each component of the primary color for each pixel. You can mimic the eye behavior by having 50% of the pixel that capture the green component, 25% for the Blue and the remaining 25% for Red. This allows to have a more optimized silicon area for the image sensor compared to more traditional CCD (Charge-Coupled Device) sensors where each color is captured separately for each pixel.

Bayer Filter (By Bryce Edward Bayer-1974)

Using the Debayer filter we can create a calculated RGB representation for each pixel by analyzing neighbour pixels.?If we take the example of BGGR Bayer filter pattern, we have a Red filter on the middle pixel surrounded with Green and blue filters on the surrounding pixels as shown in the example below. The pattern repeats for all the groups of 2x2 pixels. By analyzing the neighbour pixels, we can interpolate the RGB values for each pixel. In the example below we have the RGGB Debayer algorithm calculation: ?

Debayer filter RGB interpolation

Image journey from the sensor to the Host processor

The image and processing associated with it depends on the end application. It is not the same processing that is involved if you capture an image and have that displayed on a screen versus an automotive image sensor that captures the image and process it to be used as an input to the autonomous car driving system. Multiple ISPs (Image Signal Processing) are available for each use case, let’s first explore the image journey from its creation till it become usable.

Image sensor operation

To help understand how the image sensor work, we can refer to the block Diagram below that highlights the main components that are involved in the image sensor and how the image is constructed.

When the sensor is exposed to light, each pixel will be able to capture the light intensity and convert it to an electrical signal. There is an adjustable gain amplifier that boost that electrical signal as needed based on the light intensity (e.g. in low light condition we may have increase the gain factor). The follow-on stage is the ADC (Analog to Digital Converter), that allow us to represent the captured analog voltage into the digital domain. The number of bits used for this transition defines the resolution of the RAW image. That’s why we hear terms like RAW8 or RAW10, this means that the ADC is 8 bit or 10 bits.

The Full image is constructed pixel by pixel organised in rows from left to right. A frame represents one captured image (e.g. 1920x1080 image would have 1080 lines each with 1920 pixels). The resolution of the image will determine how many lines and columns the image has. Control signals are used to manage the pixel arrangements in lines and frames. At the end of each line a line valid (lv) signal is asserted. Same for the frame, frame valid (fv) signal is used to indicate the end of a frame transmission. The term frame per second is used for video transmission where several frames are transmitted per second using the same principle described here (e.g. 30fps represent 30 images or frames per second).?

Obviously, the reality is a little bit more complex as there are blanking periods and some sensors may do basic image processing, but that is not part of the scope of this article.??

Image journey from image sensor to host processor

Sensor Interface protocols

Once the image is built, it needs to be transferred to the next stage. Multiple interface protocols could be used to perform this task. The most basic interface would be to have a parallel bus for pixel data and control signals like (Line valid and Frame valid).

With increased bandwidth more complex protocols are used to minimize the number of interface lanes, improve bandwidth and guarantee data integrity. The most popular interface is CSI-2 developed by MIPI Alliance (open membership organization that develops interface specifications for mobile and mobile-influenced industries). That’s what is used in the majority of smart Phone’s camera today. Other interfaces could also be used to name few: SLVS, SLVS-EC, SubLVDS, etc. each of these will have a different way or modeling the image sensor interface logic.

Image sensor interface parallel vs MIPI CSI-2

The processor on the other hand could have a nonmatching interface to the sensor. That’s where bridging function may be needed to adapt the image sensor interface to the processor interface. Another very common use case is when the processor has a few MIPI interfaces and need to be connected to a large number of cameras. In this case the bridging device could be doing the aggregation function also. Example, having to interface 8 different image sensors to one MIPI CSI-2 input on the processor. One of the nice features of MIPI CSI-2 protocol is the capability to use virtual channel concept which allow you to map multiple asynchronous image sensors to one aggregated CSI-2 interface.

8 MIPI CSI-2 Image sensor aggregation

From RAW to processed image

The final stage of the image journey is the image processing. RAW image represents the image captured by the sensor before any processing (sometime with minimal processing).?? As seen in the previous section the RAW image from the sensor will be in a mosaic and not usable as is, even though all the true color representation of the captured image are in the RAW image. you can make analogy with the old camera where you use a chemical film to take pictures then get it developed in a colorful images. In the digital world the RAW image must go through ISP (Image Signal Processing) algorithms to be useful for the target application.?

Multiple image processing algorithm could be used based on target application. Below are some of the ISP algorithms used:

  • Auto White Balance (AWB): Adjusts the white color of the image. Some time when you take a picture in low light condition, you may notice that it is yellowish. The white color is not truly represented. AWB balance allows to enhance the image white color to make it look more natural.
  • Auto Exposure (AE): The objective of this function is to manage the exposure time of the sensor to the incoming light in order to have the right amount of light captured by the image sensor. In a very bright condition, the exposer time should be minimal to avoid saturation of the pixels that could lead to a bright image that does not show details.
  • Color correction Matrix (CCM): Sensors will go through a calibration process using a known color pattern that allows to correct the generated colors.
  • Color Space Conversion (CSC):? Multiple color space representations are used in the industry. The conversion between these is possible using a mathematical equation (e.g. RGB to YCbCr).
  • High Dynamic Range (HDR): There are situations where the captured image has a highly bright spot. This can make the sensor exposure time adjusted to a low value which makes the other details in the picture less visible. The solution to these problems is to capture multiple images with different exposures and combine them. HDR processing allows us to perform this function.
  • Edge Detection:? In some applications we may want to detect the objects’ edges (like in automotive camera that capture and analyze road signs). Edge detection algorithm detects brightness discontinuity to perform this function. used in computer vision, and machine vision.
  • 2D Scaler: Image scaling function is another algorithm that could be used to change the size of the image as needed (upscale or downscale to fit a target resolution).

Conclusion

A lot could be discussed on this subject and could get very technical. This article provides a high-level overview and good starting point to dive into this topic. FPGAs are very well suited to implement bridging and aggregation solution as well as full image processing pipeline. If you want to get deep technical on these topics, you can refer to the Lattice semiconductor website or Lattice Insights the official training platform. A large library of IPs and Reference Designs are available. In addition, Lattice offer what is called solution stack like “SenseAI” or “mVision” that represent a complete solution setup (Hardware, software, firmware, IPs) that could be used as a starting point to your target application. “mVision” solutions stack includes everything embedded vision system designers need to evaluate, develop, and deploy FPGA-based embedded vision applications, such as machine vision, robotics, ADAS, video surveillance, and drones. “SensAI” stack includes everything you need to evaluate, develop and deploy FPGA-based Machine Learning / Artificial Intelligence solutions.

Enjoy the learning journey!

MZ


Matt Holdsworth

UK and NORDIC Senior Field Applications Engineer at Lattice Semiconductors Inc.

8 个月

Great article!

Steve Ong

Senior Director, Worldwide Applications & Solutions Engineering at Lattice Semiconductor

8 个月

Very insightful article!

Very informative, thanks

要查看或添加评论,请登录

Mourad ZAKHAMA (EE, M.Sc., MBA)的更多文章

社区洞察

其他会员也浏览了