Image Journey from Sensor to Host Processor
Have you ever wondered how the image sensor works? How are the image colors generated? What Image processing happens behind the scenes?
The image goes through multiple stages from the moment it is captured by the image sensor to be time it gets displayed on a screen. The integration of camera (image sensor) in multiple electronic devices, such as smartphone, tablets, Industrial system or even automotive, made it look trivial and natural. When we want to take a picture, we just use our smartphone and instantly can see the captured image or video on the screen. But what happen behind the scenes is quite impressive in terms of technology.
In this article we will try to explain at a high level the image journey from the image sensor to the host for processing. Each application has its specificity, but the principles are the same.
What is an image sensor?
An image sensor is a device that converts the light associated with an image to a digital format. The amount of photons that hits the image sensor, composed of what is called a pixels, creates an electrical signal. A number of these pixel arranged in rows and columns define the resolution of an image sensor. In general, a higher number of pixels will provide a higher resolution of the captured image.
Someone may ask: how does the image sensor recognize the different colors that represent the image? The image sensor works very similarly to the human eye. If we look at the human eye we can identify a couple of components: lens, pupil, iris, and retina to mention a few of these. The lens is held with ligament inside the eye connected to some mussels that can apply a certain force to change its shape and hence focus the light on the retina.
?Similarly, when you use a camera, you may manipulate the lens to get a focused image (In most cases this is automatic process). The iris on the other hand, control the pupil that gets wider or narrower to regulate the amount of light that can be seen by the retina. Likewise, the digital camera system will have auto exposure (AE) that manage the amount of time the image sensor (pixels) gets exposed to the external light. In low light condition the exposure time is longer to allow more lights to be captured by each pixel. And finally, the retina captures the light and communicate that to the brain through the nerves.
The retina is composed of Cone and Rod cells, Cones are sensitive to color and rods are sensitive to brightness. We can categorize the cones into 3 categories based on their sensitivity to colors wavelength. 50% of the cones are sensitive to the green color, 25% to red and the remaining 25% to blue color. A couple of interesting outcomes from this:
In the digital world we can represent each color intensity on 8 bits. This means that the intensity of light could have up to 256 levels for each primary color (0 to 255). A Red color will be represented by 255 on the red component, 0 on the green and blue component RGB (255,0,0). A black color means the absence of light hence a (0,0,0) representation. If we count the number of possible primary color intensity combinations (256 light intensity per primary color), we can have over 16 million different color variants generated. In fact, the color codes that you see in most documentation is based on this principle as shown in the table below:
Each two Hexadecimal digits represent the intensity of each of the RGB components. Some image sensors use a 10 bit per color or even up to 24 bits per color, you can do the math to determine the number of colors that could be represented.
Using the Debayer filter we can create a calculated RGB representation for each pixel by analyzing neighbour pixels.?If we take the example of BGGR Bayer filter pattern, we have a Red filter on the middle pixel surrounded with Green and blue filters on the surrounding pixels as shown in the example below. The pattern repeats for all the groups of 2x2 pixels. By analyzing the neighbour pixels, we can interpolate the RGB values for each pixel. In the example below we have the RGGB Debayer algorithm calculation: ?
Image journey from the sensor to the Host processor
The image and processing associated with it depends on the end application. It is not the same processing that is involved if you capture an image and have that displayed on a screen versus an automotive image sensor that captures the image and process it to be used as an input to the autonomous car driving system. Multiple ISPs (Image Signal Processing) are available for each use case, let’s first explore the image journey from its creation till it become usable.
领英推荐
Image sensor operation
To help understand how the image sensor work, we can refer to the block Diagram below that highlights the main components that are involved in the image sensor and how the image is constructed.
When the sensor is exposed to light, each pixel will be able to capture the light intensity and convert it to an electrical signal. There is an adjustable gain amplifier that boost that electrical signal as needed based on the light intensity (e.g. in low light condition we may have increase the gain factor). The follow-on stage is the ADC (Analog to Digital Converter), that allow us to represent the captured analog voltage into the digital domain. The number of bits used for this transition defines the resolution of the RAW image. That’s why we hear terms like RAW8 or RAW10, this means that the ADC is 8 bit or 10 bits.
The Full image is constructed pixel by pixel organised in rows from left to right. A frame represents one captured image (e.g. 1920x1080 image would have 1080 lines each with 1920 pixels). The resolution of the image will determine how many lines and columns the image has. Control signals are used to manage the pixel arrangements in lines and frames. At the end of each line a line valid (lv) signal is asserted. Same for the frame, frame valid (fv) signal is used to indicate the end of a frame transmission. The term frame per second is used for video transmission where several frames are transmitted per second using the same principle described here (e.g. 30fps represent 30 images or frames per second).?
Obviously, the reality is a little bit more complex as there are blanking periods and some sensors may do basic image processing, but that is not part of the scope of this article.??
Sensor Interface protocols
Once the image is built, it needs to be transferred to the next stage. Multiple interface protocols could be used to perform this task. The most basic interface would be to have a parallel bus for pixel data and control signals like (Line valid and Frame valid).
With increased bandwidth more complex protocols are used to minimize the number of interface lanes, improve bandwidth and guarantee data integrity. The most popular interface is CSI-2 developed by MIPI Alliance (open membership organization that develops interface specifications for mobile and mobile-influenced industries). That’s what is used in the majority of smart Phone’s camera today. Other interfaces could also be used to name few: SLVS, SLVS-EC, SubLVDS, etc. each of these will have a different way or modeling the image sensor interface logic.
The processor on the other hand could have a nonmatching interface to the sensor. That’s where bridging function may be needed to adapt the image sensor interface to the processor interface. Another very common use case is when the processor has a few MIPI interfaces and need to be connected to a large number of cameras. In this case the bridging device could be doing the aggregation function also. Example, having to interface 8 different image sensors to one MIPI CSI-2 input on the processor. One of the nice features of MIPI CSI-2 protocol is the capability to use virtual channel concept which allow you to map multiple asynchronous image sensors to one aggregated CSI-2 interface.
From RAW to processed image
The final stage of the image journey is the image processing. RAW image represents the image captured by the sensor before any processing (sometime with minimal processing).?? As seen in the previous section the RAW image from the sensor will be in a mosaic and not usable as is, even though all the true color representation of the captured image are in the RAW image. you can make analogy with the old camera where you use a chemical film to take pictures then get it developed in a colorful images. In the digital world the RAW image must go through ISP (Image Signal Processing) algorithms to be useful for the target application.?
Multiple image processing algorithm could be used based on target application. Below are some of the ISP algorithms used:
Conclusion
A lot could be discussed on this subject and could get very technical. This article provides a high-level overview and good starting point to dive into this topic. FPGAs are very well suited to implement bridging and aggregation solution as well as full image processing pipeline. If you want to get deep technical on these topics, you can refer to the Lattice semiconductor website or Lattice Insights the official training platform. A large library of IPs and Reference Designs are available. In addition, Lattice offer what is called solution stack like “SenseAI” or “mVision” that represent a complete solution setup (Hardware, software, firmware, IPs) that could be used as a starting point to your target application. “mVision” solutions stack includes everything embedded vision system designers need to evaluate, develop, and deploy FPGA-based embedded vision applications, such as machine vision, robotics, ADAS, video surveillance, and drones. “SensAI” stack includes everything you need to evaluate, develop and deploy FPGA-based Machine Learning / Artificial Intelligence solutions.
Enjoy the learning journey!
MZ
UK and NORDIC Senior Field Applications Engineer at Lattice Semiconductors Inc.
8 个月Great article!
Senior Director, Worldwide Applications & Solutions Engineering at Lattice Semiconductor
8 个月Very insightful article!
Very informative, thanks