This thesis addresses the issues of multi-sensor image systems and its surveillance applications. The advanced surveillance systems incorporate multiple imaging modalities for an improved and more reliable performance under various conditions. The so-called image fusion technique plays an important role to process multi-modal images. The use of image fusion techniques has been found in a wide range of applications. The fusion operation is to integrate features from multiple inputs into the fused result.
The image fusion process consists of four basic steps, i.e. preprocessing, registration, fusion, and post-processing or evaluation. This thesis focuses on the last three topics. The first topic is the image registration or alignment, which is to associate corresponding pixels in multiple images to the same physical point in the scene. The registration of infrared and electro-optic video sequences is investigated in this study. The initial registration parameters are derived from the match of head top points across the consecutive video frames. Further refinement is implemented with the maximum mutual information approach. Instead of doing the foreground detection, the frame difference, from which the head top point is detected, is found with an image structural similarity measurement.
The second topic is the implementation of pixel-level fusion. In this study, a modified fusion algorithm is proposed to achieve context enhancement through fusing infrared and visual images or video sequences. Current available solutions include adaptive enhancement and direct pixel-level fusion. However, the adaptive enhancement algorithm should be tuned to the specific images manually and the performance may not always satisfy the application. Direct fusion of infrared and visual images does combine the features exhibiting in different ranges of electromagnetic spectrum, but such features are not optimal to human perception. Motivated by the adaptive enhancement, a modified fusion scheme is proposed. The visual image is first enhanced with the corresponding infrared image. Then, the enhanced image is fused with the visual image again to highlight the background features. This achieves a context enhancement most suitable for human perception.
As the application of multi-sensor concealed weapon detection (CWD) is concerned, this thesis clarifies the requirements and concepts for CWD. How the CWD application can benefit from multi-sensor fusion is identified and a framework of multi-sensor CWD is proposed. A solution to synthesize a composite image from infrared and visual image is presented with experimental results. The synthesized image, on one hand provides both the information of personal identification and the suspicious region of concealed weapons; on the other hand implements the privacy protection, which appears to be an important aspect of the CWD process.
The third topic is about the fusion performance assessment. So far a number of fusion algorithms have been and are being proposed. However, there is not such a solution to objectively assess those fusion algorithms based on how the features are fused together. In this study, the evaluation metrics are developed for reference-based assessment and blind assessment respectively. An absolute measurement of image features, namely phase congruency, is employed.
This thesis only addresses a limited number of closely related issues regarding to the multi-sensor imaging systems. It is definitely worth further investigations on these topics as discussed in the conclusion of this thesis. In addition, future work should include the reliability and optimization study of multiple image sensors from applications' and human perception-related perspectives. This thesis could be a contribution to such research.