• Context-Aware Mixed Reality: A Learning-based Framework for Semantic-level Interaction

      Chen, Long; Tang, Wen; Zhang, Jian Jun; John, Nigel W.; Bournemouth University; University of Chester; University of Bradford (Wiley Online Library, 2019-11-14)
      Mixed Reality (MR) is a powerful interactive technology for new types of user experience. We present a semantic-based interactive MR framework that is beyond current geometry-based approaches, offering a step change in generating high-level context-aware interactions. Our key insight is that by building semantic understanding in MR, we can develop a system that not only greatly enhances user experience through object-specific behaviors, but also it paves the way for solving complex interaction design challenges. In this paper, our proposed framework generates semantic properties of the real-world environment through a dense scene reconstruction and deep image understanding scheme. We demonstrate our approach by developing a material-aware prototype system for context-aware physical interactions between the real and virtual objects. Quantitative and qualitative evaluation results show that the framework delivers accurate and consistent semantic information in an interactive MR environment, providing effective real-time semantic level interactions.
    • De-smokeGCN: Generative Cooperative Networks for Joint Surgical Smoke Detection and Removal

      Chen, Long; Tang, Wen; John, Nigel W.; Wan, Tao Ruan; Zhang, Jian Jun; Bournemouth University; University of Chester; University of Bradford (IEEE XPlore, 2019-11-15)
      Surgical smoke removal algorithms can improve the quality of intra-operative imaging and reduce hazards in image-guided surgery, a highly desirable post-process for many clinical applications. These algorithms also enable effective computer vision tasks for future robotic surgery. In this paper, we present a new unsupervised learning framework for high-quality pixel-wise smoke detection and removal. One of the well recognized grand challenges in using convolutional neural networks (CNNs) for medical image processing is to obtain intra-operative medical imaging datasets for network training and validation, but availability and quality of these datasets are scarce. Our novel training framework does not require ground-truth image pairs. Instead, it learns purely from computer-generated simulation images. This approach opens up new avenues and bridges a substantial gap between conventional non-learning based methods and which requiring prior knowledge gained from extensive training datasets. Inspired by the Generative Adversarial Network (GAN), we have developed a novel generative-collaborative learning scheme that decomposes the de-smoke process into two separate tasks: smoke detection and smoke removal. The detection network is used as prior knowledge, and also as a loss function to maximize its support for training of the smoke removal network. Quantitative and qualitative studies show that the proposed training framework outperforms the state-of-the-art de-smoking approaches including the latest GAN framework (such as PIX2PIX). Although trained on synthetic images, experimental results on clinical images have proved the effectiveness of the proposed network for detecting and removing surgical smoke on both simulated and real-world laparoscopic images.
    • Real-time Geometry-Aware Augmented Reality in Minimally Invasive Surgery

      Chen, Long; Tang, Wen; John, Nigel W.; Bournemouth University; University of Chester (IET, 2017-10-27)
      The potential of Augmented Reality (AR) technology to assist minimally invasive surgeries (MIS) lies in its computational performance and accuracy in dealing with challenging MIS scenes. Even with the latest hardware and software technologies, achieving both real-time and accurate augmented information overlay in MIS is still a formidable task. In this paper, we present a novel real-time AR framework for MIS that achieves interactive geometric aware augmented reality in endoscopic surgery with stereo views. Our framework tracks the movement of the endoscopic camera and simultaneously reconstructs a dense geometric mesh of the MIS scene. The movement of the camera is predicted by minimising the re-projection error to achieve a fast tracking performance, while the 3D mesh is incrementally built by a dense zero mean normalised cross correlation stereo matching method to improve the accuracy of the surface reconstruction. Our proposed system does not require any prior template or pre-operative scan and can infer the geometric information intra-operatively in real-time. With the geometric information available, our proposed AR framework is able to interactively add annotations, localisation of tumours and vessels, and measurement labelling with greater precision and accuracy compared with the state of the art approaches.
    • Recent Developments and Future Challenges in Medical Mixed Reality

      Chen, Long; Day, Thomas W.; Tang, Wen; John, Nigel W.; Bournemouth University and University of Chester (2017-11-23)
      Mixed Reality (MR) is of increasing interest within technology driven modern medicine but is not yet used in everyday practice. This situation is changing rapidly, however, and this paper explores the emergence of MR technology and the importance of its utility within medical applications. A classification of medical MR has been obtained by applying an unbiased text mining method to a database of 1,403 relevant research papers published over the last two decades. The classification results reveal a taxonomy for the development of medical MR research during this period as well as suggesting future trends. We then use the classification to analyse the technology and applications developed in the last five years. Our objective is to aid researchers to focus on the areas where technology advancements in medical MR are most needed, as well as providing medical practitioners with a useful source of reference.
    • Self-supervised monocular image depth learning and confidence estimation

      Chen, Long; Tang, Wen; Wan, Tao Ruan; John, Nigel W.; Bournemouth University; University of Bradford; University of Chester
      We present a novel self-supervised framework for monocular image depth learning and confidence estimation. Our framework reduces the amount of ground truth annotation data required for training Convolutional Neural Networks (CNNs), which is often a challenging problem for the fast deployment of CNNs in many computer vision tasks. Our DepthNet adopts a novel fully differential patch-based cost function through the Zero-Mean Normalized Cross Correlation (ZNCC) to take multi-scale patches as matching and learning strategies. This approach greatly increases the accuracy and robustness of the depth learning. Whilst the proposed patch-based cost function naturally provides a 0-to-1 confidence, it is then used to self-supervise the training of a parallel network for confidence map learning and estimation by exploiting the fact that ZNCC is a normalized measure of similarity which can be approximated as the confidence of the depth estimation. Therefore, the proposed corresponding confidence map learning and estimation operate in a self-supervised manner and is a parallel network to the DepthNet. Evaluation on the KITTI depth prediction evaluation dataset and Make3D dataset show that our method outperforms the state-of-the-art results.
    • SLAM-based dense surface reconstruction in monocular Minimally Invasive Surgery and its application to Augmented Reality

      Chen, Long; Tang, Wen; John, Nigel W.; Wan, Tao R.; Zhang, Jian Jun; Bournemouth University; University of Chester; University of Bradford (Elsevier, 2018-02-08)
      Background and Objective While Minimally Invasive Surgery (MIS) offers considerable benefits to patients, it also imposes big challenges on a surgeon's performance due to well-known issues and restrictions associated with the field of view (FOV), hand-eye misalignment and disorientation, as well as the lack of stereoscopic depth perception in monocular endoscopy. Augmented Reality (AR) technology can help to overcome these limitations by augmenting the real scene with annotations, labels, tumour measurements or even a 3D reconstruction of anatomy structures at the target surgical locations. However, previous research attempts of using AR technology in monocular MIS surgical scenes have been mainly focused on the information overlay without addressing correct spatial calibrations, which could lead to incorrect localization of annotations and labels, and inaccurate depth cues and tumour measurements. In this paper, we present a novel intra-operative dense surface reconstruction framework that is capable of providing geometry information from only monocular MIS videos for geometry-aware AR applications such as site measurements and depth cues. We address a number of compelling issues in augmenting a scene for a monocular MIS environment, such as drifting and inaccurate planar mapping. Methods A state-of-the-art Simultaneous Localization And Mapping (SLAM) algorithm used in robotics has been extended to deal with monocular MIS surgical scenes for reliable endoscopic camera tracking and salient point mapping. A robust global 3D surface reconstruction framework has been developed for building a dense surface using only unorganized sparse point clouds extracted from the SLAM. The 3D surface reconstruction framework employs the Moving Least Squares (MLS) smoothing algorithm and the Poisson surface reconstruction framework for real time processing of the point clouds data set. Finally, the 3D geometric information of the surgical scene allows better understanding and accurate placement AR augmentations based on a robust 3D calibration. Results We demonstrate the clinical relevance of our proposed system through two examples: a) measurement of the surface; b) depth cues in monocular endoscopy. The performance and accuracy evaluations of the proposed framework consist of two steps. First, we have created a computer-generated endoscopy simulation video to quantify the accuracy of the camera tracking by comparing the results of the video camera tracking with the recorded ground-truth camera trajectories. The accuracy of the surface reconstruction is assessed by evaluating the Root Mean Square Distance (RMSD) of surface vertices of the reconstructed mesh with that of the ground truth 3D models. An error of 1.24mm for the camera trajectories has been obtained and the RMSD for surface reconstruction is 2.54mm, which compare favourably with previous approaches. Second, in vivo laparoscopic videos are used to examine the quality of accurate AR based annotation and measurement, and the creation of depth cues. These results show the potential promise of our geometry-aware AR technology to be used in MIS surgical scenes. Conclusions The results show that the new framework is robust and accurate in dealing with challenging situations such as the rapid endoscopy camera movements in monocular MIS scenes. Both camera tracking and surface reconstruction based on a sparse point cloud are eff active and operated in real-time. This demonstrates the potential of our algorithm for accurate AR localization and depth augmentation with geometric cues and correct surface measurements in MIS with monocular endoscopes.