This work was partly supported by the National Science Council, Taiwan, under Grant NSC 101-2221-E-011-077-MY3. [Read Paper]
A user interface healthcare system was developed using artificial intelligence (AI), machine learning (ML), and computer vision (CV) technologies. The system processes and directly predicts vital signs such as breath rate, heart rate, and blood pressure from a normal camera/webcam. This replaces traditional medical measurement devices and provides a more convenient and accessible way to monitor patients’ health status.
This work was supported by the Taipei Medical University under grant TMU107-AE1-B18.
The Mask-RCNN and FCN networks were used to train the segmentation models from 5000 labeled images.
This work was supported in part by the “Center for Cyber-Physical System Innovation” from the Featured Areas Research Center Program Within the Framework of the Higher Education Sprout Project by the Ministry of Education in Taiwan. [Read Paper]
This study proposes a novel approach to replace the fixed projection planes that existed in previous research to reduce motion artifacts obtained from the human face by a normal webcam for monitoring heart rate in real-time. The novel projection plane is adaptively changed with the light intensity change to eliminate the color distortion induced by motion. In this study, the state-of-the-art semantic segmentation Deeplabv3+ is implemented to segment the skin pixels from the facial region that is detected by several trackers (Boosting, MIL, TLD, Median Flow, Mosse, CSRT) to boost computational time compared to conventional face detection by Haar-Like features. Image and digital signal processing techniques are also applied to eliminate possible noise for obtaining a clean pulse signal. The proposed approach is compared with other existing approaches (green, PCA, Chrom, and POS) in multiple challenges. From the experiments conducted, Deeplabv3+ outperforms conventional K-Means for different kinds of skin segmentation. Moreover, the proposed approach is quite robust and stable in stationary cases (with an accuracy of 96%), dim-lighting environments and long distances up to 4 meters away without zooming in camera. Besides, multiple head-movement simulations and motions of fitness are conquered by the APP approach as shown in the experiments. Thus, it can be concluded that the proposed approach is applicable to surveillance or healthcare applications.
This work was partially supported by the Swissray Digital X-ray Center, Taiwan and by the Ministry of Science and Technology, Taiwan. [Read Paper]
This study aims to develop an affordable and straightforward system that uses images to detect human breath in real-time. The system estimates the peak of the inspiratory phase of a breath to define a proper triggering timing for X-ray shooting. Detecting tiny breathing motions on images is challenging. To address this issue, well-known techniques are employed to obtain useful features from the chest area for the Lucas-Kanade algorithm. Various levels of the pyramidal Lucas-Kanade are then adapted to track possible small motions of those features. The proposed approach can successfully detect the inspiratory-expiratory motions, and the peak time of the inspiratory phase can be predicted within an acceptable interval of error time. From the experiments conducted, the breath motion can be successfully observed in two different environmental situations (dim lighting and lighting conditions). The tracked features are quite robust and stable without losing quantity over a long period of testing time. Thus, the proposed approach can effectively be used to define a proper triggering timing for X-ray shooting. Besides, in our experiments, even though the target is 6 meters away, breath detection is still successful. In other words, the proposed approach can also be used for surveillance or healthcare environments.
This work was supported in part by the National Science Council, Taiwan, under Grant NSC 101-2221-E-011-077-MY3.
Noninvasive blood pressure estimation (NBPE) is attracting considerable attention for surveillance and healthcare applications. Widely available NBPE presents an opportunity of being able to check blood pressure at home via a normal camera. However, a comprehensive evaluation of measurement based on the two medical standards - the British Hypertension Society (BHS) and the Association for the Advancement of Medical Instrumentations (AAMI) has not been previously reported. This study is aimed at building a novel image-based approach to estimate blood pressure automatically from subtle color changes of the face and the palm regions recorded by a normal webcam at 30 fps in a real-time fashion. Deep learning object detection models are employed to localize the initial bounding boxes for the Mosse tracking algorithm to speed up the process. Image and digital signal processing techniques are then applied to select only the best pairs of the filtered bio-signals on the green channel of face and palm that have the same heart-rate estimations based on Discrete Fourier Transform (DFT). In this study, a deep neural network (DNN) is employed to be trained by data consisting of a pair bio-signals and nine additional variables from participants (age, gender, weight, height, waistline, handedness, measured hand, valence levels, and arousal levels) and is used to estimate the systolic blood pressure (SBP) and the diastolic blood pressure (DBP) values. A total of 164 recorded videos with ground truth blood pressure measurement from FORA P30 device on 82 participants with their ages from 18 to 57 are used to build the estimation model. From experiments conducted, the proposed DNN satisfies the BHS and the AAMI to estimate SBP and DBP. Moreover, it achieves the best performance at 450-sliding-window size with the root mean square error and mean absolute error for SBP/DBP are 7.942/7.912 mmHg and 6.556/6.372, respectively. The proposed approach outperforms the existing image-based approaches using high-speed cameras to estimate blood pressure reliably in a non-contact continuous manner. Thus, it can be concluded that our approach can be applied to healthcare applications.
This work was supported in part by the National Science Council, Taiwan, under Grant MOST 107-2634-F-038-001.
Coronary artery disease is caused primarily by vessel narrowing. Extraction of the coronary artery area from images is the preferred procedure for diagnosing coronary diseases. In this study, a U-Net-based network architecture, 3D Dense-U-Net, was adopted to perform fully automatic segmentation of the coronary artery. The network was applied to 480 coronary computed tomography (CT) angiography scans performed at Wangfang Hospital, Taiwan. Of these, 10% were used for testing. The CT scans were divided into patches of 16 original high-resolution slices. The slices were overlapped between patches to take advantage of surrounding imaging information. However, an imbalance between the foreground and background present a challenge in smaller-object segmentation such as with the coronary arteries. The network was optimized and achieved a promising result when the focal loss concept was adopted. To evaluate the accuracy of the automatic segmentation approach, the dice similarity coefficient (DSC) was calculated, and an existing clinical tool was used. The subjective ratings of three experienced radiologists were used to compare the two. Results show that this proposed approach can achieve a DSC of 0.9691 compared to 0.8060 achieved by existing deep learning approaches. In the main trunk, the results of automatic segmentation agree with those of the clinical tool; they were significantly better in some small branches. This study demonstrates that automatic segmentation is not only a high-performing clinical tool, but is convenient as well.
This work was supported in part by Ministry of Science and Technology (MOST) under grant MOST 108-2823-8-038 -002 -, MOST 108-2410-H-038-010 -SSS, Ministry of Education (MOE) under grant MOE 108-6604-001-400, and Taipei Medical University under grant TMU107-AE1-B18.
The automatic segmentation of skin lesions has been reported using dermoscopic images. However, this method does not apply to real-time detection using a smartphone. This study aims to examine a deep learning model for detecting and localizing the positions of moles on captured images to extract crop images of the model without any other objects. Data were collected through public health events in Taiwan between December 2017 and February 2019. Participants who were concerned about the risk of their moles were asked to take mole images. Images were then measured and risks determined by three dermatologists. The mole position was labeled with bounding boxes using the ‘LabelImg’ tool. Two architectures, SSD and Faster-RCNN, have been used to build eight different mole-detection models. The confidence score, intersection over union (IoU), and mean average precision (mAP) with the COCO metrics were used to measure the accuracy of those models. A total of 2790 mole images were used for the development and validation of the models. The Faster-RCNN Inception Resnet model had the highest overall mAP of 0.245, followed by 0.234 of the Faster-RCNN Resnet 101 and 0.227 of the Faster-RCNN Resnet 50 model. The SSD Mobilenet v1 model had the lowest mAP of 0.142. The Faster-RCNN Inception Resnet model had a dominant AP of 0.377, 0.236, and 0.129 for large, medium, and small sizes of moles, respectively. It was observed that the Faster RCNN Inception Resnet has shown the best performance with high confident scores (over 97%) for all kinds of moles. Detection models based on the techniques of SSD and Faster-RCNN were successfully developed. These models might help researchers accurately localize the position of moles with their risks as a feasible detection app on smartphones.
This work was supported by the Swissray Digital X-ray Center, Taiwan.
Chest X-Ray (CXR) imaging is an important technique in supporting the clinical diagnosis and treatment for thoracic diseases such as pneumonia, effusion, fibrosis, etc. Commensurate with this importance, CXR is attracting considerable interest due to the development of deep learning. Various deep learning-based approaches have been implemented that achieved notable successes. However, it is hard to implement on all CXR data to build a computer-aided detection and diagnosis (CAD) in real-world medical sites. This work aims to build machine learning models for lung disease classification on two different published datasets to diagnose the captured chest X-ray image triggered by the X-ray machine.
This work was supported by the Solomon Technology Corporation and Clearmind company, Taiwan.
This work was supported by the Taipei Medical Univserity, Taiwan.
This work was supported by VinBrain, Vietnam.
Conventional medical rigid registration approaches attempt to align the moving image to the fixed image using overall image information. However, the only one alignment output of the existing registration is not enough as organs usually undergo uneven changes over time in the human body due to the respiratory process. This study aims to solve the body-part-based registration task from reconstructed point-cloud features of two different computed tomography modalities for fast and accurate performance. Besides, fast point feature histogram and fast global registration are used to boost processing time to initialize a good pose for local refinement based on the point-to-plane iterative closest point algorithm. From experiments conducted for bone and liver parts, the proposed approach achieves the best similarity to the averaging value of manual registration from three independents. Our method even outperforms existing and manual methods when evaluating predicted liver masks exported by our reliable training 3D-Unet (dice score 96%). The proposed approach showed the most accurate and fastest with a normalized dice score up to 96.550 ± 2.028% for liver registration.
This work was supported by VinBrain, Vietnam.
Recent deep-learning-based linear registrations have shown promising results and runtime advantages. However, most existing approaches focus on learning general affine parameters to initialize transformation for deformable registration algorithms. Thus, it disables the learning ability of geometry parameters for specific linear components, including translation, rotation, shearing, and scaling. Moreover, these networks only handle affine transformation tasks from various types of linear registration. To tackle this issue, we propose a novel flexible bi-directional architecture by adding constraints of symmetry and order of spatial transformations to ensure one combination of linear components is solved instead of various possible solutions from the general affine deep-learning-based networks. More importantly, the designed framework is flexible to implement multiple linear registration tasks on the same network architecture, including rigid, similarity, and affine transformation. From experiments conducted on two generated benchmark liver datasets, the proposed network outperforms the existing approaches for rigid and similarity registration tasks; and achieves approximate performance for affine transformation.
This work was supported by the Solomon Technology Corporation, Taiwan. [Read Paper]
A novel vision-based virtual reality pen (SolPen) is proposed to generate pose paths of six degrees of freedom (6-DoF) for vision-guided robotics applications such as welding, cutting, painting, or polishing. The system can achieve accuracy up to 0.48 mm at a 40 cm working distance. The proposed system is simple and requires only a 2D camera and printed Aruco markers that are hand-glued on 31 surfaces of the designed 3D-printed SolPen. Image processing techniques are implemented to remove noise and sharpen the edge of the Aruco images and enhance the contrast of the Aruco edge intensity generated by pyramid reconstruction. Besides, the Least Squares technique is implemented to optimize parameters for the center pose of the truncated Icosahedron center and the vector of the Solpen-tip. From experiments conducted, the proposed approach is robust for working space under 1 meter of the Universal robot (UR5), which achieves accuracy up to (± 0.48 mm, ± 0.079 degree) and (±1.09 mm, ± 0.11 degree) for pen tip position and angle error at 40cm and 100cm working space distance, respectively.
This work was supported by the Solomon Technology Corporation, Taiwan.
An end-to-end grasp evaluation model is proposed to address the challenging problem of localizing robot grasp configurations directly from the point cloud.
The model is light-weighted and can directly process the 3D point cloud that is located within the gripper for grasp evaluation. Taking the raw point cloud as input, our proposed grasp evaluation network can capture the complex geometric structure of the contact area between the gripper and the object even if the point cloud is very sparse.
To further improve the proposed model, we collect our custom dataset and a larger-scale grasp dataset with 350k real point cloud and grasps with the YCB objects Dataset for training.
Research project - Disaster monitoring and rescue using autonomous aerial mobile robot.
A mobile robot (P3DX) and drone system was developed and implemented for target localization and capture. The drone captures images using its bottom camera while hovering at a known altitude. Sophisticated image processing techniques, including edge detection and corner detection, precisely locate the target, mobile robot, and obstacles. A path-planning algorithm determines the optimal route from the mobile robot to the target. Additionally, two Fuzzy functions, based on Microsoft Kinect or sonar range finder sensors, regulate wheel speed and orientation to avoid obstacles. Object detection algorithms (such as SSD and Yolo) identify and pinpoint the target’s position as the mobile robot approaches. Finally, a hand-eye camera attached to the manipulator securely grasps the target, and the drone autonomously seeks out a designated parking area marked by a special symbol for landing.
Research project - Disaster monitoring and rescue using an autonomous aerial mobile robot.
An autonomous landing system for drones utilizing vision technology was developed. This research project focused on disaster monitoring and rescue operations using an autonomous aerial mobile robot. The drone’s bottom camera captures images and precisely identifies a designated symbol (referred to as “P”) representing the parking area. Controlled by fuzzy functions, the drone locates the parking area within the center of the screen and gradually reduces altitude during the descent.
This work was financially supported by the Center for Cyber-physical System Innovation from The Featured Area Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) of Taiwan. The Ministry of Science and Technology (MOST) of Taiwan partially funded the present work (grant number: MOST 107-2221-E- 011-139). [Read Paper]
In this study, a novel approach of the real-time chatter detection in the milling process is presented based on the scalogram of the continuous wavelet transform (CWT) and the deep convolutional neural network (CNN). The cutting force signals measured from the stable and unstable cutting conditions were converted into two-dimensional images using the CWT. When chatter occurs, the amount of energy at the tooth passing frequency and its harmonics are shifted toward the chatter frequency. Hence, the scalogram images can serve as input to the CNN framework to identify the stable, transitive, and unstable cutting states. The proposed method does not require the subjective feature-generation and feature-selection procedures, and its classification accuracy of 99.67% is higher than the conventional machine learning techniques described in the existing literature. The result demonstrates that the proposed method can effectively detect the occurrence of chatter.
This work was financially supported by the Center for Cyber-physical System Innovation from The Featured Area Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) of Taiwan. The Ministry of Science and Technology (MOST) of Taiwan partially funded the present work (grant number: MOST 107-2221-E- 011-139).
Induction motor is important equipment in the industry. However, the occurrence of fatigue failure after a long operating time could inevitably cause a catastrophic breakdown. Monitoring and diagnosing induction motors therefore play a significant role in preventing the sudden shutdown caused by premature failures. This study aims to develop a new approach based on the convolutional neural network (CNN) to classify faults using time-frequency vibration signals from the accelerometer. The vibration signals of different faults are collected and labeled in five different categories including normal, inner ring fault, outer ring fault, misalignment, and broken rotor bar. Continuous wavelet transform (CWT) is applied to convert the vibratory time-series signals into the scalogram feature images using the Morlet function. The measured vibration signals are sampled and converted into the frequency domain to form the time-frequency feature images. These images are then resized and pushed into CNN for identifying the failures of the induction motor. The experimental results indicate that the proposed approach can achieve an excellent performance with the classification accuracy of 99.46% which is significantly improved comparing to the traditional learning methods.
Source (960 Hz)
Amplified (x20)
Detecting road damage quickly and accurately facilitates the ability of road-maintenance agencies to make timely repairs to road surfaces, maintain optimal road conditions, optimize transportation safety, and minimize transportation costs. An extensive evaluation of eight deep-learning-based road-damage detection models was conducted in this study. Each model was trained on 9493 images sourced from multiple databases. The 16165 instances of road damage in these images were categorized into five types of damage, including longitudinal crack, horizontal crack, alligator damage, pothole-related crack, and line blurring. Two experiments were conducted that identified two models, single shot multi-box detector (SSD) Inception V2 and faster region-based convolutional neural networks (R-CNN) Inception V2, as providing the best balance of road-damage-detection accuracy and image processing time. These experiments demonstrated that increasing the diversity of image sources improved road-damage-detection model performance. In addition to combining data images from different sources with consistently relabeled damage instances, this study released road-damage image data from the road maintenance agency in Zhubei, Hsinchu County, Taiwan for research and other uses, increasing the limited amount of published image data sources and positively impacting future scholarly research into road damage detection.
This work proposes a novel approach to improve mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity. The main contribution is a blender module that draws inspiration from both top-down and bottom-up instance segmentation approaches. The proposed approach can effectively predict dense per-pixel position-sensitive instance features with very few channels and learn attention maps for each instance with merely one convolution layer, thus being fast in inference. This approach can be easily incorporated with the state-of-the-art one-stage detection frameworks (Yolov7) and outperforms Mask R-CNN under the same training schedule while being 20% faster.
Step Count
Blood pressure
Heart rate.
Sentiment classification
Auto-correction
Neural Machine Translation
Fine-tuning Llama2 7B
Building System with OpenAI API
This project introduces an innovative, end-to-end solution for translating multilingual PDF documents into English. The proposed methodology encompasses several key stages. Initially, layout elements such as headers, text blocks, paragraphs, tables, and figures are extracted from the input PDF document, preserving the original layout and structure. Subsequently, an Optical Character Recognition (OCR) system is employed to convert image-based text into a machine-readable format. Recognizing potential inaccuracies in OCR-extracted text, a Language Model (LLM) is utilized for text correction based on semantic sentence context, thereby enhancing the quality of input for the translation model. The corrected text is then processed by a deep translation model, translating the text into English while handling complex linguistic structures and idioms. Post-translation, advanced text processing techniques align the translated text according to the original layout. The translated text is reintegrated into the original bounding boxes, and the completely translated document, maintaining the original layout and design, is exported as a PDF file. This comprehensive solution provides an efficient process for translating PDF documents, ensuring high translation accuracy and layout preservation, marking a significant advancement in document translation services.