Spread the love

Based on the image, the fusion of multimodal data provides automatic referral, provides support for definitive diagnosis, and provides primary imaging reports. This is the vision given by readers of all-imaging medical imaging.

In some of the most prominent public data sets in the field of medical imaging, BRATS focuses on the detection and segmentation of brain tumors in brain images. LUNA16 is designed to detect pulmonary nodules from chest radiographs, and multiple data represented by the Kaggle Diabetic Retinopathy Challenge 2015 The episode focuses on the classification of diabetic retinopathy. In these tasks for a specific disease, researchers have achieved good results. However, how far is the distance between a good classification model and clinical practice?

“Current medical image recognition technologies are mostly oriented toward serious diseases. However, taking the analysis of chest CT that we are most familiar with, patients and doctors really need a “special focus on lung cancer lesions, but other lung diseases will not be seen.” The answer to the question raised by Ding Xiaowei, the founder of Voxel Technology, is obvious.

According to the patient’s symptoms, the risk of illness, or the results of other tests, the doctor will perform some kind of imaging examination for the patient, hoping to find out “why cough?” instead of “I have lung cancer?”. Even if it is screening for targeted diseases, the doctor reading the film is obliged to report visible anomalies in all images. The single-disease identification model can only complete the hypothesis and test of a single disease, and then answer question two. For question one, it can only indicate powerlessness. The ideal system must be able to answer “you don’t have lung cancer” and you can also tell the patient “but there is pneumonia.” The basic logic of “finding all visible abnormalities” was the basic logic of the medical diagnosis, which made the “identification ability of the whole disease under the specific imaging protocol” become the bottom line of artificial intelligence imaging technology.

“Full-disease medical image reading” is not yet an easily attainable goal for human doctors. Ding Xiaowei said that about half of China’s early major illnesses have not received due inspections. Grass-roots hospitals actually have a wealth of imaging equipment resources, but it is almost impossible to equip a wealth of equipment with equally rich doctor resources.


Therefore, VoxelCloud, which has positioned “full-blind medical image readers under a given imaging protocol” as a product line, is unique. “As an AI image analysis product, if you want to effectively control the quality of diagnostics, you need a wide range of available scenes with clear boundaries to avoid the influence of human factors. Positive examples include: Visible anomaly detection products applicable to all non-enhanced chest CT, applicable Visual anomaly detection products for all fundus images. The reverse example is: products suitable for all pulmonary nodule patients and products suitable for all cases of glucose network diseases.”

All in all, the product needs to be approximated by a doctor’s natural language report from the image to minimize the doctor’s workload. In order to present this simple step, the underlying machine learning model needs to have the positional awareness and navigation capabilities of various human organs and tissues, the ability to detect and identify all visible pathological types in different locations, and the ability to characterize and quantify lesions. .

“Using machine learning to describe the language, our complete solution for VoxelCloud Retina is a multitasking model that can perform the classification and quantification of 10 types of lesions and can also perform the classification of 8 visible diseases.” Voxel Ophthalmic Products Joseph, head of the line, said.

VOXELCLOUD Retina Covered Lesion Types and Disease Types


And the head of the VoxelCloud Thorax chest CT solution from another product line directly showed us a huge list.

Pulmonary nodule feature list


“If you want to generate a report of CT nodules, the detection and localization of pulmonary nodules is in some sense pretreatment work. After completion, the more critical task is segmentation. And the characterization (characteristic). This list is a description of the nodule characteristics, including a total of nine types of more than 30 feature descriptions. In addition to the imaging experts’ advice, except for some redundancy, the voxel system has all In addition to the description of meaningful characteristics, in addition to accurate lung segmentation, segmentation of the lobes, and recognition of blood vessels and trachea, the system can understand the relative relationship between lesions and surrounding tissues. The matched lesions of the scan are matched and quantified for comparison before a complete natural nodal description of the nodule can be output. This is true for the report generation process for each type of lesion.”


The artificial intelligence medical imaging company founded in Los Angeles and Shanghai, from the name of the beginning to show their ambitions for the medical imaging field.

Voxel is short for volumetric pixel. It is a three-dimensional version of the “pixel” concept and represents the smallest unit of data in three-dimensional space. Understand its definition, it is not difficult to understand why in everyday life “pixel” is common and “voxel” is not often: with high-end cameras, the images are also natural two-dimensional images, medical images are almost It is the only native 3D scene. It seems that there are only two options for using the common two-dimensional model in computer vision for medical imaging: upgrading the model to three-dimensional, or downgrading the image to two-dimensional.

Voxels tell us its choice by name. “Since medical imaging is inherently a 3D image, we still choose to use 3D models to solve 3D problems,” explains Ding Xiaowei.

Pericardial fat volume 3D quantification

Instead of using common 2D computer vision models to process image sections, the 3D model is used to process the entire image. This is one of the most difficult bones in the visual field. As the size of the image expands exponentially with the increase of dimensions, “the memory is limited” is the “Achilles’ heel” of the current deep learning model, which has been hit; at the same time, when the recognition subject accounts for the proportional exponential level of the image. To reduce the extent to which the convolutional neural network is proud of its ability to extract features by level, there is no way that the concentrating neural network can play its role in the apparently inadequate subject information.

“In order to allow computers to deal with a whole uncompacted and cut-out effect without running out of memory, we have done a lot of special design on the model structure.” Ding Xiaowei was very excited when he mentioned this framework. “We Spatially separable convolutions and depth-width separable convolutions are used instead of the original convolutional layers.”

The logic of the spatially decomposable convolution is to decompose a 7 x 7 convolutional kernel into a product of a 7 x 1 vector and a 1 x 7 vector, so that only 49 convolution operations would have required 49 parameters. Parameters. Depth/Breadth Decomposition The convolution produces a two-dimensional convolution of each layer of the 3D image, and then convolves all the 2D feature maps to form a 3D feature map. “The three-dimensional convolution model is very difficult to train, but it is twenty times faster at the stage of reasoning than two-dimensional layer-by-layer processing, and the accuracy is also higher. A blood vessel and a nodule cross section may be difficult to distinguish in two dimensions. , but it is clear in the three-dimensional space.”

Other difficulties in medical image processing include sample imbalances. Regardless of the level of imaging equipment, the level of doctors’ operations, and the quality of data labels, even if all the above are perfect, there will still be a large number of images in our database that are not affected (negative), and a small number of images are sick. (Positive). A large number of negative samples and a large number of easily categorized samples do not allow the model to focus on learning from mistakes. “To solve this problem, we used Focal Loss as a loss function,” explains Joseph.

Focal Loss was the ICCV 2017 Best Student Paper in October last year and was presented by Ross Girshick and He Xiaoming of the Facebook Artificial Intelligence Laboratory (FAIR). This loss function is intended to proceed from cross-entropy, by reducing the weight of those samples that are easily categorized, allowing the model to focus more on hard-to-classify samples.

This paper, which was just released in August last year, has gone from being freshly baked to undergoing engineering trials to the industry. It takes less than a year. This is a prudent to almost delaying introduction of new things, and research often lags behind engineering. For the five-year medical industry in the industry, it is not at a standstill.

“Actually, the medical imaging field has received very quickly the entire depth of learning because it does make it impossible to solve the problems that could not be solved in the era of advanced learning.” Ding Xiaowei still uses lung nodules as an example. “The 80s of the last century Many medical image giants, represented by Siemens, have organized a large amount of manpower and spent ten years trying to detect pulmonary nodules using hand-designed features.However, such a huge project can only stop the detection of pulmonary nodules. Can not do a more fine judgment of the risk of benign and malignant nodules.The benign and malignant nodules are determined by some very small feature differences, it is difficult to describe in the form of language or manual features, while the law is very obvious, it is difficult to positive The thinking is summarized, but deep learning brings about data-driven possibilities. When the model abstracts features that cannot be described in the low-dimensional space from countless real lung nodule samples, benign and malignant, and even more characteristic descriptions It became possible.”

If it is decided to solve the problem of medical imaging with a three-dimensional model and a brand-new loss function, it is a methodological choice. The goal of “full-species medical image reading” is to design products from the perspective of satisfying actual needs, and then “multi-task learning”. (multi-task learning) is a combination of the two.

Multi-tasking learning is a machine-learning method based on shared representation, which involves learning multiple related tasks together. “(Multitasking learning) is both a methodological approach to improving the performance of the model and a practical approach to the realization of full-species. Model client deployment is limited by cost and hardware and requires the use of as few models as possible. As much as possible complex inference tasks,” said Ding Xiaowei.

In fundus photography, there are nearly 20 classification tasks that are related to each other. After choosing to perform multi-task learning, which task to train first will have a great influence on training time and model effects. “Experiments show that learning the model with the fastest reduction in uncertainty yields the best results,” explains Joseph.


In the chest imaging field, voxels are undergoing some more bold attempts.

Using a multi-task model that can be trained end-to-end for all lung-related imaging tasks

“We tried to use a model for all lung-related work.” Ding Xiaowei drew a block diagram: the same model, first share an encoder (Encoder) to compress three-dimensional medical images into feature vectors, and then separate them on two branches. Lobectomy with Decoder and nodule detection with Mask R-CNN. At the same time when the nodule detection is completed, the 3D nodule ROI (Region of Interest) is obtained using the obtained spherical center and radius information, and then the nodule segmentation and the nodule characterization are performed again on the basis. Finally, each branch model was aggregated and a natural language report was generated for each detected nodule: “A round, solid ground glass mixed nodule (2.5mm*4.4mm) was found in the left lower lobe, with smooth edges. Blood vessels pass through and contain adipose tissue, and the risk of malignancy is 15%.”


Lung segmentation and pulmonary nodule segmentation

After solving the computational problem with a special algorithm design, there is only one remaining problem: data. To complete such a large-scale model, the precondition is to have all the information annotations for each image.

“We will finish soon,” Ding Xiaowei mentioned. Completion here refers to the establishment of a complete set of fine marker datasets containing more than 80,000 chest CT scans.

In addition to lung cancer-related data, voxels also accumulate approximately 3,000 reserves of each type in other types of lung disease, and cumulatively collected more than 150,000 chest CT data. In addition, there are more than 50,000 cardiac coronary angiography CT data and more than 4.2 million fundus image data with 5 years of follow-up records.

In an interview a year ago, Ding Xiaowei mentioned that “The voxels are intended to be ImageNet in the medical imaging industry. They will put all kinds of human structures and pathologies of common diseases into a unified model. There is a concept of “what” and then develop a model for the specific application.” Now the work of the Babylon Tower has taken shape. The advent of ImageNet allows sophisticated neural network models to be trained on complex data sets that are “equal to the enemy,” ultimately bringing computer vision to the era of deep learning. In the world of medical imaging, the quantitative changes in data sets have caused qualitative changes. Is it also coming soon?

What can a computer vision system with full disease reading ability do? Above technical language and numbers, it is a beautiful vision.

Today, more than 50% of people with diabetes have not been screened for complications of eye diseases in a timely manner, because it is difficult to require endocrinologists to have a fundus photography reading ability. The presence of a retina-ready full-disease reader makes diabetic management of chronic disease follow-ups a task that primary hospitals can accomplish.


Furthermore, when full-blown image readers move from to B to C, the population that can cover is even more extensive.

Today, China has 60 million children under the age of 3 who are in critical stages of visual development. About 4% of them have various vision problems. The cognitive ability of children under the age of 3 is not enough to meet the general ophthalmologists for routine vision examinations, and there are only thousands of doctors who can perform professional pediatric eye examinations nationwide. One can analyze from the videos taken by parents whether there are abnormalities in the performance of the eyeballs and the body, find children with vision problems, and refer to the intervention system in a timely manner, making it possible for children to have vision problems.


Based on the image, the fusion of multimodal data provides automatic referral, provides support for definitive diagnosis, and provides primary imaging reports. This is the vision given by readers of all-imaging medical imaging.

Leave a Reply

Your email address will not be published. Required fields are marked *