Detection of t(9;22) Chromosome Translocation Using Deep Residual Neural Network ()
1. Introduction
A normal human cell has 46 chromosomes: 22 pairs of autosomes classing 1-22, and two sex chromosomes, in the form of XX or XY [1]. The chromosomes at metaphase are visible under a light microscope after Giemsa staining (Figure 1). In some diseases, specific chromosome abnormality will happen, including the numerical abnormality and structural abnormality such as translocation, deletion and so on. So detecting of such chromosome abnormality is especially important in diagnosis and therapy of these diseases. Among them, the chromosome translocation t(9;22) related with chronic myeloid leukemia (CML) is the most famous one.
The chromosome translocation t(9;22) is the most famous acquired chromosomal change in human neoplasms, which formed the BCR-ABL fusion gene on
Figure 1. A sample of chromosomes within a human cell (left) and the accompanying karyotype image sorted based on Denver groups (right).
the 22q derivative chromosome known as the Philadelphia (Ph) chromosome [2] (Figure 2). It was the first chromosome abnormality to be found in leukemia in 1960 and is now known to be present in 95% of CML cases and regarded as a specific genetic marker of CML patients [3]. During previous decades, it has been proved that the BCR-ABL fusion gene encodes a protein with tyrosine kinase activity initiating and maintaining the disease [3]. The treatment using tyrosine kinase inhibitor such as imatinib mesylate against the BCR-ABL fusion gene has revolutionized effect. Survival of up to 77.1% of the patients on imatinib mesylate treatment was confirmed by the International Randomized Study [4].
Therefore, it is crucial to effectively identify the t(9;22) chromosome translocation and to apply treatment to the abnormalities in time. Current chromosome analysis uses automated karyotyping systems (AKS) which provides interactive and graphical environment [5]. However, these AKS systems still need manual chromosome classification which is usually highly time-consuming and requires professional knowledge. A professional technician may need years of experience to effectively and independently perform karyotype analysis. These constraints thus make it difficult to perform karyotype analysis, especially in undeveloped areas that experience a lack of professionals.
2. Related Work
Basically, karyotype analysis is a problem of image analysis. During recent years, in order to reduce the burden of karyotype analysis, many computer-based automated classification methods are designed for the task, and several artificial neural networks have shown impressive performance in this area. Convolution Neural Network (CNN) is one of the networks which designed directly to process images. Many works have demonstrated the great performance of CNN in medical-related
Figure 2. A karyotype image of regular chromosomes (left), a karyotype image of a patient with t(9;22) chromosome translocation (middle), and a model image of t(9;22) chromosomal translocation forming BCR/ABL fusion gene (right) [6].
image classification. Gulshan showed the ability of CNN to detect diabetic retinopathy [7]. Esteva and others showed the ability of CNN to detect skin cancer [8]. Ehteshami Bejnordi showed the excellent performance of an improved CNN in lymph node metastasis detection [9]. Monika [10] used a method combining the crowdsourcing, preprocessing and deep learning to segment out and classify chromosomes especially with overlapping chromosomes. The accuracy of classification was 86.7%. Joshi [11] proposed the incremental learning for chromosomes classification for automated karyotyping of metaphase chromosomes and the accuracy of 97% was achieved. In general, they preprocessed each chromosome image first by using skeletonization algorithms, then features are extracted along each computed axis. At last, based on the extracted features, the classifiers are built to estimate chromosome’s type.
However, all of the study above focused on the normal chromosomes. None of these works used artificial intelligence to specifically address the topic of identifying chromosomal abnormalities from the visual appearance of chromosomes. In this work, I present an approach to identify a special chromosome abnormality-t(9;22) from images containing chromosome 9 and 22. This model is a 50 layers ResNet built using Tensorflow framework. I designed the system focusing on extracting shape features of chromosome images and solving the issue by applying Residual Network, which is capable of increasing depth of the network while reducing the effect of the Vanishing Gradient Problem and therefore can produce better accuracy in the image classification task that I want to address. By image pre-processing, deep learning, and feature extracting, I effectively enhanced performance on t(9;22) chromosome translocation detection.
3. Data
The raw chromosome images were collected from 200 different individual samples, provided by Dr. Liu. Each image was karyotyped and assigned with correct labels. All images were received in a de-identified format to protect the identity of patients. Figure 3 shows two of the samples. I extracted the images of chromosome 9 and 22 from each raw image and created image sets that contain only these two chromosomes. Chromosomes in each image have no background interference,
Figure 3. The image of regular chromosomes 9 and chromosomes 22 (left) and the image of chromosomes 9 and chromosomes 22 after t(9;22) translocation (right).
have clear bands, and each image is labeled based on whether the corresponded patient has t(9;22)chromosome translocation.
CNN performs better and converges quicker on images that are appropriately preprocessed. The images I created based on chromosome 9 and 22 from each sample have varying height and width due to the variance in chromosome sizes. To put the data into CNN for training, I modifed each image to obtain a uniform image size of 90 × 90 with a pixel value of 72. Figure 4 shows the image sets after preprocessing.
4. System Design
The structure of the proposed ResNet is illustrated in Figure 5. The model consists of five stages in which the first stage consists of one convolutional layer with ReLu activation function, a Batch Normlaization layer, and an Average Pooling layer and the remaining four stages include one convolutional block with varying number of identity blocks. At the end there is another Average Pooling layer, a flatten layer, and a softmax activation function. Comparing to traditional ResNet, I change the Max Pooling layers to Average Pooling layers, which demonstrate a better performance. This may be caused by the fact that while detecting if a chomosome has t(9;22) translocation, the artificial intelligence model wants to examine the pixle values of an entire region of the sample (e.g. If a region that should have pixels with values corresponded to the appearance of a chromosome has pixels that all have such a value) instead of focusing on the maximum pixel value of a region (e.g. If a region that should have pixels with values corresponded to the appearance of a chromosome only has a few pixels that have such a value). I also add a layer of classifier to the traditional ResNet and use transfer learning to feed the model.
ResNet is one type of Convolutional Neural Network (CNN). CNN was originally proposed by LeCun in 1989. Classic CNN architectures include AlexNet, LeNet, and VGGNet [12] [13] [14]. The basic unit of CNN is the neuron, which transfers information to and received information from other neurons. Each neuron uses a linear function and an activation function to process information received from other input nodes. The learning process is also reinforced by back-propagation procedure [15]. CNN uses loss functions to differentiate expected values and actual results. The following describes the linear (formula 4.1), activation (formula 4.2), and loss functions (formula 4.3) of the CNN system respectively:
Figure 4. The image of regular chromosomes 9 and 22 after preprocessing (left) and that of translocated chromosomes 9 and 22 (right).
(1)
(2)
(3)
Unlike other traditional neural networks, the CNN also has a convolutional layer and a pooling layer for feature extraction. The details for these two layers are described below:
1) Convolutional layer: Convolutional layers generally perform a convolution operation to extract features from images. A convolution operation is a mathematical computation between two real variable parameters and is a key component of CNN. Initial convolutional layers in CNN extracts low-level features, in my experiment may be the edges or shapes of chromosomes. Other convolutional layers in CNN will extract more complex features from each images. A sophisticated feature map will be obtained using multiple layers of convolution operations. The mathematical representation of a convolutional layer is as follows:
(4)
2) Pooling Layer: Pooling layer is located after the convolutional layer and is responsible for compressing the input to extract only main features, therefore making the feature map smaller and simplifying the complexity of computing. Common Pooling layer includes Max Pooling layer and Average Pooling layer. The mathematical model for pooling layer is as formula 4.5:
(5)
CNN achieves a high capability of detecting t(9;22) chromosome translocation by training with existing chromosome abnormality images. Currently, CNNs usually use ReLU activation function to increase generalization ability of the model. Such practice has proved to have good performance [16]. I also employed this to maintain good performance for my model.
Kaiming He and others first proposed Residual Network (ResNet) [17]. The 152 layer model they trained using ResNet achieved a top-5 accuracy of 96.43% at ILSVRC 2015 (ImageNet Classification) [18]. The main idea of ResNet is similar to that of Highway Network [19]. Unlike traditional network structure, which only allows non-linear transformations, Highway Network allows the model to store a portion of previously computed output. ResNet also has such a mechanism by including a directly-connected channel so input can be directly transferred to later layers.
If a layer receives an input X, the feature it will learn will be denoted as H(x). The residual that I want the layer to learn will therefore be F(x) = H(x) − x, so the original feature it will learn becomes F(x) + x. Theoretically, if the residual approaches zero, the layer will only directly pass the feature learned to next layer without losing performance accuracy. The residual will not be zero in reality, but the existing residual will increase the performance of the model by allowing the layer to learn additional features from input. Figure 6 is a general composition of ResNet.
ResNet further establishes residual learning to reduce the effect of gradient descent as more layers are added to the model. With residual learning, the network becomes more sensitive to additional small changes. The residual function usually has small responsive variables and has a shortcut connection. From Figure 6 it can be illustrated that the shortcut path has two layers, which are illustrated in the following formulas. The σ represents a nonlinear function.
1)
(6)
Then passing through a shortcut and a second ReLu function, obtaining output y, as described below:
2)
(7)
when the dimensions of input and output need to be changed, such as modifying the number of channels, a non-linear transformation
can be applied to x as described below.
(8)
5. Experimental Result
The training process is done using a computer with a system of Ubuntu 16.04, memory of 32 GB, and GPU of Nvidia GTX 1080T. After adjusting parameters, I finally train the network with twenty epochs. The results of four trials I conducted are presented in Table 1. The experiment shows that my model is able to achieve an average of 97.5% accuracy on the validation set, which demonstrates its power.
Figure 7 shows the accuracy change on training and validation sets, in which the horizontal axis represents the number of epochs from zero to twenty and vertical axis represents accuracy from 0% to 100%. The green and black curves
Figure 7. Loss graphs. (a) The result for trial 1; (b) The result for trial 2; (c) The result for trial 3; (d) The result for trial 4.
in Figure 7 represent the accuracy loss and indicates a reliable model as they are approaching near zero. The blue and red curves represent accuracy and their horizontal asymptotes near 1 would demonstrate the accuracy of my model.
Based on Figure 7, it is illustrated that after 20 epochs this model has an accuracy near 100% and an accuracy loss near 0, except trial 2 in which the train loss curve fails to approach 0. Based on the trend of all four curves, it can be concluded that this model has no obvious over-fitting and is effective in detecting t(9;22) chromosomal translocation.
6. Conclusions and Future Work
6.1. Conclusion
Detecting chromosomal abnormalities effectively has great clinical significance. It is crucial to detect gene abnormalities in patients at the earliest stage to ensure effective treatment on time. Currently most detection tasks require professionals to manually perform and are time-consuming. While there are some applications of chromosome classification programs, such attempts are limited and have not focused on detecting specific chromosomal abnormalities. In this paper I train a Convolutional Neural Network, which increasingly shows great performance in image recognition and classification tasks in many fields, to detect a specific chromosomal abnormality—t(9;22) chromosome translocation. My work shows the great potential of artificial neural networks have on clinical diagnosis, especially in detecting chromosomal abnormalities.
6.2. Future Work
In recent years, deep neural networks are gaining increasing attention. My work has contributed to the research on deep learning and chromosome abnormality diagnosis and provided a new approach to address this issue. However, there remain areas for further developments. I think the following can contribute to future studies:
1) Increasing the number of samples to further increase this model’s ability in detecting abnormalities.
2) Improving image enhancement, filter, and accurate chromosomal segmentation to provide more distinguished samples to train artificial neural networks
3) Examining the performance of ResNet in detecting other types of chromosomal abnormalities.
4) Examining the performance of other artificial neural networks on chromosome abnormality diagnosis.
I hope as I analyze and address these and other remaining areas and apply technologies to the field of chromosome abnormality diagnosis, the efficiency in detecting such issues will continue to be improved.
Acknowledgements
I thank Ms. Emily Tucci and Mr. Jaffe, my school teachers of science research program, for giving me encourage and support all the time; I also thank Dr.Liu Ping for teaching me the basic knowledge about cytogenetics and supplying the raw chromosome images.