Deep Learning for Osteoporosis Classification using Convolutional Neural Network on Knee Radiograph Dataset to Compare Classification Accuracy between RGB Images and Grayscale Images

Journal of Research in Medical and Dental Science
eISSN No. 2347-2367 pISSN No. 2347-2545

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Research - (2023) Volume 11, Issue 8

Deep Learning for Osteoporosis Classification using Convolutional Neural Network on Knee Radiograph Dataset to Compare Classification Accuracy between RGB Images and Grayscale Images

Usman Bello Abubakar1, Moussa Mahamat Boukar2, Steve Adeshina3 and Senol Dane4*

*Correspondence: Senol Dane, Department of Physiology, Nile University of Nigeria, Nigeria, Email:

Author info »


Osteoporosis is a silent killer disease in the elderly that causes bone fragility and fractures. An early and precise diagnosis of osteoporosis is necessary to save the patient's life. Osteoporosis, along with bone fracture: its clinical manifestation is a complex illness towards which a great deal of research has been made upon. Machine Learning (ML) and Deep Learning (DL) advancements have allowed the area of Artificial Intelligence (AI) to make significant progress in complicated data environments where humans' capacity to find high-dimensional connections is limited. This work focuses on determining the appropriate image format to be used by a deep learning model to predict osteoporosis from knee radiograph. The image format considered in the study is RGB images and Grayscale images. Two Convolutional Neural Network (convent) of the same structure were trained on set of same images with RGB and grayscale. The result showed that the convent trained on grayscale images had a better accuracy than the convent trained on RGB images.


Osteoporosis, Fracture risk assessment tool, Dual-energy X-ray absorptiometry, Bone mineral density


Osteoporosis is a complex disease in which bone quantity and quality decline, increasing bone fragility”. The clinical result of osteoporosis fracture affects 8.9 million people globally every year, posing a significant health, societal, and economic burden [1]. The bone, which gives shape to the entire body structure, is made up of tissues composed of organic and inorganic elements. Bone remodeling is the process of maintaining a healthy skeleton by the replacement of old bone with new bone. Bone loss occurs when a smaller volume of bone is replaced for a bigger volume of bone removed. This loss of bone tissue causes skeletal architecture to become disorganized, increasing the risk of fracture by making the bone fragile. This imbalance that causes such bone fragility is primarily common in people above the age of 80 or simply menopausal women [2-4].

The measurement of Bone Mineral Density (BMD) by dual-energy X-ray absorptiometry is used to detect osteoporosis (DXA). BMD testing is the most dependable method of screening for osteoporosis. Bone strength is calculated by multiplying “Bone Mineral Content (BMC) by bone area”. It is the quantity of bone mass per unit volume or area as shown in equation [5].


The examinations are generally performed on skeletal locations, such as the lumbar spine and the femoral neck [6]. The most common procedure for confirming an osteoporosis diagnosis is DXA. The World Health Organization (WHO) recommends that BMD measures be taken primarily at the hip and spine. Based on BMD measurements obtained by DXA, T-scores and Z-scores as reported by BMD test results have been defined by WHO and National Osteoporosis Foundation (NOF) as osteoporosis criteria [7].

However, “this technique does not take into account clinical risk factors or other bone metrics such as trabecular bone score, geometry, and so on. Old age, gender: female, ethnicity, inheritance, previous fracture, malnutrition, alcohol intake, current smoking, vitamin D deficiency, physical inactivity, various medicines, and medical disorders are the main clinical risk factors for osteoporotic fractures and altered bone metabolism.” The Fracture Risk Assessment Tool (FRAT) is the most widely used and well-established clinical approach for fracture prediction (FRAX). It is founded on a number of previously established clinical risk factors, as well as BMD [8]. As indicated in Figure 1, osteoporosis is a complicated disorder that can be linked to a number of factors, including demographic information, patient clinical records pertaining to disease diagnoses and treatments, family history, diet, and lifestyle [9].


Figure 1. Osteoporosis Risk Factors.

Artificial Intelligence (AI) has been employed as a screening tool and as an adjunct technology for image classification [10]. According to a review study published in 2019, recent advances in AI have resulted in practical applications that aid in the diagnosis of osteoporosis [11]. Osteoporosis has been classified with AI using dental radiographs [12, 13] spine radiographs [14], hand
and wrist radiographs [15, 16].

Literature Review

Wrapper-based feature selection was used to compare classification algorithms for osteoporosis prediction. Multilayer Feed-forward Neural Networks (MFNN) [17], Nave Bayes, and logistic regression were employed as classification techniques. To find a subset of important SNPs, a wrapper-based feature selection method was applied. The MFNN model using the wrapper-based method was found to be the best predictive model for inferring disease risk in Taiwanese women based on the complex relationship between osteoporosis and SNPs. Patients “and doctors can use the proposed technology to improve decision-making based on clinical factors such as SNP genotyping data, according to the findings.”

The study proposed by [18] “investigates whether using clinical data in conjunction with deep learning to diagnose osteoporosis from hip radiographs improves diagnostic performance over using the image mode alone.” Between 2014 and 2019, they obtained 1131 images for objective labeling from individuals who had skeletal bone mineral density testing as well as hip radiography at the same medical hospital. Five Convolutional Neural Network (CNN) models were used to assess osteoporosis from hip radiographs. They also looked into ensemble models, which contained clinical characteristics in each CNN. The following were assigned to each network





Negative Predictive Value (NPV)

F1 score, and

Area under Curve.

The five CNN models were tested using only hip radiographs, and Google Net and Efficient Net b3 had the highest accuracy, precision, and specificity. When patient variables were incorporated, Efficient Net b3 had the highest accuracy, recall, NPV, F1 score, and AUC score among the five ensemble models. When clinical factors from patient records were included to the CNN models, they improved even more in detecting osteoporosis from hip radiographs.

Deep learning techniques for screening osteoporosis with DPR pictures were created and assessed by Lee ki sun et al [19]. This study tested various CNN models based on osteoporosis discriminating accuracy using classified panoramic radiograph images based on the BMD value (T-score). The impact of transfer learning and fine-tuning a deep CNN model on classification performance was also investigated. Deep CNNs have been discovered to be effective image classification tools, but they require a huge quantity of training data, which may make them challenging to apply to medical radiographic image data. Transfer learning is a powerful approach for training deep CNNs without overfitting when the target dataset is significantly smaller than the basis dataset. The most common method for transfer learning is to use pertained models in a two-step process: On a large general dataset, the first n layers of a pretrained base network are copied to the first n levels of a target network, and then the remaining layers of the target network are randomly begun and trained toward the target task on a small local dataset

Materials and Methods

Data Acquisition

he dataset comprises of 57 normal knee radiograph images and 323 osteoporotic knee radiograph images of patients acquired from Kaggle: an online deep learning data repository. Table 1 below shows the splitting of image data into train-test-val folder. This project structuring is recommended so as to properly train the Deep Learning model.

Category Total (T) Training Validation Testing
(0.8*0.8*T) (0.2*0.8*T) (0.2*T)
0(Normal) 57 36 9 11
1(osteoporosis) 323 207 52 65

Table 1: Image Distribution.

Data Augmentation

Overfitting can be reduced by using a technique called data augmentation. When a machine learning model or neural network learns a function with a very large variance in order to perfectly model the training data, this is known as overfitting.

Image data augmentation is a technique for artificially increasing the size of a training dataset by making modified versions of the images in the dataset. Deep learning neural network models that have been trained on more data are more accurate, and augmentation approaches can provide image variations that boost the models' ability to generalize what they've learned to new images [22].

For this study, the Keras Image Data Generator python class was used to perform data augmentation. It allows for quickly and easily enhancing images using a variety of augmentation techniques as itemized below





Brightness changes, among others.

The key advantage of utilizing the Keras Image Data Generator class, however, is that it is intended to give real-time data augmentation. At each epoch, the Image Data Generator class guarantees that the model receives new variations of the images. The Image Data Generator implementation for our data augmentation is shown in Figure 2.


Figure 2. Data Augmentation.

Image Scaling

The pixel values in most image data are integers with values ranging from 0 to 255. Inputs with big integer values can disrupt or slow down the learning process in neural networks, which process inputs with modest weight values. As a result, picture normalization is a recommended practice: normalize image pixel values so that each pixel value is between 0 and 1.

Image pixel values are normalized by dividing all pixel values by the biggest pixel value, which are 255. This is done across all channels, regardless of the image's actual range of pixel values. The image was normalized for numerical stability and convergence. This ensures the neural network has a higher chance of converging and the gradient descent algorithm is way more likely to be stable.

The images in the dataset were normalized (rescaled) using the Image Data Generator method and passing rescale=1. /255 as its argument.

Image Resizing

In computer vision, resizing images is an important preprocessing step. Machine learning algorithms, in general, train quicker on smaller images. When an input image is twice as large, a network must learn from four times as many pixels, which takes time. Furthermore, the designs of deep learning models demand that our images be the same size, and most raw collected images vary in size.

The size of the images in our dataset is relatively large and varies in size. This sort of inconsistency would result in poor accuracy of our network and thus, the images all need to be of the same height and width. The ideal way to deal with different sized images is to downscale them to match dimensions from the smallest image available.

Image Data Format

The two image formats considered in this study are RGB images and Grayscale images. The dataset consists of images in RGB format. An RGB (red, green, blue) image is a three-dimensional byte array in which each pixel's color value is explicitly stored. RGB image arrays are made up of three-color channels and a width and height [23]. As demonstrated in Figure 3, an RGB image can be regarded logically as three separate images (a red scale image, a green scale image, and a blue scale image) placed on top of each other.


Figure 3. RGB Image Representation.

An image in RGB format increases the complexity of the model. This is the reason why gray scale images are preferred over colored ones to simplify mathematics. It is relatively easier to deal with (in terms of mathematics) a single-color channel (shades of white/black) than multiple color channels (shades of Red, Green, and Blue).

A monochrome image, often known as a one-color image, is a grayscale image. As illustrated in Figure 4, it simply provides brightness information and no color information. The grayscale data matrix values are then used to indicate intensities. A common image has 8 bits per pixel, allowing it to represent (0-255) distinct brightness (gray) levels [24]. And Figure 5 shows the code that was used to iteratively convert images to grayscale.


Figure 4. Grayscale Image Representation.


Figure 5. RGB to Grayscale Python Code.

From Figure 5, the variables source path and destination path serves as the path to RGB images and path to store new grayscale images respectively. The code in the image depicts the conversion of images in the train folder for osteoporosis cases. The same code was executed for the images in the test and Val folder as appropriate. Figure 6 shows an RGB image on the left and its corresponding grayscale equivalent on the right.


Figure 6. RGB and Grayscale Equivalent.

CNN Model Architecture

The most widely used deep learning architecture is Convolutional Neural Networks (CNNs). CNN is mostly utilized in image-related tasks. CNN's popularity has grown as a result of its capacity to extract essential patterns from photos without the involvement of humans. For example, a trained CNN model with enough instances of photos may automatically determine whether a given image is a dog or cat for a given data set of dog and cat. This is achievable by extracting some important features in different layers from the pixel information of the image. CNN model is efficient in image classification, and it has composed layers of convolution, pooling and fully connected layers.

Two proposed CNN architecture were implemented: one for RGB image and another for Grayscale images. However, the architecture of the CNN was similar with the exception of the input layer due to the difference in number of channels in RGB as opposed to grayscale. Three convolutional layers, two max-pooling layers, and a fully connected layer constituted the network. The images of input shape 1000 x 1000 were sent into the network as input. The conditional probability distribution p (YX) over the two categories of osteoporosis (1) and normal was the network's output (0). The network is made up of three convolutional layers, each followed by a max pooling layer and finally a ReLU activation layer. In the grayscale convolutional layer, the kernels were 5x5 and in the RGB convolutional layer, they were 5 x 5 x 3. Two fully connected layers with a sigmoid function followed the final convolutional layer.

K-Fold Cross Validation

Cross-validation is a technique used to prevent overfitting in a prediction model, which is especially useful when the amount of data available is restricted. In cross-validation, a value for K is chosen, and the dataset is divided into K equal-sized partitions. After splitting, K–1 partitions are used for training and one for testing. This method is done K times, each time with a distinct partition utilized for testing. This study makes use of 5-fold cross validation.


Model Evaluation on RGB Images

The CNN was tested on unseen data using the Keras evaluate function. The function was passed one argument which served as the path to the folder containing the testing images. The function returned a test loss of 52% and a test accuracy of 83% as seen in Figure 7. During model training, for evaluating the model on the RGB image, a graph was created to denote a plot of accuracy on the training and validation datasets over training epochs as shown in Figure 8.


Figure 7. Model test accuracy and test loss on RGB Images.


Figure 8. Graph of model accuracy on RGB Images.

Model Evaluation on Grayscale Images

The CNN was tested on unseen data using the Keras evaluate function. The function was passed one argument which served as the path to the folder containing the testing images. The function returned a test loss of 60% and a test accuracy of 86% as seen in Figure 9. During model training, for evaluating the model on the grayscale image, a chart was created to denote a plot of accuracy on the training and validation datasets over training epochs as shown in Figure 10.


Figure 9. Model test accuracy and test loss on Grayscale Images.


Figure 10. Model accuracy on Grayscale Images.

Confusion Matrix

A confusion matrix is a table that summarizes the results of classification problem prediction. Count values are used to sum and break down the number of correct and incorrect predictions by class. The confusion matrix shows the different ways in which the network model gets confused while making predictions. The confusion matrix for the convents generated by the python scikit learn package, as denoted in the snippet given in Figure 11, is presented in Figure 12 and 13.


Figure 11. Confusion Matrix code.


Figure 12. Confusion Matrix for RGB Convent.


Figure 13. Confusion Matrix for Grayscale Convent.

Scikit Learn Classification Report on Grayscale convent


Precision is the ratio between the True Positives and all the Positives in Figure 14. It “is the measure of patients that we correctly identify having osteoporosis disease out of all the patients actually having it.”


Figure 14. Classification Report Table.


The “recall is the measure of our model correctly identifying True Positives. Thus, for all the patients who actually have osteoporosis disease, recall tells us how many we correctly identified as having an osteoporosis disease.”


This “is a combination of precision and recall. It is the harmonic mean of the model’s precision and recall.”


This study proved that CNNs can detect osteoporosis from knee radiographs with a reasonable degree of accuracy. The convent that was applied on grayscale images performed slightly better than the convent that was applied on RGB images. The accuracy of the convent for grayscale images was 86% while that of RGB images was 83%.

The slight increase is due to the fact that, grayscale image has a smaller number of channels than RGB and thus, RGB increases the complexity of a network model. Also, the slight increase would be that despite the RGB images are composed of 3 channels, knee radiograph images are generally monochrome and thus, the RGB channels are not too pronounced. The difference in the accuracy is not much but in Deep Learning for medical diagnosis, any slight improvement in model accuracy is significant. The convents were trained utilizing the GPU on google colabs


  1. Johnell O, Kanis JA. An estimate of the worldwide prevalence and disability associated with osteoporotic fractures. Osteoporos Int 2006; 17:1726-33.
  2. Indexed at, Google Scholar, Cross Ref

  3. Kanis JA, Kanis JA. Assessment of fracture risk and its application to screening for postmenopausal osteoporosis: synopsis of a WHO report. Osteoporos Int 1994; 4:368-81.
  4. Indexed at, Google Scholar, Cross Ref

  5. Looker AC, Orwoll ES, Johnston JR CC, et al. Prevalence of low femoral bone density in older US adults from NHANES III. J Bone Miner Res 1997; 12:1761-8.
  6. Indexed at, Google Scholar, Cross Ref

  7. Wahner HW, Looker A, Dunn WL, et al. Quality control of bone densitometry in a national health survey (NHANES III) using three mobile examination centers. J Bone Miner Res 1994; 9:951-60.
  8. Indexed at, Google Scholar, Cross Ref

  9. Devikanniga D. Diagnosis of osteoporosis using intelligence of optimized extreme learning machine with improved artificial algae algorithm. Int J Intell Syst 2020; 1:43-51.
  10. Indexed at, Google Scholar, Cross Ref

  11. Klibanski A, Adams-Campbell L, Bassford T, et al. Osteoporosis prevention, diagnosis, and therapy. J Am Med Assoc 2001; 285:785-95.
  12. Indexed at, Google Scholar, Cross Ref

  13. Devikanniga D. Diagnosis of osteoporosis using intelligence of optimized extreme learning machine with improved artificial algae algorithm. Int J Intell Netw 2020; 1:43-51.
  14. Indexed at, Google Scholar, Cross Ref

  15. Kanis JA, Johansson H, Oden A, et al. Assessment of fracture risk. Eur J Radiol 2009; 71:392-7.
  16. Indexed at, Google Scholar, Cross Ref

  17. Abubakar UB, Boukar MM, Adeshina S. Evaluation of Parameter Fine-Tuning with Transfer Learning for Osteoporosis Classification in Knee Radiograph. Int J Adv Comput Sci Appl 2022; 13.
  18. Indexed at, Google Scholar, Cross Ref

  19. Lee S, Choe EK, Kang HY, et al. The exploration of feature extraction and machine learning for predicting bone density from simple spine X-ray images in a Korean population. Skelet Radiol 2020; 49:613-8.
  20. Indexed at, Google Scholar, Cross Ref

  21. Ferizi U, Honig S, Chang G. Artificial intelligence, osteoporosis and fragility fractures. Curr Opin Rheumatol 2019; 31:368.
  22. Indexed at, Google Scholar, Cross Ref

  23. Hwang JJ, Lee JH, Han SS, et al. Strut analysis for osteoporosis detection model using dental panoramic radiography. Dentomaxillofac Radiol 2017; 46:20170006.
  24. Indexed at, Google Scholar, Cross Ref

  25. Lee KS, Jung SK, Ryu JJ, et al. Evaluation of transfer learning with deep convolutional neural networks for screening osteoporosis in dental panoramic radiographs. J Clin Med 2020; 9:392.
  26. Indexed at, Google Scholar, Cross Ref

  27. Dimai HP, Ljuhar R, Ljuhar D, et al. Assessing the effects of long-term osteoporosis treatment by using conventional spine radiographs: results from a pilot study in a sub-cohort of a large randomized controlled trial. Skelet Radiol 2019; 48:1023-32.
  28. Indexed at, Google Scholar, Cross Ref

  29. Areeckal AS, Jayasheelan N, Kamath J, et al. Early diagnosis of osteoporosis using radiogrammetry and texture analysis from hand and wrist radiographs in Indian population. Osteoporos Int 2018; 29:665-73.
  30. Indexed at, Google Scholar, Cross Ref

  31. Tecle N, Teitel J, Morris MR, et al. Convolutional neural network for second metacarpal radiographic osteoporosis screening. J Hand Surg 2020; 45:175-81.
  32. Indexed at, Google Scholar, Cross Ref

  33. Chang HW, Chiu YH, Kao HY, et al. Comparison of classification algorithms with wrapper-based feature selection for predicting osteoporosis outcome based on genetic factors in a Taiwanese women population. Int J Endocrinol 2013.
  34. Indexed at, Google Scholar, Cross Ref

  35. Yamamoto N, Sukegawa S, Kitamura A, et al. Deep learning for osteoporosis classification using hip radiographs and patient clinical covariates. Biomolecules 2020; 10:1534.
  36. Indexed at, Google Scholar, Cross Ref

  37. Lee KS, Jung SK, Ryu JJ, et al. Evaluation of transfer learning with deep convolutional neural networks for screening osteoporosis in dental panoramic radiographs. J Clin Med 2020; 9:392.
  38. Indexed at, Google Scholar, Cross Ref

  39. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. ArXiv: 1409.1556. 2014.
  40. Indexed at, Google Scholar

  41. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data 2019; 6:1-48.
  42. Indexed at, Google Scholar, Cross Ref

  44. Padmavathi K, Thangadurai K. Implementation of RGB and grayscale images in plant leaves disease detection–comparative study. Indian J Sci Technol 2016; 9:1-6.
  45. Indexed at, Google Scholar, Cross Ref

  46. Cushing S. My Revision Notes Edexcel GCSE Computer Science. Hachette UK; 2015.
  47. Google Scholar

Author Info

Usman Bello Abubakar1, Moussa Mahamat Boukar2, Steve Adeshina3 and Senol Dane4*

1Department of Computer Science, Baze University, Abuja, Nigeria
2Department of Computer Science, Nile University of Nigeria, Abuja, Nigeria
3Department of Computer Engineering, Nile University of Nigeria, Abuja, Nigeria
4Department of Physiology, Nile University of Nigeria, Abuja, Nigeria

Citation: Usman Bello Abubakar, Moussa Mahamat Boukar, Steve Adeshina, Senol Dane, Deep Learning for Osteoporosis Classification Using Convolutional Neural Network on Knee Radiograph Dataset to Compare Classification Accuracy between RGB Images and Grayscale Images, J Res Med Dent Sci, 2023, 11(8):28-34.

Received: 26-Jun-2023, Manuscript No. jrmds-22-58059; Accepted: 29-Jun-2023, Pre QC No. jrmds-22-58059; Editor assigned: 29-Jun-2023, Pre QC No. jrmds-22-58059; Reviewed: 13-Jul-2023, QC No. jrmds-22-58059; Revised: 18-Jul-2023, Manuscript No. jrmds-22-58059; Published: 25-Aug-2023