Personal Assistant Development by CED (Canine Eye-disease Detection)

ABSTRACT


INTRODUCTION
Artificial Intelligence (AI) has a well-documented history of displaying discriminatory trends within the context of human diseases [1][2][3][4][5][6][7][8].However, the progress of AI in the veterinary domain has been impeded due to limitations in veterinary image datasets [9].Obtaining consent from animal patients poses challenges, and veterinarians often refrain from capturing images during visits due to the hectic hospital environment.Despite the scarcity of accessible data, there exists a substantial demand for deep-learning (DL) applications among both veterinarians and pet owners, offering potential benefits across various aspects [10].
In light of these challenges, DL applications have the potential to enhance the accuracy and effciency of diagnosis, ultimately leading to timely interventions and improved outcomes for animal patients.This study specifcally focuses on achieving application in daily life, aiming to accurately diagnose diseases using casual images obtained by pet owners, despite the majority of training images being captured in hospitals.
There has been a growing focus on the exploration of DL algorithms applied to the classification of eye-related diseases through image analysis.For instance, Junayed et al. introduced CataractNet, a deep neural network designed for the automatic detection of cataracts in fundus images [11].Their proposed network demonstrates superior performance compared to existing methodologies, achieving an average accuracy of 99.13% while simultaneously reducing computational cost and runtime.Li et al. presented a DL system aimed at classifying keratitis, various corneal abnormalities, and normal corneas based on slit-lamp images [12].Christopher identified glaucomatous damage in optic nerve head (ONH) fundus images [13].Aranha's study encompassed cataracts, diabetic retinopathy, excavation, and blood vessels; however, individual binary classification networks were utilized to differentiate between normal and abnormal images  ISSN: 2089-3272 IJEEI, Vol.11, No. 4, December 2023: 1129 -1142 1130 [14].The majority of research in the field of ophthalmological image analysis has been confined to a select subset of diseases such as cataracts, corneal diseases, and glaucoma [11].In contrast, our approach utilizes a unified model capable of identifying multiple diseases manifesting in distinct anatomical locations and applies to the mobile devices for the purpose of personal assistant In deep learning, both MobileNet and SqueezeNet have been specifically devised to address the challenge of deploying deep learning models on mobile and embedded devices characterized by restricted computational resources.MobileNet achieves this by implementing depth-wise separable convolutions, thereby reducing both the model's size and computational demands.Conversely, SqueezeNet attains a compact model size through a judicious combination of 1x1 and 3x3 convolutions.While MobileNet generally exhibits superior accuracy compared to SqueezeNet, the latter effectively strikes a balance between model size and accuracy.In scenarios where the prioritization lies in minimizing model size and computational resources, SqueezeNet emerges as the apt choice, particularly for applications marked by constrained memory and processing capacities.
In this paper, we apply a deep learning-based electronic dog eye disease diagnostic technology and develop a pet health management system using it.Recently, deep learning has been applied in various fields, and health diagnosis is one of the major application areas.Especially, with the rapid increase in the number of pet dogs, managing their health has become crucial.However, it is challenging for non-experts to do so.While visiting an animal hospital is an option, it may be difficult to determine when an illness has occurred, and if there is no hospital nearby, there may be no basis for judgment.By utilizing lightweight deep learning methods such as MobileNet or SqueezeNet, we make it possible to apply the technology to mobile devices.This allows individuals to regularly check their pet's eye health, and if a problem is detected, they can be promptly guided to a nearby hospital for necessary measures against diseases, enabling them to maintain their health.The validity of the developed method was demonstrated through experiments on 5 eye diseases.The results confirmed the difference in sample size and recognizability according to the deep learning method, considering the recognition success rate and establishing an appropriate benchmark model.

DEEP LEARNING FOR MOBILE DEVICES AND PERSONAL ASSISTANT 2.1. Deep Learning for Mobile Devices CNN
The network architecture design refers to the fundamental framework of the network, which delineates the quantity of units and the interconnections among distinct groups of units, typically referred to as layers.
The architecture of a Convolutional Neural Network (CNN) encompasses the arrangement of convolutional, activation, pooling, and additional layers within a unified structure.It prescribes the overarching arrangement of the complete CNN, addressing inquiries pertaining to the total number of layers, the specific unit quantities within each layer, and the manner in which these units are interconnected.

MobileNet and SqueezeNet
MobileNet and SqueezeNet are two prominent convolutional neural network (CNN) architectures designed specifically for efficient and lightweight deep learning on devices with limited computational resources.Despite sharing the common objective of minimizing computational and memory demands compared to conventional CNNs, they employ distinct methodologies to achieve this aim.This comparative analysis delves into the principal characteristics, architectural disparities, performance metrics, and practical applications of MobileNet and SqueezeNet.
Introduced by Howard et al. in 2017 [15], MobileNet places a strong emphasis on model efficiency while maintaining accuracy.It accomplishes this feat by employing depth-wise separable convolutions, which represent a factorized variant of conventional convolutions.Traditional convolutions entail the application of a singular filter across the entire input volume, resulting in a substantial computational overhead.In contrast, depth-wise separable convolutions partition the convolution process into two discrete operations: depth-wise convolutions and point-wise convolutions.Depth-wise convolutions individually apply a single filter to each input channel, effectively diminishing the computational complexity.Subsequently, point-wise convolutions amalgamate the outputs of depth-wise convolutions through 1x1 convolutions, facilitating the capture of cross-channel interactions.By disentangling spatial and channel-wise filtering, MobileNet significantly curtails the number of parameters and computations, rendering it particularly suitable for deployment on mobile and embedded devices.[16], addresses the imperative of model efficiency through the incorporation of fire modules and the application of aggressive compression methodologies.Fire modules are composed of a squeeze layer employing 1x1 convolutions, serving to diminish the quantity of input channels.This is succeeded by expand layers, which employ a combination of 1x1 and 3x3 convolutions to encapsulate spatial information.The squeeze layer functions as a bottleneck layer, effectively reducing computational complexity, while the expand layers facilitate the restoration of information that might have been lost.Additionally, SqueezeNet pioneers the "squeeze-excitation" paradigm to model interdependencies among channels.This approach leverages a concise set of global statistics to dynamically scale the output of each channel, thereby augmenting the representational capacity of the network.In architectural terms, MobileNet and SqueezeNet exhibit discernible distinctions.MobileNet encompasses multiple layers, including depth-wise separable convolutions, succeeded by global average pooling and a fully connected layer for classification.The architecture is amenable to customization through the adjustment of hyper parameters like depth multiplier and input resolution, allowing for a judicious balance between accuracy and efficiency.In contrast, SqueezeNet is characterized by fire modules interspersed with pooling layers and 1x1 convolutions, adhering to the conventional convolutional neural network (CNN) structure with convolutional layers, activation functions, and fully connected layers for classification.
The comparative evaluation of MobileNet and SqueezeNet encompasses various facets.MobileNet achieves commendable accuracy in image classification tasks, concurrently demonstrating substantial reductions in model size and computational complexity vis-à-vis conventional CNNs.This equilibrium between efficiency and accuracy renders it suitable for real-time applications on mobile and embedded platforms.Conversely, SqueezeNet attains accuracy levels akin to larger models, distinguished primarily by its significantly smaller footprint.It achieves notable compression ratios by leveraging aggressive compression techniques, including 1x1 convolutions and parameter sharing.Consequently, SqueezeNet excels in contexts where model size assumes paramount importance, particularly in deployment scenarios involving edge devices characterized by constrained memory and processing capabilities.In Figure 4, some results show the inference time of two methods [17] and SqueezeNet responses fast on constrained conditions(especially the device is old one).The selection between MobileNet and SqueezeNet hinges on the precise exigencies of the application at hand.MobileNet, possessing adaptability and versatility, is an apt choice for a spectrum of resource constraints, while upholding a commendable level of accuracy.It finds relevance in tasks spanning object detection, semantic segmentation, and image classification on mobile platforms.Conversely, SqueezeNet, distinguished by its highly compressed model size, stands out in scenarios where storage and computational resources are severely constrained.It finds utility in low-power devices, applications within the Internet of Things (IoT) domain, and contexts where network bandwidth constitutes a limiting factor.
In summation, both MobileNet and SqueezeNet present proficient solutions for deep learning on resource-limited devices.MobileNet achieves efficiency through the employment of depth-wise separable convolutions, thereby diminishing both parameter count and computational load.In contrast, SqueezeNet capitalizes on fire modules and assertive compression techniques to substantially diminish model size, all the while upholding competitive levels of accuracy.The preference between the two architectures is contingent upon the specific requisites of the application, with MobileNet embodying versatility and adaptability, and SqueezeNet offering maximal compression for exceedingly constrained resources.

CED(Canine Eye-disease Detection)
The models of MobileNet and SqueezeNet are as follows.First, the used layers are Convolution, Flatten, Fully Connected, and Fully Connected.The first layer describes a specific convolutional layer in the MobileNet architecture.It applies a 5x5 convolution to a single-channel input (such as a grayscale image), resulting in a 7x7 output with 256 channels.This output is then further processed by a 3x3 convolution with 5 channels.These operations are fundamental to how MobileNet processes information in its deep learning architecture.The "Flatten" layer is a common layer in many neural network architectures, including CNNs.Its purpose is to convert the multidimensional output of the previous layer into a one-dimensional array.Also known as a dense layer, the "Fully Connected" layer is a fundamental component in neural networks.In this layer, every neuron is connected to every neuron in the preceding and succeeding layers.
And the hyperparameters we used are specific settings that are crucial for training a machine learning model in MobileNet.

Personal Assistant
The development of an app utilizing AI to recognize and examine a dog's eyes represents an effort to leverage technology for canine health improvement.While this app may not be an exceedingly specialized project, it serves the purpose of exploring the potential of AI in the field of dog eye recognition.
The research results demonstrate a reasonable level of accuracy in recognizing and analyzing dog eyes for potential ailments.However, it is important to acknowledge the limitations of this app, which is not highly specialized, and therefore may exhibit expectedly lower accuracy and restricted functionality.As a result, the primary objective of this app is to aid in the early assessment of a dog's health rather than providing definitive diagnoses.Therefore, for precise diagnosis and treatment, it is advised to visit a nearby veterinary clinic if a disease is suspected.
User feedback and performance evaluations of the app play a crucial role in future enhancements.To improve the app's accuracy, additional refinement of the AI algorithms and the acquisition of more canine eye data will be necessary.
Overall, this research demonstrates the potential of artificial intelligence in the field of dog eye recognition.With further development, improvement, and integration driven by user feedback, this app has the potential to positively impact canine well-being, providing valuable support for both dog owners and veterinarians in monitoring and enhancing eye health.
The program is structured with three main screens.The initial screen serves as a cover page with menu selection options.Users can perform a preliminary diagnosis of their dog's condition through the "Eye health test" menu and easily proceed with the steps for hospital visits through the "Find nearby hospitals" menu.
The cover page is organized as follows: the "Eye health test" menu is located on the left, while the "Find nearby hospitals" menu is positioned on the right in Figure 5.The "Eye health test" menu allows users to begin by touching the magnifying glass icon to capture a photo.Additionally, users have the option to choose between the front and rear cameras if needed.Once the photo is taken, the algorithm utilizes provided experimental data along with existing reference data to conduct a preliminary diagnosis of potential ailments.It then displays the three most likely scenarios, starting from the left, as seen in Figure 6.Particularly for the most probable scenario, an explanation is provided below for easy comprehension of the potential ailment.Users are recommended to verify this information and, in cases of high probability, to visit a nearby animal hospital for further diagnosis and treatment.

Figure 6. Camera diagnosis by CED
The "Find nearby hospitals" menu offers additional functionalities and is primarily designed for services within Korea.As a result, it is structured around a map search displayed in Korean.(Future improvements will aim to enable English and multilingual services.)Users can search for animal hospitals using place names, and the results are presented based on proximity using GPS.This result can also be displayed based on keyword-related accuracy.The menu provides information such as the hospital's address, phone number, directions, and navigation.Additionally, it offers a "Like" count, allowing users to consider it as a criterion when choosing which animal hospital to visit.This criterion is determined based on user feedback and proves to be a helpful aid in searching for animal hospitals.

TEST AND VALIDATION
To conduct the dog eye health examination, we first gathered a diverse set of data.In order to ensure the reliability of the information, we primarily utilized data from animal hospitals.However, high-quality and consistent data were not available due to various reasons, we did the acquisition of data utilizing smartphone cameras.This data source represents information attainable in real-life scenarios and aligns with the objective of this paper: to provide a guide for pet owners in real-life scenarios regarding the diagnosis of their pets' diseases.
At first, we can collect data on 15 different types of eye diseases and also gathered data on the eyes of healthy dogs, but the numbers of samples are not enough for all diseases.For the experiment, it is essential to have an equal number of samples for each disease.Therefore, five diseases with an equal number of samples are selected for experimentation from Table 3.Therefore, experiments were conducted on five diseases with an equal number of applicable instances: Cataracts, Cherry eye, Dry eye, Glaucoma, and Entropion.The samples used for the experiment, as illustrated in Figure 8, totaled 60 in number.Among these, 40 samples were allocated, with each of the five diseases represented by 8 unique instances, without any instances being duplicated.Conversely, there were 20 instances of healthy samples, constituting half of the total eye disease sample set.To conduct an eye health examination using deep learning, it is necessary to first create a model.Since the performance of deep learning is influenced by the amount of benchmark data.In this paper, experiments were conducted in two groups based on the number of samples: a group with a small number of samples and another group with twice as many samples.Firstly, in the group with a small number of samples, tests were performed by altering the quantity of utilized samples.In this context, an equal number of samples were applied for both diseased and healthy cases while varying the total sample count.Subsequently, in the case of using double the number of samples, the quantity of healthy samples remained constant while conducting experiments, which were then compared with those from the group with a smaller sample size.
In scenarios with a smaller sample count, as depicted in Table 4, 2, 3, and 4 samples were utilized for each disease, and an equivalent number of healthy samples were employed to differentiate between healthy and diseased cases.Consequently, for 10 disease sample cases, 10 healthy samples were utilized; for Using this, we constructed three benchmark models based on the number of samples and conducted tests for nine different cases.The experimental results are shown in Figure 9.We considered results with an accuracy of 70% or higher as successful and highlighted them in yellow.In the tests, we displayed the top three cases for each cell with the highest recognition rates in order.While it is common to consider the one with the highest probability, we implemented the feature to display up to the third place with the intention of providing guidance for other possible diseases.Firstly, in the case of 10 samples(Smodel10), meaning each disease used in the model has 2 samples, as seen in the figure, the relatively higher number of healthy cases (10 samples) shows positive recognition results.Conversely, the recognition results for the other diseases are relatively low.In the case of 15 samples (3 samples for each disease,Smodel15), similar results can be observed.However, unlike the case with 10 samples where only healthy cases were recognized, in this case, there is one case each of healthy and diseased cases, but the recognition success rate remains the same at 22% for all test cases.This is still a challenging figure for practical application.
In the case of 20 samples(Smodel20), where 4 samples were used for each disease, the recognition success rate starts to show meaningful results, reaching around 78%.This indicates that using four or more samples enables meaningful recognition, and the actual recognition potential can go up to 99%.However, in this scenario, a significantly larger number of healthy samples, five times more, should be used compared to the number of samples representing the diseased cases.
The confusion matrix of Smodel20 is in Table 5 and the accuracy is as follows: Accuracy=(70+72+71+70+72+91)/596=0.748  Looking at the recognition success rates, SqueezeNet shows a significantly lower rate of 11%(Recognize one of 9 tests) compared to MobileNet's 78%(Recognize 7 of 9 tests).As explained in the previous section, this can be attributed to the performance degradation characteristic of an approach that emphasizes implementation of deep learning in limited and light weight scenarios.
In the case of a larger sample size, as presented in Table 6, for each disease, twice the number of samples compared to the smaller sample group were used, specifically 4, 6, and 8 samples.However, the number of healthy samples remained fixed at 12, a number higher than the relative number of healthy samples for each disease compared to the diseased samples in the smaller sample test.This decision was made to ensure that the fixed number of healthy samples was relatively higher than the sample count for each disease, considering that in the smaller sample test, there were relatively more healthy samples per disease.Therefore, for 20, 30, and 40 disease samples, 12 healthy samples were consistently utilized during the model training.Using this, we constructed three benchmark models based on the number of samples again and conducted tests for eight different cases.The experimental results are shown in Figure 11.As before, we considered results with an accuracy of 70% or higher as successful and highlighted them in yellow.In the tests, we displayed the top three cases for each cell with the highest recognition rates in order.In the case of 20 samples (Dmodel20), despite using an identical set of 20 disease samples, only two instances of success are demonstrated.This alteration stems from the reduction of healthy samples from the previous 20 to 12, leading to an error in recognizing the disease samples as healthy eyes in two out of three attempts, as depicted in Figure 9. Similarly, in the scenario of 30 samples (Dmodel30), doubling the disease sample count also results in only two instances of successful recognition, mirroring the outcomes of Figure 9.These outcomes pose considerable challenges for practical implementation.
However, with 40 samples (Dmodel40), employing eight samples for each disease, the recognition success rate notably improves, reaching approximately 75%.This signifies that when the number of disease samples and healthy samples reaches an optimal level (around 10 each), a model using an equivalent number of samples can sufficiently provide guidelines for pet owners in everyday life for dog eye disease assessment.
Even in scenarios where the disease sample count is low but there is a surplus of healthy samples (Smodel20), recognition becomes possible.However, when the number of disease and healthy samples is similar, Dmodel40 emerges as a practical model, demonstrating the potential for actual recognition capabilities of up to 91%.
The confusion matrix of Smodel20 is in Table 7 and the accuracy is as follows: Accuracy=(75+76+75+75+74+80)/584=0.779The accuracy, as computed using the confusion matrix, resulted in 74.8% for Smodel20 and 77.9% for Dmodel40.Comparatively, this demonstrates a relative 8% decrease in accuracy compared to another study (84.7% [18]), which is chosen for comparison with the perspective of developing a practical deep learning framework for classifying ocular surface disease images in companion animals.However, this discrepancy 8% could be attributed to the disparity between the paper's [18] utilization of data acquired from specific equipment in veterinary clinics, whereas the present study is based on casual images.
At last, we compared the application results of MobileNet and SqueezeNet for double sample(Dmodel40).The sample size was fixed at 40, and for each case, we constructed a baseline model.The experimental results for recognition according to each model are shown in Figure 12.In terms of recognition success rates, SqueezeNet demonstrates a significant inability to recognize as opposed to MobileNet's 75% (identifying 6 out of 8 samples).As observed in Smodel and confirmed in Dmodel, MobileNet appears to be a viable deep learning method in the current experimental scenario.The experiments conducted using Smodel and Dmodel highlighted the importance of appropriately considering the number of disease samples and healthy samples for effective training.Particularly noteworthy is the influence of the quantity of healthy eye samples on the assessment of eye diseases.This is attributed to the absence of specific patterns in healthy eyes, making it challenging to distinguish visual characteristics.Furthermore, given the diverse breeds of pets and variable lighting conditions in real-life settings, this study underscores the possibility of implementing disease recognition based on data acquired from real-life scenarios, utilizing DL methods on mobile devices, leveraging an appropriate number of samples, as opposed to traditional computer vision methods conducted in laboratory settings.

Remark:
The mobile device used in the experiment is the Galaxy Z Flip3 model, running on the Android 13 operating system.It is equipped with the Qualcomm Snapdragon 888 SM8350 Platform and 8GB of LPDDR5 SDRAM.While response times were not explicitly measured during the actual tests, the recognition results appeared promptly, indicating no issues for practical use.Therefore, I did not feel the need to switch from MobileNet to SqueezeNet.

CONCLUSION
In this paper, we have developed a pre-screening system for assessing a dog's eye health using lightweight deep learning techniques, allowing for preliminary examinations before visiting an animal hospital.This system enables anyone, regardless of expertise, to perform diagnoses.We compared the application of two lightweight deep learning methods, MobileNet and SqueezeNet.The examination method developed using these lightweight deep learning techniques demonstrated performance levels suitable for practical use on mobile devices, particularly in terms of processing speed and constrained environments.Furthermore, we observed variations in performance results depending on the quantity of data used for the reference model, highlighting that even with lightweight methods, outcomes can vary.We anticipate that employing a larger set of reference data will enhance recognition capabilities, leading to improved recognition rates in the future.Also this research starts from collaborative research and development with the agricultural ICT company, SAMS and it is being pursued for a new startup venture.Thus the method used for CED can be expanded to apply to farm animals such as cattle and pigs, providing additional services for livestock management.

Figure 1 .
Figure 1.CNN basic structureIn a CNN, the model is structured through the repetition of convolution layers and pooling layers.The iterative methodology can vary depending on the design; for instance, multiple layers of convolution may be arranged, followed by a single pooling layer at the end.Additionally, the inclusion or exclusion of the Local Contrast Normalization (LCN) layer can be determined based on necessity.And the final classification results are computed using the softmax function.The softmax function is primarily utilized for categorizing more than two categories, employing probability values calculated for each category to perform classification.

Figure 2 .
Figure 2. MobileNet SqueezeNet, introduced by Iandola et al. in 2016[16], addresses the imperative of model efficiency through the incorporation of fire modules and the application of aggressive compression methodologies.Fire modules are composed of a squeeze layer employing 1x1 convolutions, serving to diminish the quantity of input channels.This is succeeded by expand layers, which employ a combination of 1x1 and 3x3 convolutions to encapsulate spatial information.The squeeze layer functions as a bottleneck layer, effectively reducing computational complexity, while the expand layers facilitate the restoration of information that might have been lost.Additionally, SqueezeNet pioneers the "squeeze-excitation" paradigm to model interdependencies among channels.This approach leverages a concise set of global statistics to dynamically scale the output of each channel, thereby augmenting the representational capacity of the network.

Figure 3
Figure 3. SqueezeNet (a) MobileNet (b) SqueezeNet Figure 4. Total Inference time IJEEI ISSN: 2089-3272  Personal Assistant Development by CED (Canine Eye-disease Detection) (K.Chun) 1133 A larger learning rate might speed up training but could overshoot the optimal values.An epoch is one complete pass through the entire training dataset.Training a model involves multiple epochs.During each epoch, the model's parameters are updated based on the loss calculated on the training data.More epochs generally allow the model to learn more, but too many epochs can lead to overfitting.The training data fraction indicates what fraction of the entire dataset is used for training.In this case, 40% of the available data is used for training, which means the rest (60%) is likely reserved for validation and testing.Adam (short for Adaptive Moment Estimation) is an optimization algorithm used for training deep learning models.It adapts the learning rates of individual parameters based on their past gradients.It's a popular choice because it generally works well for a wide range of problems. ISSN: 2089-3272 IJEEI, Vol.11, No. 4, December 2023: 1129 -1142 1134

Figure 9 .
Figure 9. Results according to number of samples (small samples) by CED (Canine Eye-disease Detection) (K.Chun) compared the application results of MobileNet and SqueezeNet.The sample size was fixed at 20, and for each case, we constructed a baseline model.The experimental results for recognition according to each model are shown in Figure 10.

Figure 11 .
Figure 11.Results according to number of samples (double samples)

Table 2 .
HyperparametersThe learning rate determines the step size at which the model's parameters (weights) are updated during training.A smaller learning rate means smaller steps and slower convergence, but it can lead to more accurate results.

Table 3 .
Dog eye disease Personal Assistant Development by CED (Canine Eye-disease Detection) (K.Chun) 1137

Table 6 .
Number of samples (double samples)