Bengali Word Detection from Lip Movements Using Mask RCNN and Generalized Linear Model

Abul Bashar Bhuiyan, Jia Uddin

Abstract


Speech processing with the help of lip detection and lip reading is anadvancing field. For this, we need proper algorithms and techniques to detectlips and movements of lips perfectly. Lip detection and configuration are themost important parts of speech recognition. In this paper, we focus ondetecting the lip segment properly. Mask R-CNN (Regional ConvolutionalNeural Network) performs object detection and instance segmentation pervideo frame to detect the lip segment. The process of mask R-CNN addsonly a small overhead to Faster R-CNN and is quite simple to train, runningat 5 frames per second. The Mask R-CNN involves keypoint detection whichhelps to extract the location of the lip landmarks pixel by pixel. Once the lipregion is extracted and the landmarks are highlighted, we observe how thelip landmarks change as the object's lips move over time to each Bengaliword. The keypoint changes that are observed during each millisecond arethen the landmarks used to train the GLM (Generalized Linear Model). Inaddition, we compare the performance of GLM with Naive Bayes, LogisticRegression, and Decision Tree. The GLM has exhibited the highest 91.8%accuracy, whereas the Naive Bayes, Logistic Regression, and Decision Treeshow the accuracy of 87.1%, 38.3%, and 82.2%, respectively.

Keywords


Word Detection; Lip Movements; Machine learning; Image Segmentation; Accuracy;

References


REFERENCES

S. W. Chin, K. P. Seng, L.-M. Ang, and K. H. Lim, “New lips detection and tracking system,” in Proceedings of the international multiconference of engineers and computer scientists, vol. 1, 2009, pp. 18-20.

N. Oliver, A. P. Pentland, and F. Berard, "LAFTER: lips and face real time tracker," in Proc. IEEE Computer Society Conf. Comput. Vis. Pattern Recogn., San Juan, PR, USA, 1997, pp. 123-129.

Z.-M. Chan, C. Y. Lau, and K. F. Thang, “Visual Speech Recognition of Lips Images Using Convolutional Neural Network in VGG-M Model,” J. Inf. Hiding Multim. Signal Process., vol. 11, pp. 116-125, 2020.

A. Aripin and A. Setiawan, "Indonesian Lip-Reading Recognition Using Long-Term Recurrent Convolutional Network," SSRN Electronic Journal, 2022. [Online]. Available: https://ssrn.com/abstract=4444973

R. El-Bialy et al., “Developing Phoneme-based Lip-reading Sentences System for Silent Speech Recognition,” CAAI Trans. Intell. Technol., 2022.

Y. Fu, Y. Lu, and R. Ni, “Chinese Lip-Reading Research Based on ShuffleNet and CBAM,” Applied Sciences, vol. 13, no. 2, p. 1106, Jan. 2023.

M. M. Rahman, M. R. Tanjim, S. S. Hasan, S. M. Shaiban, and M. A. Khan, "Lip Reading Bengali Words," in Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence (ACAI '22), Sanya, China, 2023, Art. no. 22, pp. 1-6, doi: 10.1145/3579654.3579677.

G. Zhang and Y. Lu, “Research on a Lip-Reading Algorithm Based on Efficient-GhostNet,” Electronics, vol. 12, no. 5, p. 1151, Feb. 2023.

A. Berkol et al., “Visual Lip-Reading Dataset in Turkish,” Data, vol. 8, no. 1, p. 15, Jan. 2023.

Uddin, J., Arko, F. N., Tabassum, N., Trisha, T. R., & Ahmed, F. (2017, December). Bangla sign language interpretation using bag of features and Support Vector Machine. In 2017 3rd International Conference on Electrical Information and Communication Technology (EICT) (pp. 1-4). IEEE.

P. Bharati and A. Pramanik, "Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey," in Computational Intelligence in Pattern Recognition, A. Das, J. Nayak, B. Naik, S. Pati, and D. Pelusi, Eds. Singapore: Springer, 2020, vol. 999. [Online]. Available: https://doi.org/10.1007/978-981-13-9042-5_56.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.

K. Ishizaki, K. Saruta and H. Uehara, "Detecting Keypoints for Automated Annotation of Bounding Boxes using Keypoint Extraction," 2020 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 2020, pp. 1691-1694, doi: 10.1109/CSCI51800.2020.00312.

"Free Video to JPG Converter," DVDVideoSoft, 2017. [Online]. Available: https://www.dvdvideosoft.com/products/dvd/Free-Video-to-JPG-Converter.htm. [Accessed: 03, July, 2023]

J.-X. Zhang, G. Wan, and J. Pan, "Is lip region-of-interest sufficient for lipreading?," in Proceedings of the 2022 International Conference on Multimodal Interaction, 2022.

"BIRME - Bulk Image Resizing Made Easy 2.0," BIRME. 2018. [Online]. Available: https://www.birme.net/. [Accessed: 03, July, 2023].

A. Dutta and A. Zisserman, "The VIA Annotation Software for Images, Audio and Video," in Proceedings of the 27th ACM International Conference on Multimedia (MM '19), New York, NY, USA: ACM, 2019. [Online]. Available: https://doi.org/10.1145/3343031.3350535

W. Abdulla, "Splash of color: Instance segmentation with mask r-cnn and tensorflow," Matterport Engineering Techblog, Mar. 20, 2018. [Online]. Available: https://engineering.matterport.com/splash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46.

Q. Liu and Y. Wu, "Supervised Learning," in Encyclopedia of the Sciences of Learning, N. M. Seel, Ed. 2012. [Online]. Available: https://doi.org/10.1007/978-1-4419-1428-6_451.

D. W. Hosmer Jr, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression, Vol. 398. John Wiley & Sons, 2013.

J. A. Nelder and R. W. Wedderburn, "Generalized linear models," Journal of the Royal Statistical Society Series A: Statistics in Society, vol. 135, no. 3, pp. 370-384, 1972.

A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, and S. D. Brown, "An introduction to decision tree modeling," Journal of Chemometrics: A Journal of the Chemometrics Society, vol. 18, no. 6, pp. 275-285, 2004.

K. P. Murphy, "Naive bayes classifiers," University of British Columbia, vol. 18, no. 60, 2006.



Refbacks

  • There are currently no refbacks.


 

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
ISSN 2089-3272

Creative Commons Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

web analytics
View IJEEI Stats

https://journalofhealthandcaringsciences.org/pasar2/https://jlt.ac/https://jgaa.info/public/www/idn/https://jgaa.info/public/www/mpo/https://mitrasmart.co.id/akun-pro-thailand/https://algede.org/kamboja/https://lnx.gatm.it/analiticaojs/https://viguera.com/slot-thailand/https://www.cienciaecuador.com.ec/https://ejournal.aibpmjournals.com/scatter-hitam/https://pijarpemikiran.com/https://hr.tarunabakti.or.id/zeus-slot/https://www.vertitech.gr/wp-content/situs/https://ube.edu.ec/depo10k/https://ejournal.aibpmjournals.com/gates-of-olympus/https://viguera.com/depo-10k/https://tangseldaily.com/https://esic.novacanaapaulista.sp.gov.br/uploads/sigmaslot/https://rbiad.com.br/sigmaslot/https://fjot.anfe.fr/https://www.viguera.com/slot-gacor/http://revista.tce.gob.ec/ojs-3.1.2-4/sweet-bonanza/http://citaitb.com/wp-content/document/https://rdsp.msp.gob.do/sgm/https://rdsp.msp.gob.do/https://thepab.org/public/pro/https://www.unjc.cu/sigmaslot/https://ojs.co.id/wp-content/cache/https://ktadigitalpgri.org/assets/dist/img/scatter-hitam/https://pasarantogel2.live/http://www.inmedsur.cfg.sld.cu/pasaran2/http://controlvisible.auditoria.gov.co/public/https://isbrmj.org/starlight-princess/https://fjot.anfe.fr/https://journalofhealthandcaringsciences.org/atm88/https://journalofhealthandcaringsciences.org/idn/https://www.viguera.com/sigmaslot/https://seemedj.mefos.unios.hr/public/http://ojs3.bkstm.org/sigma/https://masonhq.org/http://www.inmedsur.cfg.sld.cu//https://iojpe.org/jepang/https://ojs.ukscip.com/pages/2024/https://www.journalprenatalife.com/public/http://citaitb.com/idn/https://journalofhealthandcaringsciences.org/mpo/https://asianmedjam.com/slot-deposit-pulsa/https://asianmedjam.com/akun-pro-kamboja/https://isbrmj.org/public/https://caet.inspirees.com/slot-luar/https://isnujatim.org/slot-dana/https://journal.shamlands.sy/pages/io/https://www.viguera.com/slot-kamboja/https://kpmsurabaya.id/akun-pro-kamboja/https://iojpe.org/atmos88/https://www.remap.ugto.mx/pages/slot-luar-negeri-winrate-tertinggi/http://www.inmedsur.cfg.sld.cu/docs/https://www.viguera.com/pasarantogel2/https://webscience-journal.net/https://humanika.penapersada.com/public/wp/https://caet.inspirees.com/scatter-hitam/https://ojs.ahe.lodz.pl/pg/https://ojs.co.id/id/pasarantogel2/https://snman.science/https://algede.org/