Chemical Compounds Recommendation Engine Based on Skin Types Using Retrieval-Augmented Generation and OCR

Authors: Suryaprasad L, Vasudev Joshi, Eshwari , Haneen A, Kendagannaswamy M S

DOI Link: https://doi.org/10.22214/ijraset.2026.79381

Abstract

The text describes a secure blockchain-based electronic voting system integrated with biometric authentication to improve the transparency, security, and reliability of elections. Traditional voting systems and existing e-voting platforms face major issues such as centralized vulnerabilities, vote tampering, weak authentication, duplicate voting, slow counting, and lack of transparency, which reduce trust in electoral processes. These problems highlight the need for a more secure and decentralized solution. To address these challenges, the proposed system combines blockchain technology with biometric verification (fingerprint and facial recognition). Each voter is authenticated before voting, and every vote is encrypted and stored as an immutable blockchain transaction, ensuring it cannot be altered or deleted. Smart contracts enforce election rules such as one voter, one vote, while also enabling real-time vote tracking and automatic counting. The system also includes an admin module for managing voters, candidates, elections, and results, along with real-time monitoring and reporting features. The methodology involves key steps: Voter registration with biometric data Biometric and login authentication Secure vote casting and encryption Blockchain recording and validation using smart contracts Automatic vote counting and result display Testing results show that the system significantly improves authentication accuracy, security, and transparency, while eliminating duplicate voting and reducing result processing time. The decentralized blockchain structure ensures tamper-proof records and auditability, and biometric authentication prevents impersonation.

Introduction

The text presents a privacy-preserving, AI-powered skincare analysis system that uses Retrieval-Augmented Generation (RAG), OCR, and semantic search to translate complex cosmetic ingredient lists into safe, explainable, skin-type-specific recommendations.

The problem arises from the cosmetics industry’s complexity and the lack of transparency in INCI ingredient labeling, which most users and even clinicians struggle to interpret. Many ingredients can cause cumulative skin damage, but existing tools fail to provide personalized, evidence-based safety analysis. Additionally, large language models (LLMs) risk generating hallucinated or unsafe advice when used without grounding.

To solve this, the proposed system combines:

OCR (Tesseract) to extract ingredients from product images
SBERT-based embeddings + ChromaDB for semantic retrieval
A two-tier hierarchical RAG pipeline to retrieve both summaries and detailed dermatological transcripts
A grounding guard to ensure every recommendation is evidence-based and suppress hallucinations
Role-Based Access Control (RBAC) to differentiate consumer vs dermatologist outputs
A microservices architecture using React, Spring Boot, Flask, MongoDB, and ChromaDB

The system processes a user query or product image, extracts ingredients, matches them to a dermatological knowledge base, retrieves relevant scientific transcripts, and generates traceable, explainable safety recommendations with high grounding accuracy (97.3%) and low hallucination rates (2.7%).

Key contributions include:

End-to-end pipeline from image → OCR → RAG-based chemical analysis
Two-level semantic retrieval for improved accuracy
Role-based personalized outputs
Scalable microservices architecture for real-time deployment

Experimental results show strong performance:

High retrieval F1 score (0.889)
High OCR accuracy (up to 96.8%)
Fast response time (~3.2 seconds)
Strong hallucination control compared to baseline LLM systems

The system outperforms traditional keyword-based and LLM-only approaches by focusing on ingredient-level analysis, evidence-backed reasoning, and strict grounding in dermatological data.

Limitations include incomplete knowledge coverage, OCR sensitivity to poor images, language constraints, and lack of clinical trials. Ethically, the system is designed as a decision-support tool, not a replacement for dermatologists, with safeguards to avoid unsafe recommendations.

Conclusion

This paper presented a Chemical Compounds Recommendation Engine that integrates OCR-based ingredient extraction, hierarchical Retrieval-Augmented Generation, Sentence-BERT semantic embeddings, and Role-Based Access Control within a scalable microservices architecture. Evaluation on 850 queries across four skin types demonstrates a semantic retrieval F1 of 0.889, a grounding accuracy of 97.3%, a hallucination rate of 2.7%, and a mean latency of 3.2 s, consistently outperforming all baselines. Directions for future work include: (1) Regulatory database integration with the EU Cosmetics Regulation and FDA VCRP databases; (2) Mobile OCR—on-device edge-deployed scanning on smartphones; (3) Multilingual expansion—non-English dermatological literature; (4) Clinical validation in partnership with dermatology departments; (5) Graph-neural-network interaction modelling for synergistic and antagonistic compound pairs; (6) Federated learning [19] for cross-institutional model improvement without centralising sensitive user data.

References

[1] A. Panico et al., \"Skin safety and health prevention: An overview of chemicals in cosmetic products,\" J. Preventive Medicine and Hygiene, vol. 60, no. 1, pp. E50–E57, Mar. 2019. DOI: 10.15167/2421-4248/jpmh2019.60.1.1080. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC6477564/ [2] A. Esteva et al., \"Dermatologist-level classification of skin cancer with deep neural networks,\" Nature, vol. 542, no. 7639, pp. 115–118, Feb. 2017. DOI: 10.1038/nature21056. [Online]. Available: https://www.nature.com/articles/nature21056 [3] A. Vaswani et al., \"Attention is all you need,\" in Proc. Advances in Neural Information Processing Systems (NeurIPS), vol. 30, pp. 5998–6008, 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html [4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of deep bidirectional transformers for language understanding,\" in Proc. NAACL-HLT, Minneapolis, MN, USA, pp. 4171–4186, Jun. 2019. [Online]. Available: https://aclanthology.org/N19-1423/ [5] P. Lewis et al., \"Retrieval-augmented generation for knowledge-intensive NLP tasks,\" in Proc. NeurIPS, vol. 33, pp. 9459–9474, 2020. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html [6] K. C. M. de Alwis and T. G. I. Udayangi, \"OCR-based cosmetic product safety analysis system,\" in Proc. 22nd Int. Conf. on Advances in ICT for Emerging Regions (ICTer), IEEE, pp. 1–6, 2022. [Online]. Available: https://ieeexplore.ieee.org/xpl/conhome/1803739/all-proceedings [7] R. Smith, \"An overview of the Tesseract OCR engine,\" in Proc. 9th Int. Conf. on Document Analysis and Recognition (ICDAR), IEEE, vol. 2, pp. 629–633, 2007. DOI: 10.1109/ICDAR.2007.4376991. [Online]. Available: https://ieeexplore.ieee.org/document/4376991/ [8] N. Reimers and I. Gurevych, \"Sentence-BERT: Sentence embeddings using Siamese BERT-networks,\" in Proc. EMNLP-IJCNLP, Hong Kong, China, pp. 3982–3992, Nov. 2019. [Online]. Available: https://aclanthology.org/D19-1410/ [9] R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman, \"Role-based access control models,\" IEEE Computer, vol. 29, no. 2, pp. 38–47, Feb. 1996. DOI: 10.1109/2.485845. [Online]. Available: https://ieeexplore.ieee.org/document/485845/ [10] N. Dragoni et al., \"Microservices: Yesterday, today, and tomorrow,\" in Present and Ulterior Software Engineering, M. Mazzara and B. Meyer, Eds. Cham: Springer, pp. 195–216, 2017. DOI: 10.1007/978-3-319-67425-4_12. [Online]. Available: https://doi.org/10.1007/978-3-319-67425-4_12 [11] C. L. Burnett et al., \"Final report of the safety assessment of Kojic acid as used in cosmetics,\" Int. J. of Toxicology, vol. 29, no. 6, pp. 244S–273S, 2010. DOI: 10.1177/1091581810385956. [Online]. Available: https://doi.org/10.1177/1091581810385956 [12] V. Karpukhin et al., \"Dense passage retrieval for open-domain question answering,\" in Proc. EMNLP, pp. 6769–6781, Nov. 2020. [Online]. Available: https://aclanthology.org/2020.emnlp-main.550/ [13] J. Johnson, M. Douze, and H. Jégou, \"Billion-scale similarity search with GPUs,\" IEEE Trans. Big Data, vol. 7, no. 3, pp. 535–547, Jul. 2021. DOI: 10.1109/TBDATA.2019.2921572. [Online]. Available: https://ieeexplore.ieee.org/document/8733051/ [14] Y. LeCun, Y. Bengio, and G. Hinton, \"Deep learning,\" Nature, vol. 521, no. 7553, pp. 436–444, May 2015. DOI: 10.1038/nature14539. [Online]. Available: https://doi.org/10.1038/nature14539 [15] J. Lee, S. Kim, and H. Park, \"Deep learning-based skin care product recommendation system,\" IEEE Access, vol. 12, pp. 34521–34533, 2024. [Online]. Available: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639 [16] M. S. Hossain, M. A. Rahman, and S. Islam, \"A content-based cosmetic product recommendation engine using deep feature extraction,\" in Proc. IEEE Region 10 Conf. (TENCON), IEEE, pp. 1–6, 2023. [Online]. Available: https://ieeexplore.ieee.org/xpl/conhome/1000667/all-proceedings [17] H. Chase, \"LangChain: Building applications with LLMs through composability,\" GitHub repository, 2022. [Online]. Available: https://github.com/langchain-ai/langchain [18] T. Wolf et al., \"Transformers: State-of-the-art natural language processing,\" in Proc. EMNLP System Demonstrations, pp. 38–45, Nov. 2020. [Online]. Available: https://aclanthology.org/2020.emnlp-demos.6/ [19] B. McMahan et al., \"Communication-efficient learning of deep networks from decentralized data,\" in Proc. 20th Int. Conf. on Artificial Intelligence and Statistics (AISTATS), vol. 54, pp. 1273–1282, Apr. 2017. [Online]. Available: https://proceedings.mlr.press/v54/mcmahan17a.html

Copyright

Copyright © 2026 Suryaprasad L, Vasudev Joshi, Eshwari , Haneen A, Kendagannaswamy M S. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET79381

Publish Date : 2026-04-03

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here