ARCHIVES
Original Article
AI-Enabled Image Description: Bridging the Gap for the Visually Impaired
Vighnesh Pujala1
Dr. G.Ravi2
1 Computer Science & Engineering, Malla Reddy College of Engineering and Technology (MRCET), Hyderabad, Telangana, India. 2 Professor, Computer Science & Engineering, Malla Reddy College of Engineering and Technology (MRCET), Hyderabad, Telangana, India.
Published Online: January-February 2026
Pages: 22-28
Cite this article
↗ https://www.doi.org/10.59256/ijrtmr.20260601003References
1. X. Ye et al., “A joint-training two-stage method for remote sensing image captioning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022, Art. no. 4709616, doi: 10.1109/TGRS.2022.3224244.
2. W. S. Mayzura, R. Sarno, N. S. Suroto, M. I. A. Supriyanto, and G. Sihaj, “Automatic interpretation of brain medical images using hierarchical classification and image captioning model,” IEEE Access, 2025, doi: 10.1109/ACCESS.2025.3560701.
3. T. Wei, W. Yuan, J. Luo, W. Zhang, and L. Lu, “VLCA: Vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning,” Journal of Systems Engineering and Electronics, vol. 34, no. 1, pp. 9–18, Feb. 2023, doi: 10.23919/JSEE.2023.000035.
4. Z. Ren et al., “HATNet: Hierarchical attention transformer with RS-CLIP patch tokens for remote sensing image captioning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 27208–27223, 2025, doi: 10.1109/JSTARS.2025.3624411.
5. A. Ueda, W. Yang, and K. Sugiura, “Switching text-based image encoders for captioning images with text,” IEEE Access, vol. 11, pp. 55706–55715, 2023, doi: 10.1109/ACCESS.2023.3282444.
6. J. Lin, S. Wang, X. Ye, R. Wang, R. Yang, and L. Jiao, “CLIP-based grid features and masking for remote sensing image captioning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 2631–2642, 2025, doi: 10.1109/JSTARS.2024.3510414.
7. D.-J. Kim, T.-H. Oh, J. Choi, and I. S. Kweon, “Semi-supervised image captioning by adversarially propagating labeled data,” IEEE Access, vol. 12, pp. 93580–93592, 2024, doi: 10.1109/ACCESS.2024.3423790.
8. Y. Wang, W. Zhang, Z. Zhang, X. Gao, and X. Sun, “Multiscale multiinteraction network for remote sensing image captioning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 2154–2165, 2022, doi: 10.1109/JSTARS.2022.3153636.
9. R. Ramos and B. Martins, “Using neural encoder–decoder models with continuous outputs for remote sensing image captioning,” IEEE Access, vol. 10, pp. 24852–24863, 2022, doi: 10.1109/ACCESS.2022.3151874.
10. K. Komurcu and L. Petkevicius, “Multispectral image caption unification using diffusion and CycleGAN models,” IEEE Access, vol. 13, pp. 193708–193718, 2025, doi: 10.1109/ACCESS.2025.3632152.
2. W. S. Mayzura, R. Sarno, N. S. Suroto, M. I. A. Supriyanto, and G. Sihaj, “Automatic interpretation of brain medical images using hierarchical classification and image captioning model,” IEEE Access, 2025, doi: 10.1109/ACCESS.2025.3560701.
3. T. Wei, W. Yuan, J. Luo, W. Zhang, and L. Lu, “VLCA: Vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning,” Journal of Systems Engineering and Electronics, vol. 34, no. 1, pp. 9–18, Feb. 2023, doi: 10.23919/JSEE.2023.000035.
4. Z. Ren et al., “HATNet: Hierarchical attention transformer with RS-CLIP patch tokens for remote sensing image captioning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 27208–27223, 2025, doi: 10.1109/JSTARS.2025.3624411.
5. A. Ueda, W. Yang, and K. Sugiura, “Switching text-based image encoders for captioning images with text,” IEEE Access, vol. 11, pp. 55706–55715, 2023, doi: 10.1109/ACCESS.2023.3282444.
6. J. Lin, S. Wang, X. Ye, R. Wang, R. Yang, and L. Jiao, “CLIP-based grid features and masking for remote sensing image captioning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 2631–2642, 2025, doi: 10.1109/JSTARS.2024.3510414.
7. D.-J. Kim, T.-H. Oh, J. Choi, and I. S. Kweon, “Semi-supervised image captioning by adversarially propagating labeled data,” IEEE Access, vol. 12, pp. 93580–93592, 2024, doi: 10.1109/ACCESS.2024.3423790.
8. Y. Wang, W. Zhang, Z. Zhang, X. Gao, and X. Sun, “Multiscale multiinteraction network for remote sensing image captioning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 2154–2165, 2022, doi: 10.1109/JSTARS.2022.3153636.
9. R. Ramos and B. Martins, “Using neural encoder–decoder models with continuous outputs for remote sensing image captioning,” IEEE Access, vol. 10, pp. 24852–24863, 2022, doi: 10.1109/ACCESS.2022.3151874.
10. K. Komurcu and L. Petkevicius, “Multispectral image caption unification using diffusion and CycleGAN models,” IEEE Access, vol. 13, pp. 193708–193718, 2025, doi: 10.1109/ACCESS.2025.3632152.
Related Articles
2026
A Strategic Framework for Depth-Dependent Hydroelectric Conversion along the Indian Coastline
2026
Reimagining Development in India: A Critical Analysis of the Viksit Bharat Vision
2026
Perceived Occupational Risks of Emergency Medical Services Personnel
2026
Origin, Growth and recent Development of Integrated Reporting (IR): A theoretical Review
2026
Smart Hostel Management System
2026