Knowledge-base of Semantic Relationships between Images based on FrameNet from Computational Semiotic Perspective

Document Type : Research Paper

Author

Faculty of Linguistics, Institute for Humanities and Cultural Studies

10.30473/il.2026.75736.1714

Abstract

In the framework of semiotics, sign systems form the basis of human social and cultural activities for representing the knowledge hidden in signs. These signs may be manifested in the language or image system. Achieving abstract concepts in these two systems can help in classifying signs and representing them in the form of a knowledge base.
In the present study, an attempt is made to use, within the framework of computational semiotics, the semantic relations of image signs that have been obtained in the framework of frame semantics obtained from linguistic signs of image captions to classify images. Then, this achievement is used to develop a knowledge base that contains these semantic relations between images. The results of this study indicate that extracting abstract concepts from frame-based semantic representation of image captions can help both the semantic classification of images and determine the semantic relations of images. The classification and relationship of images can be expressed as a triple containing the type of relationship and two image elements. This representation can be used in the construction of a knowledge base, as well as in the conceptual search of images. To conduct this research, the Flickr30k corpus is used.

Keywords

Main Subjects


 
Bagheri, B.; Pourmohiabadi, M.; & Nezamabadipour, H. (1399). Content Based Image Retrieval by the Fusion of Short Term Learning Methods. Iranian Journal of Electrical and Computer Engineering, 4(13): 1-10. [In Persian]
Baker, C.F.; Fillmore, C.J.; & Lowe, J.B. (1998) The Berkeley FrameNet project. In Proceedings of the joint Annual Meeting of the Association for Computational Linguistics and International Conference on Computational Linguistics, Montreal, QC, pp. 86-90.
Barezi, Elham J.; & Kordjamshidi, P. (2024). Find the gap: Knowledge base reasoning for visual question answering. arXiv.  https://arxiv.org/abs/2404.10226
Barthes, R. (1968). Elements of Semiology. Communications , 4: 91-135, Hill and Wang: New York.
Belcavello, F.; Timponi Torrent, T.; Matos, E. E.; Pagano, A. S.; Gamonal, M.; Sigiliano, N.; Dutra, L. V.; de Andrade Abreu, H.; Samagaio, M.; Carvalho, M.; Campos, F.; Azalim, G.; Mazzei, B.; de Oliveira, M. F.; Loçasso Luz, A. C.; Pádua Ruiz, L.; Bellei, J.; Pestana, A.; Costa, J.; Rabelo, I.; Silva, A. B.; Roza, R.; Souza, M.; & Oliveira, I. (2024). Frame2: A FrameNet-based multimodal dataset for tackling text-image interactions in video. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, pp: 7429–7437, Torino, Italia. ELRA and ICCL.
Broadbent, D. E. (1953). The role of auditory localization in attention and memory span. Journal of Experimental Psychology, 47, 191–196.
Chandler, D. (2022). Semiotics: The Basics. 4th ed. NY: Routledge.
de Saussure, F. ([1916]1983). Course in General Linguistics (trans. Roy Harris). London: Duckworth.
Diyanat, F. (1396). A study on the importance of semiotics (semantic significance) in conceptual photographs. Theoretical Foundations of Visual Arts, 4: 71-84. [In Persian]
Emadoddin, A.R. (1396). Study of the Content Matching of Photos and News Content on the Websites of Al-Alam and Press TV News Networks. Master's Thesis. Faculty of Communication, University of Radio and Television of the Islamic Republic of Iran, Tehran. [In Persian]
Emadoddin, A.R. (1399). Semiotics of news photography. Journal of Rasaneh (Journal of Media Studies and Research), 31(1): 73-98. [In Persian]
Fauconnier, G.; & Turner, M. (2002). The Way We Think: Conceptual Blending and the Mind's Hidden Complexities. New York: Basic Books.
Fillmore, C.J. (1968). The case for case. In Emmon W. Bach and Robert T. Harms, editors, Universals in Linguistic Theory. Holt, Rinehart & Winston, New York, pp. 1-88.
Fillmore, C.J. (1971) Some problems for case grammar. In R. J. O’Brien, editor, 22nd Annual Round Table. Linguistics: Developments of the Sixties-Viewpoints of the Seventies. Volume 24 of Monograph Series on Language and Linguistics. Georgetown University Press, Washington, D.C., pp. 35-56.
Fillmore, C.J. (1982). Frame semantics. In Linguistics in the Morning Calm, Seoul, Korea: Hanshin, pp. 111-138.
Fillmore, C.J. (1985) Frames and the semantics of understanding. In Quaderni di Semantica, 6.2:222-254
Fillmore, C.J. (1994) Starting where the dictionaries stop: The challenge of corpus lexicography. In Computational Approaches to the Lexicon, ed. By B.T.S. Atkins and A. Zampolli, Oxford, pp. 349-393.
Gildea, D.; & Jurafsky, D. (2002) ‘Automatic labeling of semantic roles’ In Association for Computational Linguistics, Vol. 28, Num. 3, pp245-288.
Hakim, A.; Pakzad, Z.; & Kowsari, M. (1400). A study of verbal/visual multimodal discourse in contemporary Iranian art. Quarterly Journal of Perspective, 16 (60): 141-155. [In Persian]
Halliday, M. A. K. (1994). An Introduction to Functional Grammar. London: Edward Arnold.
Hatefi, M.; & Shairi, H.R. (1390). The quasi-discursive status of the comparative semio semantics of text and image in the picture book of Ordinary People. Journal of Comparative Art Studies, 1 (2): 41-56. [In Persian]
Hearst, M. (1999) ‘Untangling text data mining’ In Proceedings of the 37th Annual Meeting of the ACL, College Park, Maryland, pp. 3-10.
Jewitt, C. (2009) The Routledge Handbook of Multimodal Analysis. London: RoutledgeFalmer.
Kepes, G. (1403). Images Language. Translated by Firuzeh Mohajer. 18th edition. Tehran: Soroush Publications of the Islamic Republic of Iran Radio and Television. [In Persian]
Kress, G.; Majdizadeh, Z.; & Hajjari, M. (2022). Multimodal discourse analysis. Journal of Society, Culture, and Media, 11(42): 279-310. [In Persian]
Kress, G.; & Van Leeuwen, T. (2021). Reading images: The grammar of visual design. 3rd Eds. London: Routledge.
Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., & Dollár, P. (2014) Microsoft COCO: Common Objects in Context.  arXiv:1405.0312v3. https://arxiv.org/abs/1405.0312
Mehdizadeh, A. (1392). Analysis of the act of photography and reading photos from a semiotic perspective. The Glory of Art, 3 (1): 1-74. [In Persian]
Meunier, J.G. (2022). Computational Semiotics. London: Bloomsbury Publishing Plc.
Padó, S. (2007). Cross-lingual Annotation Projection Models for Role-Semantic Information. PhD dissertation, Saarland University, Saarbücken, Germany.
Peirce, C.S. (1931-58). Collected Papers (8 vols). Eds C. Hartshorne, P. Weiss & A. W. Burks. Camebridge: Harvard  University Press.
Pourghasem, H.; & Ghasemian, H. (1386). Semantic classification of medical images in a hierarchical structure based on a new unsupervised clustering method. In Proceedings of the 13th Annual Conference of the Iranian Computer Association. [In Persian]
Razzaghi, P. (1397). Weakly supervised semantic segmentation using object level and context level information. Journal of Machine Vision and Image Processing, 5(1): 1-13. [In Persian]
Ruppenhofer, J. and M. Ellsworth and M. R.L.Petruck and C. R.Johnson and J. Scheffczyk (2006) FrameNet II: Extended Theory and Practice
http://framenet.icsi.berkeley.edu/
Sadeghi, L. (1392). The blending of words and image in literary text based on conceptual blending theory. Language Related Research. 4 (3): 75-103. [In Persian]
Shairi, H.R. (1397). Visual Semiotics: Theories and Applications of Art