AUTHOR=Wang Shuang , Liu Jingyu , Lan Xuedan , Hu Qihang , Jiang Jian , Zhang Jingjing TITLE=Cross-modal association analysis and matching model construction of perceptual attributes of multiple colors and combined tones JOURNAL=Frontiers in Psychology VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2022.970219 DOI=10.3389/fpsyg.2022.970219 ISSN=1664-1078 ABSTRACT=Audio-visual correlation is a common phenomenon in real life. In this article, aiming at the correlation between multiple colors and combined tones, we comprehensively used experimental methods and technologies such as experimental psychology methods, audio-visual information processing technology, and machine learning algorithms to study the correlation mechanism between the multi-color perceptual attributes and the interval consonance attribute of musical sounds, so as to construct the audio-visual cross-modal matching model. Specifically, first, this article constructed the multi-color perceptual attribute dataset through the subjective evaluation experiment, namely "cold/warm", "soft/hard", "transparent/turbid", "far/near", "weak/strong", pleasure, arousal, and dominance; and constructed the interval consonance attribute dataset based on calculating the audio objective parameters. Secondly, this article designed and carried out the subjective evaluation experiment of cross-modal matching for audio-visual correlation, thereby obtained the cross-modal matched and mismatched data between multi-color perceptual attributes and interval concordance attributes. On this basis, through visual processing and correlation analysis of the matched and mismatched data, this article proves that there is a certain correlation between multicolor and combined sounds from the perspective of perceptual attributes. Finally, this article used linear and nonlinear machine learning algorithms to construct the audio-visual cross-modal matching model, so as to realize the mutual prediction between multi-color perceptual attributes and interval consonance attributes, and the prediction accuracy is up to 91.7%. The contributions of our research are below: (1) The cross-modal matched and mismatched dataset can provide basic data support for audio-visual cross-modal research. (2) The constructed audio-visual cross-modal matching model can provide a theoretical basis for audio-visual interaction technology. (3) In addition, the research method of audio-visual cross-modal matching proposed in this article can provide new research ideas for related research.