<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title>Frontiers in Neuroscience</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurosci.</abbrev-journal-title>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnins.2022.1022041</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Dynamically attentive viewport sequence for no-reference quality assessment of omnidirectional images</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Yuhong</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1964665/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Li</surname> <given-names>Hong</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1964662/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Jiang</surname> <given-names>Qiuping</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1462198/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Information Science and Engineering, Ningbo University</institution>, <addr-line>Ningbo</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>College of Science and Technology, Ningbo University</institution>, <addr-line>Ningbo</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Qingbo Wu, University of Electronic Science and Technology of China, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Leida Li, Xidian University, China; Guanghui Yue, Shenzhen University, China</p></fn>
<corresp id="c001">&#x002A;Correspondence: Hong Li, <email>ky_lihong@nbu.edu.cn</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Visual Neuroscience, a section of the journal Frontiers in Neuroscience</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>11</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>16</volume>
<elocation-id>1022041</elocation-id>
<history>
<date date-type="received">
<day>18</day>
<month>08</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>10</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2022 Wang, Li and Jiang.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Wang, Li and Jiang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Omnidirectional images (ODIs) have drawn great attention in virtual reality (VR) due to the capability of providing an immersive experience to users. However, ODIs are usually subject to various quality degradations during different processing stages. Thus, the quality assessment of ODIs is of critical importance to the community of VR. The quality assessment of ODIs is quite different from that of traditional 2D images. Existing IQA methods focus on extracting features from spherical scenes while ignoring the characteristics of actual viewing behavior of humans in continuously browsing an ODI through HMD and failing to characterize the temporal dynamics of the browsing process in terms of the temporal order of viewports. In this article, we resort to the law of gravity to detect the dynamically attentive regions of humans when viewing ODIs. In this article, we propose a novel no-reference (NR) ODI quality evaluation method by making efforts on two aspects including the construction of Dynamically Attentive Viewport Sequence (DAVS) from ODIs and the extraction of Quality-Aware Features (QAFs) from DAVS. The construction of DAVS aims to build a sequence of viewports that are likely to be explored by viewers based on the prediction of visual scanpath when viewers are freely exploring the ODI within the exploration time <italic>via</italic> HMD. A DAVS that contains only global motion can then be obtained by sampling a series of viewports from the ODI along the predicted visual scanpath. The subsequent quality evaluation of ODIs is performed merely based on the DAVS. The extraction of QAFs aims to obtain effective feature representations that are highly discriminative in terms of perceived distortion and visual quality. Finally, we can adopt a regression model to map the extracted QAFs to a single predicted quality score. Experimental results on two datasets demonstrate that the proposed method is able to deliver state-of-the-art performance.</p>
</abstract>
<kwd-group>
<kwd>omnidirectional images</kwd>
<kwd>image quality assessment</kwd>
<kwd>no-reference</kwd>
<kwd>spatiotemporal scene statistics</kwd>
<kwd>virtual reality</kwd>
</kwd-group>
<contract-sponsor id="cn001">Natural Science Foundation of Zhejiang Province<named-content content-type="fundref-id">10.13039/501100004731</named-content></contract-sponsor>
<contract-sponsor id="cn002">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content></contract-sponsor>
<counts>
<fig-count count="5"/>
<table-count count="5"/>
<equation-count count="18"/>
<ref-count count="61"/>
<page-count count="14"/>
<word-count count="9901"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>The omnidirectional image (ODI), which records and delivers 360-degree surround information, plays an important role in virtual reality (VR) photography. The brand-new viewing experience enabled by ODIs is substantially different from traditional 2D plane images, as humans are allowed to freely change their viewport to explore the immersive virtual environments through head-mounted display (HMD) (<xref ref-type="bibr" rid="B16">Li et al., 2018</xref>; <xref ref-type="bibr" rid="B53">Zhang et al., 2018</xref>; <xref ref-type="bibr" rid="B37">Tran et al., 2019</xref>; <xref ref-type="bibr" rid="B6">Deng et al., 2021</xref>). Due to its capability of providing natural immersions of real-world scenarios, ODIs have attracted lots of attentions from both academia and industry. In the meanwhile, ODIs have been put into widespread use in many practical VR applications (<xref ref-type="bibr" rid="B3">Chen et al., 2019</xref>; <xref ref-type="bibr" rid="B12">ISO, 2019</xref>; <xref ref-type="bibr" rid="B1">Alain et al., 2020</xref>).</p>
<p>The visual quality of ODIs is a researching-worth topic as ODI content with poor visual quality may cause both physical and mental discomforts. Compared with the traditional 2D plane image, the visual quality issues of ODIs are much more acute and challenging. On the one hand, the currently mainstream methods for acquiring a typical ODI are to stitch multiple images which are captured by using a wide-angle camera array with partially overlapped field of view. These images from multiple cameras are stitched to produce an omnidirectional panorama in the spherical format. However, the stitching process will inevitably introduce stitching artifacts at the stitching boundaries, which are scarcely occurred in traditional 2D plane images (<xref ref-type="bibr" rid="B10">Ho and Budagavi, 2017</xref>). On the other hand, compared with traditional 2D plane images, the storage space requirement and the spatial resolution of ODIs are much higher, e.g., 4K, 8K, or higher. Therefore, ODIs are often heavily compressed to facilitate transmission and storage (<xref ref-type="bibr" rid="B14">Kim et al., 2020</xref>). As a result, ODIs with serious compression artifacts inevitably lead to even worse quality-of-experience. In addition, ODIs are usually with the spherical format to be displayed on HMD. Therefore, the human viewing behavior when freely exploring the ODIs with HMD is dramatically different with that of 2D plane images. This dramatically different viewing behavior will also affect the human quality perception of ODIs accordingly. On considering the above quality issues, it is necessary to develop effective ODI quality metrics by jointly considering the effects of different distortions as well as the human viewing behaviors in viewing ODIs with HMD.</p>
<p>Despite its high importance, the problem of ODI quality evaluation has been not well addressed so far. One of the most important challenges is that human viewing behavior in browsing ODIs through HMD is dramatically different from that in viewing 2D plane images by human eyes directly. Typically, when viewers are browsing a spherical ODI through HMD, they can obtain immersive and interactive viewing experiences by freely changing the viewpoint, and only the visual contents within the current viewport can be viewed at a certain time (<xref ref-type="bibr" rid="B4">Chen et al., 2020</xref>; <xref ref-type="bibr" rid="B44">Xu M. et al., 2020</xref>). Besides, viewers usually tend to focus on the regions near the equator during the exploration of ODIs with HMD. Therefore, the prediction of viewports plays a critical role in designing accurate ODI quality evaluation metrics as it can extract the most important visual contents from ODIs to facilitate the quality evaluation process. To account for this, some previous efforts have been made to extract viewports from ODIs for ODI quality evaluation. For example, the VGCN-based ODI quality evaluation method proposed in <xref ref-type="bibr" rid="B43">Xu J. et al. (2020)</xref> was dedicated to extracting viewports with higher probabilities of being explored by viewers according to the human visual sensitivity to structural information and adopted the graph convolution network (GCN) to predict ODI quality score by implicitly modeling the interactions among different viewports. In <xref ref-type="bibr" rid="B35">Sun et al. (2019)</xref>, the authors proposed to project the equirectangular image into six equally sized viewport images and then generate a corresponding channel for further study. These previous works have achieved promising performance and demonstrated the validity and importance of viewport generation toward creating reliable ODI quality evaluation metrics. Despite the effectiveness, the viewport generation strategies of these methods have a common limitation, i.e., they both ignore the actual viewing behavior of humans in continuously browsing an ODI through HMD and fail to characterize the temporal dynamics of the browsing process in terms of the temporal order of viewports. Therefore, research efforts dedicated to more accurate viewport sequence generation by accounting for the temporal order of viewports will undoubtedly facilitate the quality evaluation of ODIs.</p>
<p>In this article, we propose a novel no-reference (NR) ODI quality evaluation method by making efforts on two aspects including the construction of Dynamically Attentive Viewport Sequence (DAVS) from ODIs and the extraction of Quality-Aware Features (QAFs) from DAVS. The proposed method is named Spatiotemporal Scene Statistics of Dynamically Attentive Viewport Sequence (S<sup>3</sup>DAVS) in short. The construction of DAVS aims to build a sequence of viewports that are likely to be explored by viewers based on the prediction of visual scanpath when viewers are freely exploring the ODI within the exploration time <italic>via</italic> HMD. A DAVS that contains only global motion can then be obtained by sampling a series of viewports from the ODI along the predicted visual scanpath. As a result, the obtained DAVS can be considered a human viewing behavior-characterized compact representation of the whole ODI. The subsequent quality evaluation of ODIs is performed merely based on the DAVS. The extraction of QAFs aims to obtain effective feature representations that are highly discriminative in terms of perceived distortion and visual quality. Finally, we can adopt a regression model to map the extracted QAFs to a single predicted quality score.</p>
<p>The contributions of this work are 2-fold. First, we make the first attempt to predict the visual scanpath based on which a DAVS is accordingly obtained as one kind of human viewing behavior-characterized compact representation of the whole ODI. Second, we model the spatiotemporal scene statistics by analyzing the 3D-MSCN and spatiotemporal Gabor response maps of the DAVS to serve as the QAFs for the quality evaluation of ODIs. Finally, we conduct extensive experiments on two benchmark databases to validate the effectiveness of our proposed S<sup>3</sup>DAVS method.</p>
<p>The rest of this article is arranged as follows. Section &#x201C;Related works&#x201D; provides a brief review of some representative ODI quality metrics. Section &#x201C;Proposed spatiotemporal scene statistics of dynamically attentive viewport sequence approach&#x201D; illustrates the details of our proposed ODI quality evaluation metric. Section &#x201C;Experimental results&#x201D; presents the experiments and performance comparisons. Section &#x201C;Conclusion&#x201D; concludes and discusses future works.</p>
</sec>
<sec id="S2">
<title>Related works</title>
<p>Compared with traditional 2D plane images, ODIs are always extremely high-resolution and a large amount of data which hereby directly increases the burden of transmission and storage. Therefore, the visual quality issues of ODIs are much more acute and challenging, calling for dedicated ODI quality evaluation metrics. During the past decades, image quality evaluation has been widely investigated and can be roughly categorized into three categories including full-reference (FR), reduced-reference (RR), and no-reference (NR) according to the participation amount of information from reference image (i.e., without distortion) (<xref ref-type="bibr" rid="B26">Mittal et al., 2013</xref>; <xref ref-type="bibr" rid="B59">Zhu et al., 2019</xref>; <xref ref-type="bibr" rid="B22">Liu et al., 2021</xref>). Compared with the FR/RR methods which require full/partial reference information, the NR methods do not require any information from the reference image, which can have much wider applications in practical systems. In this section, we will provide a brief review of some representative FR and NR ODI quality metrics in the literature.</p>
<sec id="S2.SS1">
<title>FR-OIQA</title>
<p>There have been many FR image quality assessment (IQA) metrics designed for 2D plane images (<xref ref-type="bibr" rid="B40">Wang et al., 2003</xref>; <xref ref-type="bibr" rid="B55">Zhou et al., 2004</xref>; <xref ref-type="bibr" rid="B30">Sheikh et al., 2006</xref>; <xref ref-type="bibr" rid="B52">Zhang et al., 2011</xref>; <xref ref-type="bibr" rid="B45">Xue et al., 2014</xref>). However, the existing 2D FR-IQA metrics cannot be directly applied to evaluate ODIs because the geometric distortions induced by projection will be wrongly treated and evaluated. Therefore, the previous research efforts mostly concentrate on studying to improve existing 2D FR-IQA metrics by removing the influence of the projection-related geometric distortions. <xref ref-type="bibr" rid="B48">Yu et al. (2015)</xref> proposed a spherical PSNR (S-PSNR) metric to improve the performance of traditional PSNR by sampling pixel locations from a sphere uniformly and getting pixel contents from the reference image and the distorted image according to the relation of spherical coordinates to 2D coordinates, then calculated the error between the pixels on the reference image and the pixels on the distorted image. <xref ref-type="bibr" rid="B49">Zakharchenko et al. (2017)</xref> introduce a Craster&#x2019;s parabolic projection (CPP-PSNR) method by projecting the reference image and distorted image into a shared CPP format domain before calculating the PSNR. However, the drawback of CPP-PSNR methods is that interpolation may introduce errors and decrease the accuracy. <xref ref-type="bibr" rid="B36">Sun et al. (2017)</xref> advocated calculating weighted-to-spherically uniform PSNR (WS-PSNR) on the 2D format of the omnidirectional image directly, which may decrease the negative effect of interpolation in CPP-PSNR. However, PSNR is not highly correlated with human perception (<xref ref-type="bibr" rid="B11">Hor&#x00E9; and Ziou, 2010</xref>). Inspired by the good consistency of structure similarity (SSIM) metrics with subjective perception in 2D-IQA, <xref ref-type="bibr" rid="B58">Zhou et al. (2018)</xref> introduce a weighted-to-spherically uniform SSIM metric for FR-OIQA. Besides, some studies suggest figuring out more information from other dimensions to improve performance. For example, the phase consistency-guided method (<xref ref-type="bibr" rid="B41">Xia et al., 2021</xref>) proposes to utilize the abundant structure and texture features of high-order phase information from ODIs. Recently, with the development of deep learning, <xref ref-type="bibr" rid="B14">Kim et al. (2020)</xref> constructed a novel FR-OIQA model by learning the visual and positional features with the guidance of human perception through adversarial learning.</p>
</sec>
<sec id="S2.SS2">
<title>NR-OIQA</title>
<p>The limitation of FR-OIQA metrics is the dependence on reference images which may be unavailable in practical applications. Therefore, NR-IQA metrics that do not require any reference information are desired (<xref ref-type="bibr" rid="B9">Gu et al., 2015</xref>). In the literature, several NR-IQA metrics for 2D images have been proposed (<xref ref-type="bibr" rid="B27">Moorthy and Bovik, 2011</xref>; <xref ref-type="bibr" rid="B23">Min et al., 2018</xref>; <xref ref-type="bibr" rid="B46">Yan et al., 2019</xref>). However, as demonstrated in <xref ref-type="bibr" rid="B43">Xu J. et al. (2020)</xref>, NR-OIQA involves new challenges, such as stitching artifacts, sphere representation, and wide field of view (FoV). Therefore, a reliable no-reference quality metric for ODIs is extremely needed. There have been some algorithms proposed for NR-IQA of ODIs. <xref ref-type="bibr" rid="B56">Zhou W. et al. (2022)</xref> captured the multi-frequency information by decomposing the projected ERP maps into multiple images on sub-bands. However, the conventional method of studying the overall ODI is not necessary, because the salient region in ODIs mainly locate on the equator (<xref ref-type="bibr" rid="B32">Sitzmann et al., 2018</xref>). Therefore, the thought of exploring the information from a viewport-based image is proposed and obtained broad acceptance. <xref ref-type="bibr" rid="B35">Sun et al. (2019)</xref> projected each ODI into six viewport images and then proposed a multi-channel CNN framework including six parallel ResNet34. However, the way to obtain viewport-based images is directed by the behavior of humans when they view omnidirectional images through HMD (<xref ref-type="bibr" rid="B17">Li et al., 2019</xref>; <xref ref-type="bibr" rid="B42">Xu et al., 2019</xref>; <xref ref-type="bibr" rid="B13">Jiang et al., 2021</xref>; <xref ref-type="bibr" rid="B61">Zou et al., 2021</xref>). <xref ref-type="bibr" rid="B43">Xu J. et al. (2020)</xref> noticed this problem and constructed a viewport-oriented graph convolutional network (VGCN) to address the perceptual quality assessment of ODIs.</p>
</sec>
</sec>
<sec id="S3">
<title>Proposed spatiotemporal scene statistics of dynamically attentive viewport sequence approach</title>
<p>The framework of our proposed method is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. Our proposed method involves three modules, including DAVS generator, QAF extractor, and image quality regressor. DAVS which is composed of a series of viewports sampled from the ODI along the scanpath can be considered a human viewing behavior-characterized compact representation of the whole ODI. Specifically, part (a) is an unprocessed ERP image from the OIQA database. The spherical format ODI converted from ERP format ODI is presented in part (b). The predicted results of scanpath are shown in part (c). We get the center location of each viewport on spherical format ODI from precomputed locations in the ERP format ODI provided in part (c) by coordinate mapping. Then, we offer the value of FOV to set the size of the viewport, and finally obtain the complete viewport content from part (b). All those viewport contents are shown in part (f). In the QAF extractor, the image sequence is generated by arranging the viewport images along the order in the scanpath. The QAF extraction is performed on the DAVS and involves obtaining effective feature representations that are highly discriminative in terms of visual quality. Finally, the extracted QAFs are used to predict the quality score <italic>via</italic> a learned regression model.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>The overall framework of our proposed S<sup>3</sup>DAVS method. It contains DAVS generator, QAF extractor, and image quality regressor. The viewport sequence produced by the DAVS generator will be sent into the QAF extractor and then the image quality regressor will predict the final quality score based on the features coming from the QAF extractor. Source for the photos in this figure: open source OIQA Database.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-1022041-g001.tif"/>
</fig>
<sec id="S3.SS1">
<title>Dynamically attentive viewport sequence construction</title>
<p><italic>1) Scanpath prediction:</italic> <xref ref-type="bibr" rid="B32">Sitzmann et al. (2018)</xref> proposed that humans mainly pay attention to salient regions located on the equator in ODIs. Therefore, our original intention of building this omnidirectional image quality assessment algorithm is to simulate human behavior when they observe ODI through HMD (<xref ref-type="bibr" rid="B60">Zhu et al., 2020</xref>). Besides, different trajectories of visual scanpath will produce different contents of DAVS and accordingly affect the perceptual quality assessment. Based on these two considerations, the key to DAVS construction is accurately predicting the scanpath of viewers when they are exploring the virtual scene <italic>via</italic> HMD. The fixation is decided by visual attention shift, i.e., a phenomenon of the temporal dynamics of human visual attention (<xref ref-type="bibr" rid="B47">Yang et al., 2021</xref>). Current approaches in modeling scanpath prediction are based on the saliency map that illustrates the probability of each human visual attention (<xref ref-type="bibr" rid="B19">Ling et al., 2018</xref>). However, these approaches fail to characterize the temporal dynamics of visual attention. Recently, it has been revealed that the law of gravitation can well explain the mechanisms behind the dynamic process of attention shift in exploring a visual scene. Inspired by this, we intuitively propose a gravitational model-based visual scanpath prediction approach for ODIs. Given an ODI image as input, this model can yield a continuous function that describes the trajectory of the fixations over the exploration time as output.</p>
<p>We think the process that humans pay attention to the interesting area can be regarded as a field effect in HMD. Therefore, it is necessary to define the virtual mass corresponding with ODI content and the virtual field in ODIs for building a relationship between ODI content with fixation. Here, we employ the model proposed by <xref ref-type="bibr" rid="B50">Zanca and Gori (2017)</xref> and <xref ref-type="bibr" rid="B51">Zanca et al. (2020)</xref> for the utilization of gravitation theory. The ODI content noticed by humans should contain abundant details and motion information. Details reflect masses proportional to the magnitude of the gradient, while motions illustrate proportional to the optical flow which mainly occurs when humans shift to the next fixation. If we set virtual mass &#x03BC; that obeys a certain distribution as the assembling of all the fixation during the exploration, and think it degenerates to a single distributional mass concentrated in <italic>x</italic>, then the virtual mass can be expressed as &#x03BC;(<italic>y</italic>,<italic>t</italic>) = &#x03B4;(<italic>y</italic>&#x2212;<italic>x</italic>), <italic>y</italic> = <italic>a</italic>(<italic>t</italic>),where <italic>a</italic>(<italic>t</italic>) represents the focus of attention at time <italic>t</italic>. The &#x03BC;(<italic>y</italic>,<italic>t</italic>) consists of two elements: gradient of brightness &#x03BC;<sub><italic>1</italic></sub> and optical flow &#x03BC;<sub><italic>2</italic></sub>. The field we set makes effects on the virtual mass we have designed, and then the Green function (<italic>G</italic>) will construct the correlation between the field and its corresponding mass. Therefore, our scanpath can be associated with attention <italic>a</italic>(<italic>t</italic>) by the potential,</p>
<disp-formula id="S3.E1">
<label>(1)</label>
<mml:math id="M1">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi mathvariant="normal">&#x03C0;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mi>log</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo fence="true">||</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo fence="true">||</mml:mo>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>Then the gravitational field <italic>e</italic> at time <italic>t</italic> can be written as:</p>
<disp-formula id="S3.E2">
<label>(2)</label>
<mml:math id="M2">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi mathvariant="normal">&#x03C0;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>&#x2062;</mml:mo>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo fence="true" maxsize="142%" minsize="142%">||</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo fence="true" maxsize="142%" minsize="142%">||</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>where Equation (2) illustrates that the strength of field is inversely proportional to the distance between the <italic>a</italic>(<italic>t</italic>) and <italic>x</italic>. Then the overall field can be demonstrated by,</p>
<disp-formula id="S3.E3">
<label>(3)</label>
<mml:math id="M3">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi mathvariant="normal">&#x03C0;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:msub>
<mml:mo mathsize="90%" stretchy="false">&#x222B;</mml:mo>
<mml:mi>R</mml:mi>
</mml:msub>
</mml:mstyle>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo fence="true" maxsize="142%" minsize="142%">||</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo fence="true" maxsize="142%" minsize="142%">||</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mfrac>
</mml:mstyle>
<mml:mo>&#x2062;</mml:mo>
<mml:mi mathvariant="normal">&#x03BC;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<disp-formula id="S3.E4">
<label>(4)</label>
<mml:math id="M4">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mi>E</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mpadded width="+1.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo rspace="3.8pt">&#x002A;</mml:mo>
<mml:mi mathvariant="normal">&#x03BC;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>where Equation (3) can be rewritten as Equation (4) using convolution operation. It depicts the links between the virtual mass in the field and the attention of viewers. Subscript <italic>R</italic> represents the overall field. However, the method first explores the focus of attention in the most attentive regions which may hinder the method from searching unexplored locations and finishing a complete exploration of the scene. Therefore, the shift of attention is needed to be triggered by an extra setting. Zanca proposed to introduce the inhibitory function (<italic>I</italic>(<italic>t</italic>)) which will return value 0 for the pixels that have not been explored and value 1 for the pixels that have already been explored. Then all the focus of attention <italic>a</italic>(<italic>t</italic>)can finally be represented as follows:</p>
<disp-formula id="S3.E5">
<label>(5)</label>
<mml:math id="M5">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>I</mml:mi>
<mml:mo>.</mml:mo>
</mml:mover>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mi>I</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<disp-formula id="S3.E6">
<label>(6)</label>
<mml:math id="M6">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>a</mml:mi>
<mml:mo>.</mml:mo>
</mml:mover>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mi>z</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<disp-formula id="S3.E7">
<label>(7)</label>
<mml:math id="M7">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>z</mml:mi>
<mml:mo>.</mml:mo>
</mml:mover>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x03BB;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>z</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mpadded width="+1.3pt">
<mml:mi>e</mml:mi>
</mml:mpadded>
<mml:mo rspace="1.8pt">&#x002A;</mml:mo>
<mml:mi mathvariant="normal">&#x03BC;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>Here, <inline-formula><mml:math id="INEQ11"><mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>u</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:msup><mml:mi>u</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x2062;</mml:mo><mml:msup><mml:mi mathvariant="normal">&#x03B4;</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and 0 &#x003C; &#x03B2; &#x003C; 1. The dumping term <inline-formula><mml:math id="INEQ12"><mml:mrow><mml:mover accent="true"><mml:mi>a</mml:mi><mml:mo>.</mml:mo></mml:mover><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> prevents strong oscillations and makes the overall dynamics closer to human scanpath. <xref ref-type="fig" rid="F2">Figure 2</xref> shows the scanpath prediction of four images in the OIQA database. We could obtain specific coordinates of these viewport centers to be utilized in the following part.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Examples of scanpath prediction results. The red squares in each image denote fixation locations based on which the viewport images are sampled. Source for the photos in this figure: open source OIQA Database.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-1022041-g002.tif"/>
</fig>
<p><italic>2) Scanpath-based Viewport Sampling:</italic> Based on the predicted scanpath data, we can obtain a series of viewport images from the ODI by sampling along the trajectory of the scanpath. The precomputed locations denoted by {<italic>s</italic><sub>1</sub>,<italic>s</italic><sub>2</sub>,&#x2026;,<italic>s</italic><sub><italic>t</italic></sub>} are human fixations that have been listed along the timeline, but they only represent the center of attentive regions on the ERP ODIs. Each <italic>s</italic><sub><italic>t</italic></sub> includes <italic>s</italic><sub><italic>t,x</italic></sub> and <italic>s</italic><sub><italic>t,y</italic></sub>, where, (<italic>s</italic><sub><italic>t,x</italic></sub>, <italic>s</italic><sub><italic>t,y</italic></sub>) correspond with width and height values on the ERP image. To obtain the realistic viewport content from the spherical scene, we first map those locations onto a unit sphere and then define those coordinates as the predicted scanpath which is written as {<italic>p</italic><sub>1</sub>,<italic>p</italic><sub>2</sub>,&#x2026;, <italic>p</italic><sub><italic>t</italic></sub>}, where the subscript of <italic>s</italic><sub><italic>t</italic></sub><italic>andp</italic><sub><italic>t</italic></sub> records the overall exploration time and keep the consistency between locations from the ERP image and locations from the spherical image. Specifically, given the current attentive location in the ERP image is <italic>s</italic><sub><italic>t</italic></sub>, we calculate its spherical location <italic>p</italic><sub><italic>t</italic></sub> and consider it to be the actual fixation. Then, based on the theory about near peripheral vision (<xref ref-type="bibr" rid="B2">Besharse and Bok, 2011</xref>), we first set the field of view (FoV) to [-&#x03C0;/6, &#x03C0;/6], after which the scale of viewport content can be determined (<xref ref-type="bibr" rid="B33">Sui et al., 2022</xref>). Finally, bicubic interpolation will be used to sample the viewport image and construct the sequence according to the order of the time. <xref ref-type="fig" rid="F3">Figure 3</xref> shows how viewers perceive spherical content with HMD at time <italic>t</italic>. In the spherical coordinate, the behavior of viewer position mainly decided by rotation matrix (<xref ref-type="bibr" rid="B29">Rai et al., 2017</xref>) and is mainly calculated by the Cartesian theory for the location transformation. <xref ref-type="fig" rid="F4">Figure 4</xref> shows partial viewport content from the DAVS. (A) is an original distorted ERP image in the database and (B) displays the examples of different viewports in DAVS. Those overlapping viewport contents illustrate a dynamically attentive process when humans view the ODI from one attentive point to the next and finally composite into a sequence that curve the dynamically spatiotemporal information which will facilitate us to extract spatiotemporal features from the DAVS. Comparing the content in these viewports, some of these viewport contents are rich in luminance information, while others are not. This phenomenon corresponds to the theory of not depending on the saliency map but predicting the scanpath with the law of gravity, so the scanpath is more accurate than the common method.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>The illustration of the spherical scene. The viewer with HMD stands at the center of the sphere. The location of attention can be described with both longitude and latitude.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-1022041-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Visualization of viewports in DAVS obtained by the scanpath-based viewport sampling. <bold>(A)</bold> ODI with the ERP format and <bold>(B)</bold> examples of different viewports in DAVS. Source for the photos in this figure: open source OIQA Database.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-1022041-g004.tif"/>
</fig>
</sec>
<sec id="S3.SS2">
<title>Quality-aware feature extraction and quality prediction</title>
<p>As stated, the built DAVS can be considered as a human viewing behavior-characterized compact representation of the whole ODI. The remaining issue is how to perform the quality evaluation of an ODI based on the DAVS. As known, the most important step of quality evaluation is to extract effective QAFs that are highly descriptive of distortion level (i.e., visual quality level). Since the DAVS is a sequence composed of a series of viewports sampled from the ODI along the scanpath, we can naturally treat the DAVS as a pseudo video with only global camera motion and each viewport in DAVS just corresponds to a specific frame in the video. Obviously, just like a natural video, the DAVS also contains information regarding the spatial variations in pixels along with the dynamic motion in successive frames (<xref ref-type="bibr" rid="B25">Mittal et al., 2016</xref>). Thus, the QAF extraction from DAVS should well account for the characteristics in the spatiotemporal domain. In this work, we propose to model the spatiotemporal scene statistics of the DAVS as the QAFs of the corresponding ODI. The obtained spatiotemporal scene statistics are used as the QAFs based on which quality prediction is performed.</p>
<p><italic>1) Spatiotemporal MSCN Sequence:</italic> It has been widely demonstrated that the local mean subtracted contrast normalized (MSCN) coefficients of a pristine natural image can be well modeled by a Gaussian distribution (<xref ref-type="bibr" rid="B24">Mittal et al., 2012</xref>). However, when a natural image suffers distortions, the distribution of its MSCN coefficients will deviate from the original distribution (<xref ref-type="bibr" rid="B24">Mittal et al., 2012</xref>). These distributions can be modeled either by a generalized Gaussian distribution (GGD) or asymmetric GGD (AGGD). The parameters of GGD and AGGD can be used as the QAFs of natural images and have achieved great success in designing NR-IQA metrics during the past decade. The main reason for working on MSCN images is due to the decorrelation of local pixel dependency. Similarly, the DAVS which records the dynamic visual contents seen by the viewers along the scanpath has a high correlation among neighboring pixels in both spatial and temporal domains. In order to decorrelate such local dependency in the DAVS, we propose to use the spatiotemporal MSCN. Specifically, let us denote the DAVS as <italic>V</italic>(<italic>x</italic>,<italic>y</italic>,<italic>t</italic>), and the spatiotemporal MSCN coefficients of <italic>V</italic>(<italic>x</italic>,<italic>y</italic>,<italic>t</italic>) is calculated as follows:</p>
<disp-formula id="S3.E8">
<label>(8)</label>
<mml:math id="M8">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">^</mml:mo>
</mml:mover>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x03BC;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">&#x03C3;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>where <italic>x</italic> and <italic>y</italic> are the spatial indices and <italic>t</italic> is the temporal index. A small constant <italic>c</italic> is imposed on the dominator to avoid instability when &#x03C3;(<italic>x</italic>,<italic>y</italic>,<italic>t</italic>) approaches to zero and we empirically set <italic>c</italic> = 1 in this work. &#x03BC;(<italic>x</italic>,<italic>y</italic>,<italic>t</italic>) and &#x03C3;(<italic>x</italic>,<italic>y</italic>,<italic>t</italic>) are the mean and standard deviation, respectively, which are defined as follows:</p>
<disp-formula id="S3.E9">
<label>(9)</label>
<mml:math id="M9">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">&#x03BC;</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:munderover>
<mml:mo movablelimits="false">&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>J</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mi>J</mml:mi>
</mml:munderover>
</mml:mstyle>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:munderover>
<mml:mo movablelimits="false">&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mi>K</mml:mi>
</mml:munderover>
</mml:mstyle>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:munderover>
<mml:mo movablelimits="false">&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mi>L</mml:mi>
</mml:munderover>
</mml:mstyle>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03C9;</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<disp-formula id="S3.E10">
<label>(10)</label>
<mml:math id="M10">
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">&#x03C3;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi/>
</mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:munderover>
<mml:mo movablelimits="false">&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>J</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mi>J</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:munderover>
<mml:mo movablelimits="false">&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mi>K</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:munderover>
<mml:mo movablelimits="false">&#x2211;</mml:mo>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mi>L</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03C9;</mml:mi>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mi>V</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">&#x03BC;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">]</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
</mml:math>
</disp-formula>
<p>where &#x03C9; is a symmetric normalized 3D Gaussian weighting function with zero mean and standard deviation of 1.166. According to <xref ref-type="bibr" rid="B5">Dendi and Channappayya (2020)</xref>, we set <italic>J</italic> = <italic>K</italic> = <italic>L</italic> = 2.</p>
<p><italic>2) Spatiotemporal Gabor Filter Response Sequence:</italic> It has been hypothesized that the HVS employs spatiotemporal bandpass filters to analyze and process dynamic visual signals (<xref ref-type="bibr" rid="B31">Simoncelli and Olshausen, 2001</xref>). Our proposed approach is motivated by this hypothesis that spatiotemporal Gabor filters are a good approximation of the bandpass behavior of the HVS (<xref ref-type="bibr" rid="B28">Petkov and Subramanian, 2007</xref>; <xref ref-type="bibr" rid="B38">Tu et al., 2020</xref>; <xref ref-type="bibr" rid="B8">Gotz-Hahn et al., 2021</xref>). The spatiotemporal Gabor filters combine information over space and time, which advocates a suitable model for feature analysis of the DAVS. Mathematically, the spatiotemporal Gabor filter is defined by the product of three factors including a Gaussian envelope function that limits the spatial extent, a cosine wave moving with phase speed <italic>v</italic> in the &#x03B8; direction, and a Gaussian function that determines the decay along time:</p>
<disp-formula id="S3.Ex2">
<label>(11)</label>
<mml:math id="M11">
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x03D5;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi/>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mi mathvariant="normal">&#x03B3;</mml:mi>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>&#x03C0;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x03C3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">&#x00AF;</mml:mo>
</mml:mover>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="normal">&#x03B3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>&#x2062;</mml:mo>
<mml:msup>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">&#x00AF;</mml:mo>
</mml:mover>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2062;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x03C3;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:mo rspace="5.3pt">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="S3.Ex3">
<mml:math id="M12">
<mml:mrow>
<mml:mi/>
<mml:mo>&#x22C5;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>cos</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi mathvariant="normal">&#x03C0;</mml:mi>
</mml:mrow>
<mml:mi mathvariant="normal">&#x03BB;</mml:mi>
</mml:mfrac>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">&#x00AF;</mml:mo>
</mml:mover>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="italic">vt</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>&#x03C6;</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x22C5;</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi mathvariant="normal">&#x03C0;</mml:mi>
</mml:mrow>
</mml:msqrt>
<mml:mi mathvariant="normal">&#x03C4;</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03BC;</mml:mi>
<mml:mi>t</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2062;</mml:mo>
<mml:msup>
<mml:mi mathvariant="normal">&#x03C4;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where <inline-formula><mml:math id="INEQ22"><mml:mrow><mml:mrow><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="normal">&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="normal">&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="normal">&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>o</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mi>s</mml:mi><mml:mo>&#x2062;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="normal">&#x03B8;</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>. &#x03B3; is the rate that specifies the ellipticity of the Gaussian envelope in the spatial domain and is set to &#x03B3; = 0.5 for matching to the elongated receptive field along the <inline-formula><mml:math id="INEQ25"><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover></mml:math></inline-formula> axis. &#x03C3; describes the standard deviation of Gaussian and determines the size of the receptive field. <italic>v</italic> is the phase speed of the cosine factor, which determines the speed of motion. In addition, the speed which the center of the spatial Gaussian moves along the <inline-formula><mml:math id="INEQ26"><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover></mml:math></inline-formula> axis is specified by the parameter <italic>v</italic><sub><italic>c</italic></sub>. In our implementation, we simply set <italic>v</italic><sub><italic>c</italic></sub> = <italic>v</italic>. &#x03BB; is the wavelength of the cosine wave and it is obtained through the relation <inline-formula><mml:math id="INEQ28"><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x2062;</mml:mo><mml:msqrt><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msup><mml:mi>v</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:mrow></mml:math></inline-formula>. &#x03B8; &#x2208; [0,2&#x03C0;] determines the motion direction and the spatial orientation of the filter. The phase offset &#x03C6; &#x2208; [&#x2212;&#x03C0;,&#x03C0;] determines the symmetry in the spatial domain. The Gaussian distribution with mean &#x03BC;<sub><italic>t</italic></sub> = 1.75 and standard deviation &#x03C4; = 2.75 is used to model the decay in intensities along time.</p>
<p>With the above-defined spatiotemporal Gabor filter, the bandpass response of a specific spatiotemporal MSCN sequence can be obtained by convolving it with a bank of spatiotemporal Gabor filters:</p>
<disp-formula id="S3.E12">
<label>(12)</label>
<mml:math id="M13">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x03D5;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">^</mml:mo>
</mml:mover>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x002A;</mml:mo>
</mml:msup>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x03C6;</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>where <inline-formula><mml:math id="INEQ33"><mml:mmultiscripts><mml:mrow/><mml:mprescripts/><mml:none/><mml:mo>&#x002A;</mml:mo></mml:mmultiscripts></mml:math></inline-formula> denotes the convolution operation. In this work, we generate the bandpass spatiotemporal Gabor filter banks by varying the values of <italic>v</italic>, &#x03B8;, and &#x03C6;. Specifically, we select three different speeds {<italic>v</italic> = 0,1,2}, four different orientations {&#x03B8; = 0,&#x03C0;/3,2&#x03C0;/3,&#x03C0;}, and two different phase offsets, i.e., &#x03C6; = 0 for the symmetry case and &#x03C6; = &#x03C0;/2 for the anti-symmetry case. As a result, the total number of spatiotemporal Gabor filters is 24. It means that we can obtain 24 bandpass spatiotemporal Gabor filter response sequences for an input DAVS.</p>
<p><italic>3) AGGD Modeling and Parameter Estimation:</italic> In the legacy NR-IQA works, it has been widely demonstrated that the distribution of the coefficients in a 2D MSCN map can be well modeled by GGD or AGGD. Inspired by this, we in this work also employ the AGGD to model the distributions of the coefficients in the spatiotemporal MSCN and bandpass spatiotemporal Gabor filter response sequences, respectively. The AGGD is a flexible model that can effectively characterize a large variety of unimodal data with only three parameters. Mathematically, the AGGD model is described as follows:</p>
<disp-formula id="S3.E13">
<label>(13)</label>
<mml:math id="M14">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>;</mml:mo>
<mml:mi mathvariant="normal">&#x03B3;</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mtable displaystyle="true" rowspacing="0pt">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mi mathvariant="normal">&#x03B3;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mi mathvariant="normal">&#x0393;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi mathvariant="normal">&#x03B3;</mml:mi>
</mml:mfrac>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
</mml:mfrac>
</mml:mstyle>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi mathvariant="normal">&#x03B3;</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="10.8pt">;</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo>&#x2264;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mi mathvariant="normal">&#x03B3;</mml:mi>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mi mathvariant="normal">&#x0393;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mi mathvariant="normal">&#x03B3;</mml:mi>
</mml:mfrac>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B2;</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
</mml:mfrac>
</mml:mstyle>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mi mathvariant="normal">&#x03B3;</mml:mi>
</mml:msup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="10.8pt">;</mml:mo>
<mml:mrow>
<mml:mo>&#x2200;</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:mrow>
<mml:mo>&gt;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
<mml:mi/>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>where &#x03B3;,&#x03B2;<sub><italic>l</italic></sub>,&#x03B2;<sub><italic>r</italic></sub> are three parameters controlling the shape of the distribution, and &#x0393;(&#x22C5;) is defined as follows:</p>
<disp-formula id="S3.E14">
<label>(14)</label>
<mml:math id="M15">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">&#x0393;</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:munderover>
<mml:mo mathsize="90%" movablelimits="false" stretchy="false">&#x222B;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mi mathvariant="normal">&#x221E;</mml:mi>
</mml:munderover>
</mml:mstyle>
<mml:mrow>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2062;</mml:mo>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>&#x2062;</mml:mo>
<mml:mi mathvariant="italic">dt</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo rspace="27.5pt">;</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>&gt;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>We adopt the moment estimation method suggested in <xref ref-type="bibr" rid="B15">Lasmar et al. (2009)</xref> to estimate the three parameters &#x03B3;,&#x03B2;<sub><italic>l</italic></sub>,&#x03B2;<sub><italic>r</italic></sub>. Besides, we also compute another parameter &#x03B7; = &#x03B3;/(&#x03B2;<sub><italic>l</italic></sub> + &#x03B2;<sub><italic>r</italic></sub>). Finally, we use four parameters [&#x03B3;,&#x03B2;<sub><italic>l</italic></sub>,&#x03B2;<sub><italic>r</italic></sub>,&#x03B7;] to represent an AGGD model.</p>
<p><italic>4) Final Feature Representation:</italic> By applying the AGGD to model the distributions of the coefficients in the spatiotemporal MSCN and bandpass spatiotemporal Gabor filter response sequences, we can obtain a hybrid parameter set to serve as the QAF representation of the DAVS:</p>
<disp-formula id="S3.E15">
<label>(15)</label>
<mml:math id="M16">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">^</mml:mo>
</mml:mover>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>A</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>V</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mover accent="true">
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">^</mml:mo>
</mml:mover>
<mml:mover accent="true">
<mml:mi>V</mml:mi>
<mml:mo stretchy="false">^</mml:mo>
</mml:mover>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mover accent="true">
<mml:mi>f</mml:mi>
<mml:mo stretchy="false">^</mml:mo>
</mml:mover>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x03B8;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="normal">&#x03C6;</mml:mi>
</mml:mrow>
</mml:msub>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>where <inline-formula><mml:math id="INEQ43"><mml:msub><mml:mover accent="true"><mml:mi>f</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:mover accent="true"><mml:mi>V</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover></mml:msub></mml:math></inline-formula> is a 12-dimensional feature vector containing the AGGD parameters obtained by applying the AGGD model on the spatiotemporal MSCN sequence at three coarse-to-fine scales (the coarser scale is first processed by a low-pass filter, followed by a down-sampling operation with a factor of 2 and <inline-formula><mml:math id="INEQ44"><mml:msub><mml:mover accent="true"><mml:mi>f</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>v</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x03C6;</mml:mi></mml:mrow></mml:msub></mml:msub></mml:math></inline-formula> is a 288-dimensional feature vector also containing the AGGD parameters at three coarse-to-fine scales and they are obtained by applying the AGGD model on the bandpass spatiotemporal Gabor filter response sequences with all <italic>v</italic>, &#x03B8;, and &#x03C6;.</p>
<p><italic>5) Quality Score Regression</italic>: After obtaining the overall feature representation of the DAVS (also the ODI), the remaining issue is how to predict the quality score of an input ODI based on the extracted feature representations. Given the subjective quality is provided in the form of scaler, the quality prediction is a typical regression problem from the perspective of machine learning. Therefore, we learn a regression model <italic>via</italic> support vector regression (SVR) for mapping the 300-dimensional QAF vector into a single quality score with the usage of a radial basis function kernel. Once the SVR model is built by training, it can be used for the quality prediction of an input ODI in the test stage with its corresponding 300-dimensional QAF vector as input.</p>
</sec>
</sec>
<sec id="S4" sec-type="results">
<title>Experimental results</title>
<sec id="S4.SS1">
<title>Experimental protocols</title>
<p>We utilize two publicly available omnidirectional image quality databases in the experiments. They are the OIQA database (<xref ref-type="bibr" rid="B7">Duan et al., 2018</xref>) and CVIQD database (<xref ref-type="bibr" rid="B34">Sun et al., 2018</xref>). The OIQA database contains 16 image scenes with four distortion types, including JPEG compression (JPEG), JPEG2000 compression (JP2K), Gaussian blur (BLUR), and Gaussian white noise (WN). Further, each distortion type involves five distortion levels. So, the OIQA database includes 16 pristine ODIs and 320 distorted ODIs in total. Subjective rating scores in the form of mean opinion score (MOS) are given in the range of <xref ref-type="bibr" rid="B45">Xue et al. (2014)</xref> and <xref ref-type="bibr" rid="B37">Tran et al. (2019)</xref>, where a higher score means better visual quality. The CVIQD database consists of 528 ODIs including 16 pristine images and 512 compressed ODIs. Three popular coding techniques, i.e., JPEG, H.264/AVC, and H.265/HEVC, are applied to simulate the compression artifacts. Subjective rating scores in the form of mean opinion score (MOS) are given in the range of [0, 100], where a higher score means a better visual quality.</p>
<p>Our proposed method was run on a computer with a 3.60 GHz Intel Core i7 processor, 64 GB main memory, and Nvidia GeForce GTX 3090 graphics. We utilize three criteria to validate the performance of IQA models, including Pearson&#x2019;s linear correlation coefficient (PLCC), Spearman&#x2019;s rank-order correlation coefficient (SRCC), and root mean squared error (RMSE). For PLCC and SRCC, the higher the better, while for RMSE, the tendency is the opposite. For a perfect match between prediction scores and ground-truth subjective scores, we should have PLCC = SRCC = 1 and RMSE = 0. Before computing PLCC and RMSE, according to the recommendation of the video quality expert group (VQE) (<xref ref-type="bibr" rid="B39">Video Quality Experts Group, 2003</xref>), we apply a standard five-parameter logistic function to adjust the predicted scores to minimize the non-linearity of the subjective rating scores:</p>
<disp-formula id="S4.E16">
<label>(16)</label>
<mml:math id="M17">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B1;</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
</mml:mstyle>
<mml:mo>-</mml:mo>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mi>exp</mml:mi>
<mml:mo>&#x2061;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B1;</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B1;</mml:mi>
<mml:mn>3</mml:mn>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B1;</mml:mi>
<mml:mn>4</mml:mn>
</mml:msub>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi mathvariant="normal">&#x03B1;</mml:mi>
<mml:mn>5</mml:mn>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>where <italic>p</italic> represents the predicted score and <italic>s</italic> denotes the mapped score. &#x03B1;<sub><italic>1</italic></sub> to &#x03B1;<sub><italic>5</italic></sub> are parameters for fitting this function.</p>
<p>Since our proposed method requires the training process to build the quality prediction model, it is necessary to describe the details regarding the training process. Specifically, we equally divide the whole dataset into five non-overlapping subsets with each subset containing 20% samples. Then, we apply the fivefold validation strategy to test the model performance. Specifically, the whole dataset is trained and tested five times, with four subsets as the training data and the remaining one subset as the testing data. After training and testing the whole dataset five times, we can get the predicted scores of all samples in the dataset. Finally, the PLCC, SRCC, and RMSE values are calculated between all the predicted scores and ground-truth subjective scores provided in each dataset.</p>
</sec>
<sec id="S4.SS2">
<title>Single-dataset performance comparison</title>
<p>We compare the performance of our proposed S<sup>3</sup>DAVS model with eight representative NR-IQA models, including BRISQUE (<xref ref-type="bibr" rid="B24">Mittal et al., 2012</xref>), DIIVINE (<xref ref-type="bibr" rid="B27">Moorthy and Bovik, 2011</xref>), SSEQ (<xref ref-type="bibr" rid="B20">Liu et al., 2014</xref>), OG-IQA (<xref ref-type="bibr" rid="B21">Liu et al., 2016</xref>), NRSL (<xref ref-type="bibr" rid="B18">Li et al., 2016</xref>), SSP-BOIQA (<xref ref-type="bibr" rid="B54">Zheng et al., 2020</xref>), MC360IQA (<xref ref-type="bibr" rid="B35">Sun et al., 2019</xref>), and <xref ref-type="bibr" rid="B57">Zhou Y. et al. (2022)</xref> work. The former five models are traditional NR-IQA models that are developed for 2D natural images, while the latter three models are specifically designed for ODIs. In these three blind omnidirectional image quality assessments (BOIQA), MC360IQA (<xref ref-type="bibr" rid="B35">Sun et al., 2019</xref>) and <xref ref-type="bibr" rid="B57">Zhou Y. et al. (2022)</xref> work employ deep learning methods, while SSP-BOIQA (<xref ref-type="bibr" rid="B54">Zheng et al., 2020</xref>) focuses on segmenting ODI into three regions and extracting features with weights. Based on those training-based models, we re-trained the quality prediction models on each dataset with the same 5-fold validation strategy. Specifically, before testing the data, each dataset should generate three sub-datasets with three different scales respectively, and the reduction factor is 1/2. We fuse the features from sub-datasets into final feature vectors. Then, in the test session, a 5-fold cross-validation method is used to verify the performance of the model. The performance results in terms of PLCC, SRCC, and RMSE on the OIQA and CVIQD databases are shown in <xref ref-type="table" rid="T1">Tables 1</xref>, <xref ref-type="table" rid="T2">2</xref>, respectively. We highlight the best performance model in each column. From these two tables, we have several observations. First, for both the OIQA and CVIQD databases, our proposed S<sup>3</sup>DAVS model, although inferior to some compared methods on some individual distortion types, presents the highest PLCC, SRCC, and RMSE values when considering the overall database. It means that our method is a good candidate for OIQA when the distortion type is unknown. Second, since the OIQA database contains more distorted types than the CVIQD database, the overall results on OIQA are commonly lower than those on the CVIQD database. According to the above experimental results, we could get the conclusion that the S<sup>3</sup>DAVS model owns the best performance among those models. Moreover, we can observe that the results of SSP-BOIQA (<xref ref-type="bibr" rid="B54">Zheng et al., 2020</xref>) which is designed for ODIs specifically are even worse than traditional 2D models. This is mainly due to the fact that SSP-BOIQA only focused on transforming the ERP image into bipolar and equatorial regions, ignoring the extraction of features from specific contents that are highly consistent with human perception.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Performance comparison of the OIQA database.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Models</td>
<td valign="top" align="center" colspan="3">JPEG<hr/></td>
<td valign="top" align="center" colspan="3">JP2K<hr/></td>
<td valign="top" align="center" colspan="3">WN<hr/></td>
<td valign="top" align="center" colspan="3">BLUR<hr/></td>
<td valign="top" align="center" colspan="3">ALL<hr/></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">BRISQUE (<xref ref-type="bibr" rid="B24">Mittal et al., 2012</xref>)</td>
<td valign="top" align="center">0.9401</td>
<td valign="top" align="center">0.8691</td>
<td valign="top" align="center">0.7605</td>
<td valign="top" align="center">0.8986</td>
<td valign="top" align="center">0.8447</td>
<td valign="top" align="center">0.9009</td>
<td valign="top" align="center">0.9663</td>
<td valign="top" align="center">0.9162</td>
<td valign="top" align="center">0.4664</td>
<td valign="top" align="center">0.9163</td>
<td valign="top" align="center">0.8529</td>
<td valign="top" align="center">0.7295</td>
<td valign="top" align="center">0.8972</td>
<td valign="top" align="center">0.8694</td>
<td valign="top" align="center">0.9358</td>
</tr>
<tr>
<td valign="top" align="left">DIIVINE (<xref ref-type="bibr" rid="B27">Moorthy and Bovik, 2011</xref>)</td>
<td valign="top" align="center">0.8963</td>
<td valign="top" align="center">0.8132</td>
<td valign="top" align="center">0.9307</td>
<td valign="top" align="center">0.9337</td>
<td valign="top" align="center">0.8970</td>
<td valign="top" align="center">0.7829</td>
<td valign="top" align="center">0.9649</td>
<td valign="top" align="center">0.9011</td>
<td valign="top" align="center">0.5029</td>
<td valign="top" align="center">0.9205</td>
<td valign="top" align="center">0.8392</td>
<td valign="top" align="center">0.7413</td>
<td valign="top" align="center">0.8793</td>
<td valign="top" align="center">0.8458</td>
<td valign="top" align="center">1.0127</td>
</tr>
<tr>
<td valign="top" align="left">SSEQ (<xref ref-type="bibr" rid="B20">Liu et al., 2014</xref>)</td>
<td valign="top" align="center">0.8905</td>
<td valign="top" align="center">0.8194</td>
<td valign="top" align="center">0.9945</td>
<td valign="top" align="center">0.8900</td>
<td valign="top" align="center">0.8500</td>
<td valign="top" align="center">0.9656</td>
<td valign="top" align="center">0.9589</td>
<td valign="top" align="center">0.9103</td>
<td valign="top" align="center">0.5200</td>
<td valign="top" align="center">0.9508</td>
<td valign="top" align="center">0.8989</td>
<td valign="top" align="center">0.6030</td>
<td valign="top" align="center">0.8970</td>
<td valign="top" align="center">0.8750</td>
<td valign="top" align="center">0.9240</td>
</tr>
<tr>
<td valign="top" align="left">OG-IQA (<xref ref-type="bibr" rid="B21">Liu et al., 2016</xref>)</td>
<td valign="top" align="center"><bold>0.9552</bold></td>
<td valign="top" align="center">0.8912</td>
<td valign="top" align="center"><bold>0.6845</bold></td>
<td valign="top" align="center">0.8759</td>
<td valign="top" align="center">0.8253</td>
<td valign="top" align="center">1.0224</td>
<td valign="top" align="center">0.9717</td>
<td valign="top" align="center">0.9206</td>
<td valign="top" align="center"><bold>0.4262</bold></td>
<td valign="top" align="center">0.9473</td>
<td valign="top" align="center">0.9025</td>
<td valign="top" align="center">0.6005</td>
<td valign="top" align="center">0.9076</td>
<td valign="top" align="center">0.8954</td>
<td valign="top" align="center">0.8684</td>
</tr>
<tr>
<td valign="top" align="left">NRSL (<xref ref-type="bibr" rid="B18">Li et al., 2016</xref>)</td>
<td valign="top" align="center">0.9490</td>
<td valign="top" align="center">0.8834</td>
<td valign="top" align="center">0.7260</td>
<td valign="top" align="center">0.9538</td>
<td valign="top" align="center">0.8941</td>
<td valign="top" align="center"><bold>0.5507</bold></td>
<td valign="top" align="center">0.9176</td>
<td valign="top" align="center">0.8691</td>
<td valign="top" align="center">0.8370</td>
<td valign="top" align="center">0.9258</td>
<td valign="top" align="center">0.8618</td>
<td valign="top" align="center">0.7074</td>
<td valign="top" align="center">0.8852</td>
<td valign="top" align="center">0.8537</td>
<td valign="top" align="center">0.9749</td>
</tr>
<tr>
<td valign="top" align="left">SSP-BOIQA (<xref ref-type="bibr" rid="B54">Zheng et al., 2020</xref>)</td>
<td valign="top" align="center">0.877</td>
<td valign="top" align="center">0.834</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.853</td>
<td valign="top" align="center">0.852</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.905</td>
<td valign="top" align="center">0.843</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.854</td>
<td valign="top" align="center">0.862</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.860</td>
<td valign="top" align="center">0.865</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">MC360IQA (<xref ref-type="bibr" rid="B35">Sun et al., 2019</xref>)</td>
<td valign="top" align="center">0.9015</td>
<td valign="top" align="center">0.8995</td>
<td valign="top" align="center">0.8234</td>
<td valign="top" align="center">0.8861</td>
<td valign="top" align="center">0.8779</td>
<td valign="top" align="center">1.3687</td>
<td valign="top" align="center">0.9195</td>
<td valign="top" align="center">0.9124</td>
<td valign="top" align="center">0.8234</td>
<td valign="top" align="center">0.8938</td>
<td valign="top" align="center">0.8892</td>
<td valign="top" align="center">1.3838</td>
<td valign="top" align="center">0.8953</td>
<td valign="top" align="center">0.8928</td>
<td valign="top" align="center">1.5052</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B57">Zhou Y. et al. (2022)</xref></td>
<td valign="top" align="center">0.936</td>
<td valign="top" align="center"><bold>0.940</bold></td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.920</td>
<td valign="top" align="center"><bold>0.934</bold></td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.968</td>
<td valign="top" align="center">0.957</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.925</td>
<td valign="top" align="center">0.920</td>
<td valign="top" align="center">&#x2013;</td>
<td valign="top" align="center">0.899</td>
<td valign="top" align="center">0.923</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">S<sup>3</sup>DAVS</td>
<td valign="top" align="center">0.9267</td>
<td valign="top" align="center">0.8872</td>
<td valign="top" align="center">0.8634</td>
<td valign="top" align="center"><bold>0.9370</bold></td>
<td valign="top" align="center">0.9306</td>
<td valign="top" align="center">0.7717</td>
<td valign="top" align="center"><bold>0.9725</bold></td>
<td valign="top" align="center"><bold>0.9623</bold></td>
<td valign="top" align="center">0.4384</td>
<td valign="top" align="center"><bold>0.9692</bold></td>
<td valign="top" align="center"><bold>0.9662</bold></td>
<td valign="top" align="center"><bold>0.4805</bold></td>
<td valign="top" align="center"><bold>0.9405</bold></td>
<td valign="top" align="center"><bold>0.9348</bold></td>
<td valign="top" align="center"><bold>0.7183</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>The best-performing NR metrics are highlighted in bold.</p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Performance comparison on CVIQD database.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Models</td>
<td valign="top" align="center" colspan="3">JPEG<hr/></td>
<td valign="top" align="center" colspan="3">AVC<hr/></td>
<td valign="top" align="center" colspan="3">HEVC<hr/></td>
<td valign="top" align="center" colspan="3">ALL<hr/></td>
</tr>
<tr>
<td valign="top" align="left"/><td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">BRISQUE (<xref ref-type="bibr" rid="B24">Mittal et al., 2012</xref>)</td>
<td valign="top" align="center">0.9519</td>
<td valign="top" align="center">0.9308</td>
<td valign="top" align="center">4.9825</td>
<td valign="top" align="center">0.8913</td>
<td valign="top" align="center">0.8559</td>
<td valign="top" align="center">5.6647</td>
<td valign="top" align="center">0.8979</td>
<td valign="top" align="center">0.8980</td>
<td valign="top" align="center">5.3367</td>
<td valign="top" align="center">0.9001</td>
<td valign="top" align="center">0.8814</td>
<td valign="top" align="center">6.2327</td>
</tr>
<tr>
<td valign="top" align="left">DIIVINE (<xref ref-type="bibr" rid="B27">Moorthy and Bovik, 2011</xref>)</td>
<td valign="top" align="center">0.9331</td>
<td valign="top" align="center">0.8710</td>
<td valign="top" align="center">6.0475</td>
<td valign="top" align="center">0.9024</td>
<td valign="top" align="center">0.8927</td>
<td valign="top" align="center">5.0618</td>
<td valign="top" align="center">0.9031</td>
<td valign="top" align="center">0.8530</td>
<td valign="top" align="center">5.6397</td>
<td valign="top" align="center">0.8988</td>
<td valign="top" align="center">0.9080</td>
<td valign="top" align="center">6.0260</td>
</tr>
<tr>
<td valign="top" align="left">SSEQ (<xref ref-type="bibr" rid="B20">Liu et al., 2014</xref>)</td>
<td valign="top" align="center"><bold>0.9745</bold></td>
<td valign="top" align="center"><bold>0.9527</bold></td>
<td valign="top" align="center">4.0731</td>
<td valign="top" align="center">0.9381</td>
<td valign="top" align="center">0.9180</td>
<td valign="top" align="center">4.2228</td>
<td valign="top" align="center">0.9115</td>
<td valign="top" align="center">0.9059</td>
<td valign="top" align="center">5.0884</td>
<td valign="top" align="center">0.9263</td>
<td valign="top" align="center">0.9134</td>
<td valign="top" align="center">5.2609</td>
</tr>
<tr>
<td valign="top" align="left">OG-IQA (<xref ref-type="bibr" rid="B21">Liu et al., 2016</xref>)</td>
<td valign="top" align="center"><bold>0.9745</bold></td>
<td valign="top" align="center">0.9261</td>
<td valign="top" align="center"><bold>3.6130</bold></td>
<td valign="top" align="center">0.8871</td>
<td valign="top" align="center">0.8852</td>
<td valign="top" align="center">5.7588</td>
<td valign="top" align="center">0.9030</td>
<td valign="top" align="center">0.9055</td>
<td valign="top" align="center">4.6374</td>
<td valign="top" align="center">0.9197</td>
<td valign="top" align="center">0.8969</td>
<td valign="top" align="center">5.3562</td>
</tr>
<tr>
<td valign="top" align="left">NRSL (<xref ref-type="bibr" rid="B18">Li et al., 2016</xref>)</td>
<td valign="top" align="center">0.9570</td>
<td valign="top" align="center">0.9056</td>
<td valign="top" align="center">5.1460</td>
<td valign="top" align="center">0.9145</td>
<td valign="top" align="center">0.8823</td>
<td valign="top" align="center">4.9565</td>
<td valign="top" align="center">0.9000</td>
<td valign="top" align="center">0.8981</td>
<td valign="top" align="center">4.8063</td>
<td valign="top" align="center">0.8850</td>
<td valign="top" align="center">0.8944</td>
<td valign="top" align="center">6.8612</td>
</tr>
<tr>
<td valign="top" align="left">SSP-BOIQA (<xref ref-type="bibr" rid="B54">Zheng et al., 2020</xref>)</td>
<td valign="top" align="center">0.915</td>
<td valign="top" align="center">0.853</td>
<td valign="top" align="center">6.847</td>
<td valign="top" align="center">0.885</td>
<td valign="top" align="center">0.861</td>
<td valign="top" align="center">7.042</td>
<td valign="top" align="center">0.854</td>
<td valign="top" align="center">0.841</td>
<td valign="top" align="center">6.302</td>
<td valign="top" align="center">0.890</td>
<td valign="top" align="center">0.856</td>
<td valign="top" align="center">6.941</td>
</tr>
<tr>
<td valign="top" align="left">MC360IQA (<xref ref-type="bibr" rid="B35">Sun et al., 2019</xref>)</td>
<td valign="top" align="center">0.9410</td>
<td valign="top" align="center">0.9230</td>
<td valign="top" align="center">5.8040</td>
<td valign="top" align="center">0.9320</td>
<td valign="top" align="center">0.9410</td>
<td valign="top" align="center">5.3570</td>
<td valign="top" align="center">0.9140</td>
<td valign="top" align="center">0.8990</td>
<td valign="top" align="center">4.8010</td>
<td valign="top" align="center">0.9390</td>
<td valign="top" align="center">0.9040</td>
<td valign="top" align="center">4.6060</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B57">Zhou Y. et al. (2022)</xref></td>
<td valign="top" align="center">0.957</td>
<td valign="top" align="center">0.923</td>
<td valign="top" align="center">5.601</td>
<td valign="top" align="center">0.953</td>
<td valign="top" align="center"><bold>0.949</bold></td>
<td valign="top" align="center">3.873</td>
<td valign="top" align="center">0.929</td>
<td valign="top" align="center"><bold>0.914</bold></td>
<td valign="top" align="center"><bold>4.525</bold></td>
<td valign="top" align="center">0.902</td>
<td valign="top" align="center">0.911</td>
<td valign="top" align="center">6.117</td>
</tr>
<tr>
<td valign="top" align="left">S<sup>3</sup>DAVS</td>
<td valign="top" align="center">0.9707</td>
<td valign="top" align="center">0.9302</td>
<td valign="top" align="center">3.8675</td>
<td valign="top" align="center"><bold>0.9586</bold></td>
<td valign="top" align="center">0.9447</td>
<td valign="top" align="center"><bold>3.3925</bold></td>
<td valign="top" align="center"><bold>0.9367</bold></td>
<td valign="top" align="center">0.8802</td>
<td valign="top" align="center">4.5675</td>
<td valign="top" align="center"><bold>0.9533</bold></td>
<td valign="top" align="center"><bold>0.9426</bold></td>
<td valign="top" align="center"><bold>4.1022</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>The best-performing NR metrics are highlighted in bold.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S4.SS3">
<title>Statistical significance test</title>
<p>Besides the comparisons in terms of PLCC, SRCC, and RMSE, we employ that the <italic>t</italic>-test to demonstrate the superiority of our proposed S<sup>3</sup>DAVS model over other compared methods is significant. <italic>T</italic>-test could be divided into three types: one-sample <italic>t</italic>-test, independent samples <italic>t</italic>-test, and paired <italic>t</italic>-test, and is used to judge whether the performance difference between two models is significant or not. In this section, we first get 100 results of indicators and select the PLCC as the sample data, and then use the independent samples <italic>t</italic>-test method to calculate the final significance criteria. The formula can be written as:</p>
<disp-formula id="S4.E17">
<label>(17)</label>
<mml:math id="M18">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="center">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="false">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>A</mml:mi>
</mml:msub>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
</mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mfrac>
<mml:msubsup>
<mml:mi>S</mml:mi>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>A</mml:mi>
</mml:msub>
</mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:msubsup>
<mml:mi>S</mml:mi>
<mml:mi>B</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>B</mml:mi>
</mml:msub>
</mml:mfrac>
</mml:mrow>
</mml:msqrt>
</mml:mfrac>
</mml:mstyle>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
<p>For each model, <italic>m</italic> represents the means of samples, <italic>S</italic><sup>2</sup> represents the variance of samples, and <italic>n</italic> is the number of samples. <italic>t</italic> illustrates the significant difference between the two compared models. In <xref ref-type="fig" rid="F5">Figure 5</xref>, the blue block indicates that the model at a row is worse than model in the column and is labeled as &#x201C;-1.&#x201D; The green block indicates that there are no obvious differences between the models in row and column and is labeled as &#x201C;0.&#x201D; The orange block means the model at row is worse than that in column, and is labeled as &#x201C;1.&#x201D; According to the data distribution of each row in the statistical significance figure, the higher the number of &#x201C;1,&#x201D; the better the algorithm. It can be observed that our proposed model is significantly better than all the others.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Statistical significance comparison by the t-test between our proposed method and other methods. <bold>(A,B)</bold> Depict the results on the OIQA and CVIQD databases, respectively.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fnins-16-1022041-g005.tif"/>
</fig>
</sec>
<sec id="S4.SS4">
<title>Ablation study</title>
<p>The proposed method mainly focuses on extracting statistical features in the spatiotemporal domain and analyzing the viewport content in a way that is consistent with human viewing behavior. According to the aforementioned feature extractor, we know that our method mainly depends on the spatiotemporal MSCN (ST-MSCN) and the spatiotemporal Gabor filter (ST-Gabor) to obtain feature vectors. To figure out the performance of each module, we test the performance by considering each single feature set, i.e., only using the feature set from the ST-MSCN or the ST-Gabor filter. The results are shown in <xref ref-type="table" rid="T3">Tables 3</xref>, <xref ref-type="table" rid="T4">4</xref>, respectively. By comparing the results in these two tables, we can easily find that the features from the ST-Gabor play a dominant role, while the spatiotemporal MSCN features play an auxiliary role in our S<sup>3</sup>DAVS. Overall, the performance will be further improved by jointly considering the spatiotemporal statistical features of the ST-Gabor and the ST-MSCN.</p>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Performance comparison of the OIQA database with each single feature set.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Ablation models</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">ST-MSCN</td>
<td valign="top" align="center">0.6303</td>
<td valign="top" align="center">0.5901</td>
<td valign="top" align="center">1.6346</td>
</tr>
<tr>
<td valign="top" align="left">ST-Gabor</td>
<td valign="top" align="center">0.9259</td>
<td valign="top" align="center">0.9213</td>
<td valign="top" align="center">0.8130</td>
</tr>
<tr>
<td valign="top" align="left">ST-MSCN + ST-Gabor</td>
<td valign="top" align="center"><bold>0.9405</bold></td>
<td valign="top" align="center"><bold>0.9348</bold></td>
<td valign="top" align="center"><bold>0.7183</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>The best-performing results are highlighted in bold.</p></fn>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T4">
<label>TABLE 4</label>
<caption><p>Performance comparison on the CVIQD database with each single feature set.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Ablation models</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">ST-MSCN</td>
<td valign="top" align="center">0.6529</td>
<td valign="top" align="center">0.6091</td>
<td valign="top" align="center">10.6939</td>
</tr>
<tr>
<td valign="top" align="left">ST-Gabor</td>
<td valign="top" align="center">0.9223</td>
<td valign="top" align="center">0.9041</td>
<td valign="top" align="center">5.5126</td>
</tr>
<tr>
<td valign="top" align="left">ST-MSCN + ST-Gabor</td>
<td valign="top" align="center"><bold>0.9533</bold></td>
<td valign="top" align="center"><bold>0.9426</bold></td>
<td valign="top" align="center"><bold>4.1022</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>The best-performing results are highlighted in bold.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S4.SS5">
<title>Cross-dataset performance comparison</title>
<p>To test the generalization ability of an image quality metric, cross-database validation is necessary. Given two databases, we use one database for training and another one for testing. Because the OIQA database contains more distorted types, such as Gaussian noise and Gaussian blur, whereas the CVIQD database only contains compression distortion, we only tested JPEG and JP2K in the OIQA database, and the results are shown in <xref ref-type="table" rid="T5">Table 5</xref>. As shown in the table, the results on the CVIQD database are generally higher than the results on the OIQA database. The reason is that the model trained on the OIQA database learns more feature types and could discriminate the distortion types in the CVIQD database more accurately. Overall, in comparison to other existing methods, our proposed model has a better generalization capability.</p>
<table-wrap position="float" id="T5">
<label>TABLE 5</label>
<caption><p>Performance results of cross-database validation.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Models</td>
<td valign="top" align="center" colspan="3">Train OIQA/Test CVIQD<hr/></td>
<td valign="top" align="center" colspan="3">Train CVIQD/Test OIQA<hr/></td>
</tr>
<tr>
<td/>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
<td valign="top" align="center">PLCC</td>
<td valign="top" align="center">SRCC</td>
<td valign="top" align="center">RMSE</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">BMPRI (<xref ref-type="bibr" rid="B23">Min et al., 2018</xref>)</td>
<td valign="top" align="center">0.4904</td>
<td valign="top" align="center">0.2417</td>
<td valign="top" align="center">12.1862</td>
<td valign="top" align="center">0.7595</td>
<td valign="top" align="center">0.7205</td>
<td valign="top" align="center">1.3249</td>
</tr>
<tr>
<td valign="top" align="left">CEIQ (<xref ref-type="bibr" rid="B46">Yan et al., 2019</xref>)</td>
<td valign="top" align="center">0.6953</td>
<td valign="top" align="center">0.5470</td>
<td valign="top" align="center">9.9767</td>
<td valign="top" align="center">0.5012</td>
<td valign="top" align="center">0.4860</td>
<td valign="top" align="center">1.7856</td>
</tr>
<tr>
<td valign="top" align="left">BRISQUE (<xref ref-type="bibr" rid="B24">Mittal et al., 2012</xref>)</td>
<td valign="top" align="center">0.6166</td>
<td valign="top" align="center">0.5503</td>
<td valign="top" align="center">11.1772</td>
<td valign="top" align="center">0.4950</td>
<td valign="top" align="center">0.4054</td>
<td valign="top" align="center">1.8217</td>
</tr>
<tr>
<td valign="top" align="left">DIIVINE (<xref ref-type="bibr" rid="B27">Moorthy and Bovik, 2011</xref>)</td>
<td valign="top" align="center">0.5658</td>
<td valign="top" align="center">0.4114</td>
<td valign="top" align="center">11.4963</td>
<td valign="top" align="center">0.4454</td>
<td valign="top" align="center">0.3575</td>
<td valign="top" align="center">1.8904</td>
</tr>
<tr>
<td valign="top" align="left">SSEQ (<xref ref-type="bibr" rid="B20">Liu et al., 2014</xref>)</td>
<td valign="top" align="center">0.6175</td>
<td valign="top" align="center">0.6113</td>
<td valign="top" align="center">10.8955</td>
<td valign="top" align="center">0.4927</td>
<td valign="top" align="center">0.4568</td>
<td valign="top" align="center">1.7922</td>
</tr>
<tr>
<td valign="top" align="left">NRSL (<xref ref-type="bibr" rid="B18">Li et al., 2016</xref>)</td>
<td valign="top" align="center">0.6884</td>
<td valign="top" align="center">0.6199</td>
<td valign="top" align="center">10.4646</td>
<td valign="top" align="center">0.3651</td>
<td valign="top" align="center">0.2648</td>
<td valign="top" align="center">1.9431</td>
</tr>
<tr>
<td valign="top" align="left">OG-IQA (<xref ref-type="bibr" rid="B21">Liu et al., 2016</xref>)</td>
<td valign="top" align="center">0.6963</td>
<td valign="top" align="center">0.6392</td>
<td valign="top" align="center">10.1059</td>
<td valign="top" align="center">0.5154</td>
<td valign="top" align="center">0.5299</td>
<td valign="top" align="center">1.8076</td>
</tr>
<tr>
<td valign="top" align="left">SSP-BOIQA (<xref ref-type="bibr" rid="B54">Zheng et al., 2020</xref>)</td>
<td valign="top" align="center">0.726</td>
<td valign="top" align="center">0.705</td>
<td valign="top" align="center">9.588</td>
<td valign="top" align="center">0.627</td>
<td valign="top" align="center">0.601</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">MC360IQA (<xref ref-type="bibr" rid="B35">Sun et al., 2019</xref>)</td>
<td valign="top" align="center">0.8230</td>
<td valign="top" align="center">0.8140</td>
<td valign="top" align="center">7.8110</td>
<td valign="top" align="center">0.6816</td>
<td valign="top" align="center">0.5238</td>
<td valign="top" align="center">1.5471</td>
</tr>
<tr>
<td valign="top" align="left"><xref ref-type="bibr" rid="B57">Zhou Y. et al. (2022)</xref></td>
<td valign="top" align="center"><bold>0.847</bold></td>
<td valign="top" align="center"><bold>0.825</bold></td>
<td valign="top" align="center"><bold>7.721</bold></td>
<td valign="top" align="center">0.735</td>
<td valign="top" align="center">0.741</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">S<sup>3</sup>DAVS</td>
<td valign="top" align="center">0.8358</td>
<td valign="top" align="center">0.8125</td>
<td valign="top" align="center">7.9331</td>
<td valign="top" align="center"><bold>0.7817</bold></td>
<td valign="top" align="center"><bold>0.6859</bold></td>
<td valign="top" align="center"><bold>1.3938</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>The best-performing results are highlighted in bold.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec id="S5" sec-type="conclusion">
<title>Conclusion</title>
<p>This article has presented a novel no-reference (NR) ODI quality evaluation method based on the construction of Dynamically Attentive Viewport Sequence (DAVS) from ODIs and the extraction of Quality-Aware Features (QAFs) from DAVS. The construction of DAVS aims to build a sequence of viewports that are likely to be explored by viewers based on the prediction of visual scanpath when viewers are freely exploring the ODI within the exploration time <italic>via</italic> HMD. A DAVS that contains only global motion can then be obtained by sampling a series of viewports from the ODI along the predicted visual scanpath. The subsequent quality evaluation of ODIs is performed merely based on the DAVS. The extraction of QAFs aims to obtain effective feature representations that are highly discriminative in terms of perceived distortion and visual quality. Finally, a regression model is built to map the extracted QAFs to a single predicted quality score. Experimental results on two datasets demonstrate that the proposed method is able to deliver state-of-the-art performance. However, its shortcomings are also obvious. Although this method captures effective information to analyze the feature of ODI and predict the quality, it is time-consuming and computationally complex, and it needs to be improved. Taking the OIQA database as an example, there are 320 ODIs in this database, and the smallest resolution is 11,332 &#x00D7; 5,666. The test result shows it takes about 1.983 min for an image to complete all the steps. Besides, the scanpath can only present the movements of human eyes, but the movements of the head are not clear. We could explore these directions in our future work.</p>
</sec>
<sec id="S6" sec-type="data-availability">
<title>Data availability statement</title>
<p>The original contributions presented in this study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.</p>
</sec>
<sec id="S7">
<title>Author contributions</title>
<p>YW and HL contributed to conception and design of the study and wrote the first draft of the manuscript. YW wrote the software codes of the algorithm and conducted experiments. HL and QJ performed the statistical analysis. All authors contributed to manuscript revision, read, and approved the submitted version.</p>
</sec>
</body>
<back>
<sec id="S8" sec-type="funding-information">
<title>Funding</title>
<p>This work was supported by the Natural Science Foundation of China under Grant No. 62271277 and the Zhejiang Natural Science Foundation of China under Grant No. LR22F020002.</p>
</sec>
<sec id="S9" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="S10" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alain</surname> <given-names>M.</given-names></name> <name><surname>Zerman</surname> <given-names>E.</given-names></name> <name><surname>Ozcinar</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). &#x201C;<article-title>Immersive imaging technologies: From capture to display</article-title>,&#x201D; in <source><italic>Proceedings of the 28th ACM International Conference on Multimedia (MM&#x2019;20)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>4787</fpage>&#x2013;<lpage>4788</lpage>. <pub-id pub-id-type="doi">10.1145/3394171.3418550</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Besharse</surname> <given-names>J.</given-names></name> <name><surname>Bok</surname> <given-names>D.</given-names></name></person-group> (<year>2011</year>). <source><italic>The Retina and its Disorders.</italic></source> <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Academic Press</publisher-name>.</citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>M.</given-names></name> <name><surname>Jin</surname> <given-names>Y.</given-names></name> <name><surname>Goodall</surname> <given-names>T.</given-names></name> <name><surname>Yu</surname> <given-names>X.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2019</year>). <article-title>Study of 3D virtual reality picture quality.</article-title> <source><italic>IEEE J. Sel. Top. Signal Process.</italic></source> <volume>14</volume> <fpage>89</fpage>&#x2013;<lpage>102</lpage>. <pub-id pub-id-type="doi">10.1109/JSTSP.2019.2956408</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>Z.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Lin</surname> <given-names>C.</given-names></name> <name><surname>Zhou</surname> <given-names>W.</given-names></name></person-group> (<year>2020</year>). <article-title>Stereoscopic omnidirectional image quality assessment based on predictive coding theory.</article-title> <source><italic>IEEE J. Sel. Top. Signal Process.</italic></source> <volume>14</volume> <fpage>103</fpage>&#x2013;<lpage>117</lpage>. <pub-id pub-id-type="doi">10.1109/JSTSP.2020.2968182</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dendi</surname> <given-names>S.</given-names></name> <name><surname>Channappayya</surname> <given-names>S. S.</given-names></name></person-group> (<year>2020</year>). <article-title>No-reference video quality assessment using natural spatiotemporal scene statistics.</article-title> <source><italic>IEEE Trans. Image Process.</italic></source> <volume>29</volume> <fpage>5612</fpage>&#x2013;<lpage>5624</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2020.2984879</pub-id> <pub-id pub-id-type="pmid">32275592</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>H.</given-names></name> <name><surname>Xu</surname> <given-names>M.</given-names></name> <name><surname>Guo</surname> <given-names>Y.</given-names></name> <name><surname>Song</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>L.</given-names></name></person-group> (<year>2021</year>). &#x201C;<article-title>LAU-Net: Latitude adaptive upscaling network for omnidirectional image super-resolution</article-title>,&#x201D; in <source><italic>2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>9185</fpage>&#x2013;<lpage>9194</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR46437.2021.00907</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duan</surname> <given-names>H.</given-names></name> <name><surname>Zhai</surname> <given-names>G.</given-names></name> <name><surname>Min</surname> <given-names>X.</given-names></name> <name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>Perceptual quality assessment of omnidirectional images</article-title>,&#x201D; in <source><italic>2018 IEEE International Symposium on Circuits and Systems (ISCAS)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1109/ISCAS.2018.8351786</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gotz-Hahn</surname> <given-names>F.</given-names></name> <name><surname>Hosu</surname> <given-names>V.</given-names></name> <name><surname>Lin</surname> <given-names>H.</given-names></name> <name><surname>Saupe</surname> <given-names>D.</given-names></name></person-group> (<year>2021</year>). <article-title>KonVid-150k: A dataset for no-reference video quality assessment of videos in-the-wild.</article-title> <source><italic>IEEE Access</italic></source> <volume>9</volume> <fpage>72139</fpage>&#x2013;<lpage>72160</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2021.3077642</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gu</surname> <given-names>K.</given-names></name> <name><surname>Zhai</surname> <given-names>G.</given-names></name> <name><surname>Yang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name></person-group> (<year>2015</year>). <article-title>Using free energy principle for blind image quality assessment.</article-title> <source><italic>IEEE Trans. Multimed.</italic></source> <volume>17</volume> <fpage>50</fpage>&#x2013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1109/TMM.2014.2373812</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ho</surname> <given-names>T.</given-names></name> <name><surname>Budagavi</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Dual-fisheye lens stitching for 360-degree imaging</article-title>,&#x201D; in <source><italic>2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2172</fpage>&#x2013;<lpage>2176</lpage>. <pub-id pub-id-type="doi">10.1109/ICASSP.2017.7952541</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hor&#x00E9;</surname> <given-names>A.</given-names></name> <name><surname>Ziou</surname> <given-names>D.</given-names></name></person-group> (<year>2010</year>). &#x201C;<article-title>Image quality metrics: PSNR vs. SSIM</article-title>,&#x201D; in <source><italic>2010 20th International Conference on Pattern Recognition</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2366</fpage>&#x2013;<lpage>2369</lpage>. <pub-id pub-id-type="doi">10.1109/ICPR.2010.579</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><collab>ISO</collab> (<year>2019</year>). <source><italic>Information Technology Coded Representation of Immersive Media&#x2014;Part 2: Omnidirectional Media Format.</italic></source> <publisher-loc>Geneva</publisher-loc>: <publisher-name>ISO</publisher-name>.</citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jiang</surname> <given-names>H.</given-names></name> <name><surname>Jiang</surname> <given-names>G.</given-names></name> <name><surname>Yu</surname> <given-names>M.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Peng</surname> <given-names>Z.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Cubemap-based perception-driven blind quality assessment for 360-degree images.</article-title> <source><italic>IEEE Trans. Image Process.</italic></source> <volume>30</volume> <fpage>2364</fpage>&#x2013;<lpage>2377</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2021.3052073</pub-id> <pub-id pub-id-type="pmid">33481711</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>H. G.</given-names></name> <name><surname>Lim</surname> <given-names>H. T.</given-names></name> <name><surname>Yong</surname> <given-names>M. R.</given-names></name></person-group> (<year>2020</year>). <article-title>Deep virtual reality image quality assessment with human perception guider for omnidirectional image.</article-title> <source><italic>IEEE Trans. Circuits Syst. Video Technol.</italic></source> <volume>30</volume> <fpage>917</fpage>&#x2013;<lpage>928</lpage>. <pub-id pub-id-type="doi">10.1109/TCSVT.2019.2898732</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lasmar</surname> <given-names>N. E.</given-names></name> <name><surname>Stitou</surname> <given-names>Y.</given-names></name> <name><surname>Berthoumieu</surname> <given-names>Y.</given-names></name></person-group> (<year>2009</year>). &#x201C;<article-title>Multiscale skewed heavy tailed model for texture analysis</article-title>,&#x201D; in <source><italic>2009 16th IEEE International Conference on Image Processing (ICIP)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>2281</fpage>&#x2013;<lpage>2284</lpage>. <pub-id pub-id-type="doi">10.1109/ICIP.2009.5414404</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Xu</surname> <given-names>M.</given-names></name> <name><surname>Du</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>Z.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>Bridge the gap between VQA and human behavior on omnidirectional video</article-title>,&#x201D; in <source><italic>ACM Multimedia Conference Proceedings-American Computer Association</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>), <fpage>932</fpage>&#x2013;<lpage>940</lpage>. <pub-id pub-id-type="doi">10.1145/3240508.3240581</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Xu</surname> <given-names>M.</given-names></name> <name><surname>Jiang</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Tao</surname> <given-names>X.</given-names></name></person-group> (<year>2019</year>). <article-title>Viewport proposal CNN for 360<sup>&#x00B0;</sup> video quality assessment.</article-title> <source><italic>2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</italic></source>, <publisher-loc>New York</publisher-loc>: <publisher-name>IEEE</publisher-name> <fpage>10169</fpage>&#x2013;<lpage>10178</lpage></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Lin</surname> <given-names>W.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Fang</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>Blind image quality assessment using statistical structural and luminance features.</article-title> <source><italic>IEEE Trans. Multimed.</italic></source> <volume>18</volume> <fpage>2457</fpage>&#x2013;<lpage>2469</lpage>. <pub-id pub-id-type="doi">10.1109/TMM.2016.2601028</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ling</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>K.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>D.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name></person-group> (<year>2018</year>). <article-title>A saliency prediction model on 360-degree images using color dictionary based sparse representation.</article-title> <source><italic>Signal Process. Image Commun.</italic></source> <volume>69</volume> <fpage>60</fpage>&#x2013;<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1016/j.image.2018.03.007</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>B.</given-names></name> <name><surname>Huang</surname> <given-names>H.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2014</year>). <article-title>No-reference image quality assessment based on spatial and spectral entropies.</article-title> <source><italic>Signal Process. Image Commun.</italic></source> <volume>29</volume> <fpage>856</fpage>&#x2013;<lpage>863</lpage>. <pub-id pub-id-type="doi">10.1016/j.image.2014.06.006</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Yi</surname> <given-names>H.</given-names></name> <name><surname>Zhao</surname> <given-names>Q.</given-names></name> <name><surname>Hua</surname> <given-names>H.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2016</year>). <article-title>Blind image quality assessment by relative gradient statistics and adaboosting neural network.</article-title> <source><italic>Signal Process. Image Commun.</italic></source> <volume>40</volume> <fpage>1</fpage>&#x2013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1016/j.image.2015.10.005</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Yu</surname> <given-names>H.</given-names></name> <name><surname>Huang</surname> <given-names>B.</given-names></name> <name><surname>Yue</surname> <given-names>G.</given-names></name> <name><surname>Song</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <article-title>Blind omnidirectional image quality assessment based on structure and natural features.</article-title> <source><italic>IEEE Trans. Instrument. Meas.</italic></source> <volume>70</volume> <fpage>1</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1109/TIM.2021.3102691</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Min</surname> <given-names>X.</given-names></name> <name><surname>Zhai</surname> <given-names>G.</given-names></name> <name><surname>Gu</surname> <given-names>K.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). <article-title>Blind image quality estimation via distortion aggravation.</article-title> <source><italic>IEEE Trans. Broadcast.</italic></source> <volume>64</volume> <fpage>508</fpage>&#x2013;<lpage>517</lpage>. <pub-id pub-id-type="doi">10.1109/TBC.2018.2816783</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mittal</surname> <given-names>A.</given-names></name> <name><surname>Moorthy</surname> <given-names>A. K.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2012</year>). <article-title>No-reference image quality assessment in the spatial domain.</article-title> <source><italic>IEEE Trans. Image Process.</italic></source> <volume>21</volume> <fpage>4695</fpage>&#x2013;<lpage>4708</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2012.2214050</pub-id> <pub-id pub-id-type="pmid">22910118</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mittal</surname> <given-names>A.</given-names></name> <name><surname>Saad</surname> <given-names>M. A.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2016</year>). <article-title>A completely blind video integrity oracle.</article-title> <source><italic>IEEE Trans. Image Process</italic>.</source> <volume>25</volume> <fpage>289</fpage>&#x2013;<lpage>300</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2015.2502725</pub-id> <pub-id pub-id-type="pmid">26599970</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mittal</surname> <given-names>A.</given-names></name> <name><surname>Soundararajan</surname> <given-names>R.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2013</year>). <article-title>Making a &#x201C;completely blind&#x201D; image quality analyzer.</article-title> <source><italic>IEEE Signal Process. Lett.</italic></source> <volume>20</volume> <fpage>209</fpage>&#x2013;<lpage>212</lpage>. <pub-id pub-id-type="doi">10.1109/LSP.2012.2227726</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moorthy</surname> <given-names>K.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2011</year>). <article-title>Blind image quality assessment: From natural scene statistics to perceptual quality.</article-title> <source><italic>IEEE Trans. Image Process.</italic></source> <volume>20</volume> <fpage>3350</fpage>&#x2013;<lpage>3364</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2011.2147325</pub-id> <pub-id pub-id-type="pmid">21521667</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Petkov</surname> <given-names>N.</given-names></name> <name><surname>Subramanian</surname> <given-names>E.</given-names></name></person-group> (<year>2007</year>). <article-title>Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition.</article-title> <source><italic>Biol. Cybern.</italic></source> <volume>97</volume> <fpage>423</fpage>&#x2013;<lpage>439</lpage>. <pub-id pub-id-type="doi">10.1007/s00422-007-0182-0</pub-id> <pub-id pub-id-type="pmid">17960417</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rai</surname> <given-names>Y.</given-names></name> <name><surname>Callet</surname> <given-names>P. L.</given-names></name> <name><surname>Guillotel</surname> <given-names>P.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Which saliency weighting for omnidirectional image quality assessment</article-title>,&#x201D; in <source><italic>2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX)</italic></source>, (<publisher-loc>Erfurt</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/QoMEX.2017.7965659</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sheikh</surname> <given-names>H. R.</given-names></name> <name><surname>Sabir</surname> <given-names>M. F.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2006</year>). <article-title>A statistical evaluation of recent full reference image quality assessment algorithms.</article-title> <source><italic>IEEE Trans. Image Process.</italic></source> <volume>15</volume> <fpage>3440</fpage>&#x2013;<lpage>3451</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2006.881959</pub-id> <pub-id pub-id-type="pmid">17076403</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simoncelli</surname> <given-names>E. P.</given-names></name> <name><surname>Olshausen</surname> <given-names>B. A.</given-names></name></person-group> (<year>2001</year>). <article-title>Natural image statistics and neural representation.</article-title> <source><italic>Annu. Rev. Neurosci.</italic></source> <volume>24</volume> <fpage>1193</fpage>&#x2013;<lpage>1216</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.neuro.24.1.1193</pub-id> <pub-id pub-id-type="pmid">11520932</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sitzmann</surname> <given-names>V.</given-names></name> <name><surname>Serrano</surname> <given-names>A.</given-names></name> <name><surname>Pavel</surname> <given-names>A.</given-names></name> <name><surname>Agrawala</surname> <given-names>M.</given-names></name> <name><surname>Wetzstein</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>Saliency in VR: how do people explore virtual environments.</article-title> <source><italic>IEEE Trans. Vis. Comput. Graph.</italic></source> <volume>24</volume> <fpage>1633</fpage>&#x2013;<lpage>1642</lpage>. <pub-id pub-id-type="doi">10.1109/TVCG.2018.2793599</pub-id> <pub-id pub-id-type="pmid">29553930</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sui</surname> <given-names>X.</given-names></name> <name><surname>Ma</surname> <given-names>K.</given-names></name> <name><surname>Yao</surname> <given-names>Y.</given-names></name> <name><surname>Fang</surname> <given-names>Y.</given-names></name></person-group> (<year>2022</year>). <article-title>Perceptual quality assessment of omnidirectional images as moving camera videos.</article-title> <source><italic>IEEE Trans. Vis. Comput. Graph.</italic></source> <volume>28</volume> <fpage>3022</fpage>&#x2013;<lpage>3034</lpage>. <pub-id pub-id-type="doi">10.1109/TVCG.2021.3050888</pub-id> <pub-id pub-id-type="pmid">33434131</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>W.</given-names></name> <name><surname>Gu</surname> <given-names>K.</given-names></name> <name><surname>Ma</surname> <given-names>S.</given-names></name> <name><surname>Zhu</surname> <given-names>W.</given-names></name> <name><surname>Liu</surname> <given-names>N.</given-names></name> <name><surname>Zhai</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>A large-scale compressed 360-degree spherical image database: From subjective quality evaluation to objective model comparison</article-title>,&#x201D; in <source><italic>2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/MMSP.2018.8547102</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>W.</given-names></name> <name><surname>Min</surname> <given-names>X.</given-names></name> <name><surname>Zhai</surname> <given-names>G.</given-names></name> <name><surname>Gu</surname> <given-names>K.</given-names></name> <name><surname>Ma</surname> <given-names>S.</given-names></name></person-group> (<year>2019</year>). <article-title>MC360IQA: a multi-channel cnn for blind 360-degree image quality assessment.</article-title> <source><italic>IEEE J. Sel. Top. Signal Process.</italic></source> <volume>14</volume> <fpage>64</fpage>&#x2013;<lpage>77</lpage>. <pub-id pub-id-type="doi">10.1109/JSTSP.2019.2955024</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Lu</surname> <given-names>A.</given-names></name> <name><surname>Yu</surname> <given-names>L.</given-names></name></person-group> (<year>2017</year>). <article-title>Weighted-to-spherically-uniform quality evaluation for omnidirectional video.</article-title> <source><italic>IEEE Signal Process. Lett.</italic></source> <volume>24</volume> <fpage>1408</fpage>&#x2013;<lpage>1412</lpage>. <pub-id pub-id-type="doi">10.1109/LSP.2017.2720693</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tran</surname> <given-names>H. T. T.</given-names></name> <name><surname>Nguyen</surname> <given-names>D. V.</given-names></name> <name><surname>Ngoc</surname> <given-names>N. P.</given-names></name> <name><surname>Hoang</surname> <given-names>T. H.</given-names></name> <name><surname>Huong</surname> <given-names>T. T.</given-names></name> <name><surname>Thang</surname> <given-names>T. C.</given-names></name></person-group> (<year>2019</year>). <article-title>Impacts of retina-related zones on quality perception of omnidirectional image.</article-title> <source><italic>IEEE Access</italic></source> <volume>7</volume> <fpage>166997</fpage>&#x2013;<lpage>167009</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2953983</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tu</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>C. J.</given-names></name> <name><surname>Chen</surname> <given-names>L. H.</given-names></name> <name><surname>Birkbeck</surname> <given-names>N.</given-names></name> <name><surname>Adsumilli</surname> <given-names>B.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2020</year>). &#x201C;<article-title>A comparative evaluation of temporal pooling methods for blind video quality assessment</article-title>,&#x201D; in <source><italic>2020 IEEE International Conference on Image Processing (ICIP)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>141</fpage>&#x2013;<lpage>145</lpage>. <pub-id pub-id-type="doi">10.1109/ICIP40778.2020.9191169</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><collab>Video Quality Experts Group</collab> (<year>2003</year>). <source><italic>Final Report from the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment, Phase II.</italic></source> <publisher-loc>San Jose, CA</publisher-loc>: <publisher-name>Video Quality Experts Group</publisher-name>.</citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Simoncelli</surname> <given-names>E. P.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2003</year>). <article-title>Multiscale structural similarity for image quality assessment.</article-title> <source><italic>The Thrity-Seventh Asilomar Conference on Signals, Systems Computers</italic></source>, <volume>2</volume> <fpage>1398</fpage>&#x2013;<lpage>1402</lpage>. <pub-id pub-id-type="doi">10.1109/ACSSC.2003.1292216</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xia</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <article-title>Phase consistency guided full-reference panoramic image quality assessment algorithm.</article-title> <source><italic>J. Image Graph.</italic></source> <volume>26</volume> <fpage>1625</fpage>&#x2013;<lpage>1636</lpage>. <pub-id pub-id-type="doi">10.11834/JIG.200546</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Luo</surname> <given-names>Z.</given-names></name> <name><surname>Zhou</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Quality assessment of stereoscopic 360-degree images from multi-viewports</article-title>,&#x201D; in <source><italic>2019 Picture Coding Symposium (PCS)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>5</lpage>. <pub-id pub-id-type="doi">10.1109/PCS48520.2019.8954555</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Zhou</surname> <given-names>W.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name></person-group> (<year>2020</year>). <article-title>Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks.</article-title> <source><italic>IEEE Trans. Circuits Syst. Video Technol.</italic></source> <volume>31</volume> <fpage>1724</fpage>&#x2013;<lpage>1737</lpage>. <pub-id pub-id-type="doi">10.1109/TCSVT.2020.3015186</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Callet</surname> <given-names>P. L.</given-names></name></person-group> (<year>2020</year>). <article-title>State-of-the-art in 360<sup>&#x00B0;</sup> video/image processing: perception, assessment and compression.</article-title> <source><italic>IEEE J. Sel. Top. Signal Process.</italic></source> <volume>14</volume> <fpage>5</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1109/JSTSP.2020.2966864</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xue</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Mou</surname> <given-names>X.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name></person-group> (<year>2014</year>). <article-title>Gradient magnitude similarity deviation: a highly efficient perceptual image quality index.</article-title> <source><italic>IEEE Trans. Image Process.</italic></source> <volume>23</volume> <fpage>668</fpage>&#x2013;<lpage>695</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2013.2293423</pub-id> <pub-id pub-id-type="pmid">26270911</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yan</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Fu</surname> <given-names>X.</given-names></name></person-group> (<year>2019</year>). <article-title>No-reference quality assessment of contrast-distorted images using contrast enhancement.</article-title> <source><italic>arXiv</italic></source> [<comment>Preprint</comment>]. <pub-id pub-id-type="doi">10.48550/arXiv.1904.08879</pub-id> <pub-id pub-id-type="pmid">35895330</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>L.</given-names></name> <name><surname>Xu</surname> <given-names>M.</given-names></name> <name><surname>Xin</surname> <given-names>D.</given-names></name> <name><surname>Feng</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). &#x201C;<article-title>Spatial attention-based non-reference perceptual quality prediction network for omnidirectional images</article-title>,&#x201D; in <source><italic>2021 IEEE International Conference on Multimedia and Expo (ICME)</italic></source>, (<publisher-loc>Shenzhen</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/ICME51207.2021.9428390</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>M.</given-names></name> <name><surname>Lakshman</surname> <given-names>H.</given-names></name> <name><surname>Girod</surname> <given-names>B.</given-names></name></person-group> (<year>2015</year>). &#x201C;<article-title>A Framework to Evaluate Omnidirectional Video Coding Schemes</article-title>,&#x201D; in <source><italic>2015 IEEE International Symposium on Mixed and Augmented Reality</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>31</fpage>&#x2013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1109/ISMAR.2015.12</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zakharchenko</surname> <given-names>V.</given-names></name> <name><surname>Choi</surname> <given-names>K. P.</given-names></name> <name><surname>Alshina</surname> <given-names>E.</given-names></name> <name><surname>Park</surname> <given-names>J. H.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Omnidirectional Video Quality Metrics and Evaluation Process</article-title>,&#x201D; in <source><italic>2017 Data Compression Conference (DCC)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>472</fpage>&#x2013;<lpage>472</lpage>. <pub-id pub-id-type="doi">10.1109/DCC.2017.90</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zanca</surname> <given-names>D.</given-names></name> <name><surname>Gori</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Variational laws of visual attention for dynamic scenes,</article-title>&#x201D; in <source><italic>Advances in Neural Information Processing Systems 30.</italic></source> (<role>eds</role>), <person-group person-group-type="editor"><name><surname>Guyon</surname> <given-names>I.</given-names></name> <name><surname>Luxburg</surname> <given-names>U. V.</given-names></name> <name><surname>Bengio</surname> <given-names>S.</given-names></name> <name><surname>Wallach</surname> <given-names>H.</given-names></name> <name><surname>Fergus</surname> <given-names>R.</given-names></name> <name><surname>Vishwanathan</surname> <given-names>S.</given-names></name> <name><surname>Garnett</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Acm Digital Library</publisher-name>), <fpage>3826</fpage>&#x2013;<lpage>3835</lpage>. <pub-id pub-id-type="doi">10.5555/3294996.3295139</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zanca</surname> <given-names>D.</given-names></name> <name><surname>Melacci</surname> <given-names>S.</given-names></name> <name><surname>Gori</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>Gravitational laws of focus of attention.</article-title> <source><italic>IEEE Trans. Pattern Anal. Mach. Intel.</italic></source> <volume>42</volume> <fpage>2983</fpage>&#x2013;<lpage>2995</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2019.2920636</pub-id> <pub-id pub-id-type="pmid">31180885</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name> <name><surname>Mou</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>D.</given-names></name></person-group> (<year>2011</year>). <article-title>FSIM: a feature similarity index for image quality assessment.</article-title> <source><italic>IEEE Trans. Image Process.</italic></source> <volume>20</volume> <fpage>2378</fpage>&#x2013;<lpage>2386</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2011.2109730</pub-id> <pub-id pub-id-type="pmid">21292594</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>F.</given-names></name> <name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Yang</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Subjective panoramic video quality assessment database for coding applications.</article-title> <source><italic>IEEE Trans. Broadcast.</italic></source> <volume>64</volume> <fpage>461</fpage>&#x2013;<lpage>473</lpage>. <pub-id pub-id-type="doi">10.1109/TBC.2018.2811627</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zheng</surname> <given-names>X.</given-names></name> <name><surname>Jiang</surname> <given-names>G.</given-names></name> <name><surname>Yu</surname> <given-names>M.</given-names></name> <name><surname>Jiang</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Segmented spherical projection-based blind omnidirectional image quality assessment.</article-title> <source><italic>IEEE Access</italic></source> <volume>8</volume> <fpage>31647</fpage>&#x2013;<lpage>31659</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.2972158</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>W.</given-names></name> <name><surname>Bovik</surname> <given-names>A. C.</given-names></name> <name><surname>Sheikh</surname> <given-names>H. R.</given-names></name> <name><surname>Simoncelli</surname> <given-names>E. P.</given-names></name></person-group> (<year>2004</year>). <article-title>Image quality assessment: from error visibility to structural similarity.</article-title> <source><italic>IEEE Trans. Image Process.</italic></source> <volume>13</volume> <fpage>600</fpage>&#x2013;<lpage>612</lpage>. <pub-id pub-id-type="doi">10.1109/TIP.2003.819861</pub-id> <pub-id pub-id-type="pmid">15376593</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>W.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Jiang</surname> <given-names>Q.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name></person-group> (<year>2022</year>). <article-title>No-reference quality assessment for 360-degree images by analysis of multifrequency information and local-global naturalness.</article-title> <source><italic>IEEE Trans. Circuits Syst. Video Technol.</italic></source> <volume>32</volume> <fpage>1778</fpage>&#x2013;<lpage>1791</lpage>. <pub-id pub-id-type="doi">10.1109/TCSVT.2021.3081182</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>L.</given-names></name> <name><surname>Gu</surname> <given-names>K.</given-names></name> <name><surname>Fang</surname> <given-names>Y.</given-names></name></person-group> (<year>2022</year>). <article-title>Omnidirectional image quality assessment by distortion discrimination assisted multi-stream network.</article-title> <source><italic>IEEE Trans. Circuits Syst. Video Technol.</italic></source> <volume>32</volume> <fpage>1767</fpage>&#x2013;<lpage>1777</lpage>.</citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>Y.</given-names></name> <name><surname>Yu</surname> <given-names>M.</given-names></name> <name><surname>Ma</surname> <given-names>H.</given-names></name> <name><surname>Shao</surname> <given-names>H.</given-names></name> <name><surname>Jiang</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>Weighted-to-Spherically-Uniform SSIM Objective Quality Evaluation for Panoramic Video</article-title>,&#x201D; in <source><italic>2018 14th IEEE International Conference on Signal Processing (ICSP)</italic></source>, (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>54</fpage>&#x2013;<lpage>57</lpage>. <pub-id pub-id-type="doi">10.1109/ICSP.2018.8652269</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>W.</given-names></name> <name><surname>Zhai</surname> <given-names>G.</given-names></name> <name><surname>Min</surname> <given-names>X.</given-names></name> <name><surname>Hu</surname> <given-names>M.</given-names></name> <name><surname>Liu</surname> <given-names>J.</given-names></name> <name><surname>Guo</surname> <given-names>G.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Multi-channel decomposition in tandem with free-energy principle for reduced-reference image quality assessment.</article-title> <source><italic>IEEE Trans. Multimedia.</italic></source> <volume>21</volume> <fpage>2334</fpage>&#x2013;<lpage>2346</lpage>. <pub-id pub-id-type="doi">10.1109/TMM.2019.2902484</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Zhai</surname> <given-names>G.</given-names></name> <name><surname>Min</surname> <given-names>X.</given-names></name> <name><surname>Zhou</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>The prediction of saliency map for head and eye movements in 360 degree images.</article-title> <source><italic>IEEE Trans Multimedia.</italic></source> <volume>22</volume> <fpage>2331</fpage>&#x2013;<lpage>2344</lpage>. <pub-id pub-id-type="doi">10.1109/TMM.2019.2957986</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zou</surname> <given-names>W.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name> <name><surname>Yang</surname> <given-names>F.</given-names></name></person-group> (<year>2021</year>). <article-title>Modeling the perceptual quality for viewport-adaptive omnidirectional video streaming considering dynamic quality boundary artifact.</article-title> <source><italic>IEEE Trans. Circuits Syst. Video Technol.</italic></source> <volume>31</volume> <fpage>4241</fpage>&#x2013;<lpage>4254</lpage>. <pub-id pub-id-type="doi">10.1109/TCSVT.2021.3050157</pub-id></citation></ref>
</ref-list>
</back>
</article>