Lagrangian Motion Magnification with Double Sparse Optical Flow Decomposition

Microexpressions are fast and spatially small facial expressions that are difficult to detect. Therefore motion magnification techniques, which aim at amplifying and hence revealing subtle motion in videos, appear useful for handling such expressions. There are basically two main approaches, namely via Eulerian or Lagrangian techniques. While the first one magnifies motion implicitly by operating directly on image pixels, the Lagrangian approach uses optical flow (OF) techniques to extract and magnify pixel trajectories. In this paper, we propose a novel approach for local Lagrangian motion magnification of facial micro-motions. Our contribution is three-fold: first, we fine tune the recurrent all-pairs field transforms (RAFT) for OFs deep learning approach for faces by adding ground truth obtained from the variational dense inverse search (DIS) for OF algorithm applied to the CASME II video set of facial micro expressions. This enables us to produce OFs of facial videos in an efficient and sufficiently accurate way. Second, since facial micro-motions are both local in space and time, we propose to approximate the OF field by sparse components both in space and time leading to a double sparse decomposition. Third, we use this decomposition to magnify micro-motions in specific areas of the face, where we introduce a new forward warping strategy using a triangular splitting of the image grid and barycentric interpolation of the RGB vectors at the corners of the transformed triangles. We demonstrate the feasibility of our approach by various examples.

Microexpressions are fast and spatially small facial expressions that are difficult to detect.Therefore motion magnification techniques, which aim at amplifying and hence revealing subtle motion in videos, appear useful for handling such expressions.There are basically two main approaches, namely via Eulerian or Lagrangian techniques.While the first one magnifies motion implicitly by operating directly on image pixels, the Lagrangian approach uses optical flow (OF) techniques to extract and magnify pixel trajectories.In this paper, we propose a novel approach for local Lagrangian motion magnification of facial micro-motions.Our contribution is three-fold: first, we fine tune the recurrent all-pairs field transforms (RAFT) for OFs deep learning approach for faces by adding ground truth obtained from the variational dense inverse search (DIS) for OF algorithm applied to the CASME II video set of facial micro expressions.This enables us to produce OFs of facial videos in an efficient and sufficiently accurate way.Second, since facial micro-motions are both local in space and time, we propose to approximate the OF field by sparse components both in space and time leading to a double sparse decomposition.Third, we use this decomposition to magnify micro-motions in specific areas of the face, where we introduce a new forward warping strategy using a triangular splitting of the image grid and barycentric interpolation of the RGB vectors at the corners of the transformed triangles.We demonstrate the feasibility of our approach by various examples.

Introduction
Motion magnification describes a wide variety of algorithms to magnify and therefore visualize subtle, imperceptibly small motions in videos.In analogy to fluid dynamics, motion magnification techniques can be grouped into Eulerian and Lagrangian approaches.
Eulerian methods have first been introduced by Wu et al. [49].These methods amplify motion or color changes by modifying the time-dependent color variation in the video using temporal or spatial filtering, e.g., by dealing with image pyramids of different resolutions [20,47,49,54].The term Eulerian emphasizes that information is processed on the fixed grid of pixels while Lagrangian methods manipulate point trajectories.Eulerian methods found applications for quantitative motion assessment as well.For example, Sarrafi et al. [41] and Eitner et al. [19] used point tracking on motion magnified recordings for the detection and modal parameter estimation of vibrations in different mechanical components.Fei et al. [21] applied motion magnification to the detection of AI-generated videos and Alinovi et al. [1] to respiratory rate monitoring.While Eulerian methods are often very easy and fast to evaluate and can therefore be readily employed in real-time applications, the type and degree of motions that can be magnified is limited.In particular, phase-based methods depend on local spatial frequencies which can lead to blurring around sharp images edges.Intensity-based Eulerian methods magnify motion by linearization, which is known to be limited to small movements.
Lagrangian approaches rely on the explicitly computed motion field of objects within the video which is in general done by means of OF estimation [4,34,23].The OF fields enable not only to selectively and adaptively magnify types of motions that vary spatially as, e.g. in [23,34], but also to attenuate them separately or completely remove all movement, see [4,25].An advantage of Lagrangian methods is that the estimated motion information can not only be used for the motion magnification task, but allows subsequent quantitative assessment [26,43].
The advancements in OF estimation in the past decades resulted in high-accuracy methods with different model invariants as well as robustness and high computing speed.Variational methods optimize functions with explicitly modeled data terms and regularizers.They still produce state-of-the-art accuracy in many applications, especially in combination with advanced smoothness terms such as [26] and methods for the initialization of very large displacements: Chen et al. [16] lead (as of March 2022) the Middlebury OF benchmark [5] with respect to average endpoint error and average angular error.They use similarity transformations for a segmented flow field as initialization of large motion.In a second step, the variational method of Sun et al. [45] is used for subpixel refinement.The widely available dense inverse search (DIS) OF method [32] initializes the OF field with patch correspondences and uses a variational method similar to Sun et al. and Brox et al. [11] for high resolution refinement.The availability of large scale synthetic datasets such as MPI Sintel dataset [36] paved the way for deep learning-based methods such as RAFT [46] which computes a low resolution flow field from correlation volumes which is subsequently upsampled via learned upsampling convolutions.Variants of the method are still among high performing methods in the competitive Sintel benchmark1 [13].Its fast convergence [44] makes it in particular interesting for refinement tasks.
Non-contact monitoring of the affective and neurocognitive state is an emerging topic with many new applications in different areas of life sciences and engineering.Especially, estimation of the psycho-physiological state can find applications in healthcare settings and even neuro-ergonomics in human machine interaction.In that context, facial microexpressions promise to allow an assessment of emotions from faces.In contrast to normal facial expressions, microexpressions can even occur when the person tries to conceal the emotion [52].
However, microexpressions are temporally and spatially very small movements and, therefore, difficult to observe.Recent work shows that small movements, which last less than 500 ms can be considered as microexpressions [52].Besides appearance-based features that analyse the skin texture of faces, OF field-based features have long been used for the detection and analysis of microexpressions [29,50,39,53].Apart from the small, microexpression induced movements, appearance-based methods use face appearance features that describe skin texture of faces to capture changes in shading and texture for microexpression detection, see [29].
On a more general scope, micromovements of the face and head can contain other important physiological and psychophysiological information: the movement of the head can contain cues on heart rate [42], movement of the mouth can be applied for audiovisual speech recognition in noisy environments [38], and micromovements of the ear can contain cues on audio stimuli [43].While many of the analysis methods fall back on OF methods for analysis, there are not many annotated datasets for specific, local movements.The selective detection and magnification of micromovements of the face is a relatively untapped area of research with the potential to guide the annotation of such datasets and visualize the isolated movements that can be exploited for computer vision algorithms.
However, such underlying emotional implications will be not in the focus of this paper.Instead, we will concentrate on the relatively untapped area of detecting and selectively magnifying micromovements.This exceeds the field of microexpressions and goes more in the direction of realistic magnifications in computer graphics.
In this paper, we rely on Lagrangian motion magnification, where our approach is based on three pillars.First, we fine tune the RAFT deep learning method [46] using variational approaches [11,32] on a facial data set to create new ground truth data to fine tune the network on recordings of faces with small displacements.Next, we employ ideas of sparse matrix factorization from [35], to decompose the OF field into meaningful sparse components.Variational approaches with sparsity promoting regularization terms are very common in inverse problems in imaging, see also [8,14,15,27] for other sparsity based approaches than just sparse matrix factorization.Here we address the sparsity both in space and time leading to a double sparse representation.Although there exists a broad literature on sparse decomposition, also in connection with principle component analysis, see e.g.[8] and the references therein, to the best of our knowledge, such decomposition was not applied for motion magnification so far.Having the above decomposition at hand, we are finally able to magnify the motion in respective areas in an unsupervised way.For this, we use a sophisticated forward warping method.Image warping methods, usually simpler backward warping approaches, are used in the context of OF estimation, compression or image registration [11,48].We will explain forward and backward warping to illustrate why the first one is better suited for our purposes.
The outline of this paper is as follows: in Section 2, we recall the RAFT approach for computing OF fields from deep learning and describe our fine-tuning based on a facial dataset and a variational inspired approach.Then, in Section 3, we show how OF fields from micro-motions can be decomposed sparsely in space and time.Having such a decomposition, we describe how to magnify the motion in the respective areas in Section 4. In particular, we have to use a specially designed method for forward warping.Experiments of our method are given in Section 5. Conclusions and directions of further research are addressed in Section 6.

Optical Flow Detection in Facial Micro-Motions
In this section, we describe our path for OF detection.Then, in the next sections, the motion field will be decomposed and partially magnified.
OF computation by a variational model.Our Lagrangian motion magnification approach uses the rather accurate variational detection of the OF between image frames in a video.This is indeed highly nontrivial, since facial micro-motions are in general highly local both in space and time and are therefore often imperceptible by a human observer.OF models for videos are usually based on a gray-value constancy assumption, meaning that in a continuous model each pixel keeps its value when moving over the time, i.e., , 2 leads at time t 0 to the following underdetermined system for computing the OF vectors (v 1 , v 2 ): Clearly, digital video sequences are discrete both in space and time.Given we are interested in the OF fields between consecutive video frames Then the derivatives in (1) are replaced by finite differences and the approximation ( 1) is used to form the data term D(f, v) within a variational model Here R(v) is a regularizing term or prior of the OF field which is necessary to make the underdetermined problem well-posed.Different data and regularization terms were used in the literature.Starting with the Horn-Schunck model [28], there exist meanwhile many sophisticated variational OF models for certain purposes.For an overview, we refer to [7].
OF detection by RAFT.While recently introduced deep-learning based OF techniques can outperform variational OF methods in terms of accuracy on many modern benchmarks, their performance depends not only on architectural innovations but also on the quality of the training data and training strategies.In this paper, we rely on the RAFT approach from [46] which demonstrated state-of-the-art performance.Training and inference of the original RAFT is briefly described in the next remark.
For inference, the authors use the output flow field after a fixed number R of such iterations.For the r-th iterates, learned upsampling convolutions are applied to bring the computed flow back to the original image resolution, i.e. ṽr → v r .Then, for training this model, the authors use the loss function where v (gt) denotes the ground truth and α = 0.8 is chosen.
RAFT was improved with optimized training strategies, datasets and augmentations in various works.For a comparison of different methods see [44].More precisely, the RAFT training schedule combines now different datasets in multiple training stages for pre-training and refinement as follows [46]: RAFT is pre-trained on FlyingChairs [17] for 100k iterations, which is followed by 100k iterations on FlyingThings3D [5].Further, RAFT is fine tuned on 100k iterations on Sintel [36] (RAFT-Sintel), KITTI-2015 [37] and HD1k [31], which is finally followed by 50k fine tuning on KITTI-2015 (RAFT-KITTI).
Fine tuning of RAFT with data produced by variational OF.We fine tune RAFT with additional training data obtained via variational OF methods.To this end, we replace the fine tuning on KITTI-2015 with a fine tuning on facial microexpression videos derived from the CASME II [51] and SMIC datasets [33].First, we construct a microexpression dataset (ME) by annotating SMIC and CASME II with the dense inverse search optimization (DISO) approach from [32] implemented in the OpenCV library [10].Note that DISO uses a variational OF technique close to [11,45].Second, we enrich the ME dataset as follows: we use the already estimated OF v(t, •) and apply it to the first frame f (1, •) of each sequence in SMIC and CASME II to obtain new sequences The reason for including both sequences f and f in ME is that i) for f , an accurate ground truth displacement field exists, but it does not contain the changes in face appearance from the original video, because the color information is just propagated from the first frame of each sequence; ii) on the other hand, f contains changes in face appearance from the original video, but there is only a DISO estimation of the OF ground truth.
In summary, with this strategy, we ensure that the dataset contains accurate image and flow pairs with f as well as the original image sequence with dynamic face appearance information in f for which we have OF estimates.Including the dynamic face appearance information is particularly important in the context of microexpressions [29].

Motion Decomposition
Once the OF field is known, we want to decompose it into sparse components in space and time to detect the local facial regions of interest.Sparsity driven decomposition methods have received a lot of interest in recent years and there is an overwhelming amount of literature on the topic, see [35] and the references therein.Our approach is based on an appropriate application and modification of a method in [35] for our setting.Facial micro-motions are sparse with respect to space and time.Let n := n 1 n 2 be the number of image grid points.Therefore we aim to decompose the flow field (2) into K ≪ min{T, n} components G k and d k , so that .
The right-hand side separates space and time variables.On the one hand, this decomposition can be seen for each fixed spatial point evolving over the time t = 1, . . ., T as and we search for a dictionary or (non-orthogonal) principal components consisting of the columns of D such that the sample vectors (v(1, •) T , . . ., v(T, •) T ) ∈ R 2T at each spatial position can be sparsely represented with respect to this basis.On the other hand, we can consider at each fixed time all spatial points and the corresponding columnwise reshaped vectors in R n , i.e., and find a dictionary or (non-orthogonal) principal components consisting of the column of G such that the samples of vectors at each time can be sparsely represented with respect to this basis.In summary, we are looking for a doubly sparse model with respect to space and time.The sparse spatial components G k , k = 1, . . ., K will be later used to magnify the motion in special regions, while the sparse time components show the duration at which the magnification should appear.Using the notation in (3), we propose to find such double sparse decomposition by solving for appropriately chosen α, β > 0 the minimization problem with sparsity contraints ∥d k ∥ 2,1 ≤ β, k = 1, . . ., K, where By reshaping the whole velocity field v(t, x 1 , x 2 ) t,x 1 ,x 2 into a matrix V ∈ R 2T,n , this problem can be written in the compact form where ∥ • ∥ F denotes the Frobenius norm of a matrix and Remark 2. The original sparse dictionary decomposition model in [35] considers -in our notation -the minimization problem which enforces only the sparsity of the spatial decomposition in G.In contrast, we are also interested in the (grouped) sparsity of the d k in time.We will compare this model with (4) in our numerical examples.
Problem ( 5) is convex with respect to each component D and G and the minimization can be done by alternating with respect to these components.1.For fixed D, the minimization problem can be separately solved for each (x 1 , x 2 ), arg min by several approaches as operator splittings using soft thresholding, see [6,12] or the LARS method [40,18].In this paper, we applied the later one.a i,j b i,j , so that we can rewrite the data term in (5) as ).
Then we can solve the equivalent problem This is done iteratively for each column d k , k = 1, . . ., K of D while keeping the other columns fixed.Taking the symmetry of A = (a 1 | . . .|a K ) = a(j, k) K j,k=1 into account, we get for the minimizer of F with respect to the k-th column of D that The algorithm performs one step of the above fixed point iteration to get a new column vector and projects this vector onto the (2, 1)-ball given by the kth constraint in (7).This projection Π can be done by the so-called grouped shrinkage which reads for Finally, the above steps are not performed for all spatial points at the same time, but the sum in ( 4) is updated point by point and the dictionary D from the previous step is used for a "warm start" in the next step.This kind of algorithms is known as block-coordinate descent algorithm with warm restarts see [9].Convergence of the algorithm was shown under certain assumptions in [35] which can be also applied for our modified setting.All steps are summarized in Algorithm 3.1.

Algorithm 3.1 Double Sparse OF Decomposition
Compute G r using LARS: end for end for

Motion Magnification
The detection of microexpressions and facial micro movements plays an increasingly important role for various applications from computer vision.Microexpressions and micro movements are often unobservable by the untrained and naked eye.This difficult task can be necessary in the context of the annotation of video datasets (e.g.compare [51]).The visualization of microexpressions and other micro movements in the face via motion magnification can potentially mitigate the difficulty of manual microexpression detection to enable manual annotation and microexpression analysis without professional or specific training.Furthermore, unsupervised selection and magnification of different motion components can help with exploratory investigation of facial micro movements for the development of new methods for remote assessment.
Having decomposed the OF field into meaningful spatial and temporal components, we show in this section how these components can be used to enhance the OF in the regions of interest and how this can be visualized in a new video sequence using a sophisticated warping method.Image warping is used in the context of OF estimation, compression or image registration (usually backwards warping) or for applications from graphics such as texture mapping or novel view synthesis, among the rich literature, see, e.g.[11,48].

Forward Warping
For a given displacement v between two consecutive image frames f 1 and f 2 (so that we can skip the time variable), there are two prevalent methods for warping, forward warping and backward warping: i) Backward warping f 1 to yield f bw : Usually backward warping is preferred as it directly computes the warped frame.However, we will see that for warping facial motions, backward warping is far inferior to forward warping.
ii) Forward warping f 1 to yield f fw : The difference between both warping methods is illustrated in Figure 1.This figure shows the results of forward warping versus backward warping on a displacement field which corresponds to an magnified blinking motion as well as a magnification of the raising of an eyebrow.In this case, the motion field is localized on the eyebrow and the eyelid.Therefore, when doing backward warping, the colors stay exactly the same everywhere the displacement field is zero.In this illustration, this yields a contraction of the eyebrow and a contraction of the eye itself.This problem does not occur when doing forward warping.However, in computer graphics, forward warping is significantly harder to do.This is because for the warped frame, we only have color information at the points ), which generally do not correspond to grid coordinates.To overcome this problem, we use the following scheme: first, we consider a triangulation of the frame f 1 , which we aim to warp.The structure of this triangulation is shown in Figure 2 left.Each triangle connects three adjacent pixels and the vertices are equipped with the color values of the corresponding pixel.Then the triangles are displaced according to the vector field (v 1 , v 2 ) yielding a displaced triangulation depicted in Figure 2 right.Finally, each of the displaced triangles is rasterized.More precisely, we take all pixels whose midpoints are contained in the displaced triangle and assign them a value interpolating the triangle vertex color values using barycentric coordinates.
Finally, it can be possible that multiple displaced triangles overlap.To resolve this ambiguity, the proposed algorithm allows for a depth map as input which indicates which triangles should be drawn above others.While for this application, an elaborate depth map based on facial features could be devised, we opted to set the depth map such that stronger motions always "overlap" weaker motions.
To efficiently compute the triangulation, displacement and subsequent rasterization of the input image, we employ a shader pipeline written using the OpenGL programming language.In short, this means that we make use of GPU native routines which are extremely efficient at handling such tasks and enable us to even employ this forward warping technique in real-time applications.

Motion Magnification in Selected Regions
With all this at hand, we are now able to magnify selected movements in the input video.To this end, we still need to decide which components from the decomposition in Section 3 can be considered to be micro-movements.In our case, this is done by simply evaluating for each k ∈ {1, . . ., K}, if the values of d k lie in a specified range.However, as the d k are all normalized due to our optimization procedure, we need to adjust them to better resemble the velocity magnitudes from the OF.In short, we do the following selection: 1. Compute the maximum of each domain G k for all k ∈ {1, . . ., K}, 2. Compute the maximal normalized motion magnitude for each component 3. For two pre-defined thresholds 0 < λ 1 < λ 2 , consider the k-th component k ∈ {1, . . ., K} to belong to a micro-movement if Let I ⊆ {1, . . ., K} be the set of components deemed to correspond to micro-movements by the above selection.Then, we magnify their motion as follows.For each, t ∈ {1, . . ., T − 1}, we assume that v(t, •) warps the frame f 1 into f 2 .Therefore, magnification of the micro-movements by the magnification factor µ > 0, we aim to warp f 1 by the flow field

Experiments
We illustrate our proposed method on an example video from CASME II [51] and a video from the medical dataset for the remote detection of respiratory infections (see Figure 9).All our experiments were performed on consumer-grade computers.RAFT training has been performed on a multi-GPU machine with Intel Xeon Gold processors and two Nvidia Tesla V100 GPUs.).Viral and bacterial, respiratory pathogens were tested using multiplex polymerase chain reaction (Multiplex PCR) assays.The camera array consisted of stereo RGB, near infrared (NIR) and thermal cameras as well as microphones (compare [22]).A duration between 5-10 minutes was recorded where participants answered questions in front of the camera.Core temperature was recorded with infrared ear thermometers.The videos in this dataset can contain microexpressions as well as other types of micro movements which are potentially related to symptoms of respiratory diseases.As benchmark sequence, we selected a video which contains a large movement (eye blinking) as well as a small movement around the mouth.

Motion estimation
We compare our method against RAFT-Sintel and RAFT-KITTI as well as against FlowFormer2 after the same training schedules [30].RAFT-Sintel performed better than RAFT-KITTI which is why we choose RAFT-Sintel as baseline model for refinement.Due to the lack of publicly available datasets of facial microexpressions with ground truth OF annotation, we compared the methods by estimating a motion compensation error on a CASME II sequence from participant number 14 3 .We used the mean squared error (MSE) of the gray values as well as on the Laplacian, see Figure 3.While the performance of DISO was better on the test sequence than all other methods, our RAFT refinement with the ME dataset could generally improve performance on facial microexpression videos.

Motion magnification
For the motion magnification experiments, we use a video of participant number 14 from CASME II as well as a recording from the VI-SCREEN dataset.Participant number 14 has been excluded from the OF training dataset.The videos contain microexpressions and micromovements.We perform the analysis outlined in Section 3 using a sparsity regularization parameters α = 0.1, β = 4 and K = 9 components.A motion component k ∈ {1, . . ., 9} is selected to be a microexpression if λ 1 = 0.1 ≤ c k ≤ λ 2 = 0.3 holds true.We magnify these selected microexpressions using the magnification factor µ = 4.
Figure 4 shows an example application of our method with the sparse decomposition model (5) with ∥•∥ 2,1 constraint.Our experiments show that our method is able to successfully decompose the OF field into reasonable components.Using this decomposition, we are able to perform facial motion magnification selectively in an unsupervised manner.For the case shown in Figure 4 and Figure 7, the threshold corresponds only to the components connected to the micromovements around the mouth.The blinking motion is unchanged.Changing the thresholds also lets us select different motions.In Figure 6, we illustrate the results of our algorithm when choosing instead the thresholds λ 1 = 0.3 and λ 2 = ∞.In this case, only the blinking motion is magnified.
For comparison, Figure 7 depicts the outcome of a similar experiment using just the ∥•∥ 2 -constraint, see Remark 2. For this comparison, we use the same parameters α = 0.1, K = 9 and µ = 4. Introduction of the sparsity-promoting constraint leads to a clearer separation of motion components in the image as well as between temporally disconnected motions.We can observe this in the visualization of the components in Figure 5, where the ∥•∥ 2,1 -constraint produces smaller components with clearer edges.Figure 8 shows how each component from the ∥•∥ 2,1 -constraint has only limited, temporal extend, while with the ∥•∥ 2 -constraint a few components capture a wide range of motion events over the complete video.

Conclusions
We provided a non-supervised method for magnifying micro-motions in facial videos based on the Lagrangian approach with OF forward warping magnification.The actual regions for OF magnification were found by minimizing a double sparse decomposition model for OF and a thresholding procedure to detect the relevant regions.Our thresholding method for differentiating microexpressions from other movements is open to further improvement.For example, the temporal extent of each motion could be taken into consideration.However, in case of the CASME II dataset, where eye movements are the only notable facial motions beside microexpressions, a simple thresholding was already sufficient.Furthermore, significant head-movements might disrupt the presented decomposition into time-independent regions of motion in the face.In these cases, a facial alignment pre-processing step might be necessary.This could be realized similar to related work [24,23,20] or by attenuating large and global motion with our method which might become more important with out-of-lab data with difficult motion.Also, our method has multiple parameters that need to be manually chosen.As a next step, statistics from real world microexpression datasets could be included to learn suitable parameters for different motion types.The presented methods have the potential to facilitate the annotation of datasets with such face recordings for microexpression detection.Subtle expression changes, where the detection would otherwise require expert training, could be magnified for untrained individuals, who could then annotate such data.Our preliminary results with a real world, medical recording demonstrated how different parameters would allow for the visualization of different motion components in videos.
2 ) in the bottom right.In this example, two components, shown by thicker lines in the plot, were selected by thresholding to be micromovements.The frames are warped accordingly to yield the video sequence in the top right.The pixels from the red and blue lines are plotted over time in the corresponding red and blue boxes.This shows that the motion around the mouth is magnified while leaving the blinking motion unchanged.Up to our knowledge, there exist no facial microexpression or micromovement datasets with ground truth OF.OF methods have been analysed with respect to their predictive power of microexpressions [3] which does not allow conclusions on the overall quality of the OF predictions.Regarding the refinement of RAFT, future evaluations have to show the performance gain compared to OF methods tailored to faces with different training or transfer learning strategies such as [2] for which the models are currently not publicly available.Given the small movements in the SMIC and CASME II recordings as well as constant backgrounds, further data augmentations that introduce large displacements and different background textures could further improve the method.

2 .
For fixed G, we first note that for the trace tr of a matrix and A, B ∈ R M,N it holds tr(A T B) = tr(BA T ) = ⟨A, B⟩, where ⟨A, B⟩ =

Figure 1 :
Figure 1: An illustration showing the results of forward versus backward warping with an example displacement field.

Figure 2 :
Figure 2: Illustration of the forward warping process.The image is first triangulated as shown on the left.The triangles are displaced and the warped image is then rasterized from these distorted triangles, see next figure.

Figure 4 :
Figure 4: Illustration of our motion magnification procedure with the ∥•∥ 2,1 -constraint.The algorithm starts with a sequence of 80 video frames of size 640×480 illustrated in the top left.The OF field computed for each time step is illustrated in the lower left.Then we use our method to decompose this OF field into G 1 , . . ., G 9 shown in the bottom middle as well as the (d 1 1 , d 1 2 ), . . ., (d 9 1 , d92 ) in the bottom right.In this example, two components, shown by thicker lines in the plot, were selected by thresholding to be micromovements.The frames are warped accordingly to yield the video sequence in the top right.The pixels from the red and blue lines are plotted over time in the corresponding red and blue boxes.This shows that the motion around the mouth is magnified while leaving the blinking motion unchanged.

Figure 6 :
Figure 6: Illustration of the same video without (left) and with (right) motion magnification.The chosen thresholds (λ 1 = 0.3, λ 2 = ∞) only magnify the largest motions : This results in magnification of the blinking motion while leaving other facial movements unchanged.

Figure 8 :
Figure 8: On the left-hand side, ∥d k (t)∥ 2 is shown over time for all nine components,where we optimized problem(6).On the right-hand side, we see the curves of ∥d k (t)∥ 2,1 after optimizing problem (5).It can be observed that the later one promotes the separation of the flow field in temporally distinct motions, while in the first one a few components capture a wide range of motion events.
Figure 3: MSE for a sequence on a microexpression sequence after motion compensation.The values are normalized with respect to the MSE of the raw recording.While DISO performs better than the refined RAFT, our RAFT refinement on the ME dataset improved the performance over other state-of-the-art models trained on Sintel and KITTI-2015 in our examples.or expected microexpressions, the latter based on self-reports of the participants.Additionally, we use a video from a dataset for the remote assessment of respiratory diseases (VI-SCREEN).For this dataset, participants were recruited at multiple recording sites, including Saarland univerisity clinic (UKS) Children's Hospital and at the UKS central emergency admission, Germany.Recruitment happened from February 2022 and is ongoing (July 2023).Up until April 2023, this study includes 106 participants (median age 26 years, interquartile range (IQR) 24 years).Informed consent was obtained, and the study was approved by the Ethics committee of the Ärztekammer des Saarlandes (02/22