AlphaTracker: a multi-animal tracking and behavioral analysis tool

Computer vision has emerged as a powerful tool to elevate behavioral research. This protocol describes a computer vision machine learning pipeline called AlphaTracker, which has minimal hardware requirements and produces reliable tracking of multiple unmarked animals, as well as behavioral clustering. AlphaTracker pairs a top-down pose-estimation software combined with unsupervised clustering to facilitate behavioral motif discovery that will accelerate behavioral research. All steps of the protocol are provided as open-source software with graphic user interfaces or implementable with command-line prompts. Users with a graphical processing unit (GPU) can model and analyze animal behaviors of interest in less than a day. AlphaTracker greatly facilitates the analysis of the mechanism of individual/social behavior and group dynamics.


. Introduction . . Development of the protocol
The study of animal behavior can be dated back to the nineteenth century when most researchers focused on observing natural behaviors (Darwin, 1872;Tinbergen, 1963).While reductionist behavioral paradigms are still widely used to study specific aspects of behavior in a controlled manner, allowing animals to freely explore spaces and to exhibit complex behaviors greatly expands our understanding of system neuroscience (Kabra et al., 2013;Wiltschko et al., 2015;Mathis et al., 2018;Pereira et al., 2020a;Padilla-Coreano et al., 2022).Ethological behavioral research challenges our ability to quantify behavior and draw statistically meaningful conclusions with traditional tracking methods and manual annotations (Berman, 2018).Social behavior is even more challenging due to the difficulty for a human to observe multiple animals simultaneously.Traditional animal tracking software suffers from noisy prediction of animal poses and confusion between multiple, seemingly identical animals.In addition, there remains a large gap between tracking animal keypoints and the quantification and understanding of observed behaviors.
. Materials and methods . .Materials . . .Software . . .Hardware • Computer: Windows and Linux all can be used for labeling data, performing behavioral clustering, and evaluating trained tracking models.For training the tracking model, desktops/cloud servers with GPU access are required.We recommend > = 32 GB of RAM on the system for CPU analysis.
• GPU: GPU is required for training the tracking model.We recommend having a GPU with > = 8 GB memory, such as the NVIDIA GeForce 1,080 or 2,080.Alternatively, our toolbox can also be used on cloud computing services with GPU support (e.g., Google Cloud/Amazon Web Services).• Camera: Our toolbox supports both color and grayscale videos, and even infrared light.Though we demonstrate decent performance on images with low resolution, we recommend cameras with a resolution of > =1,080 p for the best performance.We used Logitech C930e cameras for data acquisition in this paper.

. . . Equipment setup
Users can either install our toolbox locally or on Google Colab.We recommend users with GPU access run AlphaTracker locally (local desktop or lab server) to avoid reinstallation of dependency upon restarting Colab.
For GPU support, NVIDIA drivers should be installed (see the previous subsection for details).Windows users double-click the downloaded .exefile to install it.Linux users first navigate to the Frontiers in Behavioral Neuroscience frontiersin.orgfolder with the ".run" file after downloading, open the terminal and type the following command in the terminal: 1. cd path/of/driver 2. chmod +x ./Nvidia-driver-name.run 3. sudo ./Nvidia-driver-name.run We recommend both Linux and Windows users install Anaconda for managing packages and environments because Anaconda supports multiple environments with different versions of Python and supporting libraries.This avoids version mismatch with existing packages and libraries of the operating system.Follow the instructions at https://docs.anaconda.com/anaconda/install/to install Anaconda.
Windows users should install Git Bash by downloading from https://git-scm.com/downloads.Open the ".exe" file named "Git Bash" and run all commands within Git Bash.

. . Algorithm
The tracking component of the pipeline (AlphaTracker) is adapted from AlphaPose (Fang et al., 2017(Fang et al., , 2022)), a human pose estimation and tracking library that provides superior performance in both accuracy and efficiency.The algorithm consists of three steps: animal detection, keypoint estimation, and identity (ID) tracking across frames (Figure 1).
First, the algorithm detects the positions of animals in each frame with YOLOv3 (Redmon and Farhadi, 2018) which is a stateof-the-art convolutional neural network designed to detect objects at a high inference speed.
Next, individual animals are cropped out with the bounding box output from YOLOv3.The cropped individual images are fed into Squeeze-and-Excitation Networks (SENet) (Hu et al., 2017) which estimates keypoint positions.For our mouse dataset, we chose the snout, tail base and two ears as our four keypoints.The outputs from SENet include x and y coordinates as well as a confidence score which indicates the reliability of each identified keypoint.
Finally, the algorithm tracks each animal across frames.This presents a significant challenge for many platforms as animals of the same genetic lines often look alike.Traditional Re-ID methods previously implemented (Chen et al., 2018;Ristani and Tomasi, 2018;Feng et al., 2019) tend to fail since such methods typically rely on differences in the appearance of tracked animals.In our pipeline, we propose a novel target association method that captures hierarchical visual information to keep track of the identities of nearly identical animals across frames.We define a descriptor for the position and orientation of the animal from the set of bounding boxes around the entire animal and individual body parts.The similarity score of pairs of descriptors in adjacent frames is calculated according to formula 1.
IOU(P t ik , P t+1 jk ) ( 1) In formula 1, D t i is the descriptor of animal i at frame t, box t i is the bounding box of animal i at frame t predicted by the convolutional neural network.P t ik is the box that wraps the k-th body point of animal i at frame t.Intersection Overlap Union (IOU) is defined by formula 2. After sorting the descriptor similarities in descending order, the descriptors between two adjacent frames with the highest similarity are matched and assigned with the same tracking ID.Across frames, descriptors for dyads are matched with the second-highest similarity score.This procedure is repeated until no animals are left unmatched.
In some cases, the predictions of bounding boxes and body points may not be accurate due to either tracking errors or occlusion.When the users correct the position of keypoints in one frame, we apply Kalman filtering (Kalman, 1960) to model the motion characterized by velocity and acceleration.We then modify the keypoint position predictions in consecutive frames to ensure consistency across time.
Our behavioral clustering classifies mouse behavior with an unsupervised hierarchical clustering algorithm (Wiltschko et al., 2015;Nilsson et al., 2020) for the following reasons: (1) Animal behavioral taxonomy is intrinsically hierarchical in structure.(2) It allows intuitive re-organization of results once the linkage matrix is computed.In our method, we first extract the features of animal behaviors based on the temporal dynamics of poses captured within a 15-frame time window.The 15-frame time window is chosen here since sub-second actions of animals have mean duration ± s.d.= 425 ± 726 ms (Wiltschko et al., 2020).Such features include biologically distinct features such as body length and displacement.When analyzing social behavior, we set one mouse as the reference and calculate the relative motion of the non-reference mouse.
Here, users can assign different weights to each feature to reflect feature importance in behavioral clustering.We next apply an agglomerative hierarchical clustering algorithm (Ward, 1963) to cluster clips based on the similarity between their features.Finally, a customized web-based UI allows easy inspection and modification of clustering results.2. git clone https://github.com/MVIG-SJTU/AlphaTracker.git3. cd AlphaTracker 2. Users can install our toolbox with either command line or via a coding-free GUI (Figure 2).Users can run the following commands to install our toolbox via the command line.Note that Windows users should first check out the "Windows" branch before the actual installation.Our toolbox creates an Anaconda virtual environment to manage Python dependencies.

. . Methods
1. git checkout windows # Windows users only 2. bash scripts/install.shTo use our GUI for installation, users need to download a GUI named "main_ui" from https://github.com/MVIG-SJTU/AlphaTracker/releases and save it inside the AlphaTracker folder.Users should: (1) Right-click the GUI app and choose "Properties, " (2) Check the "allow execution" or "allow run as a program" options under the "Permission" tab, (3) Open the main GUI by double-clicking the icon, then (4) Click the "Install" button to run the installation automatically.A video tutorial is available at: https://youtu.be/fQ1bSoAkV5o.

. . . Training dataset preparation
For users hoping to train the model using their own parameters (e.g., animal species, lighting condition), we include an image annotation toolkit to allow customized annotation of training datasets.This toolkit was adapted from an open-source tool Sloth and can be found under the directory ./Tracking/TheAnnotationTool/.This tool has only been tested under Windows.We have also provided a demo training dataset 600 annotated frames of two unmarked mice interacting in a home cage.Users can download this folder from https://drive.google.com/file/d/1TYIXYYIkDDQQ6KRPqforrup_rtS0YetR/view?usp= sharing and proceed to the next section for training.
1. Pick representative frames from input videos (> =200 frames are recommended) and save these frames as a folder called "im." Place the folder under the folder "json."These frames should be as distinct from each other as possible to cover the posture space.Models that learn from the entire space generalize better during the actual implementation.First, choose the "Face" option to generate a red bounding box around the animal of interest on the image.Your definition of a bounding box should be consistent (e.g., if you include the tail in the bounding box, always do so).We recommend only including the tail base for mice because tails are highly flexible and extend to a large area.
Next, choose the "point" option to label keypoints for that animal.If you have multiple keypoints, it is critical to follow the same annotation order for all the animals (e.g., snout − → left ear − → right ear − → tail base).If you have multiple animals, repeat the process for another animal only after you are completely done with the current animal (i.e., bounding box − → all the keypoints) because the order matters.5.If there is a mistake, you should first select the image on the bottom left of the UI, click the wrongly labeled coordinates, and press "delete."Make sure to delete all the subsequent coordinates for this frame as well and redo the annotation because the order of annotation is important for our algorithm.6.Once you are done, press the "save" button to save the JSON file before exiting the program.Rename and move the entire "im/" folder (images and the JSON file) to a safer storage location for later use.As a double-check, the generated JSON file should have the same structure as in Figure 3.  • image_root_list.List of paths to the directories of annotated frames.
• json_file_list.List of paths to the corresponding annotation JSON file.• num_mouse.A list of the maximum number of animals that may appear in each of the corresponding image folders.• exp_name.Name for the current project.
• num_pose.The number of keypoints for each animal.
This must be consistent within the project.If users have videos with different numbers of keypoints per animal, they can set up individual projects to keep the keypoint number consistent within each project.
Depending on the training results, users may need to modify hyperparameters related to training.For example, users can lower the learning rate and increase the number of epochs.

FIGURE
User interface for training the tracking model.The user interface for training the tracking model requires several inputs from the user.The inputs include the path to the labeled images ("ImageDir"), a JSON label file ("Label"), the number of mice in the images ("num_mouse"), the experiment name ("exp_name"), and pose pairs for defining the connections between keypoints to represent the skeleton ("pose_pairs").Other adjustable parameters such as the learning rate ("sppe_lr," "yolo_lr") and batch size ("sppe_batchSize," "yolo_batchSize") can be modified as needed.
However, over-reducing the learning rate may deteriorate tracking quality.Some hyperparameters are explained as follows.
• sppe_lr.Learning rate for the pose estimation module.
Default: 1e-4.User interface for running the tracking model.The UI for running the tracking model requires an input video ("Video") and the name of the trained model ("exp_name_track").Users can specify the frame interval to be tracked by setting the start and end frames ("start_frame," "end_frame"), and indicate the maximum number of mice expected in the frames ("max_pid_id_setting").The results will be saved in the specified result folder ("result_folder").
GUI (Figure 5) or by running the following in the command line.

. . . Tracking result inspection
Users can inspect and modify the tracking results with a browser-based UI (Figure 6, Supplementary Video 6).We recommend Google Chrome as the default browser for using the UI.Pre-installed Python3 is required as Python scripts are called by the backend of the UI.
1. Users can start the UI by clicking the "Results" button on the main GUI and clicking the "Start" button on the next page.A video tutorial is available at: https://youtu.be/9Ksb04s8mm4.Alternatively, users can run the following in the command line.The UI for AlphaTracker allows for inspection and correction of tracking errors.The UI consists of four areas: Area I is a navigation bar with icons for navigating between di erent interfaces ( , ), help function ( ), undoing/redoing actions ( , ), exporting results ( ), and starting new sessions ( ); The operation area (Area II) in the UI allows the user to view and edit the overlaid skeletons on the video using the cursor.It also provides the option to navigate between frames ( ), initiate curation ( ) and identity reassignment ( ), take notes ( ), and toggle the timeline's time indicator ( ).Area III is a playback control panel with options for playing/pausing videos ( ), displaying time and frame information ( , ), controlling playback speed ( , ), and specifying the interval for curation ( ); Area IV is a timeline displaying the progress of the video ( ), and keypoint locations over time ( ).Detailed instructions for using the UI can be found on GitHub.
5. After finishing modifying the tracking results, users can export the current clip information as a local JSON file by clicking the "export" icon.

. . . Behavioral clustering
AlphaTracker allows the analysis of both individual and social behavior.Here, using videos of two interacting mice, we demonstrate the ability of AlphaTracker to track animals in both scenarios.We consider clips with 15 frames (500 ms) as the unit for mouse behavior because previous research has shown that fast mouse pose dynamics can be grouped into meaningful blocks lasting 200-900 ms sub-second timescale (Wiltschko et al., 2015).For computing social features, we first rotate and move the poses such that the body of the reference mouse at the middle frame of the clip lies on the positive x axis.Figure 7 illustrates the definition of several features.
1.The success of behavioral clustering depends on the weights assigned to each feature.Users can assign higher weights to features of interest.Users either use the GUI as described in the "tracking section" or set the parameters in the ./BehavioralClustering/setting.py.Definitions of the features are listed below.
• body_length, body length of the reference mouse.
• body_change_sin, change in body direction of the reference mouse between adjacent frames.• left_ear, distance between the snout and the left ear keypoints of the reference mouse.• left_ear_cos, angle between the snout-left ear vector and the body vector(cos) of the reference mouse.• left_ear_sin, angle between the snout-left ear vector and the body vector(sin) of the reference mouse.
• right_ear, distance between the snout and the right ear keypoints of the reference mouse.• right_ear_cos, angle between the snout-right ear vector and the body vector(cos) of the reference mouse.• right_ear_sin, angle between the snout-right ear vector and the body vector(sin) of the reference mouse.• displace_rho, displacement between adjacent frames of the reference mouse.• displace_sin, direction of displacement between adjacent frames(sin) of the reference mouse.• displace_cos, direction of displacement between adjacent frames(cos) of the reference mouse.• body_length_TO, body length of the non-reference mouse.
• body_change_sin_TO, change in body direction between adjacent frames of the non-reference mouse.• left_ear_TO, distance between the snout and the left ear keypoints of the non-reference mouse.
Frontiers in Behavioral Neuroscience frontiersin.org• left_ear_cos_TO, angle between the snout-left ear vector and the body vector (cos) of the non-reference mouse.• left_ear_sin_TO, angle between the snout-left ear vector and the body vector (sin) of the non-reference mouse.• right_ear_TO, distance between the snout and the right ear keypoints of the non-reference mouse.• right_ear_cos_TO, angle between the snout-right ear vector and the body vector(cos) of the non-reference mouse.• right_ear_sin_TO, angle between the snout-right ear vector and the body vector(sin) of the non-reference mouse.• displace_rho_TO, displacement between adjacent frames of the non-reference mouse.• displace_sin_TO, direction of displacement between adjacent frames(sin) of the non-reference mouse.• displace_cos_TO, direction of displacement between adjacent frames(cos) of the non-reference mouse.• two_body_sin, angle between two body vectors(sin).
• TM_nose_RM_tail_rho, distance between the tail base of the reference mouse and the snout of the non-reference mouse.• TM_nose_RM_tail_sin, direction of the tail base of the reference mouse-the snout of the non-reference mouse vector(sin).
• TM_nose_RM_tail_cos, direction of the tail base of the reference mouse-the snout of the non-reference mouse vector (cos).• RM_nose_TM_tail_rho, distance between the snout of the reference mouse and the tail base of the non-reference mouse.• RM_nose_TM_tail_sin, direction of the snout of the reference mouse-the tail base of the non-reference mouse vector(sin).• RM_nose_TM_tail_cos, direction of the snout of the reference mouse-the tail base of the non-reference mouse vector(cos).• nose_nose_rho, distance between the two snouts.

(Optional) Users can define new features for clustering. We
provide five intermediate variables to facilitate the computation of new features.Each variable is a NumPy array with the shape of (number_of_clip, number_of_frames_in_one_clip, number_of_key_point, 3): • pose_clips, keypoints of the reference mouse.
• pose_clips_align, keypoints of the target mouse aligned to its middle frame.• poseTheOther_clips, keypoints of the non-reference mouse.
Frontiers in Behavioral Neuroscience frontiersin.org • poseTheOther_clips_alignSelf, keypoints of the nonreference mouse aligned to itself in the middle frame.• poseTheOther_clips_alignToOther, keypoints of the nonreference mouse aligned to the reference mouse in the middle frame.
Each new feature should be defined in ./fft_utils.py: 1 The branches of the dendrogram below the userdefined threshold are colored to indicate their cluster assignment.• Timeline.The timeline plot displays the cluster assignments for each clip, with their color matching the cluster assignment as in the dendrogram.• Feature heatmap.The feature heatmap visualizes the relative strength of each feature in the cluster.• UMAP.The UMAP shows the topological structure of all the clips in the feature space.Each dot represents one cluster, colored by their cluster assignment.• Mutual information plots.The mutual information plot quantifies the mutual information between each feature and the cluster assignment.Note: Larger mutual information suggests the feature is a strong marker of the cluster.• Similarity matrix between clusters.Note: Clusters with a high similarity score are hard to differentiate.• Representative skeleton for each skeleton.Cluster skeletons visualize the representative pose and its temporal evolution for each cluster.
6.Besides these analysis plots, users can also inspect generated videos saved at self.gen_video_folder as specified in setting.pywith feature and cluster assignment.7. Once the optimal threshold is identified, users should set the threshold in the setting.pyand rerun the algorithm.This will save the correct cluster information for the clustering UI.

. . . Clustering UI
We provide a Clustering UI for inspecting the clustering results (Figure 8).contains the clip information (clips_info.json)generated in the behavioral clustering step.Import the original video and specify the frame rate.Upload the JSON file that contains the cluster structure (e.g., Z_all_twoMice.json).2. Play the video and inspect the cluster assignment for each clip.
Users can examine the dendrogram by expanding and collapsing the tree structure.The branches of the dendrogram can be merged and moved to modify the cluster assignment.A detailed explanation is provided in our GitHub manual.

. . . Tracking with Google Colab
In addition to the desktop version, we also provide a Colab notebook for training and tracking.Users looking for a quick test of AlphaTracker can open this notebook https://colab.research.google.com/drive/1wYBAj3kjLMe6uir3TJVfWRAJNHtjCaPGand simply run through all blocks following the instructions.If users would like to train their own model, we provide another notebook https://colab.research.google.com/drive/1bGUo3eMWIfzXiFWCvNrNiTOzhSsTsVHV.
1. Open the Colab notebook and save a copy to your personal Google Drive.2. Click Runtime and then change the runtime type to "GPU." Run the "Install" section to connect to your Google Drive.Your Google Drive will now be mounted at /content/drive/MyDrive.
Run the next block to download AlphaTracker and finish installation.

Upload your annotated training datasets to Google Drive
and set variables such as "image_root_list, " "json_file_path, " "number_of_animals, " "number_of_poses, " "video_path" in the "Setting" section.4. Run the training code block if you would like to train the model with your own datasets.5. Run the tracking code block to perform training on the videos you listed in setting.pyor the default demo video.Once this step is complete, in order to inspect the tracking results, you can go to the result folder as specified in setting.py.

. . Anticipated tracking results
To quantify AlphaTracker's performance and compare it to SLEAP and DeepLabCut, we conducted experiments using a mouse dataset where.Trained human annotators labeled the bounding box and four keypoints (snout, left ear, right ear, and tail base) for each mouse in each frame.Our customization of the DeepLabCut default model includes the following modifications: enabling automatic computation of the PAF graph, utilization of the box tracker, and setting the maximum number of iterations to AlphaTracker, DeepLabCut, and SLEAP were evaluated on two datasets: two mice in a home cage and four mice in an operant chamber.Each model was trained on 600 annotated frames and evaluated on 200 frames with human-labeled ground truth.The evaluation results show that AlphaTracker outperforms DeepLabCut and SLEAP in both datasets, achieving higher keypoint detection accuracy (mAP) and tracking consistency (MOTA and MOTP)., and frames) using a -frame evaluation dataset.Di erent colors represent di erent keypoints including snouts (blue), ears (yellow) and tailbases (green) and the total metrics (red).(D-F) Show the evaluation results for AlphaTracker (red), SLEAP (green), and DeepLabCut (blue), with connected dots representing the total metric and the unconnected dots representing the metric for individual body parts.100,000.Our customization of the SLEAP default model includes several modifications to improve its tracking performance.Firstly, we used the bottom-up model and set the tracker mode as "flow".Secondly, we implemented culling with an IoU threshold of 0.8.Thirdly, we utilized the instance similarity method and the greedy matching method.Fourthly, we set the elapsed window to 5 and used a robust quantile of similarity scores of 0.95.Fifthly, we applied post-tracking break connection to improve tracking continuity.Finally, we adjusted the minimum and maximum rotation angles to -180 and 180 degrees, respectively.We used the standard CLEAR MOT metrics [Average Precision (AP), Multiple Object Tracking Accuracy (MOTA), and Multiple Object Tracking Precision (MOTP)] (Bernardin and Stiefelhagen, 2008), and evaluated the performance using the open-source Poseval tool (Pishchulin et al., 2015) AP assesses the accuracy of object detection by computing precision and recall values.MOTA evaluates three types of errors: missed objects in a sequence, false positives, and mismatches.MOTP calculates the average total position error for matched object-hypothesis pairs across all frames.The evaluation was performed using the open-source Poseval tool (Pishchulin et al., 2015).To adapt the MOT metrics for mouse tracking, we modified the threshold for distinguishing matched keypoints from mismatched keypoints to be 5% of the bounding box's diagonal.
We evaluated AlphaTracker's performance on a dataset with two mice interacting in a home cage, recorded at a resolution of 1,920 × 1,080 p. Results shown in Table 1 indicate that AlphaTracker outperformed SLEAP and DeepLabCut in terms of mAP, MOTA, and MOTP when trained with 600 frames and tested on 200 held-out frames (Supplementary Video 1).Furthermore, AlphaTracker demonstrate consistent performance across all four keypoints, while SLEAP and DeepLabCut showed significant variance, as shown in Figure 9. Furthermore, AlphaTracker showed high performance with only 50 frames of training data, achieving an mAP higher than 0.7 (Figure 9).Moreover, we evaluated AlphaTracker's performance in tracking multiple identical-looking animals using four C57/BL6 mice in a metal operant chamber scenario.Our evaluation (Supplementary Video 2) showed that AlphaTracker outperformed SLEAP and DeepLabCut in all metrics (Table 1, Supplementary Video 2).We also tested AlphaTracker on mice with head implants, a common scenario in neuroscience research, and demonstrated its robust performance (Supplementary Video 3).This highlights AlphaTracker's potential for studying naturalistic social group dynamics in common neuroscience settings.
It's worth mentioning that the four mice operant chamber dataset was collected with low-quality webcams at a resolution of 540 p. AlphaTracker demonstrated robust performance in tracking animals in these videos (Table 1), making it an attractive solution for large-scale animal behavior studies as it enables the monitoring of multiple cages using low-cost webcams, greatly reducing the overall cost.

. . Anticipated behavioral clustering results
The behavioral clustering component of AlphaTracker enables the clustering of both individual behavior and social interaction in an unsupervised manner.Here, we analyze a total of 4,661 clips for individual behavioral clustering and 2,356 clips for social behavioral clustering collected from four videos (Figure 10, Supplementary Videos 4, 5).Our algorithm can capture the following individual behaviors: walking, digging, sniffing, rearing, turning, face grooming, and body grooming, and social behaviors: following, chasing, anogenital sniffing, face sniffing, and social rearing.
To evaluate the importance of each feature in clustering, we calculate the mutual information between features and cluster assignment (Figure 11), with the expectation that higher mutual information indicates that the feature may represent a unique characterization of a given cluster.For example, distances between two mice are a strong indicator of social clusters, while related to the head such as head length and nose-left ear distance stand out among other individual features, indicating the salience of the head in of many behaviors like rearing, digging and turning (Figure 11).The identified behavioral clusters allow users to visualize the temporal dynamics of animal behavior.This opens up the opportunity for associative analysis between changes in behavior motifs with experimental factors like optogenetic stimulus, drug administration, environmental changes, and manipulation in a social hierarchy.
To validate the performance of our behavioral clustering algorithm, we compared its output to the ground 488 truth of human annotation.A human scorer was trained to categorically annotate behaviors.We used the Adjusted Rand Index (ARI) to measure the similarity of the class assignment between the algorithm and human scorer.ARI scores range from -1 to 1, We evaluated the algorithm's performance on datasets of different sizes.The small dataset consisted of five videos, including two videos with human-annotated ground truth, with a total of 1345 clips.The large dataset included two additional videos, with a total of 3034 clips.The results presented in Table 2 suggest that both datasets perform significantly better as compared to randomly assigning clips to each cluster.Moreover, the performance of the model further improved when given a larger clustering dataset, likely due to better coverage of the continuous input space.

. . Timing
Installation time for AlphaTracker is highly dependent on the installation method selected and users' Internet speed.We that it will take a user between 10-30 minutes to download and install the package, pre-trained model, demo data, and all dependencies on Linux.On Windows, installation may take about 2 minutes less since Windows does not support YOLO training when using the C-based darknet toolbox.
The training time for AlphaTracker (including YOLO and pose estimation) is highly dependent on hardware performance, dataset, and hyper-parameter settings.Using our default settings and example data (about 6000 images), it takes approximately 2 hours to train the YOLO detector and the pose estimation model.
The tracking time for AlphaTracker (including detection, pose estimation, and tracking) is also highly dependent on hardware performance, dataset, and hyperparameter settings.Using our default settings and demo video (about 7 minutes), tracking takes approximately 2 hours.
The time required for behavior clustering varies according to the features selected.Using keypoint-based features takes approximately 10 minutes.When using the UI to inspect the results, the main time cost is spent on loading data, which typically takes about 1-2 minutes.
These time estimates are for a server with 72 Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz, and 393 GB of RAM, running an Ubuntu 18.04.5 LTS system, using a 2080-Ti GPU.CPU times or Windows times are also noted where appropriate.

. Discussion
In this paper, we introduce AlphaTracker, a robust machinelearning pipeline that accurately tracks and estimates the poses of multiple unmarked animals.AlphaTracker also includes a feature for discovering behavioral patterns through unsupervised The accuracy of AlphaTracker's clustering algorithm was evaluated by comparing its output with human annotations for individual clips (500 ms).The annotations included individual behaviors: walking, digging, sniffing, rearing, turning, face grooming, and body grooming, and nine social behaviors: following, chasing, anogenital sniffing, face sniffing, and social rearing.The similarity of the class assignments was measured using the Adjusted Rand Index (ARI).AlphaTracker showed significantly higher ARI compared to random assignment, demonstrating its consistency with human judgement.Additionally, using a relatively small dataset of 1,345 clips, AlphaTracker was able to accurately capture most of the cluster assignments, further highlighting its efficiency and effectiveness.
clustering and a user-friendly interface for visualizing and proofreading results.Our pipeline is available on GitHub for educational use and is user-friendly for non-programmers.Users can model and analyze animal behaviors in a matter of hours with a GPU.Our aim is to provide the research community with a powerful tool for high-throughput behavioral analysis.
Traditional multi-animal tracking approaches require heuristics to resolve animal identities, such as artificial colored markers (EthoVision, Noldus) and bleach-marking with fur patterns (Ohayon et al., 2013).These methods require performing procedures on animals that could affect their natural behavior.Animal tracking has benefited greatly from advances in pose estimation, such as DeepLabCut, a software package that can reliably track human-defined unique keypoints (Mathis and Mathis, 2020).A recent algorithm, Moseq, has made progress on automated behavioral identification by using a depth camera and unsupervised learning theory (Wiltschko et al., 2015).And, SimBA presents an open-source package with a graphical interface and workflow that uses pose-estimation to create supervised machine learning (Nilsson et al., 2020).However, these tools have not been effective in tracking multiple identical animals.In recent years, other tools for multiple animal tracking have emerged.As an example, SLEAP is a full-featured general-purpose multi-animal pose tracking framework tested on a diverse array of datasets representative of common social behavioral monitoring setups and designed for flexibility (Pereira et al., 2020b).Our model outperforms these tools in keypoint detection accuracy and multi-animal identification consistency which is critical for studying social behavior.
Our AlphaTracker model has two main limitations.Firstly, it was designed for tracking mice from a top view, and its adaptation to other animals and environments requires expert tuning and adaptation.To make this process easier, we have created a tutorial on annotating new data and model training.Typically, 200 annotated frames yield satisfactory performance in new settings.The second limitation is the hardware requirement for a GPU for model training.To overcome this, we offer a Google Colab version of AlphaTracker.However, the free Colab version may time out during long training sessions, and requires packages to be reinstalled and connection to Google Drive for file storage each time it is used.
Users may also encounter challenges that are common to all models of this kind.Firstly, keypoint detection accuracy may be affected by occlusions or animals temporarily leaving the frame.To address this, we have provided a curation UI for users to correct misidentification and mislabeling.Secondly, the clustering algorithm does not work well for heterogeneous videos, such as those with different imaging angles, animal sizes and scaling factors.In these cases, the algorithm will produce clusters specific to each condition, rather than uniform behavior patterns.
We envision AlphaTracker greatly facilitating systems neuroscience research, as it premiered in Padilla-Coreano et al. (2022).In that paper, AlphaTracker played a key role in furthering research studying the role of the medial prefrontal cortex in regulating social hierarchy.Besides this paper, there has been a recent increase in the study of the neural mechanisms behind behaviors such as social dominance, mating behavior, and maternal behavior.To fully understand these behaviors, it is important to have reliable and efficient methods of tracking social interactions and quantifying behavioral patterns with minimal bias.Human annotation performed by multiple researchers suffers from in subjective behavior annotation and intensive labor.AlphaTracker which is designed for reducing biases and elevating efficiency holds great potential in accelerating this field.

FIGURE
FIGUREAlphaTracker architecture and pipeline.The AlphaTracker architecture consists of three main components: object detection using YOLO, pose estimation using Single Animal Pose Estimation (SAPE), and identity tracking based on intersection over union (IOU) with error correction via Kalman filter.It outputs bounding boxes and user-defined keypoints for each detected animal, along with confidence scores for these predictions.
2. Click json/clickme.batto create a new json file named multi_person.jsonunder the folder json/.Move the newly generated JSON file into the directory json/im/.3. Go back to the directory tool and click tool/start.bat.Select the multi_person.jsonfile and click "Open" to load all the images.4. To meet the input specifications of AlphaTracker, strictly follow the proceeding instructions for image annotation:

FIGURE
FIGURE AlphaTracker UI. (Left) The AlphaTracker user interface provides six functionalities, including installing/uninstalling, training the tracking model, running the tracking and clustering models, and examining and correcting tracking and clustering results.(Right) The installation involves verifying the presence of Anaconda, installing required packages, setting up YOLO, and downloading the model weights and demo data.

FIGURE
FIGUREExample annotation JSON file generated by Sloth.The example shows annotations for two frames, represented by blue boxes.Each frame depicts two mice, one of which is highlighted in a yellow box.The yellow-boxed mouse is annotated with a bounding box for the body and four keypoints, which correspond to the head, left ear, right ear, and tail base respectively.

FIGURE
FIGURE the original video (e.g., demo.mp4).Specify the frame rate of the imported video.The default frame rate is 25.0 fps.The video player allows the users to browse the videos with overlaying tracked keypoints and identities indicated by colors.Users can jump to frames of interest or scan through videos frame by frame.We provide speed control to allow flexible navigation within each video.The timeline visualizes the position of different keypoints.An abrupt change in keypoint position often suggests an error in tracking.3.If the detected keypoints show large jitters, this indicates that the SPPE (single perspective pose estimator) model may not be properly trained.Users can return to the training stage, modify the learning rate and the number of training epochs, or provide more training data, and retrain the network.4. Users can correct small errors such as mislabeled identities andmislabeled keypoints.To correct mislabeled keypoints, users can pause at the relevant frame(s) and drag each mislabeled keypoint to the correct position.To correct mislabeled identities, users can exchange the identities between two mice.Since errors are likely to persist after the newly corrected frame, users can select a time interval by clicking "IN" at the start of the time interval (typically, this is the frame being just modified) and "OUT" at the end of the interval.Click "curate" to update the prediction for all the frames in the interval.

FIGURE
FIGURE

FIGURE
FIGUREFeatures for clustering individual and social behavior.Individual behavioral clustering depends on both static features in individual frames and dynamic features across frames.Social behavioral clustering also depends on additional social features.The definition of example features is depicted in the diagram.Common features include distances between keypoints and the angle between two vectors.(Blue, green: skeleton of mouse poses.Red: distances between two points or angles between two vectors.Gray: reference coordinates).
1. Open the clustering UI following the same instruction as the tracking UI.Choose cluster.html.Upload the JSON file that

FIGURE
FIGUREThe UI for AlphaTracker provides the ability to inspect and correct clustering errors.Area I is a navigation bar with options for navigation ( , ), help ( ), undoing/redoing actions ( , ), result export ( ), and starting new sessions ( ).Area II displays the video ( ) and the dendrogram ( ) of clustering results, along with frame navigation ( ).Double-clicking a node in the dendrogram highlights all corresponding frames in the timeline in cyan.Double-clicking a node in the dendrogramwould highlight all the frames belonging to that cluster in the timeline.Right-clicking allows for cluster and clip manipulation (move, rename, delete).Merging of clusters is available in region , with the ability to record rationale in region .Area III contains a playback control panel with options to play/pause video ( ), display time, frame, and clip information ( , , ), control playback speed ( , ), choose video ( ), and set the level to expand/collapse in the dendrogram ( ).Area IV displays the progress bar ( ) and cluster assignments for each clip encoded by color ( ).Detailed instructions for using the UI can be found on GitHub.

FIGURE
FIGUREAlphaTracker surpasses the performance of both DeepLabCut and SLEAP in tracking two mice in a home cage.(A-C) Show the Average Precision (AP), Multi-Object Tracking Precision (MOTP), and Multi-Object Tracking Accuracy (MOTA) metrics for di erent keypoints of AlphaTracker results.The metrics were evaluated for di erent amounts of training frames ( , , , , , and frames) using a -frame evaluation dataset.Di erent colors represent di erent keypoints including snouts (blue), ears (yellow) and tailbases (green) and the total metrics (red).(D-F) Show the evaluation results for AlphaTracker (red), SLEAP (green), and DeepLabCut (blue), with connected dots representing the total metric and the unconnected dots representing the metric for individual body parts.

FIGURE
FIGURE AlphaTracker identifies clusters for both individual and social behavior.Hierarchical clustering was performed on -frame clips ( ms duration) generated from videos of interacting dyads.The dendrogram of the clustering results is shown in (A) for individual behavior and (C) for social behavior.Each leaf on the dendrogram represents a single clip, and their relative distance reflects their similarity in the feature space.The di erent colors and numbers indicate the assigned cluster for each individual clip.Example skeletons in (B, D) provide a visual representation of the typical movement in each cluster.The pose of the reference mouse is displayed in red, while the non-reference mouse's pose is displayed in green.The movement in a -frame clip is illustrated by plotting a skeleton representation, with colors ranging from dark to light to denote each individual frame.The skeletons have been rotated to align the pose of the reference mouse in the central frame with the negative x-axis direction.

FIGURE
FIGURE Behavioral clusters can be di erentiated by unique combinations of features.(A, C) Illustrate the heatmaps of averaged feature values utilized for individual and social behavioral clusters, respectively.M refers to the reference mouse and M refers to the non-reference mouse in the dyad.(B, D) Show the mutual information between cluster assignment and each feature for individual and social behavioral clusters, respectively.A higher mutual information score indicates that the feature plays a crucial role in distinguishing specific clusters.
Anaconda: a free and open-source distribution of the Python programming language.AlphaTracker is written in Python 3.8 and is not compatible with Python 2. • AlphaTracker: an actively maintained toolbox freely available at: https://github.com/MVIG-SJTU/AlphaTracker.Instructions in this paper are based on this version.Recently, we provided a sister version of our package on GitHub (https://github.com/Tyelab/AlphaTracker2)which is more friendly to Windows users as it provides a Python wrapper for the DarkNet in the YOLO package, the original version of which is a C-based toolbox that must be compiled on Linux systems.With the goal of offering real-time tracking, this version also adds some flexibility in processing speeds by offering options for lighter-weight networks like Mobile Net in place of the ResNet backbone with the goal of offering real-time tracking.• PyTorch: an open-source software library for Deep Learning.Our toolbox has been tested on PyTorch 1.8.0.• Nvidia Driver: a driver software with a version higher than 450 is required to run our model on a computer with Nvidia GPU card, available at: https://www.nvidia.com/download/index.aspx.• Jupyter Notebook: a web-based interactive computing platform available at: https://jupyter.org/install.• Data annotation toolbox Sloth: an open-source software for labeling keypoints and identities of objects, provided as part of our toolbox.
• Operating system: Linux (Ubuntu 16.04 LTS, 18.04 LTS), or Windows (10) (Windows only supports applying the tracking model, but not training the neural network-based model).• • sppe_epoch.Training epochs for the pose estimation module.You might need to set a large number when training from scratch.• sppe_batchSize.Batch size for pose estimation.If you encounter an out-of-memory (OOM) error, you may need to reduce the batch size.Path to the video • start_frame.Index of the start frame of the video • end_frame.Index of the last frame of the video • max_pid_id_setting.Number of mice in the video • result_folder.Path to the folder for saving the results • vis_track_result.Whether to visualize the tracking results by overlaying the predicted keypoints on the video.
If users have not trained their own models, they can use our pretrained model by setting exp_name=demo in the configuration file.This implicitly calls the pretrained model.In case users do not have a video ready to use, we also provide a test video at: ./Tracking/AlphaTracker/data/demo.mp4. 1. Users can set the parameters for tracking by either specifying the parameters in ./Tracking/AlphaTracker/setting.py or using the code-free UI.If using the GUI, click the "Tracking" button on the main GUI and select a video file.Modify the parameters on the "tracking" page.Users can hover the mouse cursor over each parameter for detailed explanations.A video tutorial is available at: https://youtu.be/t2skgohliAc.Important parameters are as follows:• video_full_path.• exp_name.Project name.If users want to use our pretrained model, set this parameter to demo. 2. Users can start the tracking process by either clicking the "Start" button on the tracking page of the Frontiers in Behavioral Neuroscience frontiersin.org in this Jupyter Notebook.This Jupyter Notebook will generate the following plots to help determine the clustering quality with the chosen features and feature weights.Users should try different clustering thresholds and check the dendrogram and feature heatmaps in order to identify the optimal threshold to use.• Dendrogram.The dendrogram is based on a linkage matrix calculated by the clustering algorithm.
TABLE AlphaTracker demonstrated superior performance in tracking both two mice in a home cage and four mice in an operant chamber.