ASABE Logo

Article Request Page ASABE Journal Article

Equine Kinematic Gait Analysis Using Stereo Videography and Deep Learning: Stride Length and Stance Duration Estimation

Nariman Niknejad1, Jessica L. Caro2, Rafael Bidese-Puhl1, Yin Bao1,*, Elizabeth A. Staiger3


Published in Journal of the ASABE 66(4): 865-877 (doi: 10.13031/ja.15386). Copyright 2023 American Society of Agricultural and Biological Engineers.


1Department of Biosystems Engineering, Auburn University, Auburn, Alabama, USA.

2Department of Animal Sciences, Auburn University, Auburn, Alabama, USA.

3Department of Animal Science and Veterinary Technology, Texas A&M University, Kingsville, Texas, USA.

*Correspondence: yzb0016@auburn.edu

The authors have paid for open access for this article. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License https://creative commons.org/licenses/by-nc-nd/4.0/

Submitted for review on 30 September 2022 as manuscript number ITSC 15386; approved for publication as a Research Article by Associate Editor Dr. Joshua Peschel and Community Editor Dr. Seung-Chul Yoon of the Information Technology, Sensors, & Control Systems Community of ASABE on 10 April 2023.

Highlights

Abstract. Equine kinematic gait analysis (EKGA) currently requires a complicated, expensive, and labor-intensive procedure for equine locomotion research. An automated stereo video processing pipeline was developed and evaluated for measuring equine biomechanical parameters. Using stereo videos of 40 different walking horses, a DeepLabCut (DLC) model was trained to detect body landmarks in individual frames. With an autoregressive integrated moving average filter, the landmark detection had a root mean square error of 5.14 pixels and a mean absolute error of 4.87 pixels. As a case study, methods were developed to extract stride length (SL) and stance duration (SD). Individual hoof gait phase detection was achieved using a fine-tuned Faster R-CNN model and a mode filter, yielding precision and recall values of 0.83 and 0.95, respectively. The semi-global block matching (SGBM) algorithm was used to estimate depth maps, and the accuracy was assessed by comparing head length estimation with infield measurements. A Bland-Altman analysis for DLC-detected head length in combination with SGBM-based 3D reconstruction yielded a bias of -0.014 m with upper and lower limits of agreement (LoAs) of 0.03 m and -0.061 m, respectively. Furthermore, Bland-Altman analyses on SD and SL when compared to image-level manual measurements showed biases of -0.02 sec and -0.042 m, respectively. The corresponding LoAs were (0.01907 sec, -0.24 sec) for SD and (0.04 m, -0.12 m) for SL. The proposed method showed promising potential in performing EKGA in an automated, cost-effective, and rapid manner under field conditions.

Keywords. 3D Reconstruction, Animal Pose Estimation, Deep Learning, Equine Kinematic Gait Analysis, Stereo Matching.

It is estimated that the total annual contribution of the horse industry to the U.S. economy is approximately 122 billion US dollars, according to Grice (2018). Moreover, horses’ relevance is not specific to the U.S., as they contribute to people's livelihoods specifically in developing countries in a wide range of sectors, including but not limited to agriculture, construction, tourism, mining, and public transportation (Behnke and Nakirya, 2012). Due to such economic and cultural impacts, studying biomechanical parameters is of paramount importance. This can be helpful in the performance evaluation of horses and in determining the health of a horse’s musculoskeletal system. It has been demonstrated that studying gait and behavioral analysis can potentially reduce training costs (Barrey et al., 1995; Rose et al., 2009). Horse training effectiveness and fitness evaluations can be done via qualitative or quantitative analysis of the biomechanical properties of the horses. A variety of methods have been proposed to perform an analysis of the motion of horses to measure temporal stride parameters. Typically, veterinarians evaluate a horse's musculoskeletal system through a clinical examination where an expert observes the horse walking and trotting "in hand” (led by a person) to find any visible asymmetries in its movements and to identify the localization of lameness (Pfau et al., 2016). However, this approach is subjective and is based on the examiner's technical knowledge (Keegan et al., 2000), which can lead to different diagnoses from different examiners. Hence, equine kinematic gait analysis (EKGA) technologies that are quantitative, objective, and automated are of great value to a wide range of applications.

EKGA examines the change in position of body segments on a given plane for a determined period. Quantitatively, motions are described by linear and angular variables relating to time, displacement, velocity, and acceleration (Barrey, 1999). One method to objectively evaluate locomotion lies in generating gait parameters based on extracting the precise positions of body landmarks. Marker-based optical motion capture (OMC) systems are by far the most developed motion-capturing methods in EKGA. In commercial marker-based products such as QHorse (Qualisys AB, Sweden), reflective markers on the upper body of a horse are tracked optically via image processing techniques. The angle and displacement of a joint can also be measured using strain gauges (Serra Bragança et al., 2018). Another approach is to use inertial measurement units (IMUs) (Serra Bragança et al., 2020), such as Equinosis (Equinosis LLC, USA), as a technique of gait evaluation (Bosch et al., 2018). A few reports have examined the comparability of optoelectronic and IMU-based motion systems to evaluate equine hoof motion. When analyzing fast motions like landing duration and break over duration, the two systems show less agreement while the IMU sensors seem to perform better (Hagen et al., 2021). However, sensor positions on a horse were found to affect the kinematic data derived from EKGA systems equipped with inertial sensors (Moorman et al., 2012). Another limitation of sensor-based methods is the number and placement of markers that need to be determined beforehand and are thus unalterable after data collection.

(a)(b)
Figure 1. (a) Stereo video acquisition system and (b) sample images from the stereo video dataset that show different horse breeds under various lighting conditions.

In comparison, markerless motion capture technology has been significantly improved due to the rapid advancements in computer vision and deep learning (DL) in recent years. Various DL-based video analytics tools have been developed for markerless animal pose estimation. For instance, DeepLabCut (DLC) and LEAP were the first methods to exploit deep convolutional neural networks (CNNs) for animal pose estimation by performing frame-based body landmark detection from a single camera (Mathis et al., 2018; Pereira et al., 2019). It was demonstrated that DLC could achieve near-human-level labeling accuracy on a small dataset thanks to transfer-learning and data augmentation techniques (Mathis et al., 2018; Vonstad et al., 2020). For markerless three-dimensional (3D) pose estimation, a multi-view imaging system is needed to capture a scene from many cameras at different viewing points and triangulate the 3D coordinates of the detected body landmarks. Various DL-based software tools have been developed, including the 3D version of DLC, AniPose (Karashchuk et al., 2021a), DANNCE (Dunn et al., 2021; Karashchuk et al., 2021b), OpenMonkeyStudio (Bala et al., 2020), FreiPose (Zimmermann et al., 2020), and DeepFly3D (Günel et al., 2019). While highly accurate, multi-view imaging systems are costly, difficult to manage, and require accurate camera calibration during setup for new scenes. A single binocular stereo camera that provides red-green-blue (RGB) and depth images may offer a cost-effective and easy-to-use alternative for 3D animal motion capture in conjunction with the pose estimation software tools. A single stereo camera can be easily relocated without calibration to accommodate constraints under field conditions.

This study investigated the feasibility of combining stereo 3D machine vision and deep CNNs to automate EKGA under field conditions. The specific research objectives were to develop and evaluate the performance of a data processing pipeline that can track body landmarks in 3D space and measure stride length (SL) and stance duration (SD) from side-viewing stereo videos of horses in locomotion. As a case study, SL and SD were selected to be the outputs of this pipeline because of their paramount importance in verifying various hypotheses in EKGA and lameness detection studies (Arkell et al., 2006; Keegan, 2007). Our developed pipeline can then be applied to equine genetic studies focusing on locomotion, such as gait and sports performance, which require precise phenotyping of biomechanical parameters.

Materials and Methods

Stereo Video Data Collection

The developed stereo video acquisition system consisted of a RGB stereo camera (ZED2, STEREOLABS, France), an embedded computer (Jetson Xavier NX, NVIDIA, USA), and a Wi-Fi router (TP-Link, China), as shown in figure 1a. A web-based user interface was developed using the Robot Operating System (DiLuoffo et al., 2018) for camera control and monitoring on a mobile device (i.e., a laptop). The imaging system was placed on a tripod at a height of 1.5 m above the ground. The stereo videos were saved in the SVO format (a proprietary file format created by STEREOLABS) at 15 frames per second (FPS) and with a resolution of 2208 × 1242 pixels for both the left and right cameras of the ZED2. The angular field-of-view of the stereo camera was 110° × 70°, horizontal and vertical, respectively. Table 1 depicts the intrinsic parameters of the stereo camera. Forty trotting or walking horses were videotaped using the image system at the Auburn University Equestrian Center, located at 32.5856297220309, -85.5088532, in July of 2021. Each horse was handled to move from the left side to the right side of the stereo video frame. The distance between each horse in motion and the camera was maintained at approximately 3 m. The duration of the videos varied between 65 and 85 frames. The horses included different breeds (e.g., Warmbloods, Morgans, Quarter Horses, and Thoroughbreds), and different coat colors. Prior to video recording, a single individual collected body measurements of each horse as outlined in Brooks et al. (2010). Additionally, a variety of outdoor lighting conditions (e.g., sunny, cloudy, overcast, backlit, etc.) were present in the dataset. A subset of the collected left-view images of the stereo videos in the dataset is shown in figure 1b. Nineteen out of the forty horses were diagnosed by a veterinarian to have some levels of difficulty walking during the data collection. Those horses were diagnosed with either injury or inherent physical problems such as suspensory and soft tissue damage, sciatic issues, muscular lameness, and navicular syndrome.

Data Processing Pipeline

The stereo video processing pipeline consisted of three main modules: body landmark detection, hoof gait phase detection, and stereo 3D reconstruction. As depicted in figure 2, the first step was to detect the body landmarks from the left 2D video frames using DLC. Next, each hoof and its gait phase (i.e., stance or swing) were detected using a Faster R-CNN (Ren et al., 2017) object detector. Lastly, the detected body landmarks were projected back to 3D space using a semi-global block matching algorithm (SGBM) (Hirschmuller, 2008). Two stride parameters, SL and SD, were used as case studies to evaluate the system. The implementation details are described in the following sections.

Table 1. Intrinsic parameters of the ZED2 stereo camera. Cx and Cy are the coordinates of the principal point of the left camera of the ZED2 in the image coordinate system. f is the focal length, and Tx is the baseline between the two cameras of the ZED2.
Stereo Camera
Parameters
Values
(Unit)
f1058.75 pixels
Cx1133.64 pixels
Cy659.75 pixels
Tx0.12 m

Body Landmark Detection and Filtering

A DLC keypoint detection model was fine-tuned to detect equine body landmarks in individual video frames from the left lens of the stereo camera. Twenty frames were extracted from every video for annotation. DLC offers three methods to sample frames: k-means clustering-based sampling, random sampling, and manual sampling. The k-means clustering method performs a k-means clustering of the pixel values in the video frames. A pre-defined number of frames are then sampled from the pool of grouped images. This sampling method aims to include different visual variations from all the available scenes. The k-means clustering method was used to account for different horse skin colors, backgrounds, and lighting conditions in the dataset. The number of clusters was set to 10, and two frames were sampled from each cluster. The sixteen annotated body landmarks consisted of the hoof, fetlock, knee, hock, nostril, poll, wither, and hip, as shown in table 2. For the DLC model, 1,200 frames were annotated, with 800 for model training and 400 for testing.

In many cases, landmarks such as the wither, nostril, and hip were obvious and relatively easy to annotate. However, landmarks such as the hoof, fetlock, hock, and knee of left limbs could be occluded by right limbs due to the side viewing angle. Occluded landmarks were annotated based on the horse pose where they were most likely to be found. The keypoint detection performance was evaluated on seven landmarks (i.e., four hooves, nostril, poll, and wither) due to their paramount importance in equine gait analysis studies. ResNet101 was selected as the backbone for the DLC model. To train the model, the number of iterations was set to the default value of 530,000, and a step-based learning rate decay method was used to set the learning rate to be 0.01, 0.005, and 0.002 for each one-third of the number of iterations. To add more variation to the training set, data augmentation was employed. A series of image transformations were applied to the annotated frames, including gaussian noise, elastic transformation, random rotation, and motion blur. After initial model training, the performance of the trained body landmark detector was evaluated using the test dataset. The detected landmarks were assessed both visually and based on the DLC-derived confidence level. If the detections were not associated with the correct body part or if they had a confidence level below 45%, they were selected to be annotated by a user to correctly update the body landmark. The newly annotated frames were then added to the previously developed training set. Finally, the pre-trained model was retrained with the new training dataset. The evaluation procedure described above was repeated three times.

As the DLC-predicted body landmark locations contain some errors and occasional outliers, an autoregressive integrated moving average (ARIMA) filter in the post-processing tools of DLC was used to smooth the hoof trajectories as outlined and incorporated by Mathis et al. (2018) in the DLC’s framework. A nonseasonal ARIMA model is specified as ARIMA(p,d,q), where p, d, and q refer to the number of autoregressive terms, the degrees of differentiations, and the number of lagged forecast errors, respectively (Kotu and Deshpande, 2019). An ARIMA(3, 0, 1) model was fitted to the landmark detection coordinates. Considering that the horses' motion was almost perpendicular to the camera, constant averaging with respect to one axis is appropriate. Plots of the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) of the hoof trajectory were used to identify the orders. After the third lag, the PACF graph of the trajectory cut and that was used to determine the autoregressive term. The moving average (MA) term was determined by plotting the ACF. This filter was applied to generate a one-step-ahead forecast. The variance of the filtered trajectory was then calculated. For every landmark in 2D image space, the algorithm removed the ones outside the computed variance and replaced them with predicted coordinates.

Hoof and Gait Phase Detection and Filtering

To determine the gait phases of individual hooves, a pretrained Faster R-CNN model (Ren et al., 2017) with a Resnet-101 DC5 3X backbone in Detectron2 (Wu et al., 2019) was fine-tuned to detect the bounding box of each hoof and the associated gait phase. The number of epochs was set to 500, and the base learning rate was set to 0.001. The learning rate was decreased during training using a step-decay method with a minimum value of 0.0005. The size of each patch and the number of regions of interest per image were set to be 2 and 128, respectively. COCO Annotator V0.11.1 (Brooks, 2019) was used to draw a bounding box around each hoof and assign a gait phase. If the sole of a hoof was visually determined to be more than five pixels away from the ground or to form an angle of five degrees or higher from the ground surface, that hoof was labeled "Swing." Hooves that were in contact with the ground were labeled "Stance," as outlined by Clayton (2004). Hooves obstructed by another hoof were labeled as "Occluded." A series of data augmentation techniques (i.e., resizing, changing brightness, altering contrast and saturation, and flipping) were randomly applied to the frames to increase the generalization capacity of the trained model. As shown in table 2, the numbers of annotated instances for "Swing," "Stance," and "Occluded" were 1066, 2133, and 24, respectively. For training the model, 80% of the total 810 frames were used, and the remaining 20% were used to evaluate its performance.

Gait Phase Assignment to Hoof Landmarks

The Faster R-CNN-based gait phase detection results were assigned to the corresponding DLC-based hoof landmarks in each frame by checking if the 2D coordinates of each hoof landmark were within any Faster R-CNN-detected bounding box. If a hoof landmark fell inside of multiple bounding boxes (e.g., overlapping bounding boxes of the two front hooves), the bounding box with the minimum Euclidean distance between its center and the landmark was chosen. This procedure was applied to each hoof throughout the video. Using these two models together enabled the determination of the 2D coordinates of the hooves and their gait phase. This combined algorithm was evaluated by manually annotating the gait phases of each hoof in each frame of the 40-video dataset. The process resulted in a comprehensive dataset comprising "Stance" with 6872 instances, "Swing" with 4582 instances, and "Occluded" with 178 instances.

Hoof and Gait Phase Detection Filtering

The gait phase for a hoof is expected to alternate across the frames in a video. However, the Faster R-CNN-based hoof and gait phase detection results occasionally contained sporadic changes in the gait phase that did not follow a periodically alternating pattern. This happened mostly when a hoof was occluded by another one (e.g., front right hoof occluding front left hoof). A mode filter was applied to each gait phase detection in each frame. Specifically, the gait phase of each hoof was determined as the majority gait phase in four adjacent frames (i.e., ±2 frames).

Stereo 3D Reconstruction of Body Landmarks

To measure distance in a metric unit (e.g., meter), a stereo 3D reconstruction of the 2D image landmark coordinates was needed. For each rectified stereo image pair, a disparity map was generated by using the semi-global block matching algorithm (SGBM) in the OpenCV library (Bradski, 2000). Note that stereo image rectification was automatically performed by the ZED software development kit (SDK) during image acquisition. SGBM has shown the ability to handle untextured areas with a smoothness term in its optimization objective function (Hirschmuller, 2008), which is suitable for horse bodies that have regions of homogenous coat colors. Considering the stereo camera baseline, its minimum imaging distance, a matching window size of 5 pixels, a disparity range of 100 pixels, the disparity map and the stereo camera parameters, the 2D landmark coordinates were projected back into 3D space using a set of equations (eq. 1):

(1)

where

x, y, and z = coordinates of the landmark in 3D space

u and v = coordinates in 2D image space of a landmark in the left image

W = dummy variable

d = calculated disparity for each pixel location.

Optical center coordinates for the left lens (cx = 1133.64 pixels and cy = 659.827 pixels) were extracted from the ZED2 camera calibration file, as well as its focal length (f = 1058.75 pixels) and baseline (Tx = 0.1200 m).

Stance Duration and Stride Length Estimation

As a case study for the proposed pipeline, SD and SL were selected to be the output biomechanical parameters. SD and SL hold valuable information on the biomechanical soundness of horses, as discussed previously. For instance, limb lameness has been shown to cause shorter stance duration and shorter stride length in horses (Barrey, 1999; Moorman et al., 2012; Nakamura et al., 2015; Peham et al., 2001; Serra Bragança et al., 2018). Literature has also shown that SL correlates with horse size (Hole et al., 2002; Rooney et al., 1991), so it is a reliable indirect measure of our pipeline performance. Thus, regression and pair-wise correlation were conducted to determine if an association exists between system-derived SD, system-derived SL, and body measurements. SD and SL were examined in the horses diagnosed with lameness for outliers.

To determine the duration of a stance, if the same hoof phase is detected in more than two consecutive frames, those frames are counted as valid for that phase. In this way, the number of frames in which each hoof was in the stance or swinging position was determined. SD was calculated as the frame count for a stance phase divided by frame rate (i.e., 15 FPS). The average stance location was calculated in 3D space, and the Euclidean distance between two consecutive stance locations was measured. Using this method, the SL of every detected hoof was calculated. An image level evaluation was conducted by manually annotating all the landmarks of interest, such as the hooves, nostrils, and poll, in all the consecutive frames in the 40-video dataset. The analysis yielded the following dataset: "Right Front Hoof" (2915 instances), "Left Front Hoof" (2907 instances), "Right Rear Hoof" (2901 instances), and "Left Rear Hoof" (2908 instances).

Evaluation Methods

Individual modules in the proposed data processing pipeline were assessed using the metrics discussed below. Furthermore, ground truth values of SD and SL were obtained using manual annotations of landmarks and gait phases.

Landmark Detection Assessment

To assess the accuracy of the 2D keypoint detection algorithm (DLC) to correctly estimate the coordinates of the body landmarks, root mean square error (RMSE) and mean absolute error (MAE) were used. RMSE is calculated using equation 2.

(2)

where

is the predicted 2D coordinates of the landmark in pixels

Y = (u, v) is the manually labeled coordinates of the same landmark

n = total number of the body landmark detections, and finally

i = present frame in a sequence of frames.

This RMSE value is uniformly averaged for u and v values to get one RMSE. Similarly, MAE was calculated using equation 3.

(3)

Hoof and Gait Phase Detection Evaluation

Average precision (AP) (Zhu, 2004) was used to evaluate the accuracy of the Faster R-CNN-based hoof and gait phase detection model. In the field of object detection, AP is a popular metric for evaluating object detectors by calculating the AP value over a range of recall values between 0 and 1. Here, AP is calculated as the area under the precision-recall curve, generated by Detectron2. The performance of Faster R-CNN in the hoof gait phase detection was assessed both in the training and test datasets using the mean AP value for the "Swing" and "Stance" phases.

Evaluation of Improved Hoof Gait Phase Detection

For each set of hoof coordinates that were identified by DLC, accuracy, precision, and recall of the associated Faster R-CNN-based hoof gait phase were computed. To assess the efficacy of the post-processing procedures, these values were computed before and after the mode and ARIMA filters were applied. Equation 4 was used to calculate accuracy.

(4)

where

TP = true positives

TN = true negatives

FP = false positives

FN = false negatives.

These variables were based upon manual labeling of the gait phases of hooves and comparing them to the outputs of the pipeline. Precision, recall, and F1 were calculated using equations 5, 6, and 7, respectively. Those metrics were used to assess the gait phase detection algorithm?both before and after the post-processing techniques.

(5)

(6)

(7)

Stereo 3D Reconstruction Evaluation

Head length was used to evaluate the stereo 3D reconstruction accuracy, as it should not change throughout a video. Infield head length was measured from the two top corners of the nostrils straight to the front of the poll using a tape measure as defined in the previously mentioned protocol in the data collection section. The accuracy of the 3D reconstruction module was assessed using a semi-automated method that used annotations from all the nostril and poll landmarks for all frames in the 40 videos collected for the dataset. A dataset of 2904 instances of nostril and 2912 instances of poll was created from the annotated images. In the semi-automated method, head length was computed as the Euclidean distance between nostril and poll in 3D space for each frame using manual image annotations and SGBM. Subsequently, those measurements were averaged over all frames in a video for evaluation. The semi-automated head length estimation aimed to assess accuracy without considering the error introduced by DLC-based landmark detection. Similarly, fully automated head length estimation was also obtained using DLC results and SGBM. The two sets of computed values were then compared to infield head length measurements. RMSE and MAE were used to assess the accuracy of the pipeline measurements against infield manual measurements.

Additionally, Bland-Altman (B-A) analysis (Bland and Altman, 1999) was employed to evaluate bias and limits of agreement (LoA) between infield manual measurements and semi- or fully automated head length estimations. In B-A analysis, the differences between the measurements of two systems (eq. 8) are plotted as a function of the average of those measurements. Any systematic difference between the two measurement systems (i.e., bias) is quantified by the mean difference (eq. 9). The upper and lower LoAs lie at 1.96 standard deviations of differences above and below the bias, with confidence intervals of 95% (eqs. 10, 11, and 12). The closer the bias is to zero, the more accurate the proposed method is compared to the reference method. If the LoAs are close to the bias line, the spread of differences is very small, suggesting that in most cases, the spread is very narrow. As part of our B-A plots, regression lines indicate whether bias was constant over the measurement range in our dataset and whether homoscedasticity was present. B-A analysis has been used in similar studies (Bosch et al., 2018; Hatrisse et al., 2022).

(8)

(9)

(10)

(11)

(12)

Intra-class correlation (ICC) was also calculated to quantify the consistency between infield and system-derived measurements (Barnhart et al., 2016). This step was done using the R package psychometric (version 2.2). The ICC yields values between zero and one. If ICC is zero, there is poor consistency between the two measurement systems, and if it is one, there is complete consistency. More specifically, ICC is calculated using equation 13.

(13)

where

t00 = variance of the intercept of the model

s2 = residual variance for the model

nj = size of the population.

Stride Length and Stance Duration Evaluation

SL and SD estimations were evaluated by annotating the 2D coordinates of the hooves in all frames of the videos and finding their 3D coordinates via SGBM. A manually annotated dataset of hoof gait phase was used, as described in the "Gait phase assignment to hoof landmarks" section, to calculate the SD and subsequently the SL. A comparison was then conducted between the ground truth measurements and the output of the pipeline through B-A analysis. An ICC analysis was also conducted on the measured values for SL and SD. RMSE and MAE were calculated in 3D space.

Experiment Environment

Model training and data analyses were performed on a workstation equipped with an AMD Ryzen Threadripper 2970WX 4.2 GHz 24-Core Processor, 64 GB RAM, and two NVIDIA Titan RTX GPUs with 48 GB of VRAM, running a Linux Ubuntu 20.04.4 LTS operating system.

Results

Landmark Detection

The training loss of the DLC landmark detection model was 0.002 pixels. For the test dataset, the RMSE and MAE for landmark detection without an ARIMA filter were 8.24 pixels and 6.52 pixels, respectively. Using the ARIMA filter, the RMSE and MAE were reduced to 5.14 pixels and 4.87 pixels, respectively. The detection errors of DLC equipped with an ARIMA filter of the seven individual landmarks for both the training and the test datasets are shown in figure 3. Higher means of detection error can be seen for body landmarks that were periodically occluded (i.e., left hooves). For instance, the left rear hoof had a mean error of 16 pixels as compared to 10 pixels for the right rear hoof in the test dataset. Nostril detection had the lowest mean error of 3.00 pixels in the training dataset and 5.00 pixels in the test set. As qualitative results, figure 4 depicts examples of the DLC detected landmarks in the test dataset. The 2D coordinates of the landmarks were determined and marked by different colors.

Figure 3. DLC detection errors for the (a) training dataset and (b) test dataset for seven horse body landmarks. The cross mark in each boxplot stands for mean error, the horizontal center line for median error, and the outside-box dots for outliers. Each boxplot's top and lower boundaries correspond to 25% and 75% percentiles, respectively.

Hoof and Gait Phase Detection

Training the Faster R-CNN model for hoof and gait phase detection was done with the specified dataset discussed in table 2. Figure 5 depicts the classification, bounding box regression, and total losses during the training session. In addition, it shows that the classification accuracy increased during training, starting from approximately 0.85 and approaching 1.0. The performance of the Faster R-CNN model for the test dataset yielded an AP of 33.196 and 39.714 for the swing phase and stance phase, respectively. A visual representation of the hoof and gait phase detection algorithm applied on an image from the test dataset, in conjunction with their corresponding bounding boxes, is shown in figure 6.

Improved Gait Phase Classification

The following results were computed using different combinations of the proposed modules described in the previous sections: As shown in table 3, the precision increased from 0.56 using DLC and Faster R-CNN models alone to 0.83 by applying the additional ARIMA and mode filters. Recall and F1 scores had similar improvements, increasing from 0.58 and 0.55 to 0.95 and 0.78, respectively.

The hoof and gait phase detection algorithm had a significant improvement when the filtering algorithms were applied. Figure 7 illustrates the confusion matrices for gait phase detection using different combinations of trajectory generation algorithms and filters. True positive rates increased from 56.22% to 61.20% for "Stance" cases and from 18.92% to 30.95% for "Swing" cases.

Figure 4. A visual evaluation of the performance of DLC in detecting body landmarks. (a) shows how DLC detects landmarks on the horse's body thoroughly. The hoof and fetlock on the rear left limb in (b) are occluded and therefore not detected by DLC. Likewise, the knee and hock of the left limb in (c) are not detected.
Figure 5. Bounding box regression loss, classification loss, total loss, and classification accuracy of Faster R-CNN model for hoof and gait phase detection during model training.

3D Reconstruction and Measurements of Head Length

The RMSE and MAE for head length estimation using manually annotated nostril and poll landmarks along with the SGBM algorithm were 0.019 m and 0.017 m, respectively. When the manual annotations were changed to the DLC predictions with ARIMA filtering, the system performance decreased to an RMSE and an MAE of 0.028 m and 0.022 m, respectively. B-A plots for semi- and fully automated head length estimations against the infield manual measurements are shown in figure 8. Table 4 summarizes the biases, LoAs, and ICCs for the two evaluated methods. The fully automated method resulted in a larger standard deviation than the semi-automated method. On the other hand, the biases of both methods were similar. As for ICC, the semi-automated method was excellently consistent with the infield manual method (ICC = 0.95), whereas the fully automated method was moderately consistent (ICC = 0.72) according to the guidelines by Koo and Li (2016). The p-values of the slope of the fitted regression lines in figure 8 for semi-automated and fully automated methods were 0.939 and 0.059, respectively. Because both p-values were greater than 0.05, proportional biases were not statistically significant in the two B-A plots, and thus the regression lines were not shown.

Figure 6. Bounding boxes around individual hooves and gait phase classification generated by the Faster R-CNN model for an image in the test dataset. The percentages show the probabilities of the detections.
Table 3. Precision, recall, and F1 score for different gait phase classification methods.
Gait Phase Classification MethodPrecisionRecallF1
Score
DLC + Faster R-CNN0.560.580.56
DLC w/ARIMA Filter + Faster R-CNN 0.610.740.61
DLC w/ ARIMA Filter
+ Faster R-CNN w/ Mode Filter
0.830.950.78

Stance Duration and Stride Length Evaluation

The system-derived SD resulted in an RMSE of 0.094 s in comparison to the manual measurements from the videos. In 2D image space, the RMSE between the system-derived SL and the ground truth was 5.73 pixels. The B-A analysis shows a bias of -0.025 s for SD (fig. 9a). The SL B-A plot (fig. 9b) shows a bias and a standard deviation of -0.042 m and 0.041 m, respectively. For SD, the upper and lower LoAs were 0.191 s and -0.241 s, respectively, and for SL, they were 0.040 m and -0.124 m, respectively. In figure 9a, the slope of the fitted regression line had a p-value of 3.853 × 10-6 for SD, which indicates a statistically significant trend where the system-derived SD went from under estimation to over estimation as the SD increased. On the other hand, in figure 9b for SL, the P-value of the slope of the fitted regression line was 0.071, and thus the regression line was not shown in the figure.?Table 5 summarizes the results of B-A analyses and ICCs for SD and SL estimations. The ICC for SD of all limbs combined was 0.79, while for SL it was 0.98, showing moderate consistency in estimating SD and excellent consistency in estimating SL. At the individual limb level, higher ICCs were found for right limbs than left limbs for the system-derived SD. Whereas, no similar discrepancy was found for the system-derived SL. This was likely due to the periodic occlusion of left limbs by right limbs under the right-side imaging angle. In addition, SD requires precise identification of each video frame when the gait phase of a hoof changes, whereas SL was computed as the average location of a hoof over several consecutive frames when the hoof is in the stance phase.

Figure 8. Bland-Altman analyses on the image-derived head length measurements against infield manual measurements. (a) manual annotation with SGBM-based 3D reconstruction, and (b) detected landmarks using DLC with SGBM-based 3D reconstruction.
Table 4. Summary of head length measurement agreement statistics (Bland-Altman and ICC) when compared to infield manual measurements. ICC: intra-class correlation; LoA: limit of agreement. The semi-automated method uses manual annotations in conjunction with SGBM, while the fully automated method uses SGBM in conjunction with DLC's output.
Semi-AutomatedFully Automated
Upper LoA (m)0.00200.0345
Lower LoA (m)-0.0360-0.0641
Bias (m)-0.0169-0.0147
ICC0.950.72

There were seven horses with SD that differed significantly between limbs among the 19 horses diagnosed as lame. As an example, three horses affected by injuries to the suspensory ligaments, deep digital flexor tendons, and tendon sheaths showed significantly shorter system-derived SDs for one of the four limbs (fig. 10). Among the other 12 horses with some degree of difficulty walking, there were no noticeable differences in SD.

The system-derived SL had a strong positive correlation with the infield manually measured pastern length (R2 = 0.53), demonstrating that horses with longer pasterns tended to have greater stride lengths (fig. 11). The result is consistent with the available studies in the literature (Baban et al., 2009; Heglund and Taylor, 1988; Heglund et al., 1974; Sánchez et al., 2013).

Discussion

Performance of the Proposed Pipeline

This study demonstrated that accurate gait classification using Faster R-CNN in conjunction with markerless body landmark detection and stereo videography to determine different biomechanical parameters of horses can be achieved using a high-throughput, cost-effective setup and pipeline. Traditionally, human visual assessment has served as a common method for assessing equine gait. However, human subjective assessment proved suboptimal because it could be affected by the temporal limitations of the human eye (e.g., a limited frequency response). The proposed system could enable researchers to acquire quantitative results on biomechanical parameters such as SL and SD under field conditions.

DLC sometimes failed to detect hoofs in a single frame from an equine locomotion video due to occlusion issues from a side-viewing angle. For example, a limb, an object on the track, or flying dirt from a hoof impacting the ground could obscure a hoof of interest. However, such failures typically lasted for a relatively small number of frames compared to the total number of frames during a stride. In this study, the ARIMA filter was employed to substitute missing or significantly erroneous detections, resulting in the generation of more plausible values. Notably, the detection accuracy of landmarks situated on the right limbs surpassed that of the left limbs. This discrepancy is attributable to periodic occlusions occurring as a consequence of the right-side imaging angle obstructing the left limbs. Nevertheless, this limitation can be effectively addressed by concurrently capturing and processing a secondary video from the left side of the horse. Out of the seven landmarks evaluated, detection of the poll had the second-largest error range. There was high variability in the poll appearance due to head movement in the video frames. The poll detection may be improved by adding more annotations of poll to the training dataset.

(a)(b)
Figure 9. Bland-Altman analyses for (a) stance duration (SD) and (b) stride length (SL) estimations.
Table 5. Summary of agreement statistics between system-derived stance duration and stride length measurements and image annotation-based measurements. LoA: limit of agreement; ICC: intra-class correlation; RF: right front limb; LF: left front limb; RR: rear right limb; LR: left rear limb.
Stance Duration Stride Length
Upper LoA 0.191s0.040m
Lower LoA -0.241s-0.124m
Bias -0.025s-0.042m
ICC RF0.860.99
ICC LF0.730.98
ICC RR0.860.98
ICC LR0.720.98
Combine ICC0.790.98
(a)(b)(c)
Figure 10. The system-derived stance durations for three horses with lameness in one of the legs based on a veterinarian’s diagnosis. (a) injury to the deep digital flexor tendon in a right hind limb, (b) suspensory injury to the right hind limb, and (c) issues with a right front tendon sheath. The abnormally low stance duration is highlighted by a red marker in each sub-figure.

Employing a fine-tuned Faster R-CNN model to perform classification of gait phase might be beneficial in two ways. This method allows observing gait phase directly rather than relying on indirect measurements like the trajectory of a hoof to estimate gait phase (Clayton, 2004). It is also important to note that extracting gait phase only by assessing IMU or other wearable sensors may be extremely dependent on the location of the sensors on the horses, which may differ from study to study (Moorman et al., 2012).

Figure 11. Average system-derived stride length (m) vs. infield manually measured pastern length (m) of the forty horses in the dataset with a fitted linear regression model to the data points.

Contact sensor-based systems such as the inertial sensor-based system or the GAITRite mat with embedded pressure sensors can achieve sub-mm level accuracy in comparison to a marker-based OMC system (Bosch et al., 2018; Cutlip et al., 2000). However, both contact sensor-based systems and marker-based OMC systems do not allow modification of the number of landmarks after data collection or easy use in challenging environments (e.g., rough terrain on a farm). Furthermore, the measurements can be influenced by the sensor/marker placement on the body (Moorman et al., 2012). In contrast, a DL-based markerless OMC system allows for the analysis of an arbitrary number of landmarks, although the landmark tracking accuracy may not be comparable to that of marker-based OMC systems. These advantages were explained in detail by Mathis et al. (2018), Nakamura et al. (2016), and Nath et al. (2019).

2D videography has limitations when it comes to measuring kinematic gait parameters. To create metric unit measurements, the observed gait parameters must be normalized by a known length in the video, provided that the horse moves in parallel to the image plane, which might be error-prone (Gupta, 2021). A stereo 3D camera does not need to be positioned perpendicular to the horse's movement direction, giving end users more flexibility. Our proposed system outputs landmark trajectories in 3D space and biomechanical parameters in metric units. Furthermore, the advantage of using a commercial stereo camera (ZED2) whose stereo image pair has already been rectified and calibrated means that the developed method does not require time-consuming camera calibration for a new scene, as opposed to using a multi-view imaging system in markerless 3D pose estimation methods such as AniPose and FreiPose (Karashchuk et al., 2021a; Zimmermann et al., 2020).

One of the limitations of this study was the logistical inability to use a marker-based OMC system to assess the proposed system for landmark tracking. A marker-based OMC system was not used because the retroreflective markers would conceal the body landmarks and also raise the risk of disclosing actual locations to DLC. As a result, DLC might learn the position of the markers rather than the landmarks. If the two OMC systems were employed sequentially, placing and removing the markers might potentially change the horse's behavior, and consequently, the comparison between the systems may not be valid. Given this constraint, head length was measured in this study to indirectly assess the ability of the system in 3D position tracking and subsequent stride length estimation. This procedure was carefully crafted and resulted in somewhat lower accuracy when DLC's landmarks were used in the fully automated approach as compared to the semi-automated method. One probable reason for this behavior is DLC's decreased accuracy as compared to manual annotations in landmark detection, as well as occasional misdetections of the landmarks. As seen in figure 8, head length estimation using DLC combined with SGBM had a bias of 0.014 m and a standard deviation of 0.025 m, which can still be considered accurate as the range of the head length was between 0.45 m and 0.63 m. This step may well illustrate the pipeline's potential performance when further 3D kinematic parameter measurements are made.

The proposed system can be further developed to detect musculoskeletal diseases (e.g., lameness). It has been demonstrated that a model-based behavioral analysis can be

conducted on horse locomotion to extract and detect lameness in equestrian animals (Li et al., 2021). Additionally, it might assist in breeding programs by providing quantitative data on biomechanical traits to select progeny of interest. Our evaluation of the outputs of the pipeline can provide insight for future research into whether different SDs can be detected in horses with lameness. Among the 19 lame horses sampled, there were seven horses with different stance durations between the hooves, indicating there is potential for this system to be used in lameness detection. In some instances of our dataset, the kind of lameness that a horse exhibited was not correlated with the system-derived SD. It is worth exploring extracting other biomechanical parameters sensitive to those kinds of lameness in future work.

The stereo imaging pipeline can be extended to multiple wirelessly synchronized stereo cameras. This enables tracking and analysis of landmarks around a horse. Also, a more precise stereo matching algorithm can be utilized to improve the accuracy of disparity maps. SGBM was the preferred method due to its high efficiency considering the computational system that was used. However, state-of-the-art CNNs (Chen and Jung, 2018) can be used to further improve the similarity measure between stereo image patches. Furthermore, since only the landmarks' 3D coordinates are needed, direct stereo matching of image patches centered on the landmarks instead of dense stereo matching of the entire image could be implemented to improve runtime efficiency. Lastly, the camera-to-horse distance determines the spatial resolution (i.e., detail level) and influences the accuracy of stereo matching-based depth estimation; therefore, another future research direction is to quantify the effect of camera-horse distance on the performance of the proposed system.

Conclusions

An automated EKGA method was developed that consisted of a 3D stereo videography system in conjunction with a DL-based video processing pipeline. This EKGA system combined several processing modules to detect equine body landmarks, detect hoof and gait phase, and compute 3D coordinates of the detected landmarks. In addition, two important biomechanical parameters, SD and SL, were accurately extracted using robust filtering and data fusion techniques as a case study. The developed system has the potential to become a low-cost, practical, and rapid analytical tool for animal scientists to perform EKGA under field conditions. It can allow equine locomotion researchers to rapidly phenotype a large population to identify genetic control of traits such as gait or sports performance. In addition, it can assist veterinarians in the clinical diagnosis of musculoskeletal disorders.

Acknowledgments

This project was funded by the Auburn University Intramural Grants Program, the Auburn University College of Agriculture, and the Alabama Agricultural Experiment Station. The authors would like to thank the Auburn University Equestrian Center for their assistance with data collection.

References

Arkell, M., Archer, R. M., Guitian, F. J., & May, S. A. (2006). Evidence of bias affecting the interpretation of the results of local anaesthetic nerve blocks when assessing lameness in horses. Vet. Rec., 159(11), 346-348. https://doi.org/10.1136/vr.159.11.346

Baban, M., Curik, I., Antunovic, B., Cacic, M., Korabi, N., & Mijic, P. (2009). Phenotypic correlations of stride traits and body measurements in Lipizzaner stallions and mares. J. Equine Vet. Sci., 29(6), 513-518. https://doi.org/10.1016/j.jevs.2009.04.193

Bala, P. C., Eisenreich, B. R., Yoo, S. B., Hayden, B. Y., Park, H. S., & Zimmermann, J. (2020). Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat. Commun., 11(1), 4560. https://doi.org/10.1038/s41467-020-18441-5

Barnhart, H. X., Yow, E., Crowley, A. L., Daubert, M. A., Rabineau, D., Bigelow, R.,... Douglas, P. S. (2016). Choice of agreement indices for assessing and improving measurement reproducibility in a core laboratory setting. Stat. Methods Med. Res., 25(6), 2939-2958. https://doi.org/10.1177/0962280214534651

Barrey, E. (1999). Methods, Applications and Limitations of Gait Analysis in Horses. Vet. J., 157(1), 7-22. https://doi.org/10.1053/tvjl.1998.0297

Barrey, E., Auvinet, B., & Couroucé, A. (1995). Gait evaluation of race trotters using an accelerometric device. Equine Vet. J., 27(S18), 156-160. https://doi.org/10.1111/j.2042-3306.1995.tb04910.x

Behnke, R. H., & Nakirya, M. (2012). The contribution of livestock to the Ugandan economy - IGAD LPI Working Paper No. 02-12. Retrieved from https://cgspace.cgiar.org/bitstream/handle/10568/24970/IGAD_LPI_WP_02-12.pdf

Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Stat. Methods Med. Res., 8(2), 135-160. https://doi.org/10.1177/096228029900800204

Bosch, S., Serra Bragança, F., Marin-Perianu, M., Marin-Perianu, R., Van der Zwaag, B. J., Voskamp, J.,... Havinga, P. (2018). EquiMoves: A wireless networked inertial measurement system for objective examination of horse gait. Sensors, 18(3), 850. https://doi.org/10.3390/s18030850

Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools.

Brooks, J. (2019). COCO Annotator, Version 0.11.1. https://github.com/jsbroks/coco-annotator

Brooks, S. A., Makvandi-Nejad, S., Chu, E., Allen, J. J., Streeter, C., Gu, E.,... Sutter, N. B. (2010). Morphological variation in the horse: Defining complex traits of body size and shape. Anim. Genet., 41(s2), 159-165. https://doi.org/10.1111/j.1365-2052.2010.02127.x

Chen, B., & Jung, C. (2018). Patch-based stereo matching using 3D convolutional neural networks. Proc. 2018 25th IEEE Int. Conf. on Image Processing (ICIP) (pp. 3633-3637). IEEE. https://doi.org/10.1109/ICIP.2018.8451527

Clayton, H. M. (2004). The dynamic horse: A biomechanical guide to equine movement and performance. Sport Horse Publ.

Cutlip, R. G., Mancinelli, C., Huber, F., & DiPasquale, J. (2000). Evaluation of an instrumented walkway for measurement of the kinematic parameters of gait. Gait Posture, 12(2), 134-138. https://doi.org/10.1016/S0966-6362(00)00062-X

DiLuoffo, V., Michalson, W. R., & Sunar, B. (2018). DiLuoffo, V., Michalson, W. R., & Sunar, B. (2018). Robot Operating System 2. In International Journal of Advanced Robotic Systems (Vol. 15, Issue 3, p. 172988141877001). SAGE Publications. https://doi.org/10.1177/1729881418770011

Dunn, T. W., Marshall, J. D., Severson, K. S., Aldarondo, D. E., Hildebrand, D. G., Chettih, S. N.,... Ölveczky, B. P. (2021). Geometric deep learning enables 3D kinematic profiling across species and environments. Nat. Methods, 18(5), 564-573. https://doi.org/10.1038/s41592-021-01106-6

Grice, A. L. (2018). 2017 American Horse Council economic impact study. Proc. 64th Annu. Convention of the American Association of Equine Practitioners (pp. 502-504). Lexington, KY: American Association of Equine Practitioners (AAEP).

Günel, S., Rhodin, H., Morales, D., Campagnolo, J., Ramdya, P., & Fua, P. (2019). DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila. eLife, 8, e48571. https://doi.org/10.7554/eLife.48571

Gupta, V. (2021). Equine gait analysis, body part tracking using deeplabcut and mask R-CNN and biomechanical parameter extraction. MS Thesis. Auburn, AL: Auburn University, Department of Computer Science and Software Engineering. Retrieved from https://etd.auburn.edu//handle/10415/7894

Hagen, J., Jung, F. T., Brouwer, J., & Bos, R. (2021). Detection of equine hoof motion by using a hoof-mounted inertial measurement unit sensor in comparison to examinations with an optoelectronic technique - A pilot study. J. Equine Vet. Sci., 101, 103454. https://doi.org/10.1016/j.jevs.2021.103454

Hatrisse, C., Macaire, C., Sapone, M., Hebert, C., Hanne-Poujade, S., De Azevedo, E.,... Chateau, H. (2022). Stance phase detection by inertial measurement unit placed on the metacarpus of horses trotting on hard and soft straight lines and circles. Sensors, 22(3), 703. https://doi.org/10.3390/s22030703

Heglund, N. C., & Taylor, C. R. (1988). Speed, stride frequency and energy cost per stride: How do they change with body size and gait? J. Exp. Biol., 138(1), 301-318. https://doi.org/10.1242/jeb.138.1.301

Heglund, N. C., Taylor, C. R., & McMahon, T. A. (1974). Scaling stride frequency and gait to animal size: Mice to horses. Science, 186(4169), 1112-1113. https://doi.org/10.1126/science.186.4169.1112

Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell., 30(2), 328-341. https://doi.org/10.1109/TPAMI.2007.1166

Hole, S. L., Clayton, H. M., & Lanovaz, J. L. (2002). A note on the linear and temporal stride kinematics of Olympic show jumping horses between two fences. Appl. Anim. Behav. Sci., 75(4), 317-323. https://doi.org/10.1016/S0168-1591(01)00194-0

Sánchez, José, M., Dolores Gómez, M., Peña, F., García Monterde, J., Luís Morales, J., Molina, A., & Valera, M. (2013). Relationship between conformation traits and gait characteristics in Pura Raza Español horses. Arch. Anim. Breed., 56(1), 137-148. https://doi.org/10.7482/0003-9438-56-013

Karashchuk, P., Rupp, K. L., Dickinson, E. S., Walling-Bell, S., Sanders, E., Azim, E.,... Tuthill, J. C. (2021a). Anipose: A toolkit for robust markerless 3D pose estimation. Cell Rep., 36(13). https://doi.org/10.1016/j.celrep.2021.109730

Karashchuk, P., Tuthill, J. C., & Brunton, B. W. (2021b). The DANNCE of the rats: A new toolkit for 3D tracking of animal behavior. Nat. Methods, 18(5), 460-462. https://doi.org/10.1038/s41592-021-01110-w

Keegan, K. G. (2007). Evidence-based lameness detection and quantification. Vet. Clin. North Am.: Equine Pract., 23(2), 403-423. https://doi.org/10.1016/j.cveq.2007.04.008

Keegan, K. G., Wilson, D. A., Smith, B. K., & Wilson, D. J. (2000). Changes in kinematic variables observed during pressure-induced forelimb lameness in adult horses trotting on a treadmill. Am. J. Vet. Res., 61(6), 612-619. https://doi.org/10.2460/ajvr.2000.61.612

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med., 15(2), 155-163. https://doi.org/10.1016/j.jcm.2016.02.012

Kotu, V., & Deshpande, B. (2019). Chapter 12 - Time series forecasting. In V. Kotu, & B. Deshpande (Eds.), Data Science (2nd ed., pp. 395-445). Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-814761-0.00012-5

Li, C., Ghorbani, N., Broomé, S., Rashid, M., Black, M. J., Hernlund, E.,... Zuffi, S. (2021). hSMAL: Detailed horse shape and pose reconstruction for motion pattern recognition. arXiv preprint arXiv:2106.10102. https://doi.org/10.48550/arXiv.2106.10102

Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy, V. N., Mathis, M. W., & Bethge, M. (2018). DeepLabCut: Markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci., 21(9), 1281-1289. https://doi.org/10.1038/s41593-018-0209-y

Moorman, V. J., Reiser, R. F., McIlwraith, C. W., & Kawcak, C. E. (2012). Validation of an equine inertial measurement unit system in clinically normal horses during walking and trotting. Am. J. Vet. Res., 73(8), 1160-1170. https://doi.org/10.2460/ajvr.73.8.1160

Nakamura, A., Funaya, H., Uezono, N., Nakashima, K., Ishida, Y., Suzuki, T.,... Shibata, T. (2015). Low-cost three-dimensional gait analysis system for mice with an infrared depth sensor. Neurosci. Res., 100, 55-62. https://doi.org/10.1016/j.neures.2015.06.006

Nakamura, T., Matsumoto, J., Nishimaru, H., Bretas, R. V., Takamura, Y., Hori, E.,... Nishijo, H. (2016). A markerless 3D computerized motion capture system incorporating a skeleton model for monkeys. PLoS One, 11(11), e0166154. https://doi.org/10.1371/journal.pone.0166154

Nath, T., Mathis, A., Chen, A. C., Patel, A., Bethge, M., & Mathis, M. W. (2019). Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat. Protoc., 14(7), 2152-2176. https://doi.org/10.1038/s41596-019-0176-0

Peham, C., Licka, T., Girtler, D., & Scheidl, M. (2001). The influence of lameness on equine stride length consistency. Vet. J., 162(2), 153-157. https://doi.org/10.1053/tvjl.2001.0593

Pereira, T. D., Aldarondo, D. E., Willmore, L., Kislin, M., Wang, S. S., Murthy, M., & Shaevitz, J. W. (2019). Fast animal pose estimation using deep neural networks. Nat. Methods, 16(1), 117-125. https://doi.org/10.1038/s41592-018-0234-5

Pfau, T., Fiske-Jackson, A., & Rhodin, M. (2016). Quantitative assessment of gait parameters in horses: Useful for aiding clinical decision making? Equine Vet. Educ., 28(4), 209-215. https://doi.org/10.1111/eve.12372

Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In IEEE Transactions on Pattern Analysis and Machine Intelligence (Vol. 39, Issue 6, pp. 1137–1149). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/tpami.2016.2577031

Rooney, J. R., Thompson, K. N., & Shapiro, R. (1991). A contribution to the study of velocity, stride length, and frequency in the horse. J. Equine Vet. Sci., 11(4), 208-209. https://doi.org/10.1016/S0737-0806(06)80978-0

Rose, N. S., Northrop, A. J., Brigden, C. V., & Martin, J. H. (2009). Effects of a stretching regime on stride length and range of motion in equine trot. Vet. J., 181(1), 53-55. https://doi.org/10.1016/j.tvjl.2009.03.010

Serra Bragança, F. M., Broomé, S., Rhodin, M., Björnsdóttir, S., Gunnarsson, V., Voskamp, J. P.,... Hernlund, E. (2020). Improving gait classification in horses by using inertial measurement unit (IMU) generated data and machine learning. Sci. Rep., 10(1), 17785. https://doi.org/10.1038/s41598-020-73215-9

Serra Bragança, F. M., Rhodin, M., & van Weeren, P. R. (2018). On the brink of daily clinical application of objective gait analysis: What evidence do we have so far from studies using an induced lameness model? Vet. J., 234, 11-23. https://doi.org/10.1016/j.tvjl.2018.01.006

Vonstad, E. K., Su, X., Vereijken, B., Bach, K., & Nilsen, J. H. (2020). Comparison of a Deep Learning-Based Pose Estimation System to Marker-Based and Kinect Systems in Exergaming for Balance Training. In Sensors (Vol. 20, Issue 23, p. 6940). MDPI AG. https://doi.org/10.3390/s20236940

Wu, Y., Kirillov, A., Massa, F., Lo, W., & Girshick, R. (2019). Detectron2. Retrieved from https://github.com/facebookresearch/detectron2

Zhu, M. (2004). Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo. In Waterloo Working paper, Tech. Rep..

Zimmermann, C., Schneider, A., Alyahyay, M., Brox, T., & Diester, I. (2020). FreiPose: A deep learning framework for precise animal motion capture in 3D spaces. bioRxiv. https://doi.org/10.1101/2020.02.27.967620