Press "Enter" to skip to content

Machine Learning in Biomechanics – Challenges and Opportunities

Jonas Ebbecke 0

“Over the next decade, machine learning is likely to greatly enhance the objectivity of decision making in sports science” – C. Richter, M. O’Reilly & E. Delahunt (2021)

  1. What is Machine Learning?

  2. What is a classic ML process?

  3. What are the applications in biomechanics?

  4. What are the challenges to be overcome?

  5. Summary

1. What is Machine Learning?

Anyone who is working in the field of biomechanics knows that there is no getting around any calculations in data analysis. When post-processing biomechanical measurement data, comparable quantities must be determined from raw data. For example, ground reaction forces are determined from changes in electrical charge within a force plate, trajectories of body parts are calculated from 3D coordinates, muscle activity is derived from voltages, and so on. For fairly simple computational tasks, such as linear regression, you can tell a computational program step by step what to calculate, and it will execute the corresponding commands. When doing so, the analyst must know – and enter – each individual calculation step, so the program produces the desired results. The program itself does not learn anything but merely executes. The more complex a calculation task becomes, e.g. due to a high number of influencing variables or non-linear correlations, the more difficult it becomes to manually tell the program what to calculate. Here it would be more efficient if the algorithm itself was able to decide what should be calculated and how.

Such algorithms have been around for a few years and are called machine learning (ML). They can be categorized as a part of artificial intelligence (AI), where a computer program is able to learn on its own through sample data (also called “training data”) and adopt a model that can make predictions or decisions on its own. The key point here is that a computer learns to perform tasks without being explicitly asked to do so.

Maybe you did not know it yet, but all of us come across ML algorithms in our everyday lives all the time. One of the most popular examples is the spam filter in your email program. It is constantly being trained to distinguish spam from relevant mails. Or what about the advertising presented to you by big tech companies like Google, Facebook, and co? Here, a model of you is trained which then assigns you the perfect advertisement. Or the voice recognition of your smartphone. It would hardly be possible to enter manually which changes in the frequency spectrum recorded by the microphone could mean which words. And not just for you alone, but for billions of users with incredibly high variability in their voices.

Generally, the applications for ML are very broad. They are used, for example, in prediction, image processing, classification, pattern recognition, or as a tool for multivariate data analysis.

ML algorithms owe their growing popularity in mathematical modeling to their…

  1. … non-linearity, which allows for a better fit to real-world data;
  2. … robustness to noise, which provides accurate results even with uncertain data;
  3. … fast computational speed;
  4. … ability to modify their internal structures when external influences change
  5. … high generalisability, which also allows modeling of unseen data (Basheer and Hajmeer, 2000).

Machine learning is thus a marvel of programming art. It is versatile and can solve problems that would be too complicated for a human to handle manually.

2. What is a classic ML process?

Here is a rough overview of the usual machine learning workflow:

3. What are the applications in biomechanics?

The use of ML is similarly versatile in biomechanics as it is in all other disciplines, and therefore it has been widely used:

  1. Optimisation of inertial sensor readings (Jamil et al., 2020)
  2. Optimisation of data acquisition from force plates (Hiesh et al. 2011).
  3. I myself have used one type of ML for the calibration of a kinetic bicycle pedal. Read more about it here.
  1. 3D kinematics and vertical ground reaction forces could soon be predicted from 2D videos (Morris et al., 2020; Weir et al., 2019)
  2. Barbell motion trajectories and mass were used to predict joint moments (Kipp et al., 2018)
  3. Inertial sensors were used to predict ground reaction forces (Mundt, 2019)

Classification models are able to break down data into meaningful packages that would have previously taken a large amount of time for sports scientists (Ahmadi et al., 2014)

Classification models can facilitate objective decision-making regarding rehabilitation practices and injury prevention (Richter et al., 2019; Rommers et al., 2020)

4. What are challenges to be overcome?

Even though ML promises relatively simple solutions to complex problems in theory (provided you have the mathematical understanding and appropriate programming skills), there are still some challenges in practice. These must be mastered by each user.

As written above, an ML algorithm needs training data from which it can learn and adapt its model. The rule of thumb is that about 60% of the work in dealing with ML is spent on pure training data acquisition. The quality of the training data is ultimately one of the decisive points as to whether the algorithm will do a good job or not.

In general, ML can be divided into two different learning strategies, which have an impact on the requirements for the training data: Supervised learning and non-supervised learning. Supervised learning is used to determine functions, such as in classification and regression problems. Here it is important that both the predictors (i.e., the data we hand over to the algorithm to receive a result) and the target values (i.e. the data the algorithm should return) are recorded as well. In the training process, the ML algorithm is thus able to learn which results it should put out for which input values and can later apply this "knowledge" to unseen data. For example, if one wants to predict ground reaction forces using MoCap data, as Mundt et al. (2019) did, one needs to record both 3D kinematics (predictor) and ground reaction forces (target values) for the training data. After the training process, the algorithm should then be able to estimate the ground reaction forces from the kinematics data on its own.

However, if the ML algorithm is to be used for the determination of unknown structures, such as clustering or dimensionality reduction, the so-called non-supervised learning is applied. For this purpose, no target values need to be recorded.

One of the decisive factors for the success of the training process is the pure amount of training data. The more complicated the relationships to be modelled are and the more precise the desired results need to be, the more training data is needed. And this is exactly where one of the great challenges in biomechanics lies. If you have ever conducted a study with volunteers, you know how difficult it is to recruit enough participants. Most studies are conducted with 10-20 subjects, which is not enough for ML applications in most cases. Biomechanical studies training an AI have an average of around 40 subjects, but even this does not always seem to be sufficient (Halilaj et al., 2018). It is difficult, if not impossible, to define a number that is needed as the minimum training data for an ML algorithm. But one can say unequivocally: you can never collect too much high-quality data! The more you have, the more accurate your model becomes. Of course, there is a kind of saturation somewhere, where the improvements will be marginal, yet there is no general maximum value for this either.

In addition to a more accurate model, a large training data set also ensures high generalisability and prevents so-called overfitting, which we will discuss later.

In addition to the sheer amount of training data, the variability in the targeted range of values is of crucial importance. The training data collected should be representative for your field of application in order to ensure a high degree of generalisability. In other words, the data you use for training should cover all cases that have occurred and will occur. An example might shed some light on this: Let's say you have programmed an ML algorithm that reads in images and distinguishes between dogs and cats. You train the algorithm with three images per animal: a dog with white fur, one with brown fur and one with black fur; a cat with white fur, one with brown fur and one with mackerel fur. Then you test the trained model with a picture of a cat that has a black fur. The probability is now relatively high that the ML algorithm incorrectly identifies a dog in this image, because there was no black cat in the training data set, only a black dog. If you add an image with a black cat to the training data, the algorithm is probably able to distinguish a black dog from a black cat, because it determines the differences between the two animals on the basis of other features, such as stature. A high variability in the training data thus also goes hand in hand with a high amount of data. A high amount of data, on the other hand, does not necessarily mean a high variability, which would be fatal. 10 pictures of brown cats do not help to identify a black cat. Here we would speak of a sampling bias.

It is equally important that the training data set does not contain cases that cannot occur at all. Say, for example, a mistake was made in the naming of the image files and a picture of a cat is thereby assigned to the category "dog". The picture of the cat would obviously not be representative of a dog. If you use a non-representative training data set, the trained model will also not be able to make accurate predictions.

When training the model, care should be taken to use only features as predictors that have actual utility. If the training data are characterised by a high number of irrelevant features, your model will produce significantly worse results. The motto is: if you put rubbish in, rubbish will come out. To stay with our dog-cat example, you could, for example, look at the metadata of the image files. Here, among other things, the exact time and date when the picture was taken is saved for each picture. This information is clearly of no use in distinguishing between dogs and cats and could even lead to a reduction in the performance of the algorithm. It is therefore important that you identify all relevant features before the training process and include these, then eliminate all irrelevant features in the next step.

After you have recorded enough training data with a high variability, and then separated relevant features from irrelevant features, the quality of the data should be checked, as with any data analysis in biomechanics. Are there any outliers that should be filtered out? Are there parts of the data missing, where gaps can (and must) be filled?

The model should produce the most accurate results possible, and therefore the quality of the data is of high importance.

Not all machine learning is the same. There are many different methods that are used for different applications. It is important to select the most suitable one and then optimise it. Each individual model in turn has some hyperparameters that must be configured optimally. There is no universal method for this, and a more detailed description would go beyond the scope of this article. But don't worry, we are working on a specific article!

There are also a few points to consider in the training process to get optimal results. For a successful training process, it is not important to minimise the error between ML output and target values, but to optimise it. It is important to find a good tradeoff between underfitting and overfitting the model. Underfitting occurs when the underlying model is not able to represent the complexity of the relationships. If this is the case, an ML algorithm with a higher complexity must be chosen. Overfitting is the exact opposite of this, and occurs much more frequently in practice. Here, it leads to the phenomenon that the learning algorithm fits the network outputs so well to the target values, that any kind of noise within the training data set is also stored and mapped (Jabbar & Khan, 2014).

The main task in training new ML algorithms can be summarised as (i) creating a model that corresponds in complexity to the real problem and (ii) training this model to optimise its internal structures.

For (i), the principle of 'Occam's Razor' can be applied. This principle states that simple solutions to a problem should be preferred to more complex ones (Sober, 2015). For ML algorithms, this principle suggests that low model complexity capable of performing the task of the network leads to high generalisation performance without overfitting. Furthermore, a reduction in model complexity leads to a reduction in computation time (Schmidhuber, 2015).

Point (ii) imposes requirements on the training data to be used and the training process itself. For the training data, the sheer amount of data points and their variability are of fundamental importance. Smaller datasets are more prone to overfitting as they may not capture the full complexity of the real-world problem and the model can easily adapt to its noise and outliers (Jabbar, 2014). However, even with a large dataset, care must be taken to avoid overfitting by stopping the training process at the right point.

For this, one can divide the training dataset into two or three sub-datasets:

  1. Training set: this is the largest part and is used to reduce the error between outputs and targets
  2. Validation set: used for cross-validation of the model during the training process. This is separate from the training set but is used within the training process.
  3. Test set: This part of the training data is not used for the training itself but is only used to evaluate the performance of the model.

Using the three subsets of data, the early stopping regularisation technique is now applied. Usually, both the validation error and the training error will decrease in the early stages of training. The beginning of overfitting can be detected when the validation and testing error starts to increase while the training error continues to decrease. Training is stopped as soon as a significant drop in generalisation performance is detected over a predefined number and epochs. The training algorithm then returns the model parameters with the best performance up to that point (Jabbar, 2014).


ML algorithms have a wide range of applications and score points due to their fast computation times, their non-linearity as well as their adaptability to real and complex problems. And it is precisely for these reasons that they have already found some applications in biomechanics. But of course, these advantages do not come without challenges and drawbacks: You need a large amount of training data with a high variability that describes the real problem as good as possible. You also need to choose the right model to train, configure the parameters optimally and design the training process in such a way that there is no under- or overfitting. Often there are no exact guidelines that you can follow, so you have to use the trial-and-error method until you develop a feeling for the best settings for your project.

But all of this can be mastered, and then machine learning definitely has the potential to significantly increase the objectivity of decision-making in biomechanics and solve complex problems quickly in the future.


Ahmadi, A., Mitchell, E., Destelle, F., Gowing, M., O’Connor, N.E., Richter, C., Moran, K., 2014 – 2014. Automatic Activity Classification and Movement Assessment During a Sports Training Session Using Wearable Inertial Sensors. In 2014 11th International Conference on Wearable and Implantable Body Sensor Networks. IEEE, pp. 98–103. IEEE. DOI: 10.1109/BSN.2014.29 

Basheer, I.A., Hajmeer, M., 2000. Artificial neural networks: fundamentals, computing, design, and application. Journal of Microbiological Methods 43, 3–31.

Halilaj, E., Rajagopal, A., Fiterau, M., Hicks, J.L., Hastie, T.J., Delp, S.L., 2018. Machine learning in human movement biomechanics: Best practices, common pitfalls, and new opportunities. Journal of biomechanics 81, 1–11.

Hsieh, H.-J., Lu, T.-W., Chen, S.-C., Chang, C.-M., Hung, C., 2011. A new device for in situ static and dynamic calibration of force platforms. Gait & posture 33, 701–705.

Jabbar, H.K., Khan, R.Z., 2014 – 2014. Methods to Avoid Over-Fitting and Under-Fitting in Supervised Machine Learning (Comparative Study). In Computer Science, Communication and Instrumentation Devices. Research Publishing Services, Singapore, pp. 163–172. Research Publishing Services, Singapore. DOI: 10.3850/978-981-09-5247-1_017

Jamil, F., Iqbal, N., Ahmad, S., Kim, D.-H., 2020. Toward Accurate Position Estimation Using Learning to Prediction Algorithm in Indoor Navigation. Sensors (Basel, Switzerland) 20.

Kipp, K., Giordanelli, M., Geiser, C., 2018. Predicting net joint moments during a weightlifting exercise with a neural network model. Journal of biomechanics 74, 225–229. DOI: 10.1016/j.jbiomech.2018.04.021

Mundt, M., David, S., Koeppe, A., Bamer, F., Markert, B., Potthast, W., 2019. Intelligent prediction of kinetic parameters during cutting manoeuvres. Medical & biological engineering & computing 57, 1833–1841. DOI: 10.1007/s11517-019-02000-2

Richter, C., King, E., Strike, S., Franklyn-Miller, A., 2019. Objective classification and scoring of movement deficiencies in patients with anterior cruciate ligament reconstruction. PloS one 14.

Richter, C., O’Reilly, M., Delahunt, E., 2021. Machine learning in sports science: challenges and opportunities. Sports biomechanics, 1–7.

Ritheesh Baradwaj Yellenki, 2020. Top 8 Challenges for Machine Learning Practitioners. The major challenges one needs to overcome while developing a machine learning application.

Rommers, N., RÖssler, R., Verhagen, E., Vandecasteele, F., Verstockt, S., Vaeyens, R., Lenoir, M., D’Hondt, E., Witvrouw, E., 2020. A Machine Learning Approach to Assess Injury Risk in Elite Youth Football Players. Medicine and science in sports and exercise 52, 1745–1751. DOI: 10.1249/MSS.0000000000002305

Stephen, J., Rohil, H., Vasavi, S. (Eds.), 2014 – 2014. Computer Science, Communication and Instrumentation Devices. Research Publishing Services, Singapore.

Weir, G., Alderson, J., Smailes, N., Elliott, B., Donnelly, C., 2019. A Reliable Video-based ACL Injury Screening Tool for Female Team Sport Athletes. International journal of sports medicine 40, 191–199. DOI: 10.1055/a-0756-9659

Leave a Reply

Your email address will not be published. Required fields are marked *