Neural networks and skill acquisition

As part of my original chemistry degree, I encountered neural networks as part of a specialism in computational chemistry. This was way back in 1991.

The field of artificial intelligence has evolved quite a lot since then, but the underlying concept of an algorithm that becomes increasingly accurate in making decisions and predictions over time is still valid. And in recent times I’ve been thinking about how similar this model is (or isn’t) to how humans develop, learn and acquire skills.

Essentially (for AI), in really simple terms, you try to identify a combination of inputs that relate to a defined output. In this way you create a model that approximates your chosen real world scenario and can predict what output you would get from a given set of inputs.

In my case we were using attributes of various molecules (and parts of molecules) to evaluate how good a compound might be as a prospective anti-cancer drug. The idea was that we could “screen” for lots of possible molecules, and model their probable efficacy without the lengthy and expensive process of creating and testing those drugs for real.

One of the big challenges when doing this work is trying to identify what inputs you should use. The science can help identify what attributes might be applicable. Back then, we just used inputs that we thought should have the most impact. Modern AI techniques have started to allow the algorithm to pick its own inputs. But the truth is that you still constrain the system by what observational systems you give it.

And so on to my thoughts about how you apply this stuff to sporting development. Surely, I found myself thinking, players develop talent in a sport by being exposed to a range of inputs (technical, tactical, mental, physical) and then creating a model in themselves that suggests the right thing to do? Perception-action coupling is the phrase used within coaching.

So, imagine I wanted to create the perfect hockey player. Or at least the perfect model for developing the best hockey player from any given individual.

What combination of attributes might be desirable? Spend more time on technical skills? Spatial awareness? Strength and conditioning? Mental resilience and skills? What even are the right “inputs” for a hockey player?

Their genetics are going to provide some limitations (and additional options?) within any model. If this was a neural network model, you’d fix these inputs (or at least constrain them to minimum and maximum values).

So we immediately start to encounter the challenge of translating this analogy of artificial intelligence to “wet systems” skill acquisition.

The neural network model is quite simple. Ideally you give it a load of training data (known inputs that give known outputs) and let it create an equation that describes the relationship you’re interested in. This looks something like a topographical map, with hills and valleys. The peaks describe the better outcome, and the valleys are the worst.

The algorithm learns the relationship by working out if a slight variation in the inputs would give a better output or a worse one. The best analogy for this is a little explorer on your topographical map. If they take a step in any direction, does it lead them uphill or downhill? If downhill, you’re going the wrong way. If uphill then keep going.

When your explorer finds that every step they take is downhill, they’ve effectively reached the top. They have identified the best combination of inputs for that output. Except they might not have done.

Extend the analogy to something like the Lake District. When you climb a hill there, it’s easy to look around and see that, although you climbed to the top of this hill, there are taller hills you can see all around you. This is known as the “local maxima problem”.

AI developers attack this problem (or used to at least) by parachuting lots of explorers into their hills and valleys at lots of different starting points. These are called “seeds” and are usually generated randomly. Again, the idea is to give lots of opportunity to genuinely find the top of the highest peak (the best combination of inputs).

Each explorer can also have the size of their step defined. Ones with shorter steps are more sensitive to the gradient immediately around them and more accurately identify the correct direction to travel in. But they are more likely to be stuck in a local maximum.

People are very different to our theoretical explorers. First up, even in a relatively rules based thing like sport, the correct inputs are hard to know. Not to mention the fact that you can’t stop other inputs from mixing with the ones you’re interested in.

You can’t stop your prospective genius-level athlete from watching a movie or meeting a new friend. Controlling inputs is really hard, but also may mean that you miss out on an ingredient that is really important without realising how much difference it makes.

So, even assuming you have a good idea of what the right inputs might be, you then have to deal with those genetic limitations we mentioned earlier. You may not even know how fixed those limitations are for a given athlete, or how well they will respond to training interventions.

And that leads us to our second problem. You only have one copy of your athlete. You can’t clone them and create lots of seeds to avoid the local maximum problem.

But what you can do is to encourage them to take big, wild steps if you really want to find the highest peak. The analogy starts to break down a bit here, because your athlete also happens to be human (presumably) and will naturally take steps of varying sizes. You can encourage them to be bold and try new things, but it may not work, and on any given day they’re likely to behave differently anyway (pesky emotional variables).

You also run two very significant risks. If your bold strategy leads to a lack of progress then you may alienate your athlete, leading to a loss of motivation and disengagement. And even if you don’t lose the person, you might actually run out of time. Biological systems (in this case the kids/athletes you’re working with) age. You can’t spend all your time trying to find the biggest mountain and then leave no time to actually climb it.

So how are modern coaching methods related to this? “Let the game be the teacher” is a nice example of where the coach isn’t necessarily trying to control all the inputs. Scenarios are created and outcomes encouraged. This is a bit like the training data for a neural network. This “ecological dynamics” approach seeks to guide the athlete in their skill acquisition, rather than dictate it.

We create environments (which still have our assumptions of what “good” looks like baked into them), and we encourage the players to work out their own solutions and identify for themselves what the “right” answer might be. This is a massive step forward from older, dictatorial, models of athlete development.

We also encourage these scenarios to have large numbers of opportunities to learn, making them more intense and enjoyable for the player. For example, in hockey, an exercise might have a smaller number of players and a smaller space to play in, encouraging lots of touches of the ball. This is a bit like giving the neural network a smaller step size (or at least a large number of possible steps). Players can try something different, see if it works and then evaluate what to do next.

Where are our new coaching methods at risk of letting us down? Well, a couple of observations about using small-sided games in particular first of all.

A game of hockey is 70 minutes long and contested by two teams of 11 players on the pitch. Let’s make some assumptions. Firstly, let’s assume that there are no substitutes, and that the 10 outfield players all touch the ball an equal amount. Let’s also assume that both teams have the ball about the same amount of time in the game.

70 minutes, between 20 players means an average of 3.5 minutes with the ball each. And that assumes that the ball is in play the whole time. Now let’s be really generous and say you get twice as much time on the ball as the average. 7 minutes. In a 70 minute game. That means you have to be effective without the ball for 90% of the game at the very least. Are our coaching methods (and practice design) taking that into account?

My second observation is that by using a diet of nothing but small-sided games (SSGs), we risk players being less able to operate on the larger scale and with the greater spaces involved. Passing technique over longer distances can suffer. Its not to say that we can’t adapt to this as coaches, but we need to be aware of the issues we create by training our players in this way.

My final thought about SSGs and these techniques is less fully formed. I have a feeling that we create some problems when we break games down into these objective-based mini sessions. Assumptions about how to operate in each smaller chunk add up and potentially create gaps when you put it all back together.

What I would advocate is that coaches be aware of what SSGs do, and make sure you provide opportunities to play larger, more expansive scenarios too. And don’t neglect that 90% of time when your player won’t have the ball.

So let’s get back on track with this post. If we know players can’t exactly be taught like a neural network would be, and we see modern coaching methods trying to deliver coaching that addresses some of these issues anyway, what’s the problem? And what could I be doing differently as a coach?

Ultimately players are like neural networks (at least as an approximation). We learn by trial and error, hoping to identify the right inputs and manufacturing a decision-making engine that will yield the right outputs at any given time. Neural networks have evolved to try and address the input problem. We as coaches have started to do this too.

They use seeds to avoid getting caught in a local maximum. I’m not sure that we have worked out a good strategy for this as coaches. One technique is to try and keep ourselves open to new learning, and at least recognise that there might be other peaks around us. Not all of those peaks will be accessible to all of our athletes. I’m happy to walk up a hill in the Lake District, but I’m not scaling Mount Everest.

The local maximum might be the right solution for the player you’re working with. The art is in evaluating (and re-evaluating!) what that best outcome might be. Physical and mental skills develop in a non-linear way. Make sure you stay open to the idea that the absolute potential of your player may change over time (for both good and bad). You can only do this if you get to know your players properly.

So although I’m not advocating you experiment on your players, there’s definitely the opportunity to understand that they will learn and develop at their own rate, to their own local maximum and with their own (completely unknowable) set of inputs.

The job of the coach is perhaps to help them understand what a great outcome is for them (which hills are good for them) and help them evaluate if they’re making steps towards it. Maybe our purpose is just to help them work out if the step they just took was leading them uphill or downhill.