A Deep Dive in Analyzing Swimming Data


This past summer, my daughter joined a swim team. Her team is among several teams that are part of an area league. All summer long, developmental and competitive meets are organized where teams compete on a one-to-one basis. All kids in a team swim in developmental meets. For competitive meets, top 3 or 4 kids are chosen by age groups. Meets are organized in community pools that are either 25 meters or 25 yards in length. The team schedules are packed with practices and events which are intense but thoroughly enjoyable.

This is one sport where progress can be measured on almost daily basis. While most young kids take it up as a hobby, older kids seem to be very competitive and quite mindful of their swim times. Besides her own results this year, I was curious to know how my daughter did compared to a larger cohort of kids her own age. And what could we expect, if she keeps at it in the years to come?

Luckily, I found past several years of her team’s results to analyze. Gathering this data, cleaning and transforming it for analysis was a huge challenge onto itself. Not something I intend to discuss in this post. But I digress.

Let’s take a look at the number of kids by age and gender in this dataset.

The pyramid plot above shows the number of competitors from ages 6 to 18 throughout the years. Seems like 6 to 8 are the most popular ages for kids to begin competitive swimming. There are some late joiners by age 10, after which we see a steady decline in the number of kids competing.

Let’s take a look at the ages when swimmers compete in each stroke.

We can observe a few things here:

  • Everyone starts with Freestyle swimming. The participation rate is close to 100% for both genders. But towards the late teens, some boys seem to give up Freestyle to specialize in other strokes.

  • Girls seem to learn Backstroke a little faster than boys, and a wide majority of girls keep swimming Backstroke until their late teens.

  • Girls catch up with learning Breaststroke much more quickly than boys, and a majority of them keep at it until their late teens, where as some boys choose to specialize in either Backstroke or Butterfly.

  • Learning Butterfly appears to be a much bigger challenge for both genders. Relatively fewer kids compete in it early on. Boys in their early teens appear to participate at a faster clip than girls. But towards the late teens, girls appear to participate more consistently.

Steep Learning Curve

There’s a steep learning curve in swimming. Each stroke has its own level of difficulty and challenges but once the basics are learned, rapid progress could be made. However, there’s a huge difference between knowing how to swim a stroke, and swimming it well. It requires both speed and precision.

Swimming also happens to be one of the most unforgiving sports. In other sports, minor violations result in fouls or penalties; disqualifications or ejections are rare and usually result from violent or reckless behavior. But in swimming, the moment a competitor violates a rule, it results in disqualification. Even in summer leagues, the swim officials are trained and expected to follow USA Swimming standards. So, a 6 year old is to be judged the same way as an adult at USA Swimming events.

To get an idea, here are some ways of getting disqualified:

  • Freestyle: As the name suggests, everything is legal except the following:
    • Walking on the bottom of the pool.
    • Stopping and pushing off of the bottom of the pool.
    • Pulling on a lane line for an assist.
    • Not touching the wall before turning or at finish.
  • Backstroke
    • Not swimming on back off the wall.
    • Delaying initiating a turn.
    • Pulling on a lane line for an assist.
    • Not touching the wall before turning or at finish.
  • Breaststroke
    • Pulling hands beyond hips.
    • Non-simultaneous or single hand touch at the turn or finish.
    • Doing alternating kicks (as in Freestyle) or dolphin kicks (as in Butterfly).
    • Not being on breast after leaving the wall.
  • Butterfly
    • Arms underwater during the recovery phase of stroke.
    • Non-simulatenous arm movements during strokes.
    • Non-simultaneous or single hand touch at the turn or finish.
    • Doing alternating kicks (as in Freestyle) or Breaststroke kicks.

From what I observed throughout the season, in a majority of cases, rule violations are not deliberate. They happen inadvertently.

Let’s take a look at the total number of disqualifications by age, gender and stroke in the dataset:

The absolute numbers of DQs are a proxy for the level of difficulty of each stroke. Most kids learn each stroke in this order from Freestyle to Butterfly. However, as seen in the previous plot, the level of participation in Butterfly is far less as compared to other strokes. For instance, the level of participation in Freestyle and Backstroke is consistently at ~ 80% or above. About 80% of 7 year olds attempt to swim Breaststroke, where as only ~ 33% of 7 year old boys and ~ 48% of 7 year old girls attempt Butterfly in comparison. Hence the absolute numbers of disqualifications merely reflect the total number of unique competitors at that age.

A better way to look at this is to compute the number of DQs per person, as shown below.

Here, we can see the number of disqualifications per person reduce as the kids get better with age. In Freestyle, Backstroke and Breaststroke races, kids upto 8 years swim only 1 lap. In Butterfly, kids upto 10 years swim only 1 lap. In all other age brackets, kids swim 2 laps, which means they turn after touching the opposite end of the pool. Learning to execute a legal turn without compromising pace is both an art and a science. We can see its challenges in Backstroke at age 9 and in Butterfly at age 11, when the kids start swimming 2 laps in a race. The number of disqualifications per person increase after decreasing upto the prior year when the kids are swimming only 1 lap.

Consistently swimming a legal backstroke appears to remain a challenge until the mid-teens. It is indicative of the failures in executing legal turns. Backstroke could be thought of as swimming Freestyle on the back. Arm and leg movements have no constraints as long as the body is on the back. But the biggest challenge is the swimmers cannot see the wall in the direction they are swimming. Their eyes are always facing up. They train to spot a row of flags hung 5 meters from the wall over the pool and count the number of strokes they need to reach the wall from that point. While some inexperienced swimmers touch the wall with their hands and push off again on the back, it compromises their speed. The quickest way to turn is to flip on the belly while approaching a turn and push off against the wall with the feet so that the body regains its position on the back. This needs to be done in one smooth series of motions. As soon as the body flips on the belly, only a single or double hand pull is allowed to initiate the turn. If the swimmer does multiple strokes, they’re disqualified. If the swimmer turns too soon, it’s likely they’ll miss touching the wall. No sculling is allowed to reach the wall. Therefore, again a swimmer gets disqualified. We can see these mistakes happen well into the teenage years.

For every boy competing in Breaststroke at age 6, there has been more than 1 disqualification, which means 6 year old boys face multiple disqualifications in a season. Their level of participation is at 40%. It shows how challenging it is to learn Breaststroke. 6 year old girls appear to do a bit better. The good thing is that kids are persistent in learning Breaststroke from an early age. So, not only does the level of participation increase with age, but also the number of disqualifications per person decrease dramatically. But again, at age 9 when the kids start swimming 2 laps, the rate of improvement slows down. Other than the Breaststroke technique, kids have to remember to touch the wall with both hands simultaneously and then turn back on their breast. Most kids become very good with practice, but even a slight lapse of concentration results in disqualification.

Similarly, Butterfly is a bigger challenge in the early years. With every stroke, the arms have to break the surface of the water during recovery. Many young swimmers find it hard to have the upper body strength to do it consistently. As the kids grow, their strength improves and it becomes less hard. Once again, we see a surge of DQs at age 11, when the kids start swimming 2 laps. As with Breaststroke, kids have to remember to touch the wall with both hands simultaneously and then turn back without doing any flutter or alternate kicks.

Overall, girls seem to improve more consistently than boys.

Swim Times

Now, here is what I was most curious about before doing this analysis. Shown below are the distributions of swim times by age, gender and stroke. I have excluded the races in yards to focus only on the races in meters, which is the international standard.

To provide a reference point, I have added the current 50 Short Course Meters world records for each stroke and gender. As expected, the distributions of swim times are heavily skewed towards the right. In the plots, I have truncated the outliers exceeding 80 seconds.

On average, we see consistent improvement with age across the board. In nearly every plot, we see multi-modal distributions with multiple peaks that separate the best swimmers from merely good and the rest of the pack. As the kids progress to swimming 50 meters, i.e. 2 laps of the 25m length, the distributions during the early years once again become wider with heavier skews and fatter right tails. Not only does the difference in swim speed, but also the time taken to turn contributes to a large variance in swim times.

The variance reduces in later years and the distributions tend to become more normal. I reckon this is due to two reasons:

  1. Kids get stronger with age and better with practice.

  2. A survivorship bias comes in play. As we’ve seen in the first plot, the number of competitors decrease with age. Basically the kids who have consistently done well are the ones who continue to swim beyond the early years. There might be some kids who swim just for fun or to hang out with their friends. But chances are, those who cannot compete effectively in any stroke would drop out.

Nevertheless, we see higher variance in strokes that are more technical, i.e. Backstroke, Breaststroke and Butterfly, as compared to Freestyle.

Mean Times

Looking at the plots of distributions, we can observe a very interesting thing as the kids grow older. Note, all the plots are drawn to the same scale. But the distributions of the boys’ swim times curve more to the left than those of girls. It is evident that on average, boys continue to improve well into the late teens, where as the rate of improvement of girls slows down. Why does that happen?

Let’s see the mean times to swim 50 Short Course Meters by age.

What’s really interesting is that on average, boys and girls are doing just about the same until age 12. Thereafter, boys start pulling apart. Why? Do teenage boys work harder than teenage girls to improve? Hardly the case. On the contrary, we have seen evidence that girls learn and improve more consistently than boys. So what’s happening here? I am no expert in biology, but this is the about the age where puberty begins. Among other things, boys tend to grow taller and more muscular than girls at this stage. No wonder they start swimming faster than girls! This might be among the few instances where the impact of puberty on both sexes, could be measurably observed.

In both genders, the trajectory of improvement is non-linear with age. Kids show dramatic improvement until they reach their teens. Thereafter, the rate of improvement slows down. There’s a physical limit to how fast a human could swim, so we could expect the curves to reach their minima at some point during the 20s - 30s when the swimmers reach the pinnacle of physical strength and performance. When the swimmers are past their prime, we could expect them to start slowing down.

USA Swimming has race data for swimmers ranging from ages 5 to 50. I sampled data for 50 Short Course Meters Freestyle races to explore.

Here are the curves drawn using USA Swimming data:

The USA Swimming sample data produces a similar set of curves. The effects of puberty on average swim times are exhibited in this dataset too. Boys and girls have nearly the same average swim times until age 12, but from age 13 onwards, boys start swimming faster than girls.

Another interesting observation is that after age 18, the rate of improvement accelerates once again for both male and female swimmers. Both curves show a big shift towards the left. Yet again, survivorship bias appears to come into play. These aren’t recreational swimmers. Rather they are kids who continue to swim in college teams, and then compete in national and international events. Superior coaching and facilities at the top level, probably are big factors contributing to this dramatic improvement.

Judging by the averages, male swimmers reach their prime in their early 20s and maintain their peak form until their early 30s. In comparison, female swimmers do not hit their prime until their late 20s, and have a much shorter peak span than males.

The average times start deteriorating by the mid 30s for both males and females. Unfortunately, the data is very sparse for swimmers 35 years and older. To reduce noise, I excluded the data for every age where there are fewer than 10 records. So it’s hard to see the trajectory well into the 40s.

Here are plots of all swim times in this dataset with best fitting curves for both genders:

Conclusion

From this study, we gain some interesting insights into the world of competitive swimming from the early years:

  • Most kids join swim teams between 6 and 8. More and more kids drop out during the teenage years.

  • After Freestyle, kids learn to swim Backstroke before they learn Breaststroke and Butterfly. Butterfly remains a challenging stroke to learn well into the teens.

  • Persistence pays off and kids who don’t give up improve by leaps and bounds from 6 to 10.

  • Boys and girls improve at the same rate until the onset of puberty. Then boys start gaining a big advantage.

  • On average, boys reach peak performance in the early 20s itself. On the flip side, they require all the hard work just to maintain it until the 30s. Girls reach peak performance only by the late 20s. On the brighter side, this is a big motivation to continue working hard well into adulthood.


The R markdown file with code for this post is available here.

Related