In my previous post, we’ve seen how kids improve year-by-year in swimming. Certainly kids get stronger and faster as they grow. But how much impact does practice and coaching have? As I mentioned in my previous post, coaches do not keep a record of attendance during practice. There are daily hour long practice sessions during the summers. But even though it isn’t known how regular the kids are in attending practice, the number of meets that a kid participates in during the season, can be a good proxy.
This past summer, my daughter joined a swim team. Her team is among several teams that are part of an area league. All summer long, developmental and competitive meets are organized where teams compete on a one-to-one basis. All kids in a team swim in developmental meets. For competitive meets, top 3 or 4 kids are chosen by age groups. Meets are organized in community pools that are either 25 meters or 25 yards in length.
Recently I listened to a podcast featuring Victor Haghani of Elm Partners, who described a fascinating coin-flipping experiment. The experiment was designed to be played for 30 minutes by participants in groups of 2-15 in university classrooms or office conference rooms, without consulting each other or the internet or other resources. To conduct the experiment, a custom web app was built for placing bets on a simulated coin with a 60% chance of coming up heads.
Amazon is led by its charismatic founder and CEO, Jeff Bezos, who is widely hailed as one of the top visionaries of this era. At the end of the fiscal year in March, Amazon publishes a letter to shareholders written by Bezos, in which he summarises his thoughts, business and management philosophy. While reading these letters, among other things that strikes is Bezos’ clarity of thought, expressed in a concise and easy to comprehend way.
Everyone using a smartphone or a mobile device has used an onscreen smart keyboard that tries to predict the next set of words that the user might want to type. Typically, upto 3 words are predicted, which are displayed in a row at the top of the keyboard. Given that typing on a glass pane without tactile feedback, could be very frustrating at times, the smart keyboard goes a long way in alleviating these issues.
Who’re the best batsmen in cricket today? Is there a way to define a set of objective criteria that provides an unbiased conclusion? In my last post, I selected a list of top batsmen in the history of cricket, by simply filtering all batsmen who have had a career batting average of 50 or more in any format. Prior to that I eliminated newcomers, specialist bowlers and unsuccessful players by another set of filters.
In a previous post, I used data from Statsguru and looked at a brief history of cricket with respect to debut years and career spans of players. In this post, I use detailed player statistics from the same dataset to select top batsmen who have played this game.
Distribution of matches played by all players
Let’s take a look at the summary statistics of number of matches played by all players in each format of the game.
Since early childhood, I have been a big fan of cricket. Some of my earliest childhood memories are watching the game along with my loved ones in India. As much as I can recall, the first series I saw was between India and the West Indies during the early 80s, on a black and white TV set. Watching Sunil Gavaskar play the fearsome West Indian fast bowlers with aplomb, made me his fan…until a kid named Sachin Tendulkar arrived in 1989.
In a previous post in this series, we did an exploratory data analysis of the Ames Housing dataset.
In this post, we will build linear and non-linear models and see how well they predict the SalePrice of properties.
Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed SalePrice will be our evaluation criteria. Taking the log ensures that errors in predicting expensive and cheap houses will affect the result equally.
In this case study, we will use the Ames Housing dataset to explore regression techniques and predict the sale price of houses.
The Ames Housing dataset contains the sale prices of properties in Ames, Iowa along with 80 other features. Each property has an Id associated with it.
Here are the dimensions of the training and testing sets respectively:
 "Dimensions of the training set"
 1460 81
 "Dimensions of the testing set"
 1459 81
Now, let’s combine training and testing into a single dataset and take a look at the count of missing values: