Recently I uploaded the first version of “cricketstats” a python module that gets team and player statistics from the cricsheet.org database for data analysis. However, when I released it I said very little of why I decided to write it or what the module aims to be. To that end, instead of packing the answers to those questions in the module’s readme file I thought it better to write a couple of blog posts to explain them in more detail.

A few weeks before the 2021 ICC Men’s T20 World Cup, I tried to search for free programs or web services aimed at evaluating player and team performance that could provide cricket lovers like myself a data driven and hopefully deeper understanding of T20 cricket. I was initially encouraged by discovering Stephen Rushe’s Cricsheet project which provides freely available structured ball-by-ball match data. At that time I thought my search was done. With cricsheet’s data freely available all that was needed was some program or script to run it through. However, other than services like ESPN Cricinfo’s Statsguru and python or java scripts that scrape cricsheet and online scoresheets to return individual match statistics or a player’s career statistics, I did not find what I was looking for. A freely available and easy to use application aimed at analytics in cricket. As a result, I decided to slowly start experimenting with (and learning) python to write a script to parse and analyse the data from cricheet’s json files.

Towards the end of the world cup, I came across Amol Desai’s medium article Foundational Learnings for Cricket Data & Analytics and his appearance on an episode of Jarrod Kimber’s podcast Red Inker. In both pieces Desai’s comments on the state of analytics in cricket rang true with my short research of it. The field, compared to other sports is relatively nascent, involves many people redoing the technical data gathering work required to then do data analysis, and lacks free open access to the tools required to analyse data or even the data itself as is the case with ball tracking data. With that in mind I decided that something like cricketstats was worth pursuing to provide a free open-source program that can gather and analyse the data required to then analyse player and team performance.

The goal of cricketstats can then be summarised in three points: 1) For Analytics 2) Open-source. 3) Free. Firstly, the module’s purpose is to help data analysts in analytics projects in cricket and not statisticians. The point is that the module is supposed to be a tool for people to gather data on players and teams to explain their past performances and predict future ones. It is not meant to be a tool to compile a cricket statistician’s database for finding “Top 10” lists. Secondly, I believe by being open-source the project can be improved quickly and transparent in how it works and what data it uses. It can also incorporate contributors far better at programming and maintaining python modules than myself. Lastly, By being free it means that anyone, from amatuers to professional analysts can access it and pursue their analytics tasks. Whether they then decide to charge for their labour is up to them. But I want the raw data gathering “means of production” to be free and accessible to all.

Going forward I hope to improve the module by adding numerous other functions (eg. reverse lookup, wrapper, more exotic stats). The full Todo list can be found on the readme file in the github repo. However, there are three things in the broader cricket analytics field that I hope will improve which would make the module better. The first is a more complete match database. Structured json files on matches before 2005 would help analyse the larger trends in cricket, and more match data on domestic cricket competitions around the world would also help analyse national team selection options and depth charts. The second improvement I hope for is richer ball-by-ball data (eg. ball speed, ball pitch/hit/fielded location). This would simply improve the depth of the data gathered by a module like cricketstats. Finally I hope someone can compile a common index of batters and bowlers by handedness and bowling type. This would help in gathering data for matchup analysis. I may attempt this last one myself at some point.