In November and December of 2018 I participated in a Data Science for Good Kaggle competition for the Center for Policing Equity (CPE). The aim of the competition was to provide CPE with streamlined solutions for combining different data sources across different geographic units. In their words:
Our biggest challenge is automating the combination of police data, census-level data, and other socioeconomic factors. Shapefiles are unusual and messy – which makes it difficult to, for instance, generate maps of police behavior with precinct boundary layers mixed with census layers. Police incident data are also very difficult to normalize and standardize across departments since there are no federal standards for data collection.
My solution was awarded second prize. I used a combination of the latest R packages for geospatial computations combined with an algorithm I devised for reaggregating spatial units using measured overlap to solve the challenge. I focused on writing functions which could be reused and adapted to a wide variety of data situations CPE might encounter.
I decided to enter the Kaggle competition as an opportunity to learn more about the work done by Center for Policing Equity – an organization that I admire for its close work with police departments across the country.