Chapter 3 Data transformation

3.1 Recategorize

We have two columns OFNS_DESC and PREM_TYP_DESC standing for detailed crime category and location of occurence. However, the fields of them are too detailed. There are too many categories of them, which is not informative in our visualization. Hence we decided to merge some of the detailed categories into a more general one, which will result in less categories in higher level. For example, ‘Larceny’, ‘Vehicle Stolen’ and ‘Burglary’ can all be concluded as ‘Theft’. Locations like ‘Grocery store’, ‘Clothing store’ and ‘Market’ can all be concluded as ‘Retail Store’. We chose to use python to accomplish the work, and the code is bellow. https://github.com/oliverliuoo/nyc-crime-covid19/blob/main/python/categorize.py

3.2 Parse Date and Time Stamp

We have date column in crime and covid-19 data. As we want to show the trend of covid-19 and crime in a long time period, we need to aggregate the count of occurence by month. Hence we will need to parse the date column to get year and month. Also, we need to parse time stamp column to get a column hour. We just did this transformation in R as it is handy to do so.

3.3 Aggregation

We did some aggregation work for specific visualization, for example, we want to visualize the trend of covid-19 cases by month, so we need to aggregate the count by month. We handled all these aggregation jobs with R pipeline right before visualization.

The code for above two parts are both in the following link: https://github.com/oliverliuoo/nyc-crime-covid19/blob/main/05-results.Rmd