Chapter 2 Data Source
2.1 DATA SOURCE DESCRIPTION
2.1.1 NYPD Complaint Data
Our main dataset, “NYPD Complaint Data”, which we download from the NYC Open Data. The ‘NYPD Complaint Data’ is collected by New York City agencies and other partners.
2.1.2 Covid Case by Date
The supplementary dataset, ’Covid-Case-by-DATE’, which was download from the https://github.com/nychealth/coronavirus-data/tree/master/trends#data-by-daycsv. This folder makes a daily update on the daily, weekly, and monthly data shown published by the Health Department. The group of the the githup contributor was responsible for collecting the data.
2.2 DATA DESCRIPTION
The Dataset “NYPD Complaint Data” includes all the complaint cases reported to New York City from way back to date until nowaday, it has in total of 323,817 rows, and 36 columns, and to better serve the purpose of our project, we select the columns that upon our interest.
2.2.1 Variables
CMPLNT_NUM: Randomly generated persistent ID for each complaint
ADDR_PCT_CD: The precinct in which the incident occurred
BORO_NM: The name of the borough in which the incident occurred
COMPLNY_FR_DT: Exact date of occurrence for the reported event
COMPLNT_FR_DT: Exact time of occurrence for the reported event
LAW_CAT_CD: Level of offense: felony, misdemeanor, violation
LOC_OF_OCCUR_DESC: Specific location of occurrence in or around the premises; inside, opposite of, front of, rear of
OFNS_DESC: Description of offense corresponding with key code
PERM_TYP_DESC: Specific description of premises
SUSP_AGE_GROUP, SUSP_RACE, SUSP_SEX: Description of the suspect
VIC_AGE_GROUP, VIC_RACE, VIC_SEX: Description of the Victim
X_COORD_CD,Y_COORD_CD,Latitude,Longtitude,LAT_Lon,New Georeferenced Column: Description of the geological location
2.3 DATA ISSUE
First, we found that there are limited data available for the year 2020 during the clean-up process. In the actual graphing part, we made some data transformations to hope to get rid of the potential bias caused by limited data. Beyond this, we also found that there are significant unknown or empty entries in the dataset, especially in the biological description for both suspect and victim.