Exploring NYC Crime Data Using EDA - Cleaning Data


Importing Data

Below, we include the code we used to import and clean the data for the “Exploring NYC Crime Data Using EDA” project:

library(RSocrata)
## Warning: package 'RSocrata' was built under R version 3.5.3
library(tidyverse)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
nyc_crime <- read_csv("./data/NYPD_Complaint_Data_Historic.csv")

#Code for importing directly from web:
#nyc_crime = read.socrata("https://data.cityofnewyork.us/resource/9s4h-37hy.json", 
#                          app_token = NULL, 
#                          email = NULL, 
#                          password = NULL,
#                          stringsAsFactors = FALSE)

#saveRDS(nyc_crime, file = "./data/nyc_crime.rds")

The raw, acquired dataset has 6,036,805 observations and 35 variables. Broadly speaking, the variables contain information on the exact date, time and location of crime, description of crime, demographic information of the victim and suspect, and police department infromation. For more information on the variables, click here.


Penal Codes

We were interested in sex-related, weapon-related and drug-related felonies occurring in NYC from 2014 - 2017. Below is a list of the penal codes associated with each type of felony.


Sex–Related Felonies

178 Facilitating A Sex Offense With A Controlled Substance
694 Incest
697 Use Of A Child In Sexual Performance
176 Sex Crimes
180 Course Of Sexual Conduct Against Child
153 Rape 3
157 Rape 1
177 Sexual Abuse
168 Sodomy 1
159 Rape 1 Attempt
166 Sodomy 2
164 Sodomy 3
179 Aggrevated Sexual Abuse
155 Rape 2
586 Sextrafficking
696 Promoting Sexual Performance – Child


Subsetting Data

Using that information, we can subset and filter the data to provide us with our final dataset.

#Cleaning and Filtering Data
nyc_felonies = nyc_crime %>% 
  janitor::clean_names() %>% 
  mutate(year = year(cmplnt_fr_dt)) %>%
  mutate_if(is.character, tolower) %>% 
  filter(year %in% 2014:2017) %>% 
  filter(law_cat_cd == "felony") %>% 
  select(- station_name, - transit_district, - hadevelopt, - patrol_boro, - housing_psa, 
         - juris_desc)

#saveRDS(nyc_felonies, file = "./data/nyc_felonies.rds")

#Selecting Crimes of Interest
sex_drug_weapons = nyc_felonies %>% 
  
  filter(pd_cd %in% c(178, 694, 697, 176, 180, 153, 157, 177, 168, 159, 166, 164, 179, 155, 
                      586, 696, # Sex-related felonies
                      
                      ## Drug-related felonies
                      500, 501, 502, 503, 505, 507, 510, 512, 514, 515, 519, 520, 521, 523, 
                      524, 529, 530, 531, 532, 568, 570,
                      
                      ### Weapons-related felonies
                      781, 792, 793, 796)) %>% 
  
  #Select Variables of Interest
  select(cmplnt_num, boro_nm, cmplnt_fr_dt, cmplnt_to_dt, cmplnt_fr_tm, ky_cd, ofns_desc, 
         pd_cd, pd_desc, vic_race, vic_sex, vic_age_group, year, prem_typ_desc) %>% 
  
  #Create Classification of Felonies
  mutate(boro_nm = if_else(boro_nm == "staten island", "staten_island", boro_nm),
         
     crime_group = if_else(pd_cd %in% c(178, 694, 697, 176, 180, 153, 157, 177, 168, 159, 166, 
                                        164, 179, 155, 586, 696), "Sex-Related", 
                               
                   if_else(pd_cd %in% c(500, 501, 502, 503, 505, 507, 510, 512, 514, 515, 519, 
                                        520, 521, 523, 524, 529, 530, 531, 532, 568, 570), 
                                        "Drug-Related", 
                               
                   if_else(pd_cd %in% c(781, 792, 793, 796), "Weapons-Related", pd_cd))))

#saveRDS(sex_drug_weapons, file = "./data/sex_drug_weapons.rds")


The resulting dataset has 46,692 observations and 14 variables.

felonies = readRDS(file = "./data/sex_drug_weapons.rds")

summary(felonies)
##   cmplnt_num          boro_nm           cmplnt_fr_dt                
##  Length:46692       Length:46692       Min.   :2014-01-01 00:00:00  
##  Class :character   Class :character   1st Qu.:2015-01-15 00:00:00  
##  Mode  :character   Mode  :character   Median :2016-01-16 00:00:00  
##                                        Mean   :2016-01-03 21:27:52  
##                                        3rd Qu.:2016-12-22 00:00:00  
##                                        Max.   :2017-12-31 00:00:00  
##                                                                     
##   cmplnt_to_dt                 cmplnt_fr_tm          ky_cd          
##  Min.   :2014-01-01 00:00:00   Length:46692       Length:46692      
##  1st Qu.:2015-02-05 00:00:00   Class :character   Class :character  
##  Median :2016-02-09 00:00:00   Mode  :character   Mode  :character  
##  Mean   :2016-01-19 07:31:50                                        
##  3rd Qu.:2017-01-08 00:00:00                                        
##  Max.   :2017-12-31 00:00:00                                        
##  NA's   :9108                                                       
##   ofns_desc            pd_cd             pd_desc         
##  Length:46692       Length:46692       Length:46692      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##    vic_race           vic_sex          vic_age_group           year     
##  Length:46692       Length:46692       Length:46692       Min.   :2014  
##  Class :character   Class :character   Class :character   1st Qu.:2015  
##  Mode  :character   Mode  :character   Mode  :character   Median :2016  
##                                                           Mean   :2016  
##                                                           3rd Qu.:2016  
##                                                           Max.   :2017  
##                                                                         
##  prem_typ_desc      crime_group       
##  Length:46692       Length:46692      
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
##