Below, we include the code we used to import and clean the data for the “Exploring NYC Crime Data Using EDA” project:
library(RSocrata)
## Warning: package 'RSocrata' was built under R version 3.5.3
library(tidyverse)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
nyc_crime <- read_csv("./data/NYPD_Complaint_Data_Historic.csv")
#Code for importing directly from web:
#nyc_crime = read.socrata("https://data.cityofnewyork.us/resource/9s4h-37hy.json",
# app_token = NULL,
# email = NULL,
# password = NULL,
# stringsAsFactors = FALSE)
#saveRDS(nyc_crime, file = "./data/nyc_crime.rds")
The raw, acquired dataset has 6,036,805 observations and 35 variables. Broadly speaking, the variables contain information on the exact date, time and location of crime, description of crime, demographic information of the victim and suspect, and police department infromation. For more information on the variables, click here.
We were interested in sex-related, weapon-related and drug-related felonies occurring in NYC from 2014 - 2017. Below is a list of the penal codes associated with each type of felony.
Using that information, we can subset and filter the data to provide us with our final dataset.
#Cleaning and Filtering Data
nyc_felonies = nyc_crime %>%
janitor::clean_names() %>%
mutate(year = year(cmplnt_fr_dt)) %>%
mutate_if(is.character, tolower) %>%
filter(year %in% 2014:2017) %>%
filter(law_cat_cd == "felony") %>%
select(- station_name, - transit_district, - hadevelopt, - patrol_boro, - housing_psa,
- juris_desc)
#saveRDS(nyc_felonies, file = "./data/nyc_felonies.rds")
#Selecting Crimes of Interest
sex_drug_weapons = nyc_felonies %>%
filter(pd_cd %in% c(178, 694, 697, 176, 180, 153, 157, 177, 168, 159, 166, 164, 179, 155,
586, 696, # Sex-related felonies
## Drug-related felonies
500, 501, 502, 503, 505, 507, 510, 512, 514, 515, 519, 520, 521, 523,
524, 529, 530, 531, 532, 568, 570,
### Weapons-related felonies
781, 792, 793, 796)) %>%
#Select Variables of Interest
select(cmplnt_num, boro_nm, cmplnt_fr_dt, cmplnt_to_dt, cmplnt_fr_tm, ky_cd, ofns_desc,
pd_cd, pd_desc, vic_race, vic_sex, vic_age_group, year, prem_typ_desc) %>%
#Create Classification of Felonies
mutate(boro_nm = if_else(boro_nm == "staten island", "staten_island", boro_nm),
crime_group = if_else(pd_cd %in% c(178, 694, 697, 176, 180, 153, 157, 177, 168, 159, 166,
164, 179, 155, 586, 696), "Sex-Related",
if_else(pd_cd %in% c(500, 501, 502, 503, 505, 507, 510, 512, 514, 515, 519,
520, 521, 523, 524, 529, 530, 531, 532, 568, 570),
"Drug-Related",
if_else(pd_cd %in% c(781, 792, 793, 796), "Weapons-Related", pd_cd))))
#saveRDS(sex_drug_weapons, file = "./data/sex_drug_weapons.rds")
The resulting dataset has 46,692 observations and 14 variables.
felonies = readRDS(file = "./data/sex_drug_weapons.rds")
summary(felonies)
## cmplnt_num boro_nm cmplnt_fr_dt
## Length:46692 Length:46692 Min. :2014-01-01 00:00:00
## Class :character Class :character 1st Qu.:2015-01-15 00:00:00
## Mode :character Mode :character Median :2016-01-16 00:00:00
## Mean :2016-01-03 21:27:52
## 3rd Qu.:2016-12-22 00:00:00
## Max. :2017-12-31 00:00:00
##
## cmplnt_to_dt cmplnt_fr_tm ky_cd
## Min. :2014-01-01 00:00:00 Length:46692 Length:46692
## 1st Qu.:2015-02-05 00:00:00 Class :character Class :character
## Median :2016-02-09 00:00:00 Mode :character Mode :character
## Mean :2016-01-19 07:31:50
## 3rd Qu.:2017-01-08 00:00:00
## Max. :2017-12-31 00:00:00
## NA's :9108
## ofns_desc pd_cd pd_desc
## Length:46692 Length:46692 Length:46692
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## vic_race vic_sex vic_age_group year
## Length:46692 Length:46692 Length:46692 Min. :2014
## Class :character Class :character Class :character 1st Qu.:2015
## Mode :character Mode :character Mode :character Median :2016
## Mean :2016
## 3rd Qu.:2016
## Max. :2017
##
## prem_typ_desc crime_group
## Length:46692 Length:46692
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##