NYC has a wealth of open and transparent data for anyone to study and analyze. Gathering such data and computing statistics on crime in NYC may provide critical insights regarding steps to be taken for other similar high-density and highly populated areas, which can be helpful for law enforcement and government officials to deter crime. For these particular project, we focused on examining the various factors, patterns, and variables associated with crime in New York City.
We decided to study sex-related, drug-related, and weapons-related felonies occurring from 2014 to 2017 in particular for a couple of reasons. First, we felt that there was a significant overlap among these three types of felonies - drug-related crimes, for example, are often committed concurrently with weapons-related crimes. Second, our raw data contained over 6 million observations and we wanted to reasonably limit our scope a bit. Third, felonies are generally ranked higher compared to misdemeanors and violations, in terms of violence, risk, severity, and danger, providing us with potentially more important insights than the other categories. These reasons led us to explore data on the three main types of felonies (sex-related, weapon-related and drug-related) that occurred in New York City from 2014 to 2017.
We used the dataset collected by the New York City Police Department (NYPD); specifically, we used the NYPD Historic Complaint dataset, which provides longitudinal information on complaints filed to the NYPD, the type of crimes committed by a suspect, suspect demographics, victim demographics, location of crime, date and time of crime, and other variables. The link to the raw dataset is here. The link to how we acquired and cleaned the dataset is here.
Our final dataset, sex_drug_weapons, contains 46,692 observations. The list below provides all 14 variables in the dataset with their brief descriptions:
cmplnt_num
: Randomly generated ID for each incident
boro_nm
: Borough in which the incident occurred
cmplnt_fr_dt
: Exact start date of occurrence for the reported incident
cmplnt_to_dt
: Exact end date of occurrence for the reported incident
cmplnt_fr_tm
: Exact time of occurrence for the reported incident
ky_cd
: Three-digit offense classification code
ofns_desc
: Description of offense corresponding with key code (ky_cd)
pd_cd
: Three-digit internal classification code
pd_desc
: Description of internal classification corresponding with PD code (pd_cd)
vic_race
: Victim’s race description
vic_sex
: Victim’s sex description (D=Business/Organization, E=PSNY/People of the State of New York, F=Female, M=Male)
year
: Year the incident occurred
prem_typ_desc
: Specific description of premises where incident occurred
crime_group
: Identifies whether crime was a sex-related felony, drug-related felony, or weapons-related felony
For our exploratory analysis, we examined whether the average time between when the crime started and ended differed by borough and felony type. Examining the average time between when the crime started and ended can serve as a proxy indicator of the severity of the crime. Longer times may mean the crime is more severe, harder to resolve, more violent, and may require more resources to deal with. Furthermore, differences in the length of reported crimes may have implications for law enforcement officials, policymakers, and urban residents.
First, we read in our data and create a variable calculating the length of the felony in days.
Table 1. Reading in Dataset
felonies = readRDS(file = "./data/sex_drug_weapons.rds")
knitr::kable(head(felonies[1:5]))
cmplnt_num | boro_nm | cmplnt_fr_dt | cmplnt_to_dt | cmplnt_fr_tm |
---|---|---|---|---|
642372589 | brooklyn | 2017-09-07 | 2017-09-07 | 06:15:00 |
865947766 | queens | 2014-11-08 | 2014-11-08 | 22:50:00 |
265604404 | bronx | 2014-04-10 | NA | 19:30:00 |
663741947 | brooklyn | 2017-08-12 | 2017-08-12 | 20:00:00 |
831735305 | brooklyn | 2017-06-29 | 2017-06-29 | 10:45:00 |
617379463 | staten_island | 2016-06-17 | 2016-06-17 | 14:30:00 |
knitr::kable(head(felonies[6:10]))
ky_cd | ofns_desc | pd_cd | pd_desc | vic_race |
---|---|---|---|---|
118 | dangerous weapons | 793 | weapons possession 3 | unknown |
117 | dangerous drugs | 510 | controlled substance, intent t | unknown |
117 | dangerous drugs | 501 | controlled substance,possess. | unknown |
118 | dangerous weapons | 792 | weapons possession 1 & 2 | unknown |
117 | dangerous drugs | 503 | controlled substance,intent to | unknown |
117 | dangerous drugs | 501 | controlled substance,possess. | unknown |
knitr::kable(head(felonies[11:15]))
vic_sex | vic_age_group | year | prem_typ_desc | crime_group |
---|---|---|---|---|
e | unknown | 2017 | residence-house | Weapons-Related |
e | NA | 2014 | street | Drug-Related |
e | NA | 2014 | street | Drug-Related |
e | unknown | 2017 | residence - apt. house | Weapons-Related |
e | unknown | 2017 | residence - public housing | Drug-Related |
e | unknown | 2016 | grocery/bodega | Drug-Related |
Note that not every observation has a value for cmplnt_to_dt
. This could be due to several factors - perhaps the crime was never closed (i.e., it remained an ongoing crime) or perhaps the city was not able to record the value for that variable for whatever reason. To remedy this issue, we take on two approaches:
cmplnt_fr_dt
). This makes sense if the crime remained ongoing until the end of 2017 or beyond. Our resulting dataset is time_data
.time_data2
. We will examine whether the average length of reported felonies differs when we use these two approaches.
In the table below, we input the end dates for any crime with an “NA” for the variable cmplnt_to_dt. The first few rows of the resulting dataset, time_data, is shown below.
Table 2. Input End Dates for Crime Occurrence
time_data = felonies %>%
mutate(crime_group = forcats::fct_relevel(crime_group, "Drug-Related"),
boro_nm = forcats::fct_relevel(boro_nm, "manhattan")) %>%
janitor::clean_names() %>%
mutate(time_diff2 = (as.numeric(cmplnt_to_dt - cmplnt_fr_dt, units = "days",
na.rm = TRUE))) %>%
mutate(time_diff2 = if_else(is.na(time_diff2), as.Date("2017-12-31")
- as.Date(cmplnt_fr_dt), time_diff2)) %>%
select(time_diff2, boro_nm, crime_group)
knitr::kable(head(time_data))
time_diff2 | boro_nm | crime_group |
---|---|---|
0 days | brooklyn | Weapons-Related |
0 days | queens | Drug-Related |
1361 days | bronx | Drug-Related |
0 days | brooklyn | Weapons-Related |
0 days | brooklyn | Drug-Related |
0 days | staten_island | Drug-Related |
Our second approach involves excluding any observations with missing end dates for crimes. The first few rows of the resulting dataset, time_data2, is shown in the table below.
Table 3. Exclude Missing End Dates for Crime Occurrence
time_data2 = felonies %>%
mutate(crime_group = forcats::fct_relevel(crime_group, "Drug-Related"),
boro_nm = forcats::fct_relevel(boro_nm, "manhattan")) %>%
janitor::clean_names() %>%
mutate(time_diff2 = (as.numeric(cmplnt_to_dt - cmplnt_fr_dt, units = "days",
na.rm = FALSE))) %>%
select(time_diff2, boro_nm, crime_group) %>%
filter(!is.na(time_diff2))
knitr::kable(head(time_data2))
time_diff2 | boro_nm | crime_group |
---|---|---|
0 | brooklyn | Weapons-Related |
0 | queens | Drug-Related |
0 | brooklyn | Weapons-Related |
0 | brooklyn | Drug-Related |
0 | staten_island | Drug-Related |
0 | brooklyn | Weapons-Related |
We then create tables showing the average length of felonies in days by borough and crime group for both approaches. Notice the dramatic change in both the counts and average length of felonies between the approaches.
Table 4. Average Length of Felonies by Crime Group
tidy1 = time_data %>%
rename(`Crime Group` = crime_group) %>%
group_by(`Crime Group`) %>%
summarise('Count with End Date' = n(),
`Avg. Length With End Date` = mean(time_diff2),
`SD With End Date` = sd(time_diff2))
tidy2 = time_data2 %>%
rename(`Crime Group` = crime_group) %>%
group_by(`Crime Group`) %>%
summarise('Count w/o End Date' = n(),
`Avg. Length Excluding NAs` = mean(time_diff2),
`SD Excluding NAs` = sd(time_diff2))
merged_table <- merge(tidy1, tidy2, by = c("Crime Group"))
knitr::kable(merged_table)
Crime Group | Count with End Date | Avg. Length With End Date | SD With End Date | Count w/o End Date | Avg. Length Excluding NAs | SD Excluding NAs |
---|---|---|---|---|---|---|
Drug-Related | 18293 | 156.8032 days | 363.8860 | 14591 | 0.7349879 | 13.78610 |
Sex-Related | 8646 | 157.3583 days | 350.1052 | 7128 | 23.3329008 | 91.21814 |
Weapons-Related | 19753 | 150.4999 days | 355.5124 | 15865 | 0.3511451 | 10.15509 |
Table 5. Average Length of Felonies by Borough
tidy3 = time_data %>%
filter(!is.na(boro_nm)) %>%
rename(Borough = boro_nm) %>%
group_by(Borough) %>%
summarise('Count with End Date' = n(),
`Avg. Length With End Date` = mean(time_diff2),
`SD With End Dates` = sd(time_diff2))
tidy4 = time_data2 %>%
filter(!is.na(boro_nm)) %>%
rename(Borough = boro_nm) %>%
group_by(Borough) %>%
summarise('Count w/o End Date' = n(),
`Avg. Length Excluding NAs` = mean(time_diff2),
`SD Excluding NAs` = sd(time_diff2))
merged_table2 <- merge(tidy3, tidy4, by = c("Borough"))
knitr::kable(merged_table2)
Borough | Count with End Date | Avg. Length With End Date | SD With End Dates | Count w/o End Date | Avg. Length Excluding NAs | SD Excluding NAs |
---|---|---|---|---|---|---|
bronx | 12848 | 216.5919 days | 410.3039 | 9261 | 4.519990 | 41.55909 |
brooklyn | 15014 | 128.3765 days | 331.8315 | 12615 | 4.258720 | 38.26172 |
manhattan | 9200 | 151.6389 days | 349.8756 | 7380 | 4.274063 | 38.52996 |
queens | 7995 | 108.5520 days | 310.8985 | 6985 | 6.540014 | 47.96291 |
staten_island | 1634 | 139.4144 days | 337.9634 | 1343 | 7.300136 | 62.23328 |
From our tables above, we notice that there seems to be a marked difference in the average length of incidents across boroughs and felony type. Notably, we see that, on average, sex-related felonies seem to have a longer average incident length than drug-related and weapons-related felonies, whether we input end dates for crimes or exclude NAs. In terms of boroughs, Bronx ranks the highest for average length of felonies when inputting end dates for crimes; Staten Island slightly ranks higher than the rest of the boroughs when we exclude NAs.
View the “Differences in Mean Length of NYC Felonies” project under the “Data Analysis - R” page for formal statistical tests.