The goal of this research is to analyze chicago car accident reports data in order to classify the primary cause of an accident and answer the following questions:
Q1 - What is the distribution of car accident causes?
Q2 - What regions do the most car accidents occur?
Q3 - What effect do external factors have on the amount of car crashes and car crashes with injuries?
The time of the days effect on car accidents.
The weather’s effect on car accidents.
WHAT'S THE DISTRIBUTION OF THE CAUSES OF ACCIDENTS?
The most deadly types of crashes leading in proportions are Turning at 19% and Angle at 13%. I’ll recommend focusing on these as they account for the most fatalities.
The most types of car accidents are Rear Ends accidents accounting for 30% of car crashes. Followed by Sideswipe Same Direction accidents accounting for 16% of car crashes.
The most deadly types of crashes leading in proportions are: Turning at 19% and Angle at 13%. I’ll recommend focusing on these by making better and seperate traffic signals for turning as they account for the most fatalities.
Random Forest, X Boosting & LinearSVC classifiers where implimented after re-sampling with SMOTE since the dataset was heavily imbalanced and they all gave roughly the same results give or take 5%. So I opted to go with X Boosting classifier using PCA as its feature selection parameter. The features included where:
Driver’s Action, Driver’s Vision, Roadway Surface Condition, Device Condition, First Crash Type, Posted Speed Limit, Age, Physical Condition.
The model gave a log loss of 12.5 which and accuracy of 64%. This means the amount the model penelizes for incorrect predictions 12.5 but it only predicted 64% of the primary causes of accidents accurately.
Its total recall is 64% which is the total amount of times the model classified the cause of an accident was a category correctly out of the total amount of times that category was indeed the cause.
Its total precision is 64% and this is the total amount of times the model classified the cause of an accident was a category correctly out of the total amount of the predictions made for that category.