top of page
Image by Luca David

ACCIDENT CLASSIFIER

A simple web app that helps identify the cause of an accident using the xgboost machine learning algorithm.

ACCIDENT CLASSIFIER: Project

PURPOSE

The goal of this research is to analyze chicago car accident reports data in order to classify the primary cause of an accident and answer the following questions:

  • Q1 - What is the distribution of car accident causes?

  • Q2 - What regions do the most car accidents occur?

  • Q3 - What effect do external factors have on the amount of car crashes and car crashes with injuries?

    • The time of the days effect on car accidents.

    • The weather’s effect on car accidents.

ACCIDENT CLASSIFIER: Projects

QUESTIONS

ACCIDENT CLASSIFIER: Projects
crashtype_edited.jpg

WHAT'S THE DISTRIBUTION OF THE CAUSES OF ACCIDENTS?

The most deadly types of crashes leading in proportions are Turning at 19% and Angle at 13%. I’ll recommend focusing on these as they account for the most fatalities.

Conclusion

The most types of car accidents are Rear Ends accidents accounting for 30% of car crashes. Followed by Sideswipe Same Direction accidents accounting for 16% of car crashes.

Recommendation

The most deadly types of crashes leading in proportions are: Turning at 19% and Angle at 13%. I’ll recommend focusing on these by making better and seperate traffic signals for turning as they account for the most fatalities.

ACCIDENT CLASSIFIER: Projects

Model

Random Forest, X Boosting & LinearSVC classifiers where implimented after re-sampling with SMOTE since the dataset was heavily imbalanced and they all gave roughly the same results give or take 5%. So I opted to go with X Boosting classifier using PCA as its feature selection parameter. The features included where:

Driver’s Action, Driver’s Vision, Roadway Surface Condition, Device Condition, First Crash Type, Posted Speed Limit, Age, Physical Condition. 

ACCIDENT CLASSIFIER: Text
cm_edited.jpg

RESULTS

The model gave a log loss of 12.5 which and accuracy of 64%. This means the amount the model penelizes for incorrect predictions 12.5 but it only predicted 64% of the primary causes of accidents accurately.
Its total recall is 64% which is the total amount of times the model classified the cause of an accident was a category correctly out of the total amount of times that category was indeed the cause.
Its total precision is 64% and this is the total amount of times the model classified the cause of an accident was a category correctly out of the total amount of the predictions made for that category.

ACCIDENT CLASSIFIER: Projects

Future Work

Road Type Division: Segregate the different types of streets/roads to understand the unique properties of accidents that occurs in each

More Data: Gather more data like if a driver was on the phone, exceeded the posted speed limit or or has a good amount of driving experience .

Region Division: Deeper analysis on the primary causes of accidents in the North, South, East West and Central regions of the city.

ACCIDENT CLASSIFIER: Text
bottom of page