College Recommendation System
Category:
Recommendation Systems,Predictive Modelling
Skills:
Random Forest, Logistic Regression Feature Engineering, Data Encoding
Problem Context
The project aimed to build a predictive recommendation system that suggests colleges to students based on their preferences and attributes. The focus was on data scraping, preprocessing, and supervised modeling to show how raw educational data can be turned into a useful guidance tool.
Collection
I scraped college-related data from multiple websites to create the dataset.
Final dataset: ~56,000 records × 24 features (admission stats, location, fees, rankings, etc.)
Data stored in structured CSV for analysis
Preparation
The dataset was cleaned and transformed for modeling.
Handled missing values via imputation
Encoded categorical features (e.g., location, type)
Normalized numerical features for consistency
Baseline
I started with simple models to benchmark feasibility.
Logistic Regression → baseline accuracy ~80%
Helped validate feature quality
Modeling
I applied ensemble methods to improve prediction accuracy.
Trained Random Forest with 60/20/20 train/validation/test split
Tuned hyperparameters (tree depth, number of estimators)
Achieved final accuracy of 92%
Evaluation
I compared models across metrics to ensure reliability.
Accuracy: 92% (Random Forest), 86% (Logistic Regression)
Confusion matrix revealed strong class balance
Feature importance ranked “fees,” “acceptance rate,” and “location” as top predictors
Refinement
The recommendation output was structured for easy interpretation.
Designed rules to recommend top-N colleges per student profile
System could adapt to new scraped data without redesign
Conclusion
The final Random Forest–based system achieved 92% accuracy and successfully recommended colleges based on student profiles. This project showcases raw scraped data can be cleaned, modeled, and transformed into a practical recommendation pipeline.