Natural Language to Database Query System
Category:
Natural Language Processing
Skills:
SQL,NLTK
Problem Context
The goal of this project was to design a system that allows users to query a database using natural language instead of SQL. The challenge was to bridge the gap between human-friendly input and machine-executable queries, demonstrating both NLP parsing skills and database knowledge.
Collection
I created a structured database and defined realistic query tasks.
Database contained tables for students, courses, and enrollment (~50k records)
Queries included joins, aggregations, and filtering conditions
Preparation
I set up the mapping between natural language and database schema.
Built a dictionary of schema terms (e.g., “class” →
course
)Preprocessed input text: tokenization, lemmatization, stopword removal
Baseline
A simple rule-based parser was built as proof of concept.
Example: “List all students in Computer Science” →
SELECT * FROM students WHERE major='Computer Science'
Worked well for direct keyword mappings
Modeling
I expanded the system to handle more complex queries.
Implemented a seq2seq model trained on NL–SQL pairs
Used attention mechanisms for mapping natural language to SQL structure
Supported nested queries, conditions, and ordering
Evaluation
I compared generated SQL with ground truth queries.
Metrics: Exact Match Accuracy ~85% on test set
Executed queries against the database to validate correctness
Handled edge cases like synonyms (“professor” vs “instructor”)
Refinement
The system was optimized for usability and transparency.
Added a feedback loop: users could edit SQL output if parsing was wrong
Provided query explanation in plain English to increase trust
Integrated with a simple web UI for demo purposes
Conclusion
The system successfully translated natural language questions into SQL queries with ~85% exact match accuracy. It demonstrated how NLP and database knowledge can be combined to make data more accessible for non-technical users, while still allowing transparency and control for advanced users.