About
I’m a junior AI/ML engineer with a background in software development and a growing interest in building real-world AI applications. I enjoy working on projects where I can take an idea from raw data all the way to a working system that people can actually use.
Recently, I built a RAG-based hiking planner that combines vector search (ChromaDB) with a local LLM (Llama 3 using Ollama) to recommend hiking trails in the UK. I worked on connecting trail data with train station data, added filtering based on distance and difficulty, and created an interactive map using Streamlit. I also completed a Crime Prediction Dashboard project using real data from Bristol, where I developed a machine learning pipeline, trained XGBoost models, and deployed the results in an interactive dashboard.
At the same time, I work as an AI Data Trainer, where I help improve large language models by evaluating responses, testing prompts, and identifying errors in reasoning and code generation. This role helped me better understand how LLMs behave and how to improve output quality.
Before focusing on AI, I worked as a Junior Software Engineer and Web Developer Intern, using technologies like React, Spring Boot, and Flutter. This experience helped me understand how to build complete applications, not just models.
I enjoy solving practical problems using data and continuously learning new tools in AI. Outside of tech, I like cycling and bowling.
- Python
- R
- SQL
- Pandas
- NumPy
- Matplotlib
- Scikit-learn
- TensorFlow
- Tableau
- Streamlit
- OpenAI Gym
- Docker
- MongoDB
- Firebase
- PostgreSQL
- Git
- Google Cloud
- ChromaDB
- Llama3
Experience
February 2024 - Present - Supported training and fine-tuning of large language models (LLMs) through prompt engineering, response evaluation, and systematic error identification.
- Improved model performance on reasoning, code generation, and instruction-following tasks while ensuring compliance with project guidelines and data privacy standards.
- Prompt Engineeering
- LLM Evaluation
- HTML
- CSS
- JavaScript
January 2023 - July 2023 - Developed WordPress websites using Divi page builder, including custom themes, layouts, plugins, and Gutenberg modules with PHP and JavaScript.
- Conducted quality assurance (QA) testing and applied basic SEO principles to ensure functionality, responsiveness, and visibility.
- Completed a full client website (50+ pages) under tight deadlines, handling custom development, integration, and project management.
- Strengthened problem-solving, attention to detail, and time management skills while managing multiple development tasks.
- PHP
- JavaScript
- Wordpress
- Divi
- Custom Plugins
- Quality Assuarance
- SEO
September 2021 - February 2022 - Developed front-end web applications using ReactJS and NextJS, back-end APIs with Spring Boot, and mobile applications using Flutter.
- Conducted SEO optimisation and generated reports using Java Jasper, while gaining exposure to ERP processes including sales and purchasing documents.
- Utilised internal frameworks to achieve project targets while demonstrating transparency, accountability, and a strong work ethic.
- Java
- Flutter
- SpringBoot
- ReactJS
- NextJS
- Jasper
- SEO
Projects
Built a local RAG application that recommends UK hiking trails accessible by train. Designed an end-to-end pipeline combining vector search (ChromaDB), local LLM inference (Llama 3 via Ollama), and geospatial distance calculations (Haversine formula) to link trail data with real UK train station data. Implemented semantic search with metadata filtering for difficulty and distance preferences, and built an interactive map interface using Streamlit and PyDeck.
- Python
- PyDeck
- Streamlit
- RAG
- Llama3
- Ollama
- ChromaDB
Developed an machine learning pipeline using real crime data from Bristol. Performed data cleaning, feature engineering (time, location, crime type), and aggregated data at postcode level. Trained XGBoost models to predict monthly crime count and classify likely crime types. Evaluated using F1-score and MSE. Deployed an interactive dashboard via Streamlit to visualise predictions
- Python
- Streamlit
- ML Pipelines
- Scikit-learn
- Matplotlib
- Pandas
- NumPy
Pothole Severity Detection
GitHubThis project implements a pothole severity detection system using YOLOv5 for object detection. I parsed VOC XML annotations to extract bounding boxes around potholes in images, resized them, and converted to YOLO format for training. The dataset was split into train (80%), validation (10%), and test (10%) sets. Data augmentation techniques like horizontal flips and brightness adjustments were explored using Albumentations to enhance model robustness. The YOLOv5 model was trained with hyperparameters such as 416 image size, batch 16, and 50 epochs, achieving effective pothole detection. Evaluation included confusion matrices visualized with Seaborn and Matplotlib to assess classification accuracy across severity levels (Immediate, Moderate, No Immediate Attention). This project showcases my experience in applying Python, computer vision, dataset preparation, and deep learning for real-world applications.
- Python
- PyTorch
- Scikit-learn
- Matplotlib
- Seaborn
- Jupyter Notebook
- YOLO
Music-Recommendation-System
GitHubDeveloped a hybrid music recommendation system combining a knowledge-based system (KBS) with Graph Neural Networks (GNNs) to deliver accurate song recommendations. Using two datasets (song and artist terms), We preprocessed data by handling null values, normalising numerical features like tempo and duration, and creating a knowledge graph with NetworkX, which was converted to a DGL graph for GCN training. The GCN model, built with PyTorch, used two convolutional layers with ReLU activation and was optimised through hyperparameter tuning (learning rate: 0.01, hidden size: 16), achieving a training accuracy of 0.5022. The system recommends the top 5 songs based on song and artist features, evaluated using intra-similarity scores (e.g., 0.9994 for '90s' input). Ethical considerations ensured compliance with GDPR and copyright laws, maintaining data privacy and fairness. This project showcases my expertise in Python, NetworkX, PyTorch, DGL, data preprocessing, and collaborative problem-solving.
- Python
- Jupyter Notebook
- PyTorch
- Knowledge-based Systems (KBS)
- Graph Neural Networks (GNN)
Student Performance Analysis
GitHubFor my university coursework, I developed a machine learning pipeline to predict student academic performance using the Higher Education Students Performance Evaluation dataset, identifying key factors influencing grades for stakeholders like educational institutions and teachers. Through thorough data exploration, I visualised patterns and outliers, preprocessed the data by imputing missing values and removing irrelevant features, and addressed class imbalance using SMOTE and RandomUnderSampler. I selected impactful features via univariate selection and correlation analysis, then trained and optimised models including Logistic Regression, Random Forest (achieving 44.83% accuracy), KNN, Decision Tree, SVM, and RNN using GridSearchCV, evaluating performance with confusion matrices and ROC curves. Adhering to ethical data handling under the dataset’s Creative Commons license, I ensured anonymity and fairness, enhancing my skills in Python, scikit-learn, TensorFlow, and independent problem-solving while delivering actionable insights for educational decision-making.
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- TensorFlow
- Jupyter Notebook