Back to projects
Data ScienceData Analytics
Flickr Photo Dataset — Exploratory Data Analysis
EDA on 64,000+ photo records across XML & JSON sources
Performed end-to-end exploratory data analysis on a 64,000+ record Flickr photo dataset using XML and JSON sources to identify behavioural, geographic, and publishing-pattern insights.
PythonPandasEDAXMLJSONData CleaningData VisualisationStatistical Analysis
Problem
A large, heterogeneous Flickr metadata corpus across XML and JSON needed unified parsing and exploratory analysis to surface meaningful behavioural and geographic patterns.
My role
Data Analyst — owned ingestion, parsing, cleaning, EDA, and visualisation.
Solution
Built a reproducible Pandas pipeline to parse and merge XML/JSON sources, then ran univariate, bivariate, and multivariate analyses with structured visualisations to summarise behavioural, geographic, and publishing-latency insights.
Challenges
- ›Reconciling XML and JSON schemas into one analytical view.
- ›Handling temporal, geographic, and tag dimensions consistently.
- ›Translating multi-dimensional patterns into clear narratives.
Key features
- ›Parsed and merged XML and JSON datasets into a single analysis-ready structure.
- ›Conducted univariate, bivariate, and multivariate analysis across temporal, geographic, tag, and publishing-latency dimensions.
- ›Analysed photo metadata across 20+ countries and Australian states.
- ›Identified behavioural insights around minor city users, tag distribution, and publication latency.
- ›Generated downstream ML question ideas for geo-tag quality control, recommendation, and latency prediction.
Results
- ›Cohesive EDA report across 64,000+ records.
- ›Documented behavioural insights and downstream ML question ideas.
- ›Reusable parsing and EDA templates.
64,000+ records analysed
20+ countries covered
Multi-dimensional EDA report