NBA teams often make multi-million dollar decisions based on player performance, yet salary allocations don't always reflect on-court value. I wanted to explore this imbalance using real data from the 2024–25 season, aiming to identify over- or underpaid players using a data-driven approach.
My goal was to build a fully automated data pipeline that collects player statistics and salary data, calculates performance-based value metrics, and delivers interactive visual insights. The system needed to refresh every 4 hours, ensuring timely updates throughout the season.
nba_api and used Python to preprocess and clean the data (including name normalization and standardization of team abbreviations) and load to a PostgreSQL database.
The pipeline now automatically updates every 4 hours, generating a reliable dataset for NBA player valuation. It provides actionable insights, such as identifying undervalued players like Naz Reid, whose contributions exceed their salary share. The interactive Superset dashboards allow filtering by team and player, enabling users to explore patterns in value efficiency across the league.