
Why I Split Minutes Prediction Into Two Models Instead of One
A single regression model trained on NBA game logs predicts that Joel Embiid will play 11 minutes in a game where he's listed as OUT. The model has never seen a confident zero. Every row in the training data has some minutes played, because the standard NBA API endpoint only returns logs for games where a player was active. The model knows what 28 minutes looks like and what 34 minutes looks like. It has no idea what zero looks like. This is the root problem behind the two-stage minutes engine in CourtVision. The Training Data Gap The NBA API's PlayerGameLog endpoint returns one row per game for every game a player appeared in. If Embiid sits, there's no row. If Tyrese Maxey plays 38 minutes, there's a row. The dataset is survivor-biased: it only contains games where players actually played. Train a regressor on this dataset, feed it features for a player who's clearly going to sit, and the model interpolates. It finds the nearest neighborhood in the feature space and returns a plausib
Continue reading on Dev.to
Opens in a new tab




