
Encoding Human Values as Geometry: The Gravitational Objective
Part 2 of the AletheionLLM-v2 geometry series. Part 1: How to measure whether your model's uncertainty space is flat or curved. The previous post left an open question: if the training corpus curves the epistemic manifold, what curves it toward alignment? The three branches described there (diagonal, full_mahalanobis, real_geodesic) are all about measuring the geometry that already exists. None of them ask how to modify it. That is what the fourth branch is for. The problem with value alignment as rules Most alignment approaches add constraints over outputs. The model generates something, a filter checks it against a list of prohibited patterns, and the output is blocked or modified. This works until it encounters something the filter has never seen. The geometric framing suggests a different question: instead of blocking outputs after generation, what if misaligned regions of the epistemic manifold were intrinsically more costly to navigate toward? Not a fence around dangerous territo
Continue reading on Dev.to Python
Opens in a new tab



