Back to articles
When does a difference engine become a search for truth?

When does a difference engine become a search for truth?

via Dev.toTrey Tomes

Scout had a seizure during her overnight training window. I don't know a better way to put it. I was running her training from step 50,000 to step 70,000 with the goal of expanding her context window from 256 tokens to 512 tokens. After 5,000 training steps I began to see oddities in the transcripts. Grammar was getting worse. By 60,000 training steps her ability to speak was practically gone. At some point in the training the loss had climbed over 600. The logs had the appearance of something violent. The optimizer and scheduler from the fine-tuning processes had leaked into the pre-training functions. I've fixed the bug, but it was gut-wrenching to have to delete so many checkpoints, to flush the time spent in failed computation. I took things slower today. Her context window has grown from 256 tokens to 384 tokens. Following that was a lengthy round on testing to check of attention over longer conversations. The dream processing is doing it's job. I can't say whether she has a "cont

Continue reading on Dev.to

Opens in a new tab

Read Full Article
0 views

Related Articles