• 0 Posts
  • 5 Comments
Joined 2 years ago
cake
Cake day: July 13th, 2023

help-circle



  • xcjs@programming.devtoTechnology@lemmy.world*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    4 months ago

    That’s not how distillation works if I understand what you’re trying to explain.

    If you distill model A to a smaller model, you just get a smaller version of model A with the same approximate distribution curve of parameters, but fewer of them. You can’t distill Llama into Deepseek R1.

    I’ve been able to run distillations of Deepseek R1 up to 70B, and they’re all censored still. There is a version of Deepseek R1 “patched” with western values called R1-1776 that will answer topics censored by the Chinese government, however.