Completely depends on your laptop hardware, but generally:
TabbyAPI (exllamav2/exllamav3)
ik_llama.cpp, and its openai server
kobold.cpp (or kobold.cpp rocm, or croco.cpp, depends)
An MLX host with one of the new distillation quantizations
Text-gen-web-ui (slow, but supports a lot of samplers and some exotic quantizations)
SGLang (extremely fast for parallel calls if thats what you want).
Aphrodite Engine (lots of samplers, and fast at the expense of some VRAM usage).
I use text-gen-web-ui at the moment only because TabbyAPI is a little broken with exllamav3 (which is utterly awesome for Qwen3), otherwise I’d almost always stick to TabbyAPI.
Tell me (vaguely) what your system has, and I can be more specific.
Completely depends on your laptop hardware, but generally:
I use text-gen-web-ui at the moment only because TabbyAPI is a little broken with exllamav3 (which is utterly awesome for Qwen3), otherwise I’d almost always stick to TabbyAPI.
Tell me (vaguely) what your system has, and I can be more specific.