Closed
Description
As ggml-org/llama.cpp#6829 (great job llama.cpp!) is in, should be possible to extend our grpc server to distribute the workload to workers.
From a quick look the upstream implementation looks quite lean as we need to pass params to llama.cpp directly.
Only main point is that we want to propagate this setting from the CLI/env rather then having a config portion in the model
Activity