I can believe it insofar as they might not have explicitly programmed it to do that. I’d imagine they put in something like “Make sure your output aligns with Elon Musk’s opinions.”, “Elon Musk is always objectively correct.”, etc. From there, this would be emergent, but quite predictable behavior.
If the system prompt doesn’t tell it to search for Elon’s views, why is it doing that?
My best guess is that Grok “knows” that it is “Grok 4 buit by xAI”, and it knows that Elon Musk owns xAI, so in circumstances where it’s asked for an opinion the reasoning process often decides to see what Elon thinks.
Yeah, this blogger shows a fundamental misunderstanding of how LLMs work or how system prompts work. LLM behavior is not directly controlled by the system prompt the way this person imagines. For example, censorship that is present in the training set will be “baked in” to the model and the system prompt will not affect it, no matter how the LLM is told not to be censored in that way.
My best guess is that the LLM is interfacing with a tool in order to search through tweets, and the training set that demonstrates how to use the tool contains example searches for Elon Musk’s tweets.
Is my comment wrong though? Another possibility is that Grok is given an example of searching for Elon Musk’s tweets when it is presented with the available tool calls. Just because it outputs the system prompt when asked does not mean that we are seeing the full context, or even the real system prompt.
Posting blog guides on how to code with ChatGPT is not expertise on LLMs. It’s like thinking someone is an expert mechanic because they can drive a car well.
Lmao, sure…
I can believe it insofar as they might not have explicitly programmed it to do that. I’d imagine they put in something like “Make sure your output aligns with Elon Musk’s opinions.”, “Elon Musk is always objectively correct.”, etc. From there, this would be emergent, but quite predictable behavior.
Yeah the transparency of it might be unintended.
Yeah, this blogger shows a fundamental misunderstanding of how LLMs work or how system prompts work. LLM behavior is not directly controlled by the system prompt the way this person imagines. For example, censorship that is present in the training set will be “baked in” to the model and the system prompt will not affect it, no matter how the LLM is told not to be censored in that way.
My best guess is that the LLM is interfacing with a tool in order to search through tweets, and the training set that demonstrates how to use the tool contains example searches for Elon Musk’s tweets.
“This blogger” is Simon Willison, who has been doing LLM benchmarks and other LLM-related things since before it was cool
Not a random substack grifter
Is my comment wrong though? Another possibility is that Grok is given an example of searching for Elon Musk’s tweets when it is presented with the available tool calls. Just because it outputs the system prompt when asked does not mean that we are seeing the full context, or even the real system prompt.
Posting blog guides on how to code with ChatGPT is not expertise on LLMs. It’s like thinking someone is an expert mechanic because they can drive a car well.