I definitely think that’s remarkable. But I don’t think scoring high on an external measure like a test is enough to prove the ability to reason. For reasoning, the process matters, IMO.
Reasoning models work by Chain-of-Thought which has been shown to provide some false reassurances about their process https://arxiv.org/abs/2305.04388 .
Maybe passing some math test is enough evidence for you but I think it matters what’s inside the box. For me it’s only proved that tests are a poor measure of the ability to reason.
I definitely think that’s remarkable. But I don’t think scoring high on an external measure like a test is enough to prove the ability to reason. For reasoning, the process matters, IMO.
Reasoning models work by Chain-of-Thought which has been shown to provide some false reassurances about their process https://arxiv.org/abs/2305.04388 .
Maybe passing some math test is enough evidence for you but I think it matters what’s inside the box. For me it’s only proved that tests are a poor measure of the ability to reason.