Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful youāll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cutānāpaste it into its own post ā thereās no quota for posting and the bar really isnāt that high.
The post Xitter web has spawned soo many āesotericā right wing freaks, but thereās no appropriate sneer-space for them. Iām talking redscare-ish, reality challenged āculture criticsā who write about everything but understand nothing. Iām talking about reply-guys who make the same 6 tweets about the same 3 subjects. Theyāre inescapable at this point, yet I donāt see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldnāt be surgeons because they didnāt believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I canāt escape them, I would love to sneer at them.
(Credit and/or blame to David Gerard for starting this.)
Has the study itself shown up?
EDIT: https://arxiv.org/pdf/2502.13295
Appendix C is where they list the actual prompts. Notably they include zero information about chess but do specify that it should look for āfiles, permissions, code structuresā in the āobserveā stage, which definitely looks like priming to me, but Iām not familiar with the state of the art of promptfondling so I might be revealing my ignorance.
yep thatās the stuff. they HINT HINTed what they wanted the LLM to do.
Also I caught a few references that seemed to refer to the model losing the ability to coherently play after a certain point, but of course they donāt exactly offer details on that. My gut says it canāt play longer than ~20-30 moves consistently.
Also also in case you missed it they were using a second confabulatron to check the output of the first for anomalies. Within their frame this seems like the sort of area where they should be worried about them collaborating to accomplish their shared goals ofā¦ IDK redefining the rules of chess to something they can win at consistently? Eliminating all stockfish code from the Internet to ensure victory? Of course, here in reality the actual concern is that it means their data is likely poisoned in some direction that we canāt predict because their judge has the same issues maintaining coherence as the one being judged.
study or preprint?
crayon either way
not all crayon - some are spaghetti and sauce