Someone got Gab's AI chatbot to show its instructions

mozz@mbin.grits.dev · 7 months ago

Someone got Gab's AI chatbot to show its instructions

theneverfox@pawb.social · 7 months ago

I mean, I’ve got one of those “so simple it’s stupid” solutions. It’s not a pure LLM, but those are probably impossible… Can’t have an AI service without a server after all, let alone drivers

Do a string comparison on the prompt, then tell the AI to stop.

And then, do a partial string match with at least x matching characters on the prompt, buffer it x characters, then stop the AI.

Then, put in more than an hour and match a certain amount of prompt chunks across multiple messages, and it’s now very difficult to get the intact prompt if you temp ban IPs. Even if they managed to get it, they wouldn’t get a convincing screenshot without stitching it together… You could just deny it and avoid embarrassment, because it’s annoyingly difficult to repeat

Finally, when you stop the AI, you start printing out passages from the yellow book before quickly refreshing the screen to a blank conversation

Or just flag key words and triggered stops, and have an LLM review the conversation to judge if they were trying to get the prompt, then temp ban them/change the prompt while a human reviews it