- cross-posted to:
- technology@lemmy.ml
- cross-posted to:
- technology@lemmy.ml
Whack yakety-yak app chaps rapped for security crack
You must log in or # to comment.
Is it possible to implement a perfect guardrail on an AI model such that it will never ever spit out a certain piece of information? I feel like these models are so complex that you can always eventually find the perfect combination of words to circumvent any attempts to prevent prompt injection.
Reminded me of this game: https://gandalf.lakera.ai/intro