I have been messing around with creating a homoglyph keyboard for Android, but I’m wondering if it’s even worthwhile. Is there any benefit to masking your messages with homoglyphs? Primarily I think it could defend against an LLMs ability to easily scrape messages. In my experiments ChatGPT and DeepSeek both get confused by homoglyph messages unless you instruct it to determine the likely alphabet characters and numbers for each individual character.
For the uninitiated, Ꮋ0ᛖοԌⅼуᏢʜѕ áᚱе ᏟhäʀɑсᎢᎬᚱႽ thàτ Lоοᛕ ⅼіᛕË ᏞëtTêᚱᏚ
This might fool some scrapping, but at the expense of making it not very legible for humans too. Also while ti might work right now, if it ever became a popular approach the AI scrapping could easily adapt. I expect they already try to correct for spelling mistakes anyway.
It reminds me of leet-speak. The custom keyboard is not a bad idea though.
Maybe, but also definitely will disadvantage screen reader users.
And anyone who has eye issues. My text size isn’t cranked up because I’m great at reading a bunch of squiggly shit, and if I saw more than one sentence of that I’d just not read it.
Yeah, I imagine it would be pretty unpleasant for anyone with dyslexia as well.
I suppose so, but I don’t see poisoning the LLM dataset in this way as a privacy thing, per se. It sounds more like performance art at best and futile pissing in the sea at worst.
that’s great! I believe it would be useful for simple word-matching filters but in the long run LLMs will read it no problem. I would use it if you make it public
Most of the characters you have there are still regular letters, too, just with accents and fullwidth variants, so it’s pretty easy to map these to simpler character sets like ASCII. You’d probably have to get real creative with abusing multiple writing systems like cyrillic, katakana, and so on.
deleted by creator