• SwingingTheLamp
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    1
    ·
    edit-2
    2 months ago

    Case-sensitive is easier to implement; it’s just a string of bytes. Case-insensitive requires a lot of code to get right, since it has to interpret symbols that make sense to humans. So, something over wondered about:

    That’s not hard for ASCII, but what about Unicode? Is the precomposed ç treated the same lexically and by the API as Latin capital letter c + combining cedilla? Does the OS normalize all of one form to the other? Is ß the same as SS? What about alternate glyphs, like half width or full width forms? Is it i18n-sensitive, so that, say, E and É are treated the same in French localization? Are Katakana and Hiragana characters equivalent?

    I dunno, as a long-time Unix and Linux user, I haven’t tried these things, but it seems odd to me to build a set of character equivalences into the filesystem code, unless you’re going to do do all of them. (But then, they’re idiosyncratic and may conflict between languages, like how ö is its letter in the Swedish alphabet.)

    • pedz@lemmy.ca
      link
      fedilink
      arrow-up
      2
      ·
      2 months ago

      This thread is giving me flashbacks to the times before Unicode, when swapping files between Windows and Linux partitions would have a good chance of fucking up every non-ASCII characters in their names.

      There was ways to set it up so the ISO character sets would match, but it was still a giant pain to deal with different ones.

      Blessed be Unicode.

      • zarenki@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        2 months ago

        A related issue I still see very often, even with files newly created just this year, is when trying to extract zip files on my Linux systems that contain non-ASCII filenames and that were created on Windows systems, especially ones with apparently non-English locales like Japanese. Need to trial and error the locale I give to unzip and sometimes hack together fixed names with iconv until the mojibake seems to fix itself.