• Trantarius@programming.dev
    link
    fedilink
    arrow-up
    33
    ·
    1 year ago

    Well letters don’t really have a single canonical shape. There are many acceptable ways of rendering each. While two letters might usually look the same, it is very possible that some shape could be acceptable for one but not the other. So, it makes sense to distinguish between them in binary representation. That allows the interpreting software to determine if it cares about the difference or not.

    Also, the Unicode code tables do mention which characters look (nearly) identical, so it’s definitely possible to make a program interpret something like a Greek question mark the same as a semicolon. I guess it’s just that no one has bothered, since it’s such a rare edge case.

    • yum13241@lemm.ee
      link
      fedilink
      arrow-up
      2
      arrow-down
      7
      ·
      1 year ago

      Why are the Latin “a” and the Cryilic “a” THE FUCKING SAME?

      • mrpants
        link
        fedilink
        English
        arrow-up
        21
        ·
        1 year ago

        In cases where something looks stupid but your knowledge on it is almost zero it’s entirely possible that it’s not.

        The people that maintain Unicode have put a lot of thought and effort into this. Might be helpful to research why rather than assuming you have a better way despite little knowledge of the subject.

        • yum13241@lemm.ee
          link
          fedilink
          arrow-up
          2
          arrow-down
          15
          ·
          1 year ago

          When it’s A FUCKING SECURITY issue, I know damn well what I’m talking about.

          • mrpants
            link
            fedilink
            English
            arrow-up
            8
            ·
            1 year ago

            Again you do not because the world consists of more than your interests and job description.

            • yum13241@lemm.ee
              link
              fedilink
              arrow-up
              1
              arrow-down
              2
              ·
              1 year ago

              I know damn well what I’m talking about when someone could get scammed on “apple.com” but with a Cyrillic A.

              • mrpants
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                1 year ago

                You know the problem but not the set of reasonable or practical solutions.

                Anyways I and l look identical too in many fonts. Should we make them the same letter?

                • yum13241@lemm.ee
                  link
                  fedilink
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  1 year ago

                  No, but that’s what Unicode does.

                  The solution is to force font creators to be fucking reasonable, just like how the Cyrillic A looks exactly like the Latin A. They are the same letter. The letters L and I are totally different (in handwriting at least)

                  They already did that for CJK. Make characters that look the same in handwriting b have be same codepointer.

          • kattfisk@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            2
            ·
            1 year ago

            I and l also look identical in many fonts. So you already have this problem in ascii. (To say nothing of all the non-printing characters!)

            If your security relies on a person being able to tell the difference between two characters controlled by an attacker your security is bad.

            • yum13241@lemm.ee
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              1 year ago

              The problem is when you can register “apple.com” with the Cryillic A, fooling many.

              The I l issue is caused by fonts, not by ASCII.