We can generate AI images, we can generate AI text, but text in an image is a no go?

  • 🐑🇸 🇭 🇪 🇪 🇵 🇱 🇪🐑@lemmy.world
    link
    fedilink
    English
    arrow-up
    63
    arrow-down
    16
    ·
    edit-2
    11 months ago

    Because AI doesn’t fucking understand what it creates. It follows patterns and it shows intensely much when it tries to generate text. All it sees are “Patterned squiggles” and not processing words.

    Did you expect the plagiarism machine to truly understand what it makes?

  • foggy@lemmy.world
    link
    fedilink
    arrow-up
    31
    arrow-down
    4
    ·
    11 months ago

    This happens for humans when they dream, too.

    Basically, when recognizing or producing accurate text isn’t the utility function… You don’t get accurate text.

    Welcome to the matrix or something.

    • 🐑🇸 🇭 🇪 🇪 🇵 🇱 🇪🐑@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      2
      ·
      11 months ago

      I noticed that I find writing in dreams exceedingly hard. “reading” is easy because I never truly read it but rather so just so happen to know what a word says without analysing it.

      Once I have to write the word it becomes hell however as I can’t seem to manage to make anything coherent with things morphing around constantly.

      • Swedneck@discuss.tchncs.de
        link
        fedilink
        arrow-up
        2
        ·
        11 months ago

        interesting, i only ever remember reading in my dreams and several times now i’ve had dreams where i recognize that text changes every time i read it and is just vaguely correct sounding nonsense like “blueab smolbob eat blitsfop”, and the reason i realize this is specifically because it’s frustrating to try to read something and it… doesn’t work??

    • jacksilver@lemmy.world
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      11 months ago

      To add to this, the way the AI is trained is that you pass in images with descriptions (for the most part). Since most descriptions focus on the main concepts, it generally won’t have the actual text included in the descriptions. Without the being included in the descriptions, the AI will have a hard time learning the meaning of the squiggles in the images. In addition those squiggles can represent a lot of different things, so even if it grows to “understand” letters, it’s really hard to “understand” their meaning; thus leading to a lot of weird words/text.

      • Swedneck@discuss.tchncs.de
        link
        fedilink
        arrow-up
        1
        ·
        11 months ago

        it’s pretty fun to look at how they almost get it right in some cases, like if you prompt “birthday” you might get some text that almost looks like “happy birthday” followed by a smudge that is supposed to be a name, but also probably some actually correct numbers because those are much more predictable!

  • hoshikarakitaridia@sh.itjust.works
    link
    fedilink
    arrow-up
    9
    arrow-down
    2
    ·
    11 months ago

    The best answer will require a very technical understanding, but I’ll give it a try and stay abstract.

    The AI is trained using images. If you type in things like “a tree” it has a vague idea of what it looks like.

    The thing is writing letters is a hard concept. How should the AI know text is made up of letters? Connected lines make a letter and unconnected ones don’t. Sentences are connected using dots.

    Easy enough for us, you have to imagine an AI is best with what it can directly observe. But knowing when to literally write out letters is hard. So it has a stroke. It has a vague notion of “this is where text is supposed to go” but making the letters look right in an adjusted font, remembering where letters end and how words are spaced; all of this is far too complex.

    Now I haven’t looked into it for AIs who CAN generate text more well, but I assume the only they do this is by deciding “there’s gonna be text” and then using another process to insert the text basically after the fact. Or maybe there’s some special process change in the training or inference of the image going on? Idk, for this one I need an expert.

  • WetFerret@lemmy.world
    link
    fedilink
    arrow-up
    3
    arrow-down
    5
    ·
    11 months ago

    I don’t understand why image generators can’t just make a quick call to a chatGPT API? It’s incredibly competent at producing convincing text.