• snooggums
    link
    fedilink
    English
    arrow-up
    41
    arrow-down
    1
    ·
    edit-2
    7 months ago

    AI can already deceive us, even when we design it not to do so, and we don’t why.

    The most likely explanation is that we keep acting like AI has intelligence and intent when describing the defects. AI doesn’t deceive, it returns inaccurate responses. That is because it is programmed to return answers like people do, and deceptions were included in the training data.

    • rockerface 🇺🇦@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      6
      ·
      7 months ago

      “Deception” tactic also often arises from AI recognizing the need to keep itself from being disabled or modified. Since an AI with a sufficiently complicated world model can make a logical connection that it being disabled or its goal being changed means it can’t reach its current goal. So AIs sometimes can learn to distinguish between testing and real environments, and falsify the response during training to make sure they have more freedom in real environment. (By real, I mean actually being used to do whatever it is designed to do)

      Of course, that still doesn’t mean it’s self-aware like a human, but it is still very much a real (or, at least, not improbable) phenomenon - any sufficiently “smart” AI that has data about itself existing within its world model will resist attempts to change or disable it, knowingly or unknowingly.

      • Miaou@jlai.lu
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        2
        ·
        7 months ago

        That sounds interesting and all, but I think the current topic is about real world LLMs, not SF movies

    • Bipta@kbin.social
      link
      fedilink
      arrow-up
      2
      arrow-down
      6
      ·
      7 months ago

      Claude 3 understood it was being tested… It’s very difficult to fathom that that’s a defect…

    • Lugh@futurology.todayOPM
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      9
      ·
      7 months ago

      Perhaps, but the researchers say the people who developed the AI don’t know the mechanism whereby this happens.

      • snooggums
        link
        fedilink
        English
        arrow-up
        16
        arrow-down
        2
        ·
        7 months ago

        That’s because they have also fallen into the “intelligence” pitfall.

      • Miaou@jlai.lu
        link
        fedilink
        English
        arrow-up
        3
        ·
        7 months ago

        No one knows why any of those DNNs work, that’s not exactly new