• SirGolan@lemmy.sdf.org
    link
    fedilink
    arrow-up
    7
    arrow-down
    1
    ·
    11 months ago

    Yeah. They buried it in there (and for some of their experiments just said “ChatGPT” which could mean either), but they used 3.5 and oddly enough, 3.5 gets 48% on HumanEval.

    • fristislurper@feddit.nl
      link
      fedilink
      arrow-up
      6
      arrow-down
      2
      ·
      edit-2
      11 months ago

      They “burried” it in the methodology section, where they describe how they generate prompts. This is the place I expect this to be mentioned, or am I missing something? Where else would they put it.

      • SirGolan@lemmy.sdf.org
        link
        fedilink
        arrow-up
        4
        arrow-down
        1
        ·
        11 months ago

        It’s a pretty important fact since there’s a huge difference between 3.5 and 4. Mentioning it once in one place is not great, plus they also just mention ChatGPT without specifying 3.5 or 4 earlier in that paragraph. The problem I have is this has led to press (and hence many other people) thinking ChatGPT is terrible at coding when in fact using the GPT 4 version, it’s actually pretty decent.