• 1 Post
  • 273 Comments
Joined 1 year ago
cake
Cake day: June 11th, 2023

help-circle













  • I generally agree with your comment, but not on this part:

    parroting the responses to questions that already existed in their input.

    They’re quite capable of following instructions over data where neither the instruction nor the data was anywhere in the training data.

    They’re completely incapable of critical thought or even basic reasoning.

    Critical thought, generally no. Basic reasoning, that they’re somewhat capable of. And chain of thought amplifies what little is there.




  • Increase context length, probably enable flash attention in ollama too. Llama3.1 support up to 128k context length, for example. That’s in tokens and a token is on average a bit under 4 letters.

    Note that higher context length requires more ram and it’s slower, so you ideally want to find a sweet spot for your use and hardware. Flash attention makes this more efficient

    Oh, and the model needs to have been trained at larger contexts, otherwise it tends to handle it poorly. So you should check what max length the model you want to use was trained to handle