• jubilationtcornpone@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    25
    ·
    10 months ago

    Project A: Has 6 different implementations of the same complex business logic.

    Project B: Has one implementation of the complex business logic… But it’s ALL in one function with 17 arguments and 1288 lines of code.

    “The toast always lands the buttered side down.”

      • CanadaPlus@futurology.today
        link
        fedilink
        English
        arrow-up
        3
        ·
        10 months ago

        Actually, I bet you could implement that in less. You should be able to legibly get several weights in one line.

        • QuazarOmega@lemy.lol
          link
          fedilink
          arrow-up
          1
          ·
          10 months ago

          You have my interest! (Mainly because I don’t know the first thing about implementing neutral networks)

          • CanadaPlus@futurology.today
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            10 months ago

            At the simplest, it takes in a vector of floating-point numbers, multiplies them with other similar vectors (the “weights”), sums each one, applies a RELU* the the result, and then uses those values as a vector for another layer with it’s own weights (or gives output). The magic is in the weights.

            This operation is a simple matrix-by-vector product followed by pairwise RELU, if you know what that means.

            In Haskell, something like:

            layer layerInput layerWeights = map relu $ map sum $ map (zipWith (*) layerInput) layerWeights

            foldl layer modelInput modelWeights

            Where modelWeights is [[[Float]]], and so layer has type [Float] -> [[Float]] -> [Float].

            * RELU: if i>0 then i else 0. It could also be another nonlinear function, but RELU is obviously fast and works about as well as anything else. There’s interesting theoretical work on certain really weird functions, though.


            Less simple, it might have a set pattern of zero weights which can be ignored, allowing fast implementation with a bunch of smaller vectors, or have pairwise multiplication steps, like in the Transformer. Aaand that’s about it, all the rest is stuff that was figured out by trail and error like encoding, and the math behind how to train the weights. Now you know.

            Assuming you use hex values for 32-bit weights, you could write a line with 4 no problem:

            wgt35 = [0x1234FCAB, 0x1234FCAB, 0x1234FCAB, 0x1234FCAB];

            And, you can sometimes get away with half-precision floats.

            • QuazarOmega@lemy.lol
              link
              fedilink
              arrow-up
              1
              ·
              10 months ago

              That’s cool, though honestly I haven’t fully understood, but that’s probably because I don’t know Haskell, that line looked like complete gibberish to me lol. At least I think I got the gist of things on a high level, I’m always curious to understand but never dare to dive deep (holds self from making deep learning joke). Much appriciated btw!

              • CanadaPlus@futurology.today
                link
                fedilink
                English
                arrow-up
                2
                ·
                10 months ago

                Yeah, maybe somebody can translate for you. I considered using something else, but it was already long and I didn’t feel like writing out multiple loops.

                No worries. It’s neat how much such a comparatively simple concept can do, with enough data to work from. Circa-2010 I thought it would never work, lol.