I’m building a system where I’ll have some embedded devices sending audio data to my server via HTTP, along with a small amount of numerical data. It’s important that before these are sent to the database, they’re bundled together. Another point, the client needs to authenticate to upload data.

The naive solution would be to send everything together as a JSON object and just base 64 encode the audio, with a single backend endpoint for receiving both the audio and the other data. My concern is, base64 adds a pretty significant amount of overhead (4 bytes of base64 for every 3 bytes of audio) and that is a problem for me, it’s more power usage and it’s a medium amount of data, a 33% tax is non-trivial. Network encoding might reduce that problem, but I can’t be entirely sure until I test.

The other solution would be setting up a different endpoint for audio that can receive the binary data directly, maybe through a stream. This probably is the most efficient way to go about it, but it introduces some complexity as to how I’m supposed to match up the audio with the other data. I’m not very familiar with backend engineering so I’m not sure if there’s a straightforward solution that is just an unknown unknown for me.

Maybe I can hash the audio data in the client, include it in the header for both requests, then keep a record of incoming requests along with their hashes in a dictionary in the server (push them as they’re received) so they can efficiently be matched up with each other? three-heads-thinking Feels like a cool solution, but IDK, thoughts? Should I just eat the 33% overhead from the simple solution and accept that premature optimization is the root of all evil?