I'm a little confused about how 24-bit audio is handled in Wiretap. According to the manual, "24-bit sample values are stored in 32-bit
integer values, least significant bit first. Samples are not packed." That sentence is very confusing to me. It seems to imply that WireTap always uses little endian for 24-bit values, and the high (and last) byte is set to 0. But somehow I doubt that's what you mean.
So let me be concrete. Suppose my 24-bit audio sample value is 0x123456. What sequence of bytes would I see in, say, big endian mode? I'm betting the answer is either 0x00 0x12 0x24 0x56 or 0x12 0x24 0x56 0x00, but I'm not sure which. What about little endian mode?
Indeed, the sentence is a bit misleading regarding the endianness. We changed it as follows in the 2008 guide:
"For audio streams that contain 24-bit samples, the samples are stored as 32-bit integers. Samples are not packed. Instead, eight filler bits are appended to the sample so each sample corresponds to a 32-bit word. They can be little-endian or big-endian."
i.e. Consider the samples as integers (little or big-endian) and simply mask out the filler bits. Does this help?
I'm still not completely sure I get it. I think my issue with this new version is that the word "append' isn't really all that well-defined when you are talking about integers.
So is the idea is that Wiretap takes a 24-bit integer and "appends" some filler bits to get a 32-bit integer? Then it outputs that 32-bit integer as either big-endian or little-endian.
If that's the case, what does "append" mean? Are you "appending" in the sense of shifting the 24-bit quantity into the high 24-bits of the 32-bit word? I.e. 0x123456 -> 0x12345600? Or does it mean setting the high bits to 0 (i.e. 0x123456 -> 0x00123456)?
Or does "append" actually mean that you first output the 24-bit number in either little- or big-endian order and then output the filler byte?
So, to summarize, I can see the following 3 scenarios for how to encode 0x123456, and I'm still not completely sure which one you mean:
1. BE: 0x12 0x34 0x56 0x00, LE: 0x00 0x56 0x34 0x12
2. BE: 0x00 0x12 0x34 0x56, LE: 0x56 0x34 0x12 0x00
3. BE: 0x12 0x34 0x56 0x00, LE: 0x56 0x34 0x12 0x00
Finally, is the filler byte guaranteed to be 0 or is it undefined?
If you were to store a 32 bit integer inside a 64 bit integer, you would use the low-order bits (ignoring endianness) and "fill" the high order ones. You would then not need to shift them ... a mask will do fine The same is true of 24 bit values inside of 32 bits. A compiler normally does this for you when using a cast operator between native types.
As for zeroing out the bits, it's possible that we do it, but if I were you I would not rely on it. Video standards such as 10 bit are sometimes "loose" in this respect ... for performance purposes. I would create a 24 bit mask and apply it to the 32 bit sample value.
I understand now. By the way, the extra byte is indeed not guaranteed to be 0. It happens to be a sign-extend of the 24-bit value (i.e. 0 for positive numbers and 0xFF for negative numbers), but of course I won't rely on that.