Sujet : Re: Random/OT: Low sample rate audio weirdness/mystery
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.archDate : 08. Sep 2025, 00:58:18
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <109l66r$3vo35$1@dont-email.me>
References : 1 2 3 4 5
User-Agent : Mozilla Thunderbird
On 9/7/2025 4:13 PM, MitchAlsup wrote:
BGB <cr88192@gmail.com> posted:
On 9/7/2025 5:26 AM, Terje Mathisen wrote:
MitchAlsup wrote:
>
BGB <cr88192@gmail.com> posted:
>
Just randomly thinking again about some things I noticed with audio at
low sample rates.
>
For baseline, can note, basic sample rates:
44100: Standard, sounds good, but bulky
>
No it does not sound "good" on a system that accurately reproduces
22KHz; like systems with electrostatic speakers covering the high
end of the audio spectrum.
>
Might sound "good" to someone who does not know what it is supposed
to actually sound like, though.
>
My ears are not good enough to notice the difference between CD quality,
AAC/high sample rate MP3/ogg vorbis/etc, but according to my savant (?)
cousin who could listen to a 16 min piece of music once and then write
down the score for all the instruments, none of them sound like live,
but they are close enough that he can listen and internally translate to
what it would have sounded like in a concert.
>
>
To me, 44100 and 48000 sound basically the same, so not much gain in
going higher.
>
The difference between 32000 and 44100 is slight.
The difference is in the phase of the high end spectrum 15K-22K
I can notice a slight difference, but as noted, it isn't much...
Meanwhile, decided to check the delta between:
Audio downsampled from 16K to 8K via averaging pairs of samples;
Audio downsampled from 16K to 8K via spline curve fitting.
And, I had noticed there is a difference in the 8 kHz signals.
The curve-fitting delta signal is quite strong in high frequencies (with much of the total energy in the 2 to 4 kHz range); and actually a fair bit louder than could have been expected.
The difference signal itself contains intelligible speech (and most other significant aspects of the audio), though exists pretty much entirely in the high part of the frequency range.
Where, say (S0/1/2/3) for a spline, point between 1 and 2:
Linear:
V=(S1*(1-F))+(s2*F)
Quadratic Spline (Bezier)
P1=(S0*(1-F))+(S1*F)
P2=(S1*(1-F))+(S2*F)
V=(P1*(1-F))+(P2*F)
Cubic Spline
P1=(S0*(1-F))+(S1*F)
P2=(S1*(1-F))+(S2*F)
P3=(S2*(1-F))+(S3*F)
Q1=(P1*(1-F))+(P2*F)
Q2=(P2*(1-F))+(P3*F)
V=(Q1*(1-F))+(Q2*F)
This is a different spline construction than I had usually used for audio processing:
G=1-F
P1=(S1*(1+F))-(S0*F)
P2=(S2*(1+G))-(S3*G)
V=(P1*(1-F))+(P2*F)
But it seems the former may have more useful properties in this case (mostly in that estimating the control points the former splines better preserves high-frequency properties of the signal); whereas the latter is more solely useful for interpolation tasks (such as upsampling).
Though, for 2x cases, F is only ever 0.25 or 0.75, partly simplifying the math.
But, for calculating the points, one doesn't actually have the former or following control points, so it is necessary to carry out the math for additional samples into the past and future to estimate the other control points to try to calculate the current control-point (or, a bit more hairy). For the terminal points, linear extrapolation seems to work.
But, yeah, it seems the control-points style signal seems to be significantly boosted in terms of high-frequency components.
And, as audio, it seems to preserve some aspects of the 16kHz signal that are otherwise lost when downsampling to 8 kHz.
I guess I could try looking some at a reconstructed version of the 16 kHz sample and see if anything survives past the 4kHz mark.
Well, OK, trying to resample it up to 16kHz using the B-spline is just sort of being a bit weird. Seems almost like the math is broken somehow.
In the reconstruction attempt there are a few big notches in the spectrum; seems to be an issue with the output spline rather than the input signal.
Seems to not be an issue with my typical spline, rather something specific about my attempt at upsampling again with with a cubic Bezier spline.
The upsampled reconstruction attempt sounds like dog crap; but does interestingly seem to have stuff going on beyond past the 4kHz Nyquist cutoff (so, this still leaves the possibility that parts of the higher frequencies may be surviving the downsampling process).
Though, curiously, despite sounding like dog-crap and having big notches in the spectrum, the Bezier Spline reconstruction does have the lower RMSE value for some reason.
Though, can note that the input audio it seems is fairly weak in the 4-8 kHz range, so isn't entirely obvious what specifically is being effected in downsampling, but seemingly clearly something at least.
Well, more fiddling it seem to try to figure this out...