Log-odds... What?
Everyone knows Bayes' rule, of course. The probability form especially, it is very flexible, and you can easily derive any variable knowing just 3 inputs.
Log-odds form feels more interesting to me, in that it allows for doing simpler
arithmetic of additions and negations in one's head. Who on Earth does
multiplication or division of numbers from 0 to 1 without a calculator after
all? Moreover, I think the equations for getting posteriors are more intuitive
if you already know them. Although they are not that flexible, you can't just do
1 - x
in log-odds space.
I will be using g(x) = log10(x) * 10
below to avoid repetition. As a
refresher about logarithms remember this key rule:
log(a / b) = log(a) - log(b)
, and therefore the reciprocal is just a sign flip.
First thing to remember when working with log-odds form, we are interested in relative probability of events. Where logarithm of the relative probability I would call intensity.
I did not derive the equation for a|b
, it is out there on the Internet. The
equation is g(a|b / !a|b) = g(a / !a) + g(b|a / b|!a)
. Obviously calculating
the likelihood ratio for evidence is another division that we want to avoid, we
can do so by applying the logarithm rule and get this: g(a|b) - g(!a|b) = g(a) - g(!a) + g(b|a) - g(b|!a)
. Now it's a matter of getting a sense of how
logarithm scale maps to probabilities, but more on that later.
If you look at the equation long enough you might actually see that it is similar to how we naturally think about hypothesis: "given my prior knowledge of likelihood of the event the extra evidence makes me X amount more confident of it happening under the given condition".
By symmetry, g(b|a) - g(!b|a)
equals to g(b) - g(!b) + g(a|b) - g(a|!b)
,
which seems pretty straightforward.
How about g(a|!b) - g(!a|!b)
? I struggled a little with that one, but there
is a symmetry to it too. This one should state that event a
happens when
b
condition is not met. Maybe we can just negate the evidence and keep the
priors as is? It turns out yes, it's that simple: g(a) - g(!a) + g(!b|a) - g(!b|!a)
.
By symmetry again, g(b|!a) - g(!b|!a)
is just a flip of priors and evidence,
which is g(b) - g(!b) + g(!a|b) - g(!a|!b)
.
So here we have all 4 possible equations to get to posteriors:
g(a|b) - g(!a|b) = g(a) - g(!a) + g(b|a) - g(b|!a)
g(b|a) - g(!b|a) = g(b) - g(!b) + g(a|b) - g(a|!b)
g(a|!b) - g(!a|!b) = g(a) - g(!a) + g(!b|a) - g(!b|!a)
g(b|!a) - g(!b|!a) = g(b) - g(!b) + g(!a|b) - g(!a|!b)
Aren't they elegant? The priors are consistent across equations 1 and 3, and 2 and 4. The evidence is unique per every posterior update. The arithmetic is consistent, and they are very easy to remember due to symmetry - just remember one equation and the others can be derived from that.
Mapping probability to intensity
Probability expressed as numbers from 0 to 1 has some mental model associated
with it already. Intensity is however on a range from minus infinity to plus
infinity, and scales non-linearly with respect to probability. To get a sense
of intensity we can just build a table and try to memorize that for easier
computations. I will be using g(x)
here, not g(x / !x)
, so that we have
less information to memorize, since the evidence cannot be simply negated.
The table roughly looks like this, note that sign is always negative as we operate with numbers less than 1, I will just omit it from the table and convert the probability to percents:
1% 20 | ~20% 7 | ~60% 2.2 | ~90% 0.5
~3% 15 | ~25% 6 | ~70% 1.5 | ~95% 0.22
~5% 13 | ~31% 5 | ~75% 1.25 | ~97% 0.13
10% 10 | ~40% 4 | ~80% 1 | ~99% 0.05
~16% 8 | ~50% 3 | ~85% 0.7
To get the intensity you just subtract two complements or two values depending on which variable you are operating on. Zero means "indifferent", positive numbers suggest the events are more likely than the baseline, and negatives are for less likely events.
Looks like I'm memorizing this table for estimations in day to day life.
Verification
The equations were hypothesized by intuition.
To verify the equations I wrote a little program that searches for possible solutions given the probabilities using 3 examples. Across all 3 examples there was only one solution to get to the desired intensity number, which are exactly the equations above.
I did not go through formal verification using math as for me it's easier to write some code than write equations on paper double-checking for mistakes. But I tried a little bit of that before it "clicked".