Revin: What really grinds my gears
Introduction
When I was growing up I used to love Roald Dahl books and all the really cool names he invented for things, like the snozzberry or the fizzwiggler. Trouble is that after a while I got completely confused trying to remember what the hell anything was. I mention this as I see a similar strange afflication that affects the machine learning research community. And so ladies and gentlement I present to you, RevIN, the latest MacGuffin in the time series forecasting world. And if you take a look at the claims in the paper you’d think that it can supercharge your forecasting models like a bottle of Nitros in a car on the drag strip. But what is this new and magical beast and does it really live up to it’s own hype?
What is Revin?
Well there’s no way of putting it off any longer, RevIN stands for Reversible Instance Normalisation. One of the first things I learnt about Neural Networks was that they really like the data they are trained with to be in a range somewhere close to -1 to 1. One of the reasons for this is that we normally initialise our weights in that range and so if we’re learning something that is orders of magnitude bigger than that then we can spend a lot of time just trying to learn the bias. Numerically speaking it just makes life easier and more stable. There’s really only one way to deal with it, and that is to scale the data on the way in an un-scale it on the way out. In computer vision where our inputs represent pixel values between 0 and 255, we just divide by 255. DeepAR as I’ve discussed before uses a technique which scales the input of a mini batch by the mean absolute value. But perhaps the most well known scaling technique is normalisation or rather standardisation, where we subtract the mean and divide by the standard deviation.
I first saw Revin as a piece of code when I was looking at implementing PatchTST and I thought it was strange because I didn’t remember seing it mentioned in the paper. So I went back and had a look, and found it under a different name: Instance Normalization. And so I thought I’d have a look at the paper and see what it was all about. Published at the prestigious ICLR conference in 2022. I gave it a quick read - starting at the results section obviously:
- makes N-BEATS better, check;
- makes Informer better (admittedly that one’s a pretty low bar) check;
- outperforms “state of the art” Normalization methods, check - I mean holy crap I didn’t even know there were any “state of the art” Normalization methods.
This thing must be amazing. Let’s take a look at the details, I’m gonna grab a coffee, because the genius of this thing is going to melt my brain.
Right screw the maths notation boys lets just head straight for the code, so here it is in all it’s glory:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class RevIN(nn.Module):
def __init__(self, num_features: int, eps=1e-5, affine=True, subtract_last=False):
"""
:param num_features: the number of features or channels
:param eps: a value added for numerical stability
:param affine: if True, RevIN has learnable affine parameters
"""
super(RevIN, self).__init__()
self.num_features = num_features
self.eps = eps
self.affine = affine
self.subtract_last = subtract_last
if self.affine:
self._init_params()
def forward(self, x, mode: str):
if mode == "norm":
self._get_statistics(x)
x = self._normalize(x)
elif mode == "denorm":
x = self._denormalize(x)
else:
raise NotImplementedError
return x
def _init_params(self):
# initialize RevIN params: (C,)
self.affine_weight = nn.Parameter(torch.ones(self.num_features))
self.affine_bias = nn.Parameter(torch.zeros(self.num_features))
def _get_statistics(self, x):
dim2reduce = tuple(range(1, x.ndim - 1))
if self.subtract_last:
self.last = x[:, -1, :].unsqueeze(1)
else:
self.mean = torch.mean(x, dim=dim2reduce, keepdim=True).detach()
self.stdev = torch.sqrt(
torch.var(x, dim=dim2reduce, keepdim=True, unbiased=False) + self.eps
).detach()
def _normalize(self, x):
if self.subtract_last:
x = x - self.last
else:
x = x - self.mean
x = x / self.stdev
if self.affine:
x = x * self.affine_weight
x = x + self.affine_bias
return x
def _denormalize(self, x):
if self.affine:
x = x - self.affine_bias
x = x / (self.affine_weight + self.eps * self.eps)
x = x * self.stdev
if self.subtract_last:
x = x + self.last
else:
x = x + self.mean
return x
OK constructor check - apart from creating a couple of learnable parameters if we set an affine flag, we’re just setting properties. OK that was easy enough. Next forward - ok so we can call it with “norm” for “normalise” I guess? in which case we’re gonna get some stats and then do the normalisation. OK so let’s see what mind bending stats we’re gonna generate. let’s see _get_statistics. OK subtract last we can ignore as that’s just optional, so we calculate the mean and stdev… hmm ok was expecting something a bit more exotic but I guess that’s fine. Now what crazy thing are we gonna do in normalize, let’s see …. oh we’re just gonna subtract the mean and divide by the standard deviation and then if affine is set we’re gonna add the learned bias and multiply by the weight. Are you serious … that’s it? This is what get’s published at ICLR? I mean I’m pretty sure I could have thought of this myself and I’m a moron.
Right, now I think I know what denormalize is gonna do. Oh yes suprise surprise it multiplies by the standard deviation and then adds the mean back in - I mean seriously I must have implemented code like this a hundred times, but never in my life would I have thought “wow this thing needs a new name and I’m gonna write a paper about it”. Oh so that’s what the Rev in RevIN stands for - Reversible. It’s not at all like inverse transform then.
I mean guys seriously, have you not heard of sklearn.preprocessing.StandardScaler? It does exactly the same thing and it’s been around for years. Ah yes but this is applied locally to the batch I hear you cry - wait what like DeepAR was doing like 5 years ago?
Conclusion
So there you have it, RevIN, a new name that nobody asked for something that’s been around years. Right I’m gonna find my own MacGuffin, whizzpopper regularisation anyone?