If I have an M
x N
matrix and an L1 cache of size K what cache miss rate does an optimal matrix transpose have. Obviously I am looking for something that is a function of M
and N
(and possibly K
, though that is maybe too complex) rather than a specific number.
I am asking because I have a lot of matrix data that has to be processed in both directions and I would like a rule of thumb to know when it is worth while keeping both the original data and a transpose in memory.