Use get_conn_flattened for our local energy expectation kernels
Created by: PhilipVinc
Please give me feedback
Right now local kernels use get_conn_padded
which pad the number of connected elements per sample to the maximum number of connected elements among all the samples.
This leads to some wasted compute power.
This PR improves the situation by using get_conn_flattened
to get the connected elements on discrete operators.
The downside is that the local-energy code becomes considerably more convoluted. The upside is that this gives a net x% speedup where x is the number of 0 matrix elements you have in your operators.
While for Ising and simple hamiltonians this gives no bonus (and in fact, yields a small ~1-5% slowdown in Ising 20 spins, vanishingly small on 60 spins), this gives me a 20% speedup in some dissipative things I'm running. And I guess @jwnys might be interested too?
Still WIP, but I'd like some feedback on this
Note: this code path is only taken if chunking is disabled. I still have to update the chunked code.