Change default initializer for DenseSymm
Created by: PhilipVinc
This is one piece of the discussion in #863 : initializer of DenseSymm.
This changes the initializer to have correct default variance of 1 for every channel.
The implementation is complicated because the kernel itself has a shape that is not related to the size of the input n_sites
which is what should be used as variance scaling.
(Also includes #870, so ignore the first commit)
@chrisrothUT please review and approve if you agree.