On character tables, projections, and equivariance
Created by: attila-i-szabo
This is a physics bug report of sorts, related to the equivariant network implemented in #620 and implementing #658 (closed).
One finds in standard texts on representation theory (e.g. here) that the operator (1) Pα = Σg χα(g)* ĝ projects onto the part of the Hilbert space that transforms according to an irrep α, described by the characters χα (the sum runs over the elements of the symmetry group).
Compared to this, @chrisrothUT's recent paper (which I assume #620 is based on) gives the following projection rule: (2) ψα(σ) = Σg χα(g)* ψ(gσ). These look superficially similar. However, s in the first one acts not on the basis states but the wave vector ψ. Since (3) ψ(sσ) = ⟨sσ|ψ⟩ = ⟨σ|s–1|ψ⟩, the two projectors differ by using the character of s vs. s–1. These are complex conjugates of one another, so equal in many space groups, but it is not necessary.
In general, we will have to be careful (to document) what we mean by a wave function with a given wave vector (for this complex conjugation would be undone by swapping k and –k), etc. However, we probably should stick to the proper meaning of the projection operator, (1). This changes (2) to
(2') ψα(σ) = Σg χα(g) ψ(gσ).
so the desirable meaning of the equivariant output becomes (cf. Eq. (5) in the paper)
(4) ψg(σ) = ψ(gσ).
- I think the equivariant layer defined in Eq. (10,13) in the paper is watertight: if its input satisfies either (4) or the version in the paper, the output will, too. (Eq. (11) in the paper doesn't assume anything about the origin of the input features.)
The input layer, defined in Eq. (12), and I suppose implemented inDenseSymm
needs changing to (5) fg = Σx Wgx σx: this works since Σx Wgx σx = Σx Wx σg⁻¹x = Σx Wx (gσ)x = F(gσ); the second equality follows from thinking about what it means to translate/... a spin configuration rather than the point where it's evaluated (similar distinction to that in (3)).Likewise, the final layer proposed in Eq. (14), but never implemented beyond the trivial irrep, would turn into (2'): this only requires dropping complex conjugation from the characters.
Funnily enough, the two proposed changes cancel each other out: the assignment of ψg(σ) is arbitrary, so if it is set to ψ(g–1σ) in the first layer (as it is done now), that propagates through without problem, and meets a projection rule that also swaps g and g–1. However, it would be good to get the semantics right.
EDIT: The covariant implementation (which is available now) is semantically sensible, too:
- setting ψg(σ) = ψ(g–1σ) means that |ψg⟩ = g|ψ⟩ (cf. (3));
- the final layer, as written in Eq. (14) of the paper, is directly consistent with (1).
To be honest, this might be even better, we just have to get everything straight, and make sure that DenseSymm
does what we think it does.
EDIT 2: In fact, only this implementation makes sense. If we wanted ψg(σ) = ψ(gσ), we would need a group conv. layer that is equivariant w.r.t. multiplying from the right, which is not what we have. (It could be adapted, but there is just no good reason for it...) See https://github.com/netket/netket/issues/698#issuecomment-835382734 for details.
Action items
- Check the discussion above in case I got something wrong (there are many delicate bits...)
- Checking
DenseSymm
for usage of g vs. g–1 – I don't really understand the code, so if someone (@chrisrothUT?) could make sure it implements fg = Σx Wg⁻¹x σx as it is, that'd be great - Deciding which of the two possible conventions we'd like to use (See https://github.com/netket/netket/issues/698#issuecomment-835382734, we don't have much choice.)
DenseSymm
should be changed to implement (5). Unfortunately, I don't really understand how the code works, so I need help with this, @chrisrothUT?When implementing #658 (closed), we should remember to use (2') rather than (2) in the final layer.- Note to self: I'll think about what the best convention for the wave vector → translation irrep mapping would be in my representation theory notes, which would set the convention for the character-table implementation to be built in #658 (closed). (Done in #700)