Rayleigh Gauss Newton
Created by: chrisrothUT
Here's a draft of the PR to implement the RGN method as done in https://arxiv.org/pdf/2106.10558.pdf. I've only implemented the holomorphic case and it appears to be working on real-valued neural networks. For now RGN is a separate driver. Here's how to use it:
rgn = nk.optimizer.RGN(solver=cg, diag_shift=1e-3)
vstate = nk.vqs.MCState(sampler=sa, model=ma, n_samples=1000,n_discard_per_chain=9)
sched = np.full([50],0.2)
gs = nk.driver.VMC_RGN(ha, sched, variational_state=vstate,preconditioner=rgn)
gs.run(n_iter=50,out='out')
Since the learning rate is built into the preconditioner, for now you just specify an array of learning rates at different time steps. The paper above says you can increase the LR exponentially as you go along. A few orders of business:
-
RGN really shouldn't be a separate driver but it needs access to the average energy and a learning rate parameter, epsilon. So we'd have to amend this line in VMC:
self._dp = self.preconditioner(self.state, self._loss_grad)
if we wanted to use the same driver. -
I'm not sure about the nuances for non-holomorphic complex valued functions. Perhaps @attila-i-szabo can help?
-
The documentation is all wrong from copy-pasting code. I'll fix that in the coming days