The current implementation of the Lookup class Lookup/lookup.hpp, while functional, has several limitations and it is not exactly a beautiful piece of modern code!
I am thinking of rewriting it as
//Example of Lookup containing string, ints and vectors. This can be easily extended. usingLookupElement=std::variant<std::string,int,std::vector<double>>;usingLookup=std::unordered_map<std::string,LookupElement>;//Let's add a vector to the lookup Lookuplt;lt["myvector"]=std::vector<double>(100,2);//Let's retrieve a vector by reference (to modify it further) auto&v=std::get<std::vector<double>>(lt["myvector"]);//Let's add another type lt["myint"]=2;//Let's add another vectorlt["myothervector"]=std::vector<double>(10,1);
This would allow for far greater flexibility than the current implementation, without paying a performance penalty (or at least the penalty would be to all purposes unmeasurable).
Since support for std::variant is a bit sparse, in the meanwhile one can use one of the non-standard implementations, for example the header-only mapbox::util::variant.
What the C++ experts think about this? @femtobit @twesterhout @verticube ? Other possible solutions?
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
At first glance it seems that Lookup is only used by various machines, and they only cache a single vector. @gcarleo could you perhaps point me to some code locations that illustrate the need for the flexibility you're seeking?
Hi @twesterhout good point. Yes, currently implemented machines indeed only need a single lookup vector, that's why that interface was OK. The thing is that with more complicated machines coming in (see for example the PR on feedforward networks) we have to accommodate the need for more flexible lookups as well. For example, in the case of feedforward network, each layer can have some lookup space depending on its needs, and this is not necessarily a single vector etc.
Right, thanks. This brings me to my next point then: why make caching a part of the "public" interface? Caching here is, strictly speaking, an optimisation, and each machine may want to do it in its own way (i.e. simple spin RBM vs. feedforward network).
The thing is that those Lookup types are used to speed up the sampling from the machines, using input-dependent precomputed information. The sampler changes the input, and asks the machine to update the look-up tables, so that computing other quantities is faster. The sampler needs a public interface in the sense it doesn't care about how the Lookup is updated, it is just some opaque piece of memory associated with the current configuration of spins/quantum variables being sampled
It is the equivalent of an input-dependent workspace, if you want. If the workspace has been already precomputed, you pass it to your machine to speed-up the calculation of the output. Otherwise you ask it to compute it from scratch. But the machine doesn't hold the workspace in place, since this would make things potentially bug-prone (the machine should also remember on what input it has been queried the last time, and every time check that the given input is or is not what it stores). In our case the input is handled at a higher level, by the sampler.
But the machine doesn't hold the workspace in place, since this would make things potentially bug-prone (the machine should also remember on what input it has been queried the last time, and every time check that the given input is or is not what it stores).
The sampler changes the input, and asks the machine to update the look-up tables, so that computing other quantities is faster.
If I may generalise this a bit, what you're saying is that while running Monte-Carlo sampling you'd like to keep some state on top of the RBM (please, correct me if I'm wrong).
Here is a way to accomplish what I'm describing:
The following is written for the spin RBM for brevity and is easily generalisable to other machines.
Let's say we have a machine:
classMachine{usingVectorType=...;usingMatrixType=...;VectorType_a;VectorType_b;MatrixType_w;public:// For brevityusingC=std::complex<double>;// Some constructors...// Some accessor and mutator functions...// The interesting partautoTheta(VectorTypeconst&spin,VectorType&out)constnoexcept->void{// out <- _w * s + _b}autoLogVal(VectorTypeconst&spin,Cconstsum_log_cosh_theta)constnoexcept->C{// Returns \log\psi(\mathcal{S}, \mathcal{W})}autoDerLog(VectorTypeconst&spin,VectorTypeconst&theta,VectorType&out)constnoexcept->void{// Compute the derivative of \log\psi(\mathcal{S}, \mathcal{W}) and stores it into out.}};
Notice that there's no caching in the Machine class. Machine is only concerned with efficiently computing the wave function and its derivative. However, we do need caching for efficient sampling, so:
classMonteCarloState{usingVectorType=Machine::VectorType;Machineconst*_rbm;// The machine we're sampling.VectorType_spin;// Current spin configurationVectorType_theta;// Cached value of thetaC_log_wf;// Cached value of the wave function to avoid recomputing.C_sum_log_cosh_theta// Cached value of the sum of log(cosh(theta)).public:MonteCarloState(Machineconst&rbm,VectorType&&spin):_rbm{std::addressof(rbm)},_spin{std::move(spin)},_theta{/*allocate a new vector*/}{_rbm.Theta(_spin,_theta);_sum_log_cosh_theta=sum_log_cosh(_theta);_log_wf=_rbm.LogVal(_spin,_sum_log_cosh_theta);}// Some more constructors...// Some accessor and mutator functions...private:structCacheCell{Cvalue;};public:// The interesting partautoLogValDiff(std::span<intconst>constto_change,std::span<?const>constnew_values)const->std::tuple<C,CacheCell>{// Computes \log(\psi' / \psi) where \psi' is obtained from \psi// by setting spins at indices to_change[i] to values new_values[i]returnstd::make_tuple(/*computed \log(\psi' / \psi)*/,CacheCell{/*sum(log(cosh(new_theta))) which is a by-product of the calculation*/});}autoUpdate(std::span<intconst>constto_change,std::span<?const>constnew_values,std::optional<CacheCell>constcache)->void{// Updates the current spin configuration and all related values.// If cache is not std::nullopt, the computation is a lot faster.}};
And later on, a sampler can do something like this:
// This is adapted from metropolis_exchange.hppautoSweep()override->void{// _state is our MonteCarloStatestd::uniform_real_distribution<double>distu;std::uniform_int_distribution<int>distcl(0,clusters_.size()-1);for(autok=0;k<_state.SizeVisible();++k){autoconstrcl=distcl(rgen_);autoconstsi=clusters_[rcl][0];autoconstsj=clusters_[rcl][1];if(_state.spin(si)!=_state.spin(sj)){autoconstto_change=clusters_[rcl];std::array<?,2>constnew_values={_state.Spin(sj),_state.Spin(si)};// This'd look way nicer in C++17 autoconst_result=_state.LogValDiff(to_change,new_values);autoconstlog_val_diff=std::get<0>(_result);autoconst&cache=std::get<1>(_result);if(std::norm(std::exp(log_val_diff))>distu(rgen_)){accept_[0]+=1;_state.Update(to_change,new_values,cache/*!!!*/);}}moves_[0]+=1;}}
Different samplers use the public interface of a Monte-Carlo state, so no problem here.
I hope you get the idea. Please, feel free to ask if I haven't explained something clear enough.
Thanks for the detailed proposal, I think I learned about 5 features of c++ I didn't know of here :) Anyway, I think I understand your proposal, but there is something missing in this design/ I maybe don't understand:
MonteCarloState heavily depends on the implementation of the specific machine we are using (for example, here you store the theta object as a lookup, but other machines might need to store some other type to do their job efficiently). How would I use the MonteCarloState in the Sampler then? Since in the current implementation Machine is a type erased interface, one would need to somehow type erase also MonteCarloState, further increasing the complexity of the code. I see this as a serious drawback, but maybe there are other solutions.
I like though the idea of storing in a separate class not only the spin state (or whatever other quantum number we have) and the look-up, to guarantee they are coherently associated. Maybe we can still use this design principle in some way... let me think
MonteCarloState heavily depends on the implementation of the specific machine we are using (for example, here you store the theta object as a lookup, but other machines might need to store some other type to do their job efficiently).
A big yes here. This is exactly what I meant by
Caching here is, strictly speaking, an optimisation, and each machine may want to do it in its own way
in an earlier comment.
How would I use the MonteCarloState in the Sampler then? Since in the current implementation Machine is a type erased interface, one would need to somehow type erase also MonteCarloState, further increasing the complexity of the code. I see this as a serious drawback, but maybe there are other solutions.
How about something like this:
classAbstractMachine{public:// Old functions...// virtualautoMkState()const->std::unique_ptr<AbstractMonteCarloState>;};
This allows the samplers to use AbstractMonteCarloState with no burden of keeping track which machines correspond to which states -- they do it automatically! And the states themselves have complete knowledge of the type of the machines which doesn't stop optimisations on the hot path.
I like the design proposed by @twesterhout, because it provides (a) a clear separation of concerns between the Machine classes (which are only concerned with implementing the direct computation of LogVal etc.) and the MonteCarloState classes (which deal with caching additional state information in order to speed-up computations after local changes) and (b) makes it introduce bugs by accidentally passing the wrong lookup object since the MonteCarloState class wraps the current configuration and variables.
Having a factory function in the Machine class that provides the correct MonteCarloState for that machine, as suggested in the last comment, is also a good solution, in my opinion.
OK, I think the last design is in the good direction. @twesterhout , can you provide an example of how you would implement AbstractMonteCarloState ? I still don't quite get how to handle different internal lookup types
@gcarleo I can certainly try. Might take a few days though as I have some important deadlines coming up. In the meanwhile, perhaps having a look at the McmcBase class here might help make things more concrete (there's no type erasure though).
classMonteCarloState{public:usingindex_type=int;usingsize_type=index_type;usingC=std::complex<double>;constexprMonteCarloState()noexcept=default;// Disallow copying and moving.MonteCarloState(MonteCarloStateconst&)=delete;MonteCarloState(MonteCarloState&&)noexcept=delete;MonteCarloState&operator=(MonteCarloStateconst&)noexcept=delete;MonteCarloState&operator=(MonteCarloState&&)noexcept=delete;virtual~MonteCarloState()noexcept=default;autoNvisible()constnoexcept->size_type;autoNpar()constnoexcept->size_type;autoLogValDiff(std::span<index_typeconst>indices,std::span<Cconst>new_spins)const->std::tuple<C,std::any>;autoLogVal()const->C;autoDerLog(std::span<C>out)const->void;autoUpdate(std::span<index_typeconst>indices,std::span<Cconst>new_spins,std::anyconst&cache)->void;autoSpin()constnoexcept->std::span<Cconst>;autoSpin(std::span<Cconst>spin)->void;private:virtualautoDoNvisible()constnoexcept->size_type=0;virtualautoDoNpar()constnoexcept->size_type=0;virtualautoDoLogValDiff(std::span<index_typeconst>indices,std::span<Cconst>new_spins)const->std::tuple<C,std::any>=0;virtualautoDoLogVal()constnoexcept->C=0;virtualautoDoDerLog(std::span<C>out)constnoexcept->void=0;virtualautoDoUpdate(std::span<index_typeconst>indices,std::span<Cconst>new_spins,std::anyconst&cache)->void=0;virtualautoDoSpin(std::span<Cconst>spin)noexcept->void=0;virtualautoDoSpin()constnoexcept->std::span<Cconst>=0;};
std::any is a C++17 feature and std::span will only be available in C++20. They are used here because their semantics are well-defined and help illustrate the point. The code probably contains a bug or two...
OK, so you basically are suggesting to use std::any to pass the LookUp (cache) types, if I understand correctly. But I would say that the cache type should be stored inside this MonteCarloState class as well, as a private std::any object.
But I would say that the cache type should be stored inside this MonteCarloState class as well, as a private std::any object.
I'm afraid I must disagree with you here. Let's consider a concrete SpinState (the simplest) class which helps with sampling of spin RBMs. Here's a simplified version of the class I'm using in my implementation:
structSpinState:publicMonteCarloState{public:usingMonteCarloState::C;usingMonteCarloState::index_type;usingMonteCarloState::size_type;usingRbm=...;usingbuffer_type=...;public:SpinState(Rbmconst&rbm,buffer_type&&initial_spin);SpinState(Rbmconst&rbm,gsl::span<Cconst>spin);SpinState(SpinStateconst&)=delete;SpinState(SpinState&&)=default;SpinState&operator=(SpinStateconst&)=delete;SpinState&operator=(SpinState&&)=default;autoSpin()const&noexcept->gsl::span<Cconst>;autoTheta()const&noexcept->gsl::span<Cconst>;virtual~SpinState()noexceptoverride;private:// Virtual functions from MonteCarloStatevirtualautoDo...constexprautoNvisible()constnoexcept->index_type;constexprautoNhidden()constnoexcept->index_type;constexprautoNweights()constnoexcept->index_type;constexprautoNpar()constnoexcept->index_type;autoSpin()&noexcept->gsl::span<C>;autoTheta()&noexcept->gsl::span<C>;autoNewTheta(index_typei,gsl::span<index_typeconst>flips)constnoexcept->C;autoSumLogCoshNewTheta(gsl::span<index_typeconst>flips)constnoexcept->C;autoUpdateTheta(gsl::span<index_typeconst>flips)noexcept->void;autoUpdateSpin(gsl::span<index_typeconst>flips)noexcept->void;auto_DoLogValDiff(gsl::span<index_typeconst>flips)constnoexcept->std::tuple<C,Cache>;auto_DoUpdate(gsl::span<index_typeconst>constflips)noexcept->void;auto_DoUpdate(gsl::span<index_typeconst>constflips,Cacheconstcache)noexcept->void;private:gsl::not_null<Rbmconst*>_rbm;///< A reference to the rbm we're samplingbuffer_type_spin;///< Current spin configuration \f$\sigma\f$.buffer_type_theta;///< Cached \f$\theta\f$ (i.e. \f$b + w \sigma\f$).C_sum_log_cosh_theta;///< Cached \f$\sum_i\log\cosh(\theta_i)\f$.C_log_psi;///< Cached \f$\log\Psi_\mathcal{W}(\sigma)\f$.structCache{Csum_log_cosh;};};
All the Do* functions are implemented in terms of the _Do* and their job is just dispatching (e.g. on the availability of cache). And the _Do* functions perform actual work. The nice thing is that we can implement all the _Do* ones in a type-safe manner (i.e. no run-time casts). Also, they have direct access to the cache, which in this case is a set of _theta, _sum_log_cosh_theta, and _log_psi, while we're only storing _sum_log_cosh_theta in the std::any.