Changing the Lookup class

changed title from Chaining the Lookup class to Changing the Lookup class

Created by: twesterhout

At first glance it seems that Lookup is only used by various machines, and they only cache a single vector. @gcarleo could you perhaps point me to some code locations that illustrate the need for the flexibility you're seeking?

Created by: gcarleo

Hi @twesterhout good point. Yes, currently implemented machines indeed only need a single lookup vector, that's why that interface was OK. The thing is that with more complicated machines coming in (see for example the PR on feedforward networks) we have to accommodate the need for more flexible lookups as well. For example, in the case of feedforward network, each layer can have some lookup space depending on its needs, and this is not necessarily a single vector etc.

Created by: twesterhout

Right, thanks. This brings me to my next point then: why make caching a part of the "public" interface? Caching here is, strictly speaking, an optimisation, and each machine may want to do it in its own way (i.e. simple spin RBM vs. feedforward network).

Created by: gcarleo

The thing is that those Lookup types are used to speed up the sampling from the machines, using input-dependent precomputed information. The sampler changes the input, and asks the machine to update the look-up tables, so that computing other quantities is faster. The sampler needs a public interface in the sense it doesn't care about how the Lookup is updated, it is just some opaque piece of memory associated with the current configuration of spins/quantum variables being sampled

Created by: gcarleo

It is the equivalent of an input-dependent workspace, if you want. If the workspace has been already precomputed, you pass it to your machine to speed-up the calculation of the output. Otherwise you ask it to compute it from scratch. But the machine doesn't hold the workspace in place, since this would make things potentially bug-prone (the machine should also remember on what input it has been queried the last time, and every time check that the given input is or is not what it stores). In our case the input is handled at a higher level, by the sampler.

Created by: twesterhout

But the machine doesn't hold the workspace in place, since this would make things potentially bug-prone (the machine should also remember on what input it has been queried the last time, and every time check that the given input is or is not what it stores).

Well, what's happening now is not much better. Consider, for example the LogVal member function here: https://github.com/netket/netket/blob/master/NetKet/Machine/rbm_spin.hpp#L228 It doesn't check that lt.V(0) actually corresponds to v, so I'm afraid I don't buy the bug-prone argument.

The sampler changes the input, and asks the machine to update the look-up tables, so that computing other quantities is faster.

If I may generalise this a bit, what you're saying is that while running Monte-Carlo sampling you'd like to keep some state on top of the RBM (please, correct me if I'm wrong).

Here is a way to accomplish what I'm describing:

The following is written for the spin RBM for brevity and is easily generalisable to other machines.

Let's say we have a machine:

class Machine {
    using VectorType = ...;
    using MatrixType = ...;

    VectorType _a;
    VectorType _b;
    MatrixType _w;

  public:
    // For brevity
    using C = std::complex<double>;

    // Some constructors
    ...
    // Some accessor and mutator functions
    ...

    // The interesting part
    auto Theta(VectorType const& spin, VectorType& out) const noexcept -> void
    {
        // out <- _w * s + _b
    }

    auto LogVal(VectorType const& spin, C const sum_log_cosh_theta) const noexcept -> C
    {
        // Returns \log\psi(\mathcal{S}, \mathcal{W})
    }

    auto DerLog(VectorType const& spin, VectorType const& theta,
        VectorType& out) const noexcept -> void
    {
        // Compute the derivative of \log\psi(\mathcal{S}, \mathcal{W}) and stores it into out.
    }
};

Notice that there's no caching in the Machine class. Machine is only concerned with efficiently computing the wave function and its derivative. However, we do need caching for efficient sampling, so:

class MonteCarloState {
    using VectorType = Machine::VectorType;
    Machine const* _rbm;  // The machine we're sampling.
    VectorType     _spin; // Current spin configuration
    VectorType     _theta; // Cached value of theta
    C              _log_wf;            // Cached value of the wave function to avoid recomputing.
    C              _sum_log_cosh_theta // Cached value of the sum of log(cosh(theta)).

  public:
    MonteCarloState(Machine const& rbm, VectorType&& spin)
        : _rbm{std::addressof(rbm)}
        , _spin{std::move(spin)}
        , _theta{/*allocate a new vector*/}
    {
        _rbm.Theta(_spin, _theta);
        _sum_log_cosh_theta = sum_log_cosh(_theta);
        _log_wf = _rbm.LogVal(_spin, _sum_log_cosh_theta);
    }

    // Some more constructors
    ...
    // Some accessor and mutator functions
    ...

  private:
    struct CacheCell {
        C value;
    };
  public:

    // The interesting part
    auto LogValDiff(std::span<int const> const to_change,
        std::span<? const> const new_values) const
        -> std::tuple<C, CacheCell>
    {
        // Computes \log(\psi' / \psi) where \psi' is obtained from \psi
        // by setting spins at indices to_change[i] to values new_values[i]
        return std::make_tuple( /*computed \log(\psi' / \psi)*/
                              , CacheCell{/*sum(log(cosh(new_theta)))
                                            which is a by-product of the calculation*/});
    }

    auto Update(std::span<int const> const to_change,
                std::span<? const> const new_values,
                std::optional<CacheCell> const cache) -> void
    {
        // Updates the current spin configuration and all related values.
        // If cache is not std::nullopt, the computation is a lot faster.
    }
};

And later on, a sampler can do something like this:

// This is adapted from metropolis_exchange.hpp
auto Sweep() override -> void
{
    // _state is our MonteCarloState

    std::uniform_real_distribution<double> distu;
    std::uniform_int_distribution<int> distcl(0, clusters_.size() - 1);

    for (auto k = 0; k < _state.SizeVisible(); ++k) {
        auto const rcl = distcl(rgen_);
        auto const si = clusters_[rcl][0];
        auto const sj = clusters_[rcl][1];

        if (_state.spin(si) != _state.spin(sj)) {
            auto const to_change = clusters_[rcl];
            std::array<?, 2> const new_values = {_state.Spin(sj), _state.Spin(si)};

            // This'd look way nicer in C++17            
            auto const _result = _state.LogValDiff(to_change, new_values);
            auto const log_val_diff = std::get<0>(_result);
            auto const& cache = std::get<1>(_result);

            if (std::norm(std::exp(log_val_diff)) > distu(rgen_)) {
              accept_[0] += 1;
              _state.Update(to_change, new_values, cache /*!!!*/);
            }
        }
        moves_[0] += 1;
    }
}

Different samplers use the public interface of a Monte-Carlo state, so no problem here.

I hope you get the idea. Please, feel free to ask if I haven't explained something clear enough.

Created by: gcarleo

Thanks for the detailed proposal, I think I learned about 5 features of c++ I didn't know of here :) Anyway, I think I understand your proposal, but there is something missing in this design/ I maybe don't understand:

MonteCarloState heavily depends on the implementation of the specific machine we are using (for example, here you store the theta object as a lookup, but other machines might need to store some other type to do their job efficiently). How would I use the MonteCarloState in the Sampler then? Since in the current implementation Machine is a type erased interface, one would need to somehow type erase also MonteCarloState, further increasing the complexity of the code. I see this as a serious drawback, but maybe there are other solutions.

I like though the idea of storing in a separate class not only the spin state (or whatever other quantum number we have) and the look-up, to guarantee they are coherently associated. Maybe we can still use this design principle in some way... let me think

Created by: twesterhout

MonteCarloState heavily depends on the implementation of the specific machine we are using (for example, here you store the theta object as a lookup, but other machines might need to store some other type to do their job efficiently).

A big yes here. This is exactly what I meant by

Caching here is, strictly speaking, an optimisation, and each machine may want to do it in its own way

in an earlier comment.

How would I use the MonteCarloState in the Sampler then? Since in the current implementation Machine is a type erased interface, one would need to somehow type erase also MonteCarloState, further increasing the complexity of the code. I see this as a serious drawback, but maybe there are other solutions.

How about something like this:

class AbstractMachine {
  public:
    // Old functions
    ...
    // 
    virtual auto MkState() const -> std::unique_ptr<AbstractMonteCarloState>;
};

This allows the samplers to use AbstractMonteCarloState with no burden of keeping track which machines correspond to which states -- they do it automatically! And the states themselves have complete knowledge of the type of the machines which doesn't stop optimisations on the hot path.

Created by: femtobit

I like the design proposed by @twesterhout, because it provides (a) a clear separation of concerns between the Machine classes (which are only concerned with implementing the direct computation of LogVal etc.) and the MonteCarloState classes (which deal with caching additional state information in order to speed-up computations after local changes) and (b) makes it introduce bugs by accidentally passing the wrong lookup object since the MonteCarloState class wraps the current configuration and variables.

Having a factory function in the Machine class that provides the correct MonteCarloState for that machine, as suggested in the last comment, is also a good solution, in my opinion.

Created by: gcarleo

OK, I think the last design is in the good direction. @twesterhout , can you provide an example of how you would implement AbstractMonteCarloState ? I still don't quite get how to handle different internal lookup types

mentioned in merge request !46 (merged)

Created by: twesterhout

@gcarleo I can certainly try. Might take a few days though as I have some important deadlines coming up. In the meanwhile, perhaps having a look at the McmcBase class here might help make things more concrete (there's no type erasure though).

Created by: twesterhout

@gcarleo here's how I see it

class MonteCarloState {
  public:
    using index_type = int;
    using size_type  = index_type;
    using C          = std::complex<double>;

    constexpr MonteCarloState() noexcept = default;
    // Disallow copying and moving.
    MonteCarloState(MonteCarloState const&)     = delete;
    MonteCarloState(MonteCarloState&&) noexcept = delete;
    MonteCarloState& operator=(MonteCarloState const&) noexcept = delete;
    MonteCarloState& operator=(MonteCarloState&&) noexcept = delete;
    virtual ~MonteCarloState() noexcept = default;

    auto Nvisible() const noexcept -> size_type;
    auto Npar() const noexcept -> size_type;

    auto LogValDiff(std::span<index_type const> indices,
        std::span<C const> new_spins) const -> std::tuple<C, std::any>;
    auto LogVal() const -> C;
    auto DerLog(std::span<C> out) const -> void;

    auto Update(std::span<index_type const> indices,
        std::span<C const> new_spins, std::any const& cache) -> void;

    auto Spin() const noexcept -> std::span<C const>;
    auto Spin(std::span<C const> spin) -> void;

  private:
    virtual auto DoNvisible() const noexcept -> size_type = 0;
    virtual auto DoNpar() const noexcept -> size_type = 0;
    virtual auto DoLogValDiff(std::span<index_type const> indices,
        std::span<C const> new_spins) const -> std::tuple<C, std::any> = 0;
    virtual auto DoLogVal() const noexcept -> C = 0;
    virtual auto DoDerLog(std::span<C> out) const noexcept -> void = 0;
    virtual auto DoUpdate(std::span<index_type const> indices,
        std::span<C const> new_spins, std::any const& cache) -> void = 0;
    virtual auto DoSpin(std::span<C const> spin) noexcept -> void = 0;
    virtual auto DoSpin() const noexcept -> std::span<C const> = 0;
};

std::any is a C++17 feature and std::span will only be available in C++20. They are used here because their semantics are well-defined and help illustrate the point. The code probably contains a bug or two...

Looking forward to your feedback.

Created by: gcarleo

OK, so you basically are suggesting to use std::any to pass the LookUp (cache) types, if I understand correctly. But I would say that the cache type should be stored inside this MonteCarloState class as well, as a private std::any object.

Created by: twesterhout

But I would say that the cache type should be stored inside this MonteCarloState class as well, as a private std::any object.

I'm afraid I must disagree with you here. Let's consider a concrete SpinState (the simplest) class which helps with sampling of spin RBMs. Here's a simplified version of the class I'm using in my implementation:

struct SpinState : public MonteCarloState {
  public:
    using MonteCarloState::C;
    using MonteCarloState::index_type;
    using MonteCarloState::size_type;
    using Rbm         = ...;
    using buffer_type = ...;

  public:
    SpinState(Rbm const& rbm, buffer_type&& initial_spin);
    SpinState(Rbm const& rbm, gsl::span<C const> spin);

    SpinState(SpinState const&) = delete;
    SpinState(SpinState&&)      = default;
    SpinState& operator=(SpinState const&) = delete;
    SpinState& operator=(SpinState&&) = default;

    auto Spin() const & noexcept -> gsl::span<C const>;
    auto Theta() const & noexcept -> gsl::span<C const>;

    virtual ~SpinState() noexcept override;

  private:
    // Virtual functions from MonteCarloState
    virtual auto Do...

    constexpr auto Nvisible() const noexcept -> index_type;
    constexpr auto Nhidden() const noexcept -> index_type;
    constexpr auto Nweights() const noexcept -> index_type;
    constexpr auto Npar() const noexcept -> index_type;

    auto Spin() & noexcept -> gsl::span<C>;
    auto Theta() & noexcept -> gsl::span<C>;

    auto NewTheta(index_type i, gsl::span<index_type const> flips) const noexcept -> C;
    auto SumLogCoshNewTheta(gsl::span<index_type const> flips) const noexcept -> C;
    auto UpdateTheta(gsl::span<index_type const> flips) noexcept -> void;
    auto UpdateSpin(gsl::span<index_type const> flips) noexcept -> void;

    auto _DoLogValDiff(gsl::span<index_type const> flips) const noexcept -> std::tuple<C, Cache>;
    auto _DoUpdate(gsl::span<index_type const> const flips) noexcept -> void;
    auto _DoUpdate(gsl::span<index_type const> const flips, Cache const cache) noexcept -> void;

  private:
    gsl::not_null<Rbm const*> _rbm; ///< A reference to the rbm we're sampling
    buffer_type _spin;     ///< Current spin configuration \f$\sigma\f$.
    buffer_type _theta;    ///< Cached \f$\theta\f$ (i.e. \f$b + w \sigma\f$).
    C _sum_log_cosh_theta; ///< Cached \f$\sum_i\log\cosh(\theta_i)\f$.
    C _log_psi;            ///< Cached \f$\log\Psi_\mathcal{W}(\sigma)\f$.

    struct Cache {
        C sum_log_cosh;
    };
};

All the Do* functions are implemented in terms of the _Do* and their job is just dispatching (e.g. on the availability of cache). And the _Do* functions perform actual work. The nice thing is that we can implement all the _Do* ones in a type-safe manner (i.e. no run-time casts). Also, they have direct access to the cache, which in this case is a set of _theta, _sum_log_cosh_theta, and _log_psi, while we're only storing _sum_log_cosh_theta in the std::any.

mentioned in merge request !219 (merged)

mentioned in issue #248 (closed)

closed

Changing the Lookup class

Designs

Child items ...

Activity