Benchmarks?

added question label

Created by: gcarleo

Hi @twesterhout, sure it would be valuable to have (automatic?) benchmarks in NetKet, also to make sure that some of the new features are not slowing down the existing ones.

Considering the scaling on multiple cores, I did some benchmarks some time ago and indeed there is a very good scaling when using gradient-descent optimisers. The situation is different with the 'SR' solver, which has a worse scaling, simply because I wasn't able to perfectly parallelise some of its components. It would be extremely valuable to see whether we can improve on this/ on other parts of the code after a serious profiling is performed.

For example, we could also see how slow/fast we are with respect to TensorFlow/Keras etc. on similar tasks (provided we manage to have a reasonable version of TensorFlow networks working with complex weights...)

Ideas on how to do start doing these benchmarks more systematically?

Created by: twesterhout

Ideas on how to do start doing these benchmarks more systematically?

Well, the "standard" way of running benchmarks systematically is Travis. The way I see it is

Add a Bench/ directory to the root of the repo.
Add benchmark as a submodule to External/. (It's a very good library and also the one I'm most familiar with, but I'm open to learning something new if you insist on using a different one)
Come up with a few use cases and implement them.
Write some Python/Bash scripts to run the benchmarks + create plots.
Add a line running the scripts automatically to .travis.yml if on Linux (it's simpler than OS X and there's little value in trying to make benchmarks portable unless you have a lot of free time :) ).

Then comes the tricky part -- publishing. Nowadays, it's quite common to use Github pages for the docs. One then asks Travis to update the docs and push them to the gh-pages branch from which the website is generated. NetKet uses a different approach... The simplest solution would probably be to give Travis access to the website repo and push the automatically generated plots to netket-website/img.

Does this sound like a plan?

It would be extremely valuable to see whether we can improve on this/ on other parts of the code after a serious profiling is performed.

I have alternative implementations of parts of NetKet which seem to outperform NetKet, so I have some ideas on how the performance can be improved. But I'd like to have proof of that before making the claims.

For example, we could also see how slow/fast we are with respect to TensorFlow/Keras etc. on similar tasks (provided we manage to have a reasonable version of TensorFlow networks working with complex weights...)

Yes, but I think we should start with simpler examples not involving additional dependencies.

Unless there are other volunteers or objections, I can probably create a PR to track the progress.

Created by: gcarleo

Alright @twesterhout , that sounds like a plan! Indeed google benchmark seems the best option out there. I still don't quite get why the scripts running the benchmarks shouldn't be portable on OS X, but anyways, let's see...!

NetKet is to all purposes already hosted on Github Pages. If you think it is handier to have the website as a separate branch of NetKet rather than a separate repo, then we can think of doing this change, I don't have strong opinions about this.

Let us know who things go, it would be anyway very nice to find ways to improve the speed of the core routines.

assigned to @filippo.vicentini

unassigned @filippo.vicentini

added enhancement label

mentioned in issue #248 (closed)

added help wanted label

Created by: gcarleo

Even if we have moved past the C++ phase, this issue is still quite current.

We have some primitive benchmarks in netket/Benchmarks but it would be nice to study a bit what options do we have to make things a bit more systematic, especially to see if performance deteriorates/improves across versions.

This looks like an issue that might be accessible for a first time contribution.

added good first issue label and removed question label

Created by: bharathr98

Is someone working on this issue? I would love to give it a shot

Created by: PhilipVinc

Hi @rbktech, Nobody is working on this at the moment, and It would be great if you decided to tackle it!

We are in the final phases of finalising the beta of Netket 3. I think we should be able to officially release the beta, together with builds on PyPi this upcoming week (I think mostly everything is done, but @gcarleo might have noticed something I am forgetting). Benchmarks should target this version.

The version of the upcoming netket was developed in PR #539, and you can use it by downloading the nk3 branch. As the API changed quite a bit, it might be helpful to check the updated docs/Tutorials, available here before the official release.

--

Now, if you want to work on this, I think we need to briefly discuss what we want to do: essentially we need 2 kinds of benchmarks:

Small benchmarks for important functions in netket, such as SR solving, computing the gradient... etc, in order to check the future PRs don't decrease the speed. This is a great use case for pytest-benchmark.
More complex benchmarks and scaling analysis, to publish on our website, to show off how cool netket is and maybe compare to other approaches or to also benchmark different neural quantum states between them. ideally we should be able to only measure the runtime of the optimisation, excluding the setup and the jax precompilation (so we cannot use bash built-in timer) though I'm not sure that is so important, especially if we take large-ish models ( @gcarleo ?).

i assume you want to tackle the second item?

Created by: gcarleo

Hi, yes thanks this is something important to do especially in light of the new release. I would prioritize work on the small benchmarks maybe for the moment? also, I am not sure we really have many other codes to compare to for the second class of benchmarks, but we can still think of doing some general scaling analysis, especially useful to see how well we scale on MPI etc..?

Created by: bharathr98

Thanks for the reply @PhilipVinc and @gcarleo! I might be more comfortable with formulating smaller benchmarks as a starting point.

Created by: PhilipVinc

Good! I'd write them using pytest-benchmark, as this is already integrated in our testing environment.

High-level things to test that come to my mind:

SR solving. See for example this comment. Benchmarks like this with the different settings (centered=True/False).
Computing expectation values. So MCState.expect(operator) for the same model, but different types of operators (real and complex LocalOperator, Ising, Square of operators)
Gradients of expectation values. MCState.expect_and_grad(operator) as above

Also handy, but less urgent...

Operator.get_conn_flattened

mentioned in issue #558 (closed)

Created by: bharathr98

@PhilipVinc there is an issue with running benchmarks using pytest-benchmark since xdist is also being used. The two do not seem to play well and benchmarks are disabled by default if xdist is active. Although the package allows a "--benchmark-enable" option to override it, trying to do that just gives the following error and does not run benchmarks:

PytestBenchmarkWarning: Benchmarks are automatically disabled because xdist plugin is active. Benchmarks cannot be performed reliably in a parallelized environment.

Something like iPython (like in #557) might be a better option but I need to check its reliability while using it in a parallelised environment.

Created by: PhilipVinc

Normally if you run pytest with -n0 xdist is completely disabled. Is this not the case? Make sure it's among the first arguments, so something like pytest -n0 whatever.

Benchmarks?

Designs

Child items ...

Activity