-
Notifications
You must be signed in to change notification settings - Fork 68
Formalise Benchmarks #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This issue has been marked as stale because it has been open for 60 days with no activity. If the issue is still relevant then please leave a comment, or else it will be closed in 7 days. |
This issue has been closed because it has been stale for 7 days. You can re-open it if it is still relevant. |
IMO this is still relevant, it should be re-opened and added to a milestone so that it is not automatically re-closed as stale. |
Indeed, I like this PR, just haven't had a chance to properly review it. |
I had a similar task in another project and some of the ideas converged to slightly different approaches. I will be happy to update this PR soon, probably during the next weekend. |
Sounds good! |
Since PythonCall is not v1 yet, we have to decide on how we want to compare the different branches under interface changes. Are we going to keep separate suites for dev and stable or not? |
Rationale
Create a formal benchmark pipeline to compare
Originally posted by @cjdoris in #300 (comment)
Requirements
Comments
Julia Side
Most benchmarking tools in Julia run atop BenchmarkTools.jl1 and using their interface to define test suites and store results is the way to go. Both PkgBenchmark.jl2 and AirspeedVelocity.jl3 provide functionality to compare multiple versions of a single package. Yet, they don't support comparison across multiple packages out-of-the-box. There will be some homework for us in building the right tools for this slightly generalized toolset.
Important to say that PkgBenchmark.jl has useful methods in its public API that we could leverage to build what we need. This includes methods for comparison between suites and for exporting those results to Markdown. AirspeedVelocity.jl is only made available through the CLI.
Python Side
In order to enjoy the same level of detail providede by BenchmarkTools.jl, we should adopt pyperf4.
There are many ways to use it, but a few experiments showed that the CLI + JSON interface is probably the desired option.
For each test case, stored in the
PY_CODE
variable, we would then create a temporary pathJSON_PATH
and runAfter that, we should be able parse the output JSON and convert it into a
PkgBenchmark.BenchmarkResults
object. This makes it easier for integrating those results in the overall machinery, reducing the problem to setting thePython
result as the reference value.Tasks
BenchmarkResults
Resources
References
Footnotes
BenchmarkTools.jl ↩
PkgBenchmark.jl ↩
AirspeedVelocity.jl ↩
pyperf ↩
The text was updated successfully, but these errors were encountered: