Add initial benchmarking infrastructure (#1232)

* Add initial benchmarking infrastructure * Add CI file * Try to comment on commits * Implement file download benchmarks! * drop commit comments (they dont work) * Allow running local binary * Better action * More docs! * Better look? * even better look * add pretty=all, none benchmarks
2025-11-23 22:25:15 +02:00 · 2021-12-14 18:05:25 +03:00
parent e30ec6be42
commit 4f7f59b990
5 changed files with 557 additions and 1 deletions
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -0,0 +1,52 @@
 name: Benchmark
 on:
  pull_request:
    types: [ labeled ]
 permissions:
  issues: write
  pull-requests: write
 jobs:
  test:
    if: github.event.label.name == 'benchmark'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: "3.9"
      - id: benchmarks
        name: Run Benchmarks
        run: |
          python -m pip install pyperf>=2.3.0
          python extras/profiling/run.py --fresh --complex  --min-speed=6 --file output.txt
          body=$(cat output.txt)
          body="${body//'%'/'%25'}"
          body="${body//$'\n'/'%0A'}"
          body="${body//$'\r'/'%0D'}"
          echo "::set-output name=body::$body"
      - name: Find Comment
        uses: peter-evans/find-comment@v1
        id: fc
        with:
          issue-number: ${{ github.event.pull_request.number }}
          comment-author: 'github-actions[bot]'
          body-includes: '# Benchmarks'
      - name: Create or update comment
        uses: peter-evans/create-or-update-comment@v1
        with:
          comment-id: ${{ steps.fc.outputs.comment-id }}
          issue-number: ${{ github.event.pull_request.number }}
          body: |
            # Benchmarks
            ${{ steps.benchmarks.outputs.body }}
          edit-mode: replace
      - uses: actions-ecosystem/action-remove-labels@v1
        with:
          labels: benchmark
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -144,6 +144,20 @@ $ python -m pytest tests/test_uploads.py::TestMultipartFormDataFileUpload::test_
 See [Makefile](https://github.com/httpie/httpie/blob/master/Makefile) for additional development utilities.
 #### Running benchmarks
 If you are trying to work on speeding up HTTPie and want to verify your results, you
 can run the benchmark suite. The suite will compare the last commit of your branch
 with the master branch of your repository (or a fresh checkout of HTTPie master, through
 `--fresh`) and report the results back.
 ```bash
 $ python extras/benchmarks/run.py
 ```
 The benchmarks can also be run on the CI. Since it is a long process, it requires manual
 oversight. Ping one of the maintainers to get a `benchmark` label on your branch.
 #### Windows
 If you are on a Windows machine and not able to run `make`,
--- a/2
+++ b/2
@@ -130,7 +130,7 @@ pycodestyle: codestyle
 codestyle:
 	@echo $(H1)Running flake8$(H1END)
 	@[ -f $(VENV_BIN)/flake8 ] || $(VENV_PIP) install --upgrade --editable '.[dev]'
-	$(VENV_BIN)/flake8 httpie/ tests/ docs/packaging/brew/ *.py
+	$(VENV_BIN)/flake8 httpie/ tests/ extras/profiling/ docs/packaging/brew/ *.py
 	@echo
--- a/extras/profiling/benchmarks.py
+++ b/extras/profiling/benchmarks.py
@@ -0,0 +1,203 @@
 """
 This file is the declaration of benchmarks for HTTPie. It
 is also used to run them with the current environment.
 Each instance of BaseRunner class will be an individual
 benchmark. And if run without any arguments, this file
 will execute every benchmark instance and report the
 timings.
 The benchmarks are run through 'pyperf', which allows to
 do get very precise results. For micro-benchmarks like startup,
 please run `pyperf system tune` to get even more acurrate results.
 Examples:
    # Run everything as usual, the default is that we do 3 warmup runs
    # and 5 actual runs.
    $ python extras/profiling/benchmarks.py
    # For retrieving results faster, pass --fast
    $ python extras/profiling/benchmarks.py --fast
    # For verify everything works as expected, pass --debug-single-value.
    # It will only run everything once, so the resuls are not realiable. But
    # very useful when iterating on a benchmark
    $ python extras/profiling/benchmarks.py --debug-single-value
    # If you want to run with a custom HTTPie command (for example with
    # and HTTPie instance installed in another virtual environment),
    # pass HTTPIE_COMMAND variable.
    $ HTTPIE_COMMAND="/my/python /my/httpie" python extras/profiling/benchmarks.py
 """
 from __future__ import annotations
 import os
 import shlex
 import subprocess
 import sys
 import threading
 from contextlib import ExitStack, contextmanager
 from dataclasses import dataclass, field
 from functools import cached_property, partial
 from http.server import HTTPServer, SimpleHTTPRequestHandler
 from tempfile import TemporaryDirectory
 from typing import ClassVar, Final, List
 import pyperf
 # For download benchmarks, define a set of files.
 # file: (block_size, count) => total_size = block_size * count
 PREDEFINED_FILES: Final = {'3G': (3 * 1024 ** 2, 1024)}
 class QuietSimpleHTTPServer(SimpleHTTPRequestHandler):
    def log_message(self, *args, **kwargs):
        pass
@contextmanager
 def start_server():
    """Create a server to serve local files. It will create the
    PREDEFINED_FILES through dd."""
    with TemporaryDirectory() as directory:
        for file_name, (block_size, count) in PREDEFINED_FILES.items():
            subprocess.check_call(
                [
                    'dd',
                    'if=/dev/zero',
                    f'of={file_name}',
                    f'bs={block_size}',
                    f'count={count}',
                ],
                cwd=directory,
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL,
            )
        handler = partial(QuietSimpleHTTPServer, directory=directory)
        server = HTTPServer(('localhost', 0), handler)
        thread = threading.Thread(target=server.serve_forever)
        thread.start()
        yield '{}:{}'.format(*server.socket.getsockname())
        server.shutdown()
        thread.join(timeout=0.5)
@dataclass
 class Context:
    benchmarks: ClassVar[List[BaseRunner]] = []
    stack: ExitStack = field(default_factory=ExitStack)
    runner: pyperf.Runner = field(default_factory=pyperf.Runner)
    def run(self) -> pyperf.BenchmarkSuite:
        results = [benchmark.run(self) for benchmark in self.benchmarks]
        return pyperf.BenchmarkSuite(results)
    @property
    def cmd(self) -> List[str]:
        if cmd := os.getenv('HTTPIE_COMMAND'):
            return shlex.split(cmd)
        http = os.path.join(os.path.dirname(sys.executable), 'http')
        assert os.path.exists(http)
        return [sys.executable, http]
    @cached_property
    def server(self) -> str:
        return self.stack.enter_context(start_server())
    def __enter__(self):
        return self
    def __exit__(self, *exc_info):
        self.stack.close()
@dataclass
 class BaseRunner:
    """
    An individual benchmark case. By default it has the category
    (e.g like startup or download) and a name.
    """
    category: str
    title: str
    def __post_init__(self):
        Context.benchmarks.append(self)
    def run(self, context: Context) -> pyperf.Benchmark:
        raise NotImplementedError
    @property
    def name(self) -> str:
        return f'{self.title} ({self.category})'
@dataclass
 class CommandRunner(BaseRunner):
    """
    Run a single command, and benchmark it.
    """
    args: List[str]
    def run(self, context: Context) -> pyperf.Benchmark:
        return context.runner.bench_command(self.name, [*context.cmd, *self.args])
@dataclass
 class DownloadRunner(BaseRunner):
    """
    Benchmark downloading a single file from the
    remote server.
    """
    file_name: str
    def run(self, context: Context) -> pyperf.Benchmark:
        return context.runner.bench_command(
            self.name,
            [
                *context.cmd,
                '--download',
                'GET',
                f'{context.server}/{self.file_name}',
            ],
        )
 CommandRunner('startup', '`http --version`', ['--version'])
 CommandRunner('startup', '`http --offline pie.dev/get`', ['--offline', 'pie.dev/get'])
 for pretty in ['all', 'none']:
    CommandRunner(
        'startup',
        f'`http --pretty={pretty} pie.dev/stream/1000`',
        [
            '--print=HBhb',
            '--stream',
            f'--pretty={pretty}',
            'httpbin.org/stream/100'
        ]
    )
 DownloadRunner('download', '`http --download :/big_file.txt` (3GB)', '3G')
 def main() -> None:
    # PyPerf will bring it's own argument parser, so configure the script.
    # The somewhat fast and also precise enough configuration is this. We run
    # benchmarks 3 times to warmup (e.g especially for download benchmark, this
    # is important). And then 5 actual runs where we record.
    sys.argv.extend(
        ['--worker', '--loops=1', '--warmup=3', '--values=5', '--processes=2']
    )
    with Context() as context:
        context.run()
 if __name__ == '__main__':
    main()
--- a/extras/profiling/run.py
+++ b/extras/profiling/run.py
@@ -0,0 +1,287 @@
 """
 Run the HTTPie benchmark suite with multiple environments.
 This script is configured in a way that, it will create
 two (or more) isolated environments and compare the *last
 commit* of this repository with it's master.
 > If you didn't commit yet, it won't be showing results.
 You can also pass --fresh, which would test the *last
 commit* of this repository with a fresh copy of HTTPie
 itself. This way even if you don't have an up-to-date
 master branch, you can still compare it with the upstream's
 master.
 You can also pass --complex to add 2 additional environments,
 which would include additional dependencies like pyOpenSSL.
 Examples:
    # Run everything as usual, and compare last commit with master
    $ python extras/benchmarks/run.py
    # Include complex environments
    $ python extras/benchmarks/run.py --complex
    # Compare against a fresh copy
    $ python extras/benchmarks/run.py --fresh
    # Compare against a custom branch of a custom repo
    $ python extras/benchmarks/run.py --target-repo my_repo --target-branch my_branch
    # Debug changes made on this script (only run benchmarks once)
    $ python extras/benchmarks/run.py --debug
 """
 import dataclasses
 import shlex
 import subprocess
 import sys
 import tempfile
 import venv
 from argparse import ArgumentParser, FileType
 from contextlib import contextmanager
 from dataclasses import dataclass
 from pathlib import Path
 from typing import (IO, Dict, Generator, Iterable, List, Optional,
                    Tuple)
 BENCHMARK_SCRIPT = Path(__file__).parent / 'benchmarks.py'
 CURRENT_REPO = Path(__file__).parent.parent.parent
 GITHUB_URL = 'https://github.com/httpie/httpie.git'
 TARGET_BRANCH = 'master'
 # Additional dependencies for --complex
 ADDITIONAL_DEPS = ('pyOpenSSL',)
 def call(*args, **kwargs):
    kwargs.setdefault('stdout', subprocess.DEVNULL)
    return subprocess.check_call(*args, **kwargs)
 class Environment:
    """
    Each environment defines how to create an isolated instance
    where we could install HTTPie and run benchmarks without any
    environmental factors.
    """
    @contextmanager
    def on_repo(self) -> Generator[Tuple[Path, Dict[str, str]], None, None]:
        """
        Return the path to the python interpreter and the
        environment variables (e.g HTTPIE_COMMAND) to be
        used on the benchmarks.
        """
        raise NotImplementedError
@dataclass
 class HTTPieEnvironment(Environment):
    repo_url: str
    branch: Optional[str] = None
    dependencies: Iterable[str] = ()
    @contextmanager
    def on_repo(self) -> Generator[Path, None, None]:
        with tempfile.TemporaryDirectory() as directory_path:
            directory = Path(directory_path)
            # Clone the repo
            repo_path = directory / 'httpie'
            call(
                ['git', 'clone', self.repo_url, repo_path],
                stderr=subprocess.DEVNULL,
            )
            if self.branch is not None:
                call(
                    ['git', 'checkout', self.branch],
                    cwd=repo_path,
                    stderr=subprocess.DEVNULL,
                )
            # Prepare the environment
            venv_path = directory / '.venv'
            venv.create(venv_path, with_pip=True)
            # Install basic dependencies
            python = venv_path / 'bin' / 'python'
            call(
                [
                    python,
                    '-m',
                    'pip',
                    'install',
                    'wheel',
                    'pyperf==2.3.0',
                    *self.dependencies,
                ]
            )
            # Create a wheel distribution of HTTPie
            call([python, 'setup.py', 'bdist_wheel'], cwd=repo_path)
            # Install httpie
            distribution_path = next((repo_path / 'dist').iterdir())
            call(
                [python, '-m', 'pip', 'install', distribution_path],
                cwd=repo_path,
            )
            http = venv_path / 'bin' / 'http'
            yield python, {'HTTPIE_COMMAND': shlex.join([str(python), str(http)])}
@dataclass
 class LocalCommandEnvironment(Environment):
    local_command: str
    @contextmanager
    def on_repo(self) -> Generator[Path, None, None]:
        yield sys.executable, {'HTTPIE_COMMAND': self.local_command}
 def dump_results(
    results: List[str],
    file: IO[str],
    min_speed: Optional[str] = None
 ) -> None:
    for result in results:
        lines = result.strip().splitlines()
        if min_speed is not None and "hidden" in lines[-1]:
            lines[-1] = (
                'Some benchmarks were hidden from this list '
                'because their timings did not change in a '
                'significant way (change was within the error '
                'margin ±{margin}%).'
            ).format(margin=min_speed)
            result = '\n'.join(lines)
        print(result, file=file)
        print("\n---\n", file=file)
 def compare(*args, directory: Path, min_speed: Optional[str] = None):
    compare_args = ['pyperf', 'compare_to', '--table', '--table-format=md', *args]
    if min_speed:
        compare_args.extend(['--min-speed', min_speed])
    return subprocess.check_output(
        compare_args,
        cwd=directory,
        text=True,
    )
 def run(
    configs: List[Dict[str, Environment]],
    file: IO[str],
    debug: bool = False,
    min_speed: Optional[str] = None,
 ) -> None:
    result_directory = Path(tempfile.mkdtemp())
    results = []
    current = 1
    total = sum(1 for config in configs for _ in config.items())
    def iterate(env_name, status):
        print(
            f'Iteration: {env_name} ({current}/{total}) ({status})' + ' ' * 10,
            end='\r',
            flush=True,
        )
    for config in configs:
        for env_name, env in config.items():
            iterate(env_name, 'setting up')
            with env.on_repo() as (python, env_vars):
                iterate(env_name, 'running benchmarks')
                args = [python, BENCHMARK_SCRIPT, '-o', env_name]
                if debug:
                    args.append('--debug-single-value')
                call(
                    args,
                    cwd=result_directory,
                    env=env_vars,
                )
            current += 1
        results.append(compare(
            *config.keys(),
            directory=result_directory,
            min_speed=min_speed
        ))
    dump_results(results, file=file, min_speed=min_speed)
    print('Results are available at:', result_directory)
 def main() -> None:
    parser = ArgumentParser()
    parser.add_argument('--local-repo', default=CURRENT_REPO)
    parser.add_argument('--local-branch', default=None)
    parser.add_argument('--target-repo', default=CURRENT_REPO)
    parser.add_argument('--target-branch', default=TARGET_BRANCH)
    parser.add_argument(
        '--fresh',
        action='store_const',
        const=GITHUB_URL,
        dest='target_repo',
        help='Clone the target repo from upstream GitHub URL',
    )
    parser.add_argument(
        '--complex',
        action='store_true',
        help='Add a second run, with a complex python environment.',
    )
    parser.add_argument(
        '--local-bin',
        help='Run the suite with the given local binary in addition to'
        ' existing runners. (E.g --local-bin $(command -v xh))',
    )
    parser.add_argument(
        '--file',
        type=FileType('w'),
        default=sys.stdout,
        help='File to print the actual results',
    )
    parser.add_argument(
        '--min-speed',
        help='Minimum of speed in percent to consider that a '
             'benchmark is significant'
    )
    parser.add_argument(
        '--debug',
        action='store_true',
    )
    options = parser.parse_args()
    configs = []
    base_config = {
        options.target_branch: HTTPieEnvironment(options.target_repo, options.target_branch),
        'this_branch': HTTPieEnvironment(options.local_repo, options.local_branch),
    }
    configs.append(base_config)
    if options.complex:
        complex_config = {
            env_name
            + '-complex': dataclasses.replace(env, dependencies=ADDITIONAL_DEPS)
            for env_name, env in base_config.items()
        }
        configs.append(complex_config)
    if options.local_bin:
        base_config['binary'] = LocalCommandEnvironment(options.local_bin)
    run(configs, file=options.file, debug=options.debug, min_speed=options.min_speed)
 if __name__ == '__main__':
    main()