Skip to content

Add TOC to eval markdown #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions .github/workflows/evaluate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -170,20 +170,26 @@ jobs:
- name: Summarize results
if: ${{ success() }}
run: |
echo "📊 Evaluation Results" >> $GITHUB_STEP_SUMMARY
python -m evaltools summary evals/results --output=markdown >> eval-results.md
cat eval-results.md >> $GITHUB_STEP_SUMMARY
echo "## Evaluation results" >> eval-summary.md
python -m evaltools summary evals/results --output=markdown >> eval-summary.md
echo "## Answer differences across runs" >> run-diff.md
python -m evaltools diff evals/results/baseline evals/results/pr${{ github.event.issue.number }} --output=markdown >> run-diff.md
cat eval-summary.md >> $GITHUB_STEP_SUMMARY
cat run-diff.md >> $GITHUB_STEP_SUMMARY

- name: Comment on pull request
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const summaryPath = "eval-results.md";
const summaryPath = "eval-summary.md";
const summary = fs.readFileSync(summaryPath, 'utf8');
const runId = process.env.GITHUB_RUN_ID;
const repo = process.env.GITHUB_REPOSITORY;
const actionsUrl = `https://github.com/${repo}/actions/runs/${runId}`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: summary
body: `${summary}\n\n[Check the Actions tab for more details](${actionsUrl}).`
})
11 changes: 9 additions & 2 deletions docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

Follow these steps to evaluate the quality of the answers generated by the RAG flow.

* [Deploy a GPT-4 model](#deploy-a-gpt-4-model)
* [Setup the evaluation environment](#setup-the-evaluation-environment)
* [Generate ground truth data](#generate-ground-truth-data)
* [Run bulk evaluation](#run-bulk-evaluation)
* [Review the evaluation results](#review-the-evaluation-results)
* [Run bulk evaluation on a PR](#run-bulk-evaluation-on-a-pr)

## Deploy a GPT-4 model


Expand Down Expand Up @@ -45,7 +52,7 @@ python evals/generate_ground_truth_data.py

Review the generated data after running that script, removing any question/answer pairs that don't seem like realistic user input.

## Evaluate the RAG answer quality
## Run bulk evaluation

Review the configuration in `evals/eval_config.json` to ensure that everything is correctly setup. You may want to adjust the metrics used. See [the ai-rag-chat-evaluator README](https://github.com/Azure-Samples/ai-rag-chat-evaluator) for more information on the available metrics.

Expand All @@ -72,6 +79,6 @@ Compare answers across runs by running the following command:
python -m evaltools diff evals/results/baseline/
```

## Run the evaluation on a PR
## Run bulk evaluation on a PR

To run the evaluation on the changes in a PR, you can add a `/evaluate` comment to the PR. This will trigger the evaluation workflow to run the evaluation on the PR changes and will post the results to the PR.