Skip to content

Commit 4d0e801

Browse files
authored
Merge pull request #118 from Azure-Samples/testeval6
Add TOC to eval markdown
2 parents 1929ba1 + ca9a3fd commit 4d0e801

File tree

2 files changed

+20
-7
lines changed

2 files changed

+20
-7
lines changed

.github/workflows/evaluate.yaml

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -170,20 +170,26 @@ jobs:
170170
- name: Summarize results
171171
if: ${{ success() }}
172172
run: |
173-
echo "📊 Evaluation Results" >> $GITHUB_STEP_SUMMARY
174-
python -m evaltools summary evals/results --output=markdown >> eval-results.md
175-
cat eval-results.md >> $GITHUB_STEP_SUMMARY
173+
echo "## Evaluation results" >> eval-summary.md
174+
python -m evaltools summary evals/results --output=markdown >> eval-summary.md
175+
echo "## Answer differences across runs" >> run-diff.md
176+
python -m evaltools diff evals/results/baseline evals/results/pr${{ github.event.issue.number }} --output=markdown >> run-diff.md
177+
cat eval-summary.md >> $GITHUB_STEP_SUMMARY
178+
cat run-diff.md >> $GITHUB_STEP_SUMMARY
176179
177180
- name: Comment on pull request
178181
uses: actions/github-script@v7
179182
with:
180183
script: |
181184
const fs = require('fs');
182-
const summaryPath = "eval-results.md";
185+
const summaryPath = "eval-summary.md";
183186
const summary = fs.readFileSync(summaryPath, 'utf8');
187+
const runId = process.env.GITHUB_RUN_ID;
188+
const repo = process.env.GITHUB_REPOSITORY;
189+
const actionsUrl = `https://github.com/${repo}/actions/runs/${runId}`;
184190
github.rest.issues.createComment({
185191
issue_number: context.issue.number,
186192
owner: context.repo.owner,
187193
repo: context.repo.repo,
188-
body: summary
194+
body: `${summary}\n\n[Check the Actions tab for more details](${actionsUrl}).`
189195
})

docs/evaluation.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
Follow these steps to evaluate the quality of the answers generated by the RAG flow.
44

5+
* [Deploy a GPT-4 model](#deploy-a-gpt-4-model)
6+
* [Setup the evaluation environment](#setup-the-evaluation-environment)
7+
* [Generate ground truth data](#generate-ground-truth-data)
8+
* [Run bulk evaluation](#run-bulk-evaluation)
9+
* [Review the evaluation results](#review-the-evaluation-results)
10+
* [Run bulk evaluation on a PR](#run-bulk-evaluation-on-a-pr)
11+
512
## Deploy a GPT-4 model
613

714

@@ -45,7 +52,7 @@ python evals/generate_ground_truth_data.py
4552

4653
Review the generated data after running that script, removing any question/answer pairs that don't seem like realistic user input.
4754
48-
## Evaluate the RAG answer quality
55+
## Run bulk evaluation
4956
5057
Review the configuration in `evals/eval_config.json` to ensure that everything is correctly setup. You may want to adjust the metrics used. See [the ai-rag-chat-evaluator README](https://github.com/Azure-Samples/ai-rag-chat-evaluator) for more information on the available metrics.
5158
@@ -72,6 +79,6 @@ Compare answers across runs by running the following command:
7279
python -m evaltools diff evals/results/baseline/
7380
```
7481
75-
## Run the evaluation on a PR
82+
## Run bulk evaluation on a PR
7683
7784
To run the evaluation on the changes in a PR, you can add a `/evaluate` comment to the PR. This will trigger the evaluation workflow to run the evaluation on the PR changes and will post the results to the PR.

0 commit comments

Comments
 (0)