You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Review the generated data after running that script, removing any question/answer pairs that don't seem like realistic user input.
47
54
48
-
## Evaluate the RAG answer quality
55
+
## Run bulk evaluation
49
56
50
57
Review the configuration in `evals/eval_config.json` to ensure that everything is correctly setup. You may want to adjust the metrics used. See [the ai-rag-chat-evaluator README](https://github.com/Azure-Samples/ai-rag-chat-evaluator) for more information on the available metrics.
51
58
@@ -72,6 +79,6 @@ Compare answers across runs by running the following command:
72
79
python -m evaltools diff evals/results/baseline/
73
80
```
74
81
75
-
## Run the evaluation on a PR
82
+
## Run bulk evaluation on a PR
76
83
77
84
To run the evaluation on the changes in a PR, you can add a `/evaluate` comment to the PR. This will trigger the evaluation workflow to run the evaluation on the PR changes and will post the results to the PR.
0 commit comments