Skip to content

Upgrade pgvector and improve eval workflow #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
e458605
Test evaluation
pamelafox Oct 9, 2024
e670c95
Dont run Python tests right now
pamelafox Oct 9, 2024
d95f9e4
Merge branch 'main' into test-eval
pamelafox Oct 9, 2024
c66a8fc
get eval working
pamelafox Oct 9, 2024
16dfa13
Try to get it working
pamelafox Oct 9, 2024
c7258af
Try to get it working
pamelafox Oct 9, 2024
e19ec61
Fix version output
pamelafox Oct 9, 2024
1d5cd6e
Update pgvector
pamelafox Oct 9, 2024
61b9098
Fix pgvector installation
pamelafox Oct 9, 2024
be784b5
Fix pgvector installation
pamelafox Oct 9, 2024
7bb3599
Change username
pamelafox Oct 9, 2024
a705992
Empty password
pamelafox Oct 9, 2024
30e27ef
Add a password
pamelafox Oct 9, 2024
c7907a6
Use np.array when committing
pamelafox Oct 9, 2024
a62bc2f
Move env
pamelafox Oct 9, 2024
1677f0f
Move env
pamelafox Oct 9, 2024
7874def
Add test query
pamelafox Oct 9, 2024
d7a3242
Use uv
pamelafox Oct 10, 2024
7666b7f
Install less
pamelafox Oct 10, 2024
f34e6c2
Move test query to the flow
pamelafox Oct 10, 2024
193f5f2
Default values
pamelafox Oct 10, 2024
7fd2337
Add markdown output
pamelafox Oct 18, 2024
8205a0a
Add markdown output
pamelafox Oct 18, 2024
3f4e5f8
Configure Azure Developer Pipeline
pamelafox Oct 18, 2024
8e4453d
Evaluate 10 questions
pamelafox Oct 18, 2024
5148cdf
Run on all questions
pamelafox Oct 18, 2024
af62dff
Run on all questions
pamelafox Oct 18, 2024
f1a05e9
Add args and PR #
pamelafox Oct 20, 2024
c4f7d2f
Add args and PR #
pamelafox Oct 20, 2024
80f6c52
Add args and PR #
pamelafox Oct 20, 2024
fc10322
PR #
pamelafox Oct 20, 2024
209d3de
revert unneeded changes
pamelafox Oct 20, 2024
be06cf6
Merge branch 'main' into test-eval
pamelafox Oct 20, 2024
62a23fb
Revert workflow change
pamelafox Oct 20, 2024
2c3ef8f
Fix pgvector
pamelafox Oct 20, 2024
9832f33
Get normal tests working
pamelafox Oct 21, 2024
699e651
Postgres env
pamelafox Oct 21, 2024
b1d7fd1
Always use localhost
pamelafox Oct 21, 2024
0fd1bc5
Update tests and similar route
pamelafox Oct 21, 2024
06e3bd6
Rm unneeded comment
pamelafox Oct 21, 2024
7db7fcd
Update PG tests
pamelafox Oct 21, 2024
5f09487
Bring back the matrix
pamelafox Oct 21, 2024
5fbb60b
OSes
pamelafox Oct 21, 2024
00ee507
OSes
pamelafox Oct 21, 2024
2a06e68
More OSes
pamelafox Oct 21, 2024
62ff01c
Debug macos
pamelafox Oct 21, 2024
11d042d
Mac
pamelafox Oct 21, 2024
3be51ab
Test M1
pamelafox Oct 21, 2024
b51703c
Try the same for M1
pamelafox Oct 21, 2024
2cdb54a
Test windows
pamelafox Oct 21, 2024
21dd904
Port to uv
pamelafox Oct 21, 2024
951d89b
Windows fix
pamelafox Oct 21, 2024
ac53a46
Start postgres on windows
pamelafox Oct 21, 2024
b82e803
Start postgres on windows
pamelafox Oct 21, 2024
efd6b75
Bring back everything but Windows
pamelafox Oct 21, 2024
06f25c6
Bring back everything but Windows
pamelafox Oct 21, 2024
ea60521
Dont hardcode password
pamelafox Oct 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 57 additions & 21 deletions .github/workflows/app-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,73 +27,109 @@ jobs:
strategy:
fail-fast: false
matrix:
os: ["ubuntu-latest", "windows-latest", "macos-latest-xlarge", "macos-13"]
os: ["ubuntu-latest", "macos-latest-xlarge", "macos-13"]
python_version: ["3.10", "3.11", "3.12"]
exclude:
- os: macos-latest-xlarge
python_version: "3.10"
env:
UV_SYSTEM_PYTHON: 1
POSTGRES_HOST: localhost
POSTGRES_USERNAME: postgres
POSTGRES_PASSWORD: root
POSTGRES_DATABASE: postgres
POSTGRES_SSL: disable
steps:
- uses: actions/checkout@v4
- name: Check for MacOS Runner
if: matrix.os == 'macos-latest-xlarge'
run: brew install postgresql@14
- name: Install pgvector on Windows using install-pgvector.bat

- name: (MacOS) Install postgreSQL and pgvector using brew
if: matrix.os == 'macos-13' || matrix.os == 'macos-latest-xlarge'
run: |
brew install postgresql@14
brew link --overwrite postgresql@14
brew install pgvector
brew services start postgresql@14 && sleep 1
createuser -s ${{ env.POSTGRES_USERNAME }}
psql -d postgres -c "ALTER USER ${{ env.POSTGRES_USERNAME }} WITH PASSWORD '${{ env.POSTGRES_PASSWORD }}'"
psql -d postgres -c 'CREATE EXTENSION vector'

- name: (Windows) Install pgvector using install-pgvector.bat
if: matrix.os == 'windows-latest'
shell: cmd
run: .github\workflows\install-pgvector.bat
- name: Install PostgreSQL development libraries
if: matrix.os == 'ubuntu-latest'
run: |
sudo apt update
sudo apt install postgresql-server-dev-14
- name: Setup postgres

- name: (Windows) Start postgreSQL
if: matrix.os == 'windows-latest'
uses: ikalnytskyi/action-setup-postgres@v6
with:
username: admin
password: postgres
database: postgres
- name: Install pgvector on MacOS/Linux using install-pgvector.sh
if: matrix.os != 'windows-latest'
run: .github/workflows/install-pgvector.sh
username: ${{ env.POSTGRES_USERNAME }}
password: ${{ env.POSTGRES_PASSWORD }}
database: ${{ env.POSTGRES_DATABASE }}

- name: (Linux) Install pgvector and set password
if: matrix.os == 'ubuntu-latest'
run: |
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y
sudo apt-get install postgresql-14-pgvector
sudo systemctl start postgresql
sudo -u postgres psql -c "ALTER USER ${{ env.POSTGRES_USERNAME }} PASSWORD '${{ env.POSTGRES_PASSWORD }}'"
sudo -u postgres psql -c 'CREATE EXTENSION vector'

- name: Setup python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python_version }}
architecture: x64

- name: Install uv
uses: astral-sh/setup-uv@v3
with:
enable-cache: true
version: "0.4.20"
cache-dependency-glob: "requirements**.txt"

- name: Install dependencies
run: |
python -m pip install -r requirements-dev.txt
uv pip install -r requirements-dev.txt

- name: Install app as editable app
run: |
python -m pip install -e src/backend
uv pip install -e src/backend

- name: Setup local database with seed data
run: |
cp .env.sample .env
python ./src/backend/fastapi_app/setup_postgres_database.py
python ./src/backend/fastapi_app/setup_postgres_seeddata.py

- name: Setup node
uses: actions/setup-node@v4
with:
node-version: 18

- name: Build frontend
run: |
cd ./src/frontend
npm install
npm run build
- name: cache mypy

- name: Setup mypy cache
uses: actions/cache@3624ceb22c1c5a301c8db4169662070a689d9ea8 # v4.1.1
with:
path: ./.mypy_cache
key: mypy${{ matrix.os }}-${{ matrix.python_version }}-${{ hashFiles('requirements-dev.txt', 'src/backend/requirements.txt', 'src/backend/pyproject.toml') }}

- name: Run MyPy
run: python3 -m mypy .

- name: Run Pytest
run: python3 -m pytest -s -vv --cov --cov-fail-under=85

- name: Run E2E tests with Playwright
id: e2e
run: |
playwright install chromium --with-deps
python3 -m pytest tests/e2e.py --tracing=retain-on-failure

- name: Upload test artifacts
if: ${{ failure() && steps.e2e.conclusion == 'failure' }}
uses: actions/upload-artifact@v4
Expand Down
154 changes: 71 additions & 83 deletions .github/workflows/evaluate.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
name: Evaluate

on:
workflow_dispatch:
issue_comment:
types: [created]

Expand All @@ -13,77 +12,68 @@ permissions:

jobs:
evaluate:
if: github.event_name == 'workflow_dispatch' || contains(github.event.comment.body, '#evaluate')
if: ${{ github.event.issue.pull_request && contains(github.event.comment.body, '#evaluate')
runs-on: ubuntu-latest
env:
UV_SYSTEM_PYTHON: 1
AZURE_CLIENT_ID: ${{ vars.AZURE_CLIENT_ID }}
AZURE_TENANT_ID: ${{ vars.AZURE_TENANT_ID }}
AZURE_SUBSCRIPTION_ID: ${{ vars.AZURE_SUBSCRIPTION_ID }}
AZURE_CREDENTIALS: ${{ secrets.AZURE_CREDENTIALS }}
AZURE_RESOURCE_GROUP: ${{ vars.AZURE_RESOURCE_GROUP }}
POSTGRES_HOST: localhost
POSTGRES_USERNAME: postgres
POSTGRES_PASSWORD: root
POSTGRES_DATABASE: postgres
POSTGRES_SSL: disable
OPENAI_CHAT_HOST: ${{ vars.OPENAI_CHAT_HOST }}
OPENAI_EMBED_HOST: ${{ vars.OPENAI_EMBED_HOST }}
AZURE_OPENAI_ENDPOINT: ${{ vars.AZURE_OPENAI_ENDPOINT }}
AZURE_OPENAI_VERSION: ${{ vars.AZURE_OPENAI_VERSION }}
AZURE_OPENAI_CHAT_DEPLOYMENT: ${{ vars.AZURE_OPENAI_CHAT_DEPLOYMENT }}
AZURE_OPENAI_CHAT_MODEL: ${{ vars.AZURE_OPENAI_CHAT_MODEL }}
AZURE_OPENAI_EMBED_DEPLOYMENT: ${{ vars.AZURE_OPENAI_EMBED_DEPLOYMENT }}
AZURE_OPENAI_EMBED_MODEL: ${{ vars.AZURE_OPENAI_EMBED_MODEL }}
AZURE_OPENAI_EMBED_DIMENSIONS: ${{ vars.AZURE_OPENAI_EMBED_DIMENSIONS }}
AZURE_OPENAI_EMBEDDING_COLUMN: ${{ vars.AZURE_OPENAI_EMBEDDING_COLUMN }}
AZURE_OPENAI_EVAL_DEPLOYMENT: ${{ vars.AZURE_OPENAI_EVAL_DEPLOYMENT }}
AZURE_OPENAI_EVAL_MODEL: ${{ vars.AZURE_OPENAI_EVAL_MODEL }}
steps:
- name: Check for evaluate hash tag
if: contains(github.event.comment.body, '#evaluate')
run: |
echo "Comment contains #evaluate hashtag"

- uses: actions/checkout@v4
- name: Install PostgreSQL development libraries

- name: Install pgvector
run: |
sudo apt update
sudo apt install postgresql-server-dev-14
- name: Setup postgres
uses: ikalnytskyi/action-setup-postgres@v6
with:
username: admin
password: postgres
database: postgres
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y
sudo apt-get install postgresql-14-pgvector

- name: Start postgres
run: sudo systemctl start postgresql

- name: Set password for postgres user
run: sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'root'"

- name: Install pgvector on MacOS/Linux using install-pgvector.sh
run: .github/workflows/install-pgvector.sh
- name: Create vector extension
run: sudo -u postgres psql -c 'CREATE EXTENSION vector'

- name: Install python
uses: actions/setup-python@v5
with:
python-version: '3.12'

- name: Install azd
uses: Azure/[email protected]

- name: Install dependencies
run: |
python -m pip install -r requirements-dev.txt

- name: Install app as editable app
run: |
python -m pip install -e src/backend

- name: Setup local database with seed data
run: |
python ./src/backend/fastapi_app/setup_postgres_database.py
python ./src/backend/fastapi_app/setup_postgres_seeddata.py
env:
POSTGRES_HOST: localhost
POSTGRES_USERNAME: admin
POSTGRES_PASSWORD: postgres
POSTGRES_DATABASE: postgres
POSTGRES_SSL: disable

- name: Setup node
uses: actions/setup-node@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
node-version: 18
enable-cache: true
version: "0.4.20"
cache-dependency-glob: "requirements**.txt"

- name: Build frontend
run: |
cd ./src/frontend
npm install
npm run build

- name: Install python packages
run: |
python -m pip install --upgrade pip
pip install -r requirements-dev.txt
- name: Install azd
uses: Azure/[email protected]

- name: Login to Azure
uses: azure/login@v2
Expand All @@ -107,41 +97,42 @@ jobs:
--tenant-id "$Env:AZURE_TENANT_ID"
shell: pwsh

- name: Provision Infrastructure
run: azd provision --no-prompt
env:
AZD_INITIAL_ENVIRONMENT_CONFIG: ${{ secrets.AZD_INITIAL_ENVIRONMENT_CONFIG }}
- name: Install dependencies
run: |
uv pip install -r requirements-dev.txt

- name: Install app as editable app
run: |
uv pip install -e src/backend

- name: Setup local database with seed data
run: |
python ./src/backend/fastapi_app/setup_postgres_database.py
python ./src/backend/fastapi_app/setup_postgres_seeddata.py

- name: Setup node
uses: actions/setup-node@v4
with:
node-version: 18

- name: Build frontend
run: |
cd ./src/frontend
npm install
npm run build

- name: Run local server in background
run: |
RUNNER_TRACKING_ID="" && (nohup python3 -m uvicorn fastapi_app:create_app --factory > serverlogs.out 2> serverlogs.err &)
env:
OPENAI_CHAT_HOST: ${{ vars.OPENAI_CHAT_HOST }}
OPENAI_EMBED_HOST: ${{ vars.OPENAI_EMBED_HOST }}
AZURE_OPENAI_ENDPOINT: ${{ vars.AZURE_OPENAI_ENDPOINT }}
AZURE_OPENAI_VERSION: ${{ vars.AZURE_OPENAI_VERSION }}
AZURE_OPENAI_CHAT_DEPLOYMENT: ${{ vars.AZURE_OPENAI_CHAT_DEPLOYMENT }}
AZURE_OPENAI_CHAT_MODEL: ${{ vars.AZURE_OPENAI_CHAT_MODEL }}
AZURE_OPENAI_EMBED_DEPLOYMENT: ${{ vars.AZURE_OPENAI_EMBED_DEPLOYMENT }}
AZURE_OPENAI_EMBED_MODEL: ${{ vars.AZURE_OPENAI_EMBED_MODEL }}
AZURE_OPENAI_EMBED_DIMENSIONS: ${{ vars.AZURE_OPENAI_EMBED_DIMENSIONS }}
AZURE_OPENAI_EMBEDDING_COLUMN: ${{ vars.AZURE_OPENAI_EMBEDDING_COLUMN }}
POSTGRES_HOST: localhost
POSTGRES_USERNAME: admin
POSTGRES_PASSWORD: postgres
POSTGRES_DATABASE: postgres
POSTGRES_SSL: disable

- name: Install evaluate dependencies
run: |
uv pip install -r evals/requirements.txt

- name: Evaluate local RAG flow
run: |
python evals/evaluate.py
env:
OPENAI_CHAT_HOST: ${{ vars.OPENAI_CHAT_HOST }}
AZURE_OPENAI_ENDPOINT: ${{ vars.AZURE_OPENAI_ENDPOINT }}
AZURE_OPENAI_VERSION: ${{ vars.AZURE_OPENAI_VERSION }}
AZURE_OPENAI_CHAT_DEPLOYMENT: ${{ vars.AZURE_OPENAI_CHAT_DEPLOYMENT }}
AZURE_OPENAI_CHAT_MODEL: ${{ vars.AZURE_OPENAI_CHAT_MODEL }}
AZURE_OPENAI_EVAL_DEPLOYMENT: ${{ vars.AZURE_OPENAI_EVAL_DEPLOYMENT }}
AZURE_OPENAI_EVAL_MODEL: ${{ vars.AZURE_OPENAI_EVAL_MODEL }}
python evals/evaluate.py --targeturl=http://127.0.0.1:8000/chat --numquestions=2 --resultsdir=results/pr${{ github.event.issue.number }}

- name: Upload server logs as build artifact
uses: actions/upload-artifact@v4
with:
Expand All @@ -158,13 +149,10 @@ jobs:
uses: actions/upload-artifact@v4
with:
name: eval_result
path: ./src/api/evaluate/eval_results.jsonl
path: ./evals/results/pr${{ github.event.issue.number }}

- name: GitHub Summary Step
if: ${{ success() }}
working-directory: ./src/api
run: |
echo "" >> $GITHUB_STEP_SUMMARY

echo "📊 Promptflow Evaluation Results" >> $GITHUB_STEP_SUMMARY
cat evaluate/eval_results.md >> $GITHUB_STEP_SUMMARY
echo "📊 Evaluation Results" >> $GITHUB_STEP_SUMMARY
python -m evaltools summary evals/results --output=markdown >> $GITHUB_STEP_SUMMARY
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ Since the local app uses OpenAI models, you should first deploy it for the optim
```

3. To use OpenAI.com OpenAI, set `OPENAI_CHAT_HOST` and `OPENAI_EMBED_HOST` to "openai". Then fill in the value for `OPENAICOM_KEY`.
4. To use Ollama, set `OPENAI_CHAT_HOST` to "ollama". Then update the values for `OLLAMA_ENDPOINT` and `OLLAMA_CHAT_MODEL` to match your local setup and model. Note that most Ollama models are not compatible with the "Advanced flow", due to the need for function calling support, so you'll need to disable that in _Developer Settings_ in the UI. In addition, the database rows are embedded using the default OpenAI embedding model, so you can't search them using an Ollama embedding model. You can either choose to set `OPENAI_EMBED_HOST` to "azure" or "openai", or turn off vector search in _Developer Settings_.
4. To use Ollama, set `OPENAI_CHAT_HOST` to "ollama". Then update the values for `OLLAMA_ENDPOINT` and `OLLAMA_CHAT_MODEL` to match your local setup and model. We recommend using "llama3.1" for the chat model, since it has support for function calling, and "nomic-embed-text" for the embedding model, since the sample data has already been embedded with this model. If you cannot use function calling, then turn off "Advanced flow" in the Developer Settings. If you cannot use the embedding model, then turn off vector search in the Developer Settings.

### Running the frontend and backend

Expand Down
18 changes: 14 additions & 4 deletions evals/evaluate.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import argparse
import logging
import os
from pathlib import Path
Expand Down Expand Up @@ -50,11 +51,20 @@ def get_openai_config() -> dict:
)
load_dotenv(".env", override=True)

parser = argparse.ArgumentParser(description="Run evaluation with OpenAI configuration.")
parser.add_argument("--targeturl", type=str, help="Specify the target URL.")
parser.add_argument("--resultsdir", type=Path, help="Specify the results directory.")
parser.add_argument("--numquestions", type=int, help="Specify the number of questions.")

args = parser.parse_args()

openai_config = get_openai_config()
# TODO: specify the localhost URL using argument
# TODO: specify the experiment name (based on PR number)
# TODO: Specify the num questions using argument

run_evaluate_from_config(
working_dir=Path(__file__).parent, config_path="eval_config.json", openai_config=openai_config, num_questions=2
working_dir=Path(__file__).parent,
config_path="eval_config.json",
num_questions=args.numquestions,
target_url=args.targeturl,
results_dir=args.resultsdir,
openai_config=openai_config,
)
1 change: 1 addition & 0 deletions evals/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
git+https://github.com/Azure-Samples/ai-rag-chat-evaluator/@installable
2 changes: 1 addition & 1 deletion infra/main.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@ output AZURE_OPENAI_RESOURCE_GROUP string = deployAzureOpenAI ? openAIResourceGr
output AZURE_OPENAI_ENDPOINT string = !empty(azureOpenAIEndpoint)
? azureOpenAIEndpoint
: (deployAzureOpenAI ? openAI.outputs.endpoint : '')
output AZURE_OPENAI_VERSION string = openAIEmbedHost == 'chat' ? azureOpenAIAPIVersion : ''
output AZURE_OPENAI_VERSION string = azureOpenAIAPIVersion
output AZURE_OPENAI_CHAT_DEPLOYMENT string = deployAzureOpenAI ? chatDeploymentName : ''
output AZURE_OPENAI_CHAT_DEPLOYMENT_VERSION string = deployAzureOpenAI ? chatDeploymentVersion : ''
output AZURE_OPENAI_CHAT_DEPLOYMENT_CAPACITY int = deployAzureOpenAI ? chatDeploymentCapacity : 0
Expand Down
Loading
Loading