Skip to content

Commit a30de41

Browse files
committed
add WER calculation tutorial
1 parent 8f21ae7 commit a30de41

File tree

7 files changed

+92
-0
lines changed

7 files changed

+92
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
6060
- [Tokenization, Stemming, and Lemmatization in Python](https://www.thepythoncode.com/article/tokenization-stemming-and-lemmatization-in-python). ([code](machine-learning/nlp/tokenization-stemming-lemmatization))
6161
- [How to Fine Tune BERT for Semantic Textual Similarity using Transformers in Python](https://www.thepythoncode.com/article/finetune-bert-for-semantic-textual-similarity-in-python). ([code](machine-learning/nlp/semantic-textual-similarity))
6262
- [How to Calculate the BLEU Score in Python](https://www.thepythoncode.com/article/bleu-score-in-python). ([code](machine-learning/nlp/bleu-score))
63+
- [Word Error Rate in Python](https://www.thepythoncode.com/article/calculate-word-error-rate-in-python). ([code](machine-learning/nlp/wer-score))
6364
- ### [Computer Vision](https://www.thepythoncode.com/topic/computer-vision)
6465
- [How to Detect Human Faces in Python using OpenCV](https://www.thepythoncode.com/article/detect-faces-opencv-python). ([code](machine-learning/face_detection))
6566
- [How to Make an Image Classifier in Python using TensorFlow and Keras](https://www.thepythoncode.com/article/image-classification-keras-python). ([code](machine-learning/image-classifier))
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# [Word Error Rate in Python](https://www.thepythoncode.com/article/calculate-word-error-rate-in-python)
2+
- `pip install -r requirements.txt`
3+
- `wer_basic.py` is the basic implementation of WER algorithm.
4+
- `wer_accurate.py` is the accurate implementation of WER algorithm.
5+
- `wer_jiwer.py` is the implementation of WER algorithm using [jiwer](https://pypi.org/project/jiwer/).
6+
- `wer_evaluate.py` is the implementation of WER algorithm using [evaluate](https://pypi.org/project/evaluate/).
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
numpy
2+
jiwer
3+
evaluate
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
import numpy as np
2+
3+
def calculate_wer(reference, hypothesis):
4+
# Split the reference and hypothesis sentences into words
5+
ref_words = reference.split()
6+
hyp_words = hypothesis.split()
7+
# Initialize a matrix with size |ref_words|+1 x |hyp_words|+1
8+
# The extra row and column are for the case when one of the strings is empty
9+
d = np.zeros((len(ref_words) + 1, len(hyp_words) + 1))
10+
# The number of operations for an empty hypothesis to become the reference
11+
# is just the number of words in the reference (i.e., deleting all words)
12+
for i in range(len(ref_words) + 1):
13+
d[i, 0] = i
14+
# The number of operations for an empty reference to become the hypothesis
15+
# is just the number of words in the hypothesis (i.e., inserting all words)
16+
for j in range(len(hyp_words) + 1):
17+
d[0, j] = j
18+
# Iterate over the words in the reference and hypothesis
19+
for i in range(1, len(ref_words) + 1):
20+
for j in range(1, len(hyp_words) + 1):
21+
# If the current words are the same, no operation is needed
22+
# So we just take the previous minimum number of operations
23+
if ref_words[i - 1] == hyp_words[j - 1]:
24+
d[i, j] = d[i - 1, j - 1]
25+
else:
26+
# If the words are different, we consider three operations:
27+
# substitution, insertion, and deletion
28+
# And we take the minimum of these three possibilities
29+
substitution = d[i - 1, j - 1] + 1
30+
insertion = d[i, j - 1] + 1
31+
deletion = d[i - 1, j] + 1
32+
d[i, j] = min(substitution, insertion, deletion)
33+
# The minimum number of operations to transform the hypothesis into the reference
34+
# is in the bottom-right cell of the matrix
35+
# We divide this by the number of words in the reference to get the WER
36+
wer = d[len(ref_words), len(hyp_words)] / len(ref_words)
37+
return wer
38+
39+
40+
41+
if __name__ == "__main__":
42+
reference = "The cat is sleeping on the mat."
43+
hypothesis = "The cat is playing on mat."
44+
print(calculate_wer(reference, hypothesis))
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
def calculate_wer(reference, hypothesis):
2+
ref_words = reference.split()
3+
hyp_words = hypothesis.split()
4+
5+
# Counting the number of substitutions, deletions, and insertions
6+
substitutions = sum(1 for ref, hyp in zip(ref_words, hyp_words) if ref != hyp)
7+
deletions = len(ref_words) - len(hyp_words)
8+
insertions = len(hyp_words) - len(ref_words)
9+
10+
# Total number of words in the reference text
11+
total_words = len(ref_words)
12+
13+
# Calculating the Word Error Rate (WER)
14+
wer = (substitutions + deletions + insertions) / total_words
15+
return wer
16+
17+
18+
if __name__ == "__main__":
19+
reference = "the cat sat on the mat"
20+
hypothesis = "the cat mat"
21+
print(calculate_wer(reference, hypothesis))
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import evaluate
2+
3+
wer = evaluate.load("wer")
4+
5+
# reference = "the cat sat on the mat"
6+
# hypothesis = "the cat mat"
7+
reference = "The cat is sleeping on the mat."
8+
hypothesis = "The cat is playing on mat."
9+
print(wer.compute(references=[reference], predictions=[hypothesis]))
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
from jiwer import wer
2+
3+
if __name__ == "__main__":
4+
# reference = "the cat sat on the mat"
5+
# hypothesis = "the cat mat"
6+
reference = "The cat is sleeping on the mat."
7+
hypothesis = "The cat is playing on mat."
8+
print(wer(reference, hypothesis))

0 commit comments

Comments
 (0)