1 minute read

F1 score is the most common evaluation metric for sequence tagging. It is the harmonic mean of precision and recall. But more often we see that there are different variants of the F1 score used in sequence tagging or sequence labeling tasks. Like

  • Micro F1 score
  • Macro F1 score
  • Weighted F1 score

In this blog post, I will explain what these F1 score is and how to calculate them.

Micro F1 score

In micro F1 score, we need to sum up all class true positive(TP), false positive(FP), and false negative(FN) and calculate the global F1 score.

Suppose we have three classes A, B, and C.

Class TP FP FN
A 10 0 0
B 5 10 0
C 0 0 10
Total 15 10 10

So, the micro F1 calculation equation is:

Figure

Figure

Macro F1 score

Suppose we have three classes, A, B, and C. To calculate the macro F1 score, we need to calculate the F1 score for each class and then average them.

Class Per-class F1 score
A 0.8
B 0.5
C 0.7

So macro calculation equation is: ,

Figure

Figure

Weighted F1 score

The weighted F1 score depends on two parameters

  • Each class F1 score
  • Support of each class

Support: Support is the number of actual occurrences of the class in the datasets. The support value for A is 10 means that there are only 10 A labels in the dataset.

Class Per-class F1 score Support Support Percentage
A 0.8 3 0.3 (3/total support)
B 0.5 1 0.1
C 0.7 6 0.6
Total - 10 -

So, the weighted F1 calculation equation is:

Figure

Figure

Comments