Nonparametric comparison of ROC curves: testing equivalence and clustering

Speaker:
Luboš Prchal
Abstract:
Receiver operating characteristic (ROC) curves form a popular and widely used tool that can help to summarize the overall performance of diagnostic methods and/or classifiers assigning individuals into one of these groups. Typically, the individuals in one group hold a feature of interest and are referred to as the positives, while the other ones are without the feature and are referred to as the negatives. The problem of testing equivalence of two ROC curves will be addressed and illustrated on a real data set from the field of computational linguistics. A transformation of ROC curves is suggested so that it motivates a test statistic as a distance of two empirical quantile processes. Its asymptotic distribution is obtained and a simulation scheme for critical values is proposed. The procedure is applied on several ROC curves measuring quality of automatic collocation extraction. It will be shown that obtained p-values can be used as a distance between the curves enabling ROC curves clustering. Throughout the lecture we will illustrate the potential of our approach on two-word (bigram) collocation, which can be viewed as a sort of binary classification of bigrams into one of two categories: true collocations and no collocations. This setting implies that ROC curves can be used to measure a quality of such procedures.
Length:
01:41:20
Date:
31/03/2008
views: 1942

Images:
Preview of img025.jpg
Image img025.jpg
Preview of img057.jpg
Image img057.jpg
Attachments: (video, slides, etc.)
81M
1533 downloads
354M
1943 downloads
645M
1485 downloads
202M
1530 downloads