Skip to content

Local sparsity control for Naive Bayes with extreme misclassiication costs #20

@flrngel

Description

@flrngel

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.5667&rep=rep1&type=pdf

1. Introduction

  • In text domain, there is excessive number of features
  • To control sparsity, "threshold to cut-off feature" was used traditionally
  • This paper suggests that local approaches (feature selection) has potential benefit

4. Sparsity control via feature selection

  • Global sparsity cut-off (feature ranking) is better than feature count cut-off
  • Cannot say local approach is always better than global approach but seems to better on many cases

6. Datasets

6.2. Model comparision

  • NBLOC is best
    image
    image

7. Results

image

8. Conclusions

  • standard Naive Bayes classifier has propensity to make errors with high confidence
    • especially in the text domain where overconfidence can come from large dimensionality of the feature
  • paper claims to use local approach and document-specific approach
  • local feature selection may preferable which dataset and feature ranking functions are considered
  • Naive Bayes could perform better with document-specific feature selection at cost settings
  • paper shows Naive Bayes with document length normalization and TFIDF term weighting makes benefit

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions