Local sparsity control for Naive Bayes with extreme misclassiication costs

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.5667&rep=rep1&type=pdf

# 1. Introduction
- In text domain, there is excessive number of features
- To control sparsity, "threshold to cut-off feature" was used traditionally
- This paper suggests that local approaches (feature selection) has potential benefit

# 4. Sparsity control via feature selection
- Global sparsity cut-off (feature ranking) is better than feature count cut-off
- Cannot say local approach is always better than global approach but seems to better on many cases

# 6. Datasets
## 6.2. Model comparision
- NBLOC is best
![image](https://user-images.githubusercontent.com/2807595/45404972-0a5fbd00-b69c-11e8-817d-39944ea3f3a1.png)
![image](https://user-images.githubusercontent.com/2807595/45404980-13508e80-b69c-11e8-9ad9-6fcc969770d5.png)

# 7. Results
![image](https://user-images.githubusercontent.com/2807595/45404958-fe73fb00-b69b-11e8-836a-15e0392a7577.png)

# 8. Conclusions
- standard Naive Bayes classifier has propensity to make errors with high confidence
  - especially in the text domain where overconfidence can come from large dimensionality of the feature
- paper claims to use local approach and document-specific approach
- local feature selection may preferable which dataset and feature ranking functions are considered
- Naive Bayes could perform better with document-specific feature selection at cost settings
- paper shows Naive Bayes with document length normalization and TFIDF term weighting makes benefit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local sparsity control for Naive Bayes with extreme misclassiication costs #20

1. Introduction

4. Sparsity control via feature selection

6. Datasets

6.2. Model comparision

7. Results

8. Conclusions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Local sparsity control for Naive Bayes with extreme misclassiication costs #20

Description

1. Introduction

4. Sparsity control via feature selection

6. Datasets

6.2. Model comparision

7. Results

8. Conclusions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions