TF-IDF

You can select multiple plaintext files:

Configuration

Term Frequency
0,10, 1
ft,df_{t,d}
ft,d/tdft,df_{t,d} / \sum_{t' \in d} f_{t',d}
log(1+ft,d)\log(1 + f_{t,d})
1+logft,d1 + \log{f_{t,d}}
0.5+0.5ft,dmaxtdft,d0.5 + 0.5 * \frac{f_{t,d}}{\max_{{t' \in d}} f_{t', d}}
Inverse Document Frequency
11
logNnt\log{\frac{N}{n_t}}
log(N1+nt)+1\log({\frac{N}{1+ n_t}}) + 1
log(maxtdnt1+nt)\log({\frac{\max_{{t' \in d}} n_{t'}}{1+ n_t}})
logNntnt\log{\frac{N - n_t}{n_t}}
Common TF-IDF Presets
count-idf(ft,d)logNnt(f_{t,d}) * \log{\frac{N}{n_t}}
double normalization-idf(0.5+0.5ft,dmaxtdft,d)logNnt(0.5 + 0.5 * \frac{f_{t,d}}{\max_{{t' \in d}} f_{t', d}}) * \log{\frac{N}{n_t}}
log normalization-idf(1+logft,d)logNnt(1 + \log{f_{t,d}}) * \log{\frac{N}{n_t}}

References: