Skip to content

mrpawan-gupta/TextTo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Similarity : Cosine Similarity

The purpose of this example is to get familiarize with Text Processing & Information Retrieval By :

  • Calculing Term Frequency,
  • Tokenizing Vectors,
  • Calculating Cosine Similarity,
  • and, Vecort Product.

In this Project, we'll:

  • Read two lines of text from two files, and
  • Tokenize them;
  • Read a list of stop words from another file, and
  • Filter them out;
  • Compute the cosine similarity of the two lines of text (using frequencies), and
  • Write the result into a file.

Cosine Similarity is defined as vector similarity in terms of the angle separating two vectors. It is calculated by Dot product of vectors. to get similarity ranging from -1 to 1 where

  • 1 is Exact match
  • -1 is Exact Unmatched
  • 0 is Unmatched
Cosine similarity formula

About

The Repository Contains The CPP Program to Calculate the Cosine Similarity Between two Documents Text

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages