“resumai” is a Natural Language Processing toolkit and text
classification in max mxj class.
For the moment, “resumai” does summarization (natural language based or
bayesian technique), categorization, key phrase generation, part of
speech tagging (VERY USEFUL), anaphora resolution (i.e., matches proper
names with pronouns), identification of place and human names, and
sentence boundary detection. It will soon do document clustering by
similarity and use vectors instead of text. It will soon use Jitter
matrixes for staying in max rather using text files.
You can use it using a max symbol or directly from a possibly large
file. Available file formats are .txt, .htm, .html, .pdf, .doc, .abw
and .ppt.
You can download it from :
ftp://ftp.forumnet.ircam.fr/pub/max/MXJ/resumai.zip
I hope to give other good news soon. Meanwhile, I’d be happy you debug
it.
olivier.