Graphical techniques for text mining.
TextGraphics is a python module for graphical methods in text mining. It can create two types of graphs from a corpus of text files.
We choose a threshold for cosine similarity or co-occurance count.
Analysis package contains codes to study basic properties of the graphs and methods to plot the graphs. Also there is an implementation of Girvan-Newman algorithm for extracting the communities in the graph.
Applications imclude auto summarization based on LexRank and topic modeling.
Dependencies:
Install pip install textgraphics
Usage:
from TextGraphics.src.graph import TextGraph
g = TextGraph(corpus)
senGraph = g.sentenceGraph()
keyGraph = g.keywordGraph()
from TextGraphics.Applications.summary import LexRank
lR = LexRank(corpus)
lR.summary()
See testCode.py for usage.