Package miopia :: Package preparator :: Module LexicalProcessor :: Class LexicalProcessor
[hide private]
[frames] | no frames]

Class LexicalProcessor

source code

object --+
         |
        LexicalProcessor

classdocs

Instance Methods [hide private]
 
__init__(self, sentence_tokenizer, tokenizer, tagger, vocabulary_set=[])
Constructor
source code
 
get_vocabulary_set(self) source code
 
_build_vocabulary_set(self, path_to_file) source code
 
extract_sentences(self, text)
Returns: A list of strings with the sentences of the text
source code
 
_replications(self, token)
Returns: A list with the repeated chars in the token
source code
 
_eliminate_replications(self, token, replications)
Returns: A valid word in the vocabulary if it exits, otherwise returns the original word.
source code
 
_is_upper_intesification(self, token)
Returns: True if words is a complete capitalized word, False otherwise
source code
 
_is_intensifier_replication(self, replications)
Returns: True if there are three or more replicated chars, False otherwise
source code
 
extract_tokens(self, sentences)
Returns: A list of lists with the token tokens and a LexicalSentimentInfo instance with the lexical sentiment information for the text.
source code
 
extract_tags(self, tokenized_sentences) source code
 
create_lexical_info_XML(self, dict_of_lsi, path_dest)
Writes in path_dest a XML representation of the LexicalSentimentInfo of the file
source code
 
read_lexical_info_XML(self, input_path)
Returns: A dictionary of LexicalSentimentInfo
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Instance Variables [hide private]
  _vocabulary_set
if vocabulary_set is None:...
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, sentence_tokenizer, tokenizer, tagger, vocabulary_set=[])
(Constructor)

source code 

Constructor

Parameters:
  • sentence_tokenizer - Use tokenizers/punkt/spanish.pickle from nltk_data
  • tokenizer - An instance of nltk.tokenize.punkt import PunktWordTokenizer
  • tagger - Use spanish_brill.pickle (after unserialize) included in this package
  • vocabulary_set - A Python set with the vocabulary
Overrides: object.__init__

_build_vocabulary_set(self, path_to_file)

source code 
Parameters:
  • path_to_file - A path to the file with the vocabulary of words.

    Example of the structure of a vocabulary file: Word1 Word2 ... WordN

extract_sentences(self, text)

source code 
Parameters:
  • text - A String
Returns:
A list of strings with the sentences of the text

_replications(self, token)

source code 
Parameters:
  • token - A String
Returns:
A list with the repeated chars in the token

_eliminate_replications(self, token, replications)

source code 
Parameters:
  • token - A String
  • replications - A list with the replicated chars of a token
Returns:
A valid word in the vocabulary if it exits, otherwise returns the original word.

_is_upper_intesification(self, token)

source code 
Parameters:
  • token - A String
Returns:
True if words is a complete capitalized word, False otherwise

_is_intensifier_replication(self, replications)

source code 
Parameters:
  • replications - A list with the replicated chars of a token
Returns:
True if there are three or more replicated chars, False otherwise

extract_tokens(self, sentences)

source code 
Returns:
A list of lists with the token tokens and a LexicalSentimentInfo instance with the lexical sentiment information for the text. LexicalSentimentInfo is None If no lexical sentiment info is found.

extract_tags(self, tokenized_sentences)

source code 
Parameters:
  • tokenized_sentences - A list of lists of tokens @return A list of tagged sentences. Each tagged sentence is a list of tuples (token,INfoTag)

create_lexical_info_XML(self, dict_of_lsi, path_dest)

source code 

Writes in path_dest a XML representation of the LexicalSentimentInfo of the file

Parameters:

read_lexical_info_XML(self, input_path)

source code 
Parameters:
  • input_path - A path to A XML file with the lexical sentiment info needed to build a dict of LexicalSentimentInfo
Returns:
A dictionary of LexicalSentimentInfo

Instance Variable Details [hide private]

_vocabulary_set


if vocabulary_set is None:
    self._vocabulary_set = self._build_vocabulary_set(ConfigurationManager().getParameter("path_vocabulary_set"))
else:
    self._vocabulary_set = vocabulary_set