Rule 1 (Dictionary table creation rule): First, we extract and segment the body text of a PPT document using the existing morphological analyzer (Toutanova et al., 2003; Zhang et al., 2005), which can fulfill the task of word segmenting and part-of-speech tagging. Then, we pick up all the content words to obtain a crude dictionary table, where each word and its part-of-speech are on a single line. The same words with the same parts of speech are merged in a line. We record their occurrences as an extra attribute. The basic form of the dictionary table consists of Part-of-speech, Word, Occurrences.
Do'stlaringiz bilan baham: |