particular data needs some research.
The problem that we face with long documents is that they will have higher word counts
generally, but we may still want to consider long documents about some topic to be
considered, similar to a short document about the same topic, even though the word counts
will differ significantly.
Furthermore, if we still wanted to reduce very common words and highlight the rare ones,
what we would need to do is record the relative importance of each word rather than its
raw count. This is known as
Do'stlaringiz bilan baham: |