Trend – Visualization
Since January the New “Corpus Engine” is in development and recorded about 302.000 articles. All in all 1.130.000 “news headlines” and summaries were stored since Kungle.de went online.
Now new algorithms were developed to:
- Identify the public opinion about political and economic topics.
- Follow the image status of brands, corporations or companies.
- Track public feelings and emotions about actual events.
The Challenge
The actual trend calculation, based on static dictionaries, isn’t able to identify new events like an ‘earthquake’ or a
‘political reform’. The “topic-tagging” is static and limited to 9 topics “Science, Economy, Politic, Technology, Entertainment, Sport, Boulevard, Adult and Religion”.
It would be an exhausting task to code every new subtopic or event in a FSM (Finite State Machine).
Therefore the new engine identifies topics by itself. So not only the trend is calculated dynamically also the topic classification is “calculated”.
How is this done?
A simplified breakdown:
NLP (Natural Language Processing) is based on two strategies for text analysis: Tagging via Dictionaries and word / N-gram frequency analysis.
For Example:
This is an animation of a small section from the Kungle English – Dictionary (about 300.000 words) since January. The daily word count (one hour = one frame) is represented in the column height. The Column color changes from green to red if the word occurred in more than 10 percent of all articles. The overall word count frequency decreases from left to right.
Bigrams:
This is an even smaller section from the weighted bigram Matrix (about 100.000 x 100.000 words) in the same timeframe. Also this animation is compressed you can identify some horizontal and vertical lines. These lines occur if a topic is heavily discussed.