The Technical View
The above figure shows a schematic overview of the stream processing part of OSTMap which was implemented by using the Flink Streaming API. Flink enables us to create several indices on the tweets for fast access in parallel. It reads geo-tagged tweets from Europe and North America from the twitter stream. The first row of the above figure shows the steps done to create the time index and to store the raw data: a key for the tweet is generated and then written with the tweet itself to Apache Accumulo. The second and third row represent the steps done for the term index which also uses the generated key for cross-table look-ups. The fourth row builds the geo-temporal index which we will describe later in this article. To create the previously mentioned language frequency we use the tumbling window functionality of Apache Flink. The wokflow is shown in the sixth row.
Accumulo Table Design
As one can see each tweet is saved in the RawTweetData table with a unique key consisting of the timestamp and hash of the tweet. In addition we added a simple term index pointing to the RawTweetData table for a basic term search. The third table shown in the figure holds information about the language frequencies seen by OSTMap. Further information regarding the data model of Apache Accumulo can be read here: Apache Accumulo data model.