Abstract:
We describe a solution for real-time matching of microblogging messages related to product
selling or buying. C2C buy/sell interest matching in real time is nontrivial due to the
complexities of interpreting social media messages, number of messages, and diversity of
products/services. Therefore, we adopt a combination of techniques from natural language
processing, complex event processing, and distributed systems. First, we convert the message
into semantics using named-entity recognition with CRF and Logistic Regression. Then the
extracted data are matched using a complex event processor. Moreover, NoSQL and inmemory
computing are used to enhance the scalability and performance. The proposed solution
shows a high accuracy where classification and CRF models recorded an accuracy of 98.5%
and 82.07% when applied to a real-world dataset. Low latency was observed for information
extraction, in-memory data manipulation, and complex event processing were latencies were
0.5 ms, 5 ms, and 3.6 ms, respectively.