For new comers to Complex Event Processing and Twitter API, I hope this serves as a short tutorial and helps them get off the ground quickly.
Managing big data and mining useful information from it is the hottest discussion topic in technology right now. Explosion of growth in semi-structured data flowing from social networks like Twitter, Facebook and Linkedin is making technologies like Hadoop, Cassandra a part of every technology conversation. So as not to fall behind of competition, all customer centric organizations are actively engaged in creating social strategies. What can a company get out of data feeds from social networks? Think location based services, targeted advertisements and algorithm equity trading for starters. IDC Insights have some informative blogs on the relationship between big data and business analytics. Big data in itself will be meaningless unless the right analytic tools are available to sift through it, explains Barb Darrow in her blog post on gigaom.com
Companies often listen into social feeds to learn customers’ interest or perception about the products. They also are trying to identify “influencers” – the one with most connections in a social graph – so they could make better offers to such individuals and get better mileage out of their marketing. The companies involved in equity trading want to know which public trading companies are discussed on Twitter and what are the users' sentiments about them. From big companies like IBM to smaller start-ups, everyone is racing to make most of the opportunities of big data management and analytics. Much documentation about big data like this ebook from IBM 'Big Data Platform' is freely available on the web. However a lot of this covers theory only. Jouko Ahvenainen in reply to Barb Darrow’s post above makes a good point that “many people who talk about the opportunity of big data are on too general level, talk about better customer understanding, better sales, etc. In reality you must be very specific, what you utilize and how”.
It does sound reasonable, doesn't it? So I set out to investigate this a bit further by prototyping an idea, the only good option I know. If I could do it, anybody could do it. The code is remarkably simple. But, that's exactly the point. Writing CEP framework yourself is quite complex but using it is not. Same way, Twitter makes it real easy to get to the information through REST API.
Iphone has never been good.
Iphone has never been so good.
As you can see, addition of just one word to the sentence completely changed the meaning. Because of this reason, natural language processing is considered one of the toughest problems in computer science. You can learn “natural language processing” using free online lectures offered by Stanford University. This link takes you directly to the first lecture on natural language analysis by Christopher Manning. But, in my opnion, the pervasive use of abbreviations in social media and in modern lingo in general, is making the task a little bit easier. Abbreviations like “lol” and “AFAIK” accurately project the meaning. The use of “lol” projects “funny” and “AFAIK” may indicate the user is “unsure” of him or herself.
The code presented below uses Twitter4j api to listen to live twitter feed and Esper CEP to listen to events and alert us when a threshold is met. You can download twitter4j binaries or source from http://twitter4j.org/en/index.html and Esper from http://esper.codehaus.org/ . Before you execute the code, make sure to create a twitter account if you don’t have one and also read Twitter’s guidelines and concepts its streaming API here . The authentication through just username & password combination is currently allowed by Twitter but it is going to be phased out in favor of oAuth authentication in near future. Also, pay close attention to their ‘Access and Rate Limit’ section. The code below uses streaming api in one thread. Please do not use another thread at the same time to avoid hitting the rate limit. Hitting rate limits consistently can result into Twitter blacklisting your twitter ID. Also it is important to note that, the streaming API is not sending each and every tweet our way. Twitter typically will sample the data by sending 1 out every 10 tweets our way. This is not a problem however for us, as long as we are interested in patterns in the data and not in any specific tweet. Twitter offers a paid service for businesses that need streaming data with no rate limits. Following diagram shows the components and processing of data.
Twitter4j listener is created. This listener and CEP listener start listening. Every twitter post is parsed for ‘lol’. Every time ‘lol’ is found, an happy event is generated. CEP listener raises an alert every time the total count of ‘lol’ exceeds 2 in last 10 seconds.
The code establishes a long running thread to get twitter feeds. You will see the output on the console every time threshold is met. Please remember to terminate the program, it doesn't terminate on its own.
Now that you have this basic functionality working, you can extend this prototype in number of ways. You can handle additional data feeds (from source other than Twitter) and use Esper to corelate data from the two data feeds. For visually appealing output, you can feed the output to some charting library. For example, every time Esper identifies an event, the data point is used to render a point on a line graph. If you track the ‘happy event’ this way, then the graph will essentially show the ever changing level of happiness of Twitter users over a period of time.
Please use comment section for your feedback, +1 to share and let me know if you would like to see more postings on this subject.
Managing big data and mining useful information from it is the hottest discussion topic in technology right now. Explosion of growth in semi-structured data flowing from social networks like Twitter, Facebook and Linkedin is making technologies like Hadoop, Cassandra a part of every technology conversation. So as not to fall behind of competition, all customer centric organizations are actively engaged in creating social strategies. What can a company get out of data feeds from social networks? Think location based services, targeted advertisements and algorithm equity trading for starters. IDC Insights have some informative blogs on the relationship between big data and business analytics. Big data in itself will be meaningless unless the right analytic tools are available to sift through it, explains Barb Darrow in her blog post on gigaom.com
Companies often listen into social feeds to learn customers’ interest or perception about the products. They also are trying to identify “influencers” – the one with most connections in a social graph – so they could make better offers to such individuals and get better mileage out of their marketing. The companies involved in equity trading want to know which public trading companies are discussed on Twitter and what are the users' sentiments about them. From big companies like IBM to smaller start-ups, everyone is racing to make most of the opportunities of big data management and analytics. Much documentation about big data like this ebook from IBM 'Big Data Platform' is freely available on the web. However a lot of this covers theory only. Jouko Ahvenainen in reply to Barb Darrow’s post above makes a good point that “many people who talk about the opportunity of big data are on too general level, talk about better customer understanding, better sales, etc. In reality you must be very specific, what you utilize and how”.
It does sound reasonable, doesn't it? So I set out to investigate this a bit further by prototyping an idea, the only good option I know. If I could do it, anybody could do it. The code is remarkably simple. But, that's exactly the point. Writing CEP framework yourself is quite complex but using it is not. Same way, Twitter makes it real easy to get to the information through REST API.
Big Data - http://www.bigdatabytes.com/managing-big-data-starts-here/ |
Complex Event Processing (CEP), I blogged previously (click here to read) is a critical component of the big data framework. Along with CEP, frameworks with Hadoop are used to compile, parse and make sense out of the 24x7 stream of data from the social networks. Today, Twitter's streaming api and CEP could be used together to capture the happiness levels of twitter users. The code I present below listens in to live tweets to generate an 'happy' event every time “lol” is found in the text of a tweet. The CEP is used to capture happy events and alert is raised every time the count of happy events exceed pre-determined number in a pre-determined time period. An assumption that a user is happy every time he or she uses “lol” is very simplistic, but it helps get the point across. In practice, gauging the users' sentiment is not that easy because it involves natural language analysis. Consider below the example that highlights the complexities of analyzing natural language.
Iphone has never been good.
Iphone has never been so good.
As you can see, addition of just one word to the sentence completely changed the meaning. Because of this reason, natural language processing is considered one of the toughest problems in computer science. You can learn “natural language processing” using free online lectures offered by Stanford University. This link takes you directly to the first lecture on natural language analysis by Christopher Manning. But, in my opnion, the pervasive use of abbreviations in social media and in modern lingo in general, is making the task a little bit easier. Abbreviations like “lol” and “AFAIK” accurately project the meaning. The use of “lol” projects “funny” and “AFAIK” may indicate the user is “unsure” of him or herself.
The code presented below uses Twitter4j api to listen to live twitter feed and Esper CEP to listen to events and alert us when a threshold is met. You can download twitter4j binaries or source from http://twitter4j.org/en/index.html and Esper from http://esper.codehaus.org/ . Before you execute the code, make sure to create a twitter account if you don’t have one and also read Twitter’s guidelines and concepts its streaming API here . The authentication through just username & password combination is currently allowed by Twitter but it is going to be phased out in favor of oAuth authentication in near future. Also, pay close attention to their ‘Access and Rate Limit’ section. The code below uses streaming api in one thread. Please do not use another thread at the same time to avoid hitting the rate limit. Hitting rate limits consistently can result into Twitter blacklisting your twitter ID. Also it is important to note that, the streaming API is not sending each and every tweet our way. Twitter typically will sample the data by sending 1 out every 10 tweets our way. This is not a problem however for us, as long as we are interested in patterns in the data and not in any specific tweet. Twitter offers a paid service for businesses that need streaming data with no rate limits. Following diagram shows the components and processing of data.
Diagram. Charts & DB not yet implemented in the code |
Listing 1. Standard java bean representing a happy event.
Listing 2. Esper listener is defined.
Listing 3.
Twitter4j listener is created. This listener and CEP listener start listening. Every twitter post is parsed for ‘lol’. Every time ‘lol’ is found, an happy event is generated. CEP listener raises an alert every time the total count of ‘lol’ exceeds 2 in last 10 seconds.
The code establishes a long running thread to get twitter feeds. You will see the output on the console every time threshold is met. Please remember to terminate the program, it doesn't terminate on its own.
Now that you have this basic functionality working, you can extend this prototype in number of ways. You can handle additional data feeds (from source other than Twitter) and use Esper to corelate data from the two data feeds. For visually appealing output, you can feed the output to some charting library. For example, every time Esper identifies an event, the data point is used to render a point on a line graph. If you track the ‘happy event’ this way, then the graph will essentially show the ever changing level of happiness of Twitter users over a period of time.
Please use comment section for your feedback, +1 to share and let me know if you would like to see more postings on this subject.
Really interesting topic, yes, it could be good if you can write more about the subject, i´m interesting in make a social listener based on twitter, facebook, etc, but also a social crm
ReplyDeletethanks in advance
The provided code samples are not executing as expected. Twitter API continues to generate an "TwitterException{exceptionCode=[ec814753-44a5356e 4eaddaa2-50017419]" even though I am using the latest 2.2.5 version. Also, tried the 2.2.6-SNAPSHOT version with same result. It would be quite helpful if you may be able to include list of dependencies (jar files and versions used) with articles involving code.
ReplyDeleteThank you for this good article.
I have been busy some other pressing matters for a while. I will be able to publish the list if you still haven't figured this out. Other readers were able to run the code so I am assuming you were able to try it out successfully.
DeleteOutstanding! Thanks a lot!
ReplyDeleteI have a question if you don't mind. I made the code run but I really can't understand what is returning namely ??
ReplyDeleteI must mention that i do have only 5 "lol" words on my twitter account. Also I know the meaning of the first 3 rows from below . How I can check if it right?
Thanks!
Nov 11, 2012 11:24:57 PM com.espertech.esper.core.service.EPServiceProviderImpl doInitialize
INFO: Initializing engine URI 'default' version 4.7.0
[Twitter Stream consumer-1[initializing]] INFO twitter4j.TwitterStreamImpl - Establishing connection.
[Twitter Stream consumer-1[Establishing connection]] INFO twitter4j.TwitterStreamImpl - Connection established.
[Twitter Stream consumer-1[Establishing connection]] INFO twitter4j.TwitterStreamImpl - Receiving status stream.
Got a status deletion notice id:232628414350254080
******* lol found *****
Got a status deletion notice id:194815941673099264
******* lol found *****
Got a status deletion notice id:192334088139583489
Got a status deletion notice id:163152528408711169
Got a status deletion notice id:168551524618874881
Got a status deletion notice id:175282791137816578
Got a status deletion notice id:157517124481462272
Got a status deletion notice id:184313328968019968
Got a status deletion notice id:190518243914555393
Got a status deletion notice id:267334707086237696
Got a status deletion notice id:164048083649445888
Got a status deletion notice id:177638354211438592
Got a status deletion notice id:250327253127409665
Got a status deletion notice id:175783616205426688
Got a status deletion notice id:207170465452654592
Got a status deletion notice id:172800786571591681
Got a status deletion notice id:187746656199000065
Got a status deletion notice id:168203644842414080
Got a status deletion notice id:207993157198155777
******* lol found *****
exceeded the count, actual 3
Got a status deletion notice id:184405561717174272
Got a status deletion notice id:163309466689863681
Got a status deletion notice id:267769098551820289
Got a status deletion notice id:237008857354891264
Got a status deletion notice id:90240781918543872
Got a status deletion notice id:161871751553363968
Got a status deletion notice id:189475208585940994
Got a status deletion notice id:192284264002355201
Got a status deletion notice id:187276285989502977
Got a status deletion notice id:171972042437042176
Got a status deletion notice id:215204256293195776
Got a status deletion notice id:250918457049239553
Got a status deletion notice id:94131103731957762
Got a status deletion notice id:267756112995049473
Got a status deletion notice id:267725779796893696
Got a status deletion notice id:242691765981831168
You may want to read 'deleted message FAQ' on Twitter. The very first item there is following
DeleteWhat is a delete notice?
Twitter sends us a notification whenever a user deletes a Tweet. We pass these notifications on to you as part of your stream. If you are storing Tweets you must take account of these delete messages in order to comply with Twitter's Terms of Service.