Monday, February 27, 2012

Tracking user sentiments on Twitter with Twitter4j and Esper

For new comers to Complex Event Processing  and Twitter API, I hope this serves as a short tutorial and helps them get off the ground quickly.

Managing big data and mining useful information from it is the hottest discussion topic in technology right now. Explosion of growth in semi-structured data flowing from social networks like Twitter, Facebook and Linkedin is making technologies like Hadoop, Cassandra a part of every technology conversation. So as not to fall behind of competition, all customer centric organizations are actively engaged in creating social strategies. What can a company get out of data feeds from social networks? Think location based services, targeted advertisements and algorithm equity trading for starters. IDC Insights have some informative blogs on the relationship between big data and business analytics. Big data in itself will be meaningless unless the right analytic tools are available to sift through it, explains Barb Darrow in her blog post  on gigaom.com


Companies often listen into social feeds to learn customers’ interest or perception about the products. They also are trying to identify “influencers” – the one with most connections in a social graph – so they could make better offers to such individuals and get better mileage out of their marketing. The companies involved in equity trading want to know which public trading companies are discussed on Twitter and what are the users' sentiments about them. From big companies like IBM to  smaller start-ups, everyone is racing to make most of the opportunities of big data management and analytics. Much documentation about big data like this ebook from IBM 'Big Data Platform'  is freely available on the web. However a lot of this covers theory only. Jouko Ahvenainen in reply to Barb Darrow’s post above makes a good point that “many people who talk about the opportunity of big data are on too general level, talk about better customer understanding, better sales, etc. In reality you must be very specific, what you utilize and how”.

It does sound reasonable, doesn't it? So I set out to investigate this a bit further by prototyping an idea, the only good option I know. If I could do it, anybody could do it. The code is remarkably simple. But, that's exactly the point. Writing CEP framework yourself is quite complex but using it is not. Same way, Twitter makes it real easy to get to the information through REST API.



Big Data - http://www.bigdatabytes.com/managing-big-data-starts-here/

Complex Event Processing (CEP), I blogged previously (click here to read) is a critical component of the big data framework. Along with CEP, frameworks with  Hadoop are used to compile, parse and make sense out of the 24x7 stream of data from the social networks. Today,  Twitter's streaming api and CEP could be used together to capture the happiness levels of twitter users. The code I present below listens in to live tweets to generate an 'happy' event every time “lol” is found in the text of a tweet. The CEP is used to capture happy events and alert is raised every time the count of happy events exceed pre-determined number in a pre-determined time period. An assumption that a user is happy every time he or she uses “lol” is very simplistic, but it helps get the point across. In practice, gauging the users' sentiment is not that easy because it involves natural language analysis. Consider below the example that highlights the complexities of analyzing natural language.


Iphone has never been good.
Iphone has never been so good.

As you can see, addition of just one word to the sentence completely changed the meaning. Because of this reason, natural language processing is considered one of the toughest problems in computer science. You can learn “natural language processing” using free online lectures offered by Stanford University. This link  takes you directly to the first lecture on natural language analysis by Christopher Manning. But, in my opnion, the pervasive use of abbreviations in social media and in modern lingo in general, is making the task a little bit easier. Abbreviations like “lol” and “AFAIK” accurately project the meaning. The use of “lol” projects “funny” and “AFAIK” may indicate the user is “unsure” of him or herself.

The code presented below uses Twitter4j api to listen to live twitter feed and Esper CEP to listen to events and alert us when a threshold is met. You can download twitter4j binaries or source from http://twitter4j.org/en/index.html and Esper from http://esper.codehaus.org/ . Before you execute the code, make sure to create a twitter account if you don’t have one and also read Twitter’s guidelines and concepts  its streaming API here . The authentication through just username & password combination is currently allowed by Twitter but it is going to be phased out in favor of oAuth authentication in near future. Also, pay close attention to their ‘Access and Rate Limit’ section. The code below uses streaming api in one thread. Please do not use another thread at the same time to avoid hitting the rate limit. Hitting rate limits consistently can result into Twitter blacklisting your twitter ID. Also it is important to note that, the streaming API is not sending each and every tweet our way. Twitter typically will sample the data by sending 1 out every 10 tweets our way. This is not a problem however for us, as long as we are interested in patterns in the data and not in any specific tweet. Twitter offers a paid service for  businesses that need streaming data with no rate limits. Following diagram shows the components and processing of data.


Diagram. Charts & DB not yet implemented in the code


Listing 1. Standard java bean representing a happy event.




Listing 2. Esper listener is defined.




Listing 3.

Twitter4j listener is created. This listener and CEP listener start listening. Every twitter post is parsed for ‘lol’. Every time ‘lol’ is found, an happy event is generated. CEP listener raises an alert every time the total count of ‘lol’ exceeds 2 in last 10 seconds.
The code establishes a long running thread to get twitter feeds. You will see the output on the console every time threshold is met. Please remember to terminate the program, it doesn't terminate on its own.

Now that you have this basic functionality working, you can extend this prototype in number of ways. You can handle additional data feeds (from source other than Twitter) and use Esper to corelate data from the two data feeds. For visually appealing output, you can feed the output to some charting library. For example, every time Esper identifies an event, the data point is used to render a point on a line graph. If you track the ‘happy event’ this way, then the graph will essentially show the ever changing level of happiness of Twitter users over a period of time.

 Please use comment section for your feedback, +1 to share and let me know if you would like to see more postings on this subject.

Friday, February 24, 2012

End of ERP as we know it?

A friend of mine on Facebook drew my attention to this blog post, 'End of ERP' by Tien Tzuo on Forbes.com. With the professional lives of millions tied to ERP in some way, I can imagine the buzz this post must be creating. SAP, being the The biggest ERP software maker in the world and the parent company of my employer, I read this with interest. So as to not be influenced by others' arguments, I haven't read any responses to this post yet.


If you haven't already, you can read the original post by Tien Tzuo here.
To get your opinion on this matter, I have created a short survey of only 5 questions that you can access by clicking here. I will publish the results of the survey soon. A link to the survey also appears at the bottom of this post for your convenience. In my opinion, this notable (reputation derived from the fact that it appeared on Forbes) post is way biased, as many posts often are. Could Tien's earlier job at SalesForce.com as a marketing officer be the reason? Predicting the end of something epic or a most trusted technology, is sure to generate a lot of buzz, which is what bloggers often set out to do. The post would have been a lot better and valuable had he compared ERP's strengths and weaknesses and explained why the weaknesses are so glaring that ERP customers would be willing to walk away from ERP, something so crucial to their existence. There is no success for a case that lacks even a semblance of honest acknowledgment of the other side of the argument.

In support of his argument, Tien mentions some key changes in consumer behavior and consumption patterns. The change in the ways customers engage with a company is driving ERP to its inevitable death. This is the main theme in 'End of ERP'. The services based consumption is rapidly increasing, but it can be applied only to so many things. By focusing on this alone, isn’t Tien forgetting the business processes around other product segments? Like food, energy, health and vehicles, there are simply too many things we cannot subscribe to and consume remotely. All standard functions of an ERP are still required for those sectors, aren't they? A customer may stop buying cars and instead rent from Zipcar, but cars will still have to be made, sold and bought. How would companies manage their businesses and have consolidated views of them without ERP?

ERP modules - Credit (http://www.abouterp.com/)

Tien also mentions companies like SalesForce.com and touts their successes as the proof that companies are moving away from ERP. SalesForce doesn’t offer anything other than CRM, does it? Does it provide finance, HR or materials management modules of ERP? I guess not. You can’t just run a big company effectively by mish- mashing different services from ten different vendors. That's why ERP exists and will keep it's market share in the enterprise segment. I do agree, however, that cloudification (I know, I know it's not a word in the English dictionary) of business functions is an irreversible trend. Oracle and SAP’s acquisitions of Taleo and SuccessFactors, respectively, are an indication of their grudging acceptance of this fact. The key to their success is not the demand for ERP in the cloud, which is ever present, but their ability to integrate acquired companies and their products to provide the same kind of comprehensive tool set as ERP.

“End of ERP” concludes by highlighting some key business requirements that according to Tien, are not met by ERP today. Without going in to details, it suffices to say that ERP is not meant to be a silver bullet for all business problems. It does what it does while ERP providers and its ecosystem try to find solutions to the unresolved business problems. Doesn’t business intelligence (BI) software aim to solve the kind of issues he mentions? The case in point is, there are a number of ways to mine information that you need. The importance of BI is undeniable and that's what vendors are investing millions in. The enormous response to SAP's in-memory analytics appliance HANA is just an example of how innovative products will meet the business requirements of today. While the business problems mentioned in the post may be genuine, they simply highlight opportunities for ERP’s improvement and do not in any way spell doom for it.

Make your voice heard? Take the Survey

Saturday, February 11, 2012

Complex Event Processing - a beginner's view

Using a Complex Event Processing is not so complex. Well, initially at least.
  
A substantial amount of information is available on the web on CEP products and functionality. But,if you are like me, you want to test run a product/application with little patience for reading detailed documentation. So when I was evaluating CEP as an engine for one of our future products, I decided to just try it out using a business scenario I knew from my past experience working with a financial company. For the impatient developers like me, what could be better than using a free and open source product. So, I decided to use 'Esper', an open source product based on Java and was able to write the code (merely 3 java classes) to address business case below.
But first a little about CEP and a shameless plug of our product. My apologies. :-)
Complex Event Processing has been gaining significant ground recently. The benefits of CEP are widely understood in some verticals such as financial and insurance industries, where it is actively deployed to perform various business critical tasks. Monitoring, Fraud detection and algorithmic trading are some of those critical tasks that depend on CEP to integrate multiple streams of real-time data, identify patterns and generate actionable events for an organization.
My current employer, Sybase Inc is one of the leading suppliers of CEP. Aleri, the Sybase CEP product, is widely used in financial services industry and it is the main component of Sybase's leading solution,'RAP - The Trading Edition'. Aleri is also sold as a separate product. Detailed information about the product is available here. http://www.sybase.com/products/financialservicessolutions/complex-event-processing.

The high level architecture of a CEP application is shown in the diagram below.
Figure 1.

Now on to the best part. The business requirement -
  
The important aspect of CEP that fascinates me is its ability to co-relate events or data points from different
streams or from within the same data stream. To elaborate, take an example of a retail bank that has a fraud
monitoring system in place. The system flags every cash transaction over $10,000 for a manual review. What this means is a large cash transaction (a deposit or withdrawal) in an account raises the anti-money laundering event from the monitoring system. Such traditional monitoring systems can easily be circumvented /exploited by simple tricks such as depositing more than one check with smaller amounts. What happens if an account holder deposits 2 checks of $6000 in a day or 5 checks of $2500 in a day? Nothing. The system can't catch it. The CEP provides a way to define rules with a time frame criterion. For example, you could specify a rule to raise a flag when some one deposits more than $10000 in cash in a 12 hour window. Get it?
Follow the steps below to see how easy it is to implement CEP to meet this business requirement.
Download latest Esper version (4.5.0 at the time of this writing) from here. http://espertech.com/download/
Unzip the package in a separate folder.
Create a Java project and reference the Esper jar files from this folder.
Create a standard java bean for an event - which here is an Deposit account with a name and amount attributes.

Listing 1.

The next listing is for creating an event type, the sql like query to create an event and to register a listener on
that query. The code generates an event any time one of the two deposit accounts AccountA and AccountB is deposited with more than 100000 in a time frame of 10 seconds (this is where you specify the time window). Because this is just a test, I have put the event generation functionality together with other code, but in real life the deposit amounts would be fed from deposit transaction processing system based on some messaging framework. The code is easy enough to follow. First we create the initial configuration. Then we add a type of event we want. A query with criterion for selecting the event is created next. As you can see the amount is summed up over sliding windows of 10 seconds and it creates an event when total of the amount in that time frame for a particular account exceeds 100000. A listener is created next and it is registered on the query.


 Listing 2

The next listing is the listener. Every time an event is generated in the time window specified in the query, it gets added to the newEvents collection.

 
 Listing 3

Easy enough, right? The expression language itself is fairly easy to understand because of its similarities to standard SQL syntax. Although the real life implementation could become complex based on the type and number of feeds and events you want to monitor, the product in itself is simple enough to understand. Many of the commercial CEP products offer excellent user interface to create the type of events, queries and reports.

Complex event processing is still a growing field and the pace of its adoption will only increase as companies try to make sense of all the streams of data flowing in. The amount of semi-structured and other type of data (audio, video) has already surpassed the amount of traditional relational data. It's easy to gauge the impact of good CEP application at a time when stock trading companies are already gleaning clues from twit feeds from twitter.

Hope this helps the curious. Don't forget to click +1 or Like, if you like it.