In a world full of open source and open standards what's so important about one more protocol or specification, right? Why should you know about it, right? Right, but technology professionals are required to know a lot more technologies today than they were in the past because of the rapidly shifting software landscape. Gone are the days when you could master one technology and keep your job forever. At any moment you could be asked to look into some new technology and be expected to work on it or create a proof of concept in an effort to evaluate its fit to your current project. ODATA may be one such technology you may not have worked in or heard of, but worth knowing about.
If you already know about ODATA specification and how it works, please stop reading. This is a very high level but useful information on ODATA meant for newcomers.
Promoted by Microsoft ( do I hear Microsoft haters leaving already? ) ODATA is a protocol to access information over the web. ODATA defines a standard way of exposing any information through industry standard HTTP protocol. It also follows the architecture popularly known as REST. You can get details of the protocol and much more here. If you want to expose your data to your clients using ODATA, you will need much deeper understanding of the specification. However if you just want to consume the data as a client, you could be up and running in a few minutes to a few hours depending on your level of expertise. This post will have you create a sample and test it within minutes.
Let's take an example of a company that we all know about - Netflix. It uses odata service to publish its movie catalog. Viewing the movie catalog or searching for a title could easily be done by just accessing certain urls through your browser. You just need to know how to build further urls based on the information you get from a url. ODATA provides the response to your http request in a standard XML format that follows either Atom or Json format. Try these urls yourself and see how easy it is get to the information. The odata services are consumed through urls that are intuitive and easy to understand.
http://odata.netflix.com will show what type of information you could get from this website. The URL will resolve to http://odata.netflix.com/v2/Catalog/. The results will show that you could get a list of Genres, Titles and Languages. Now, suppose you want to see all the titles Netflix has, then simply add 'Titles' to the previous url as- http://odata.netflix.com/v2/Catalog/Titles. This will generate all the titles and associated information. If you want to know all the details (metadata) about all the entities (collections) simply type in this url. http://odata.netflix.com/v2/Catalog/$metadata.This results in information on collections, the entities they are made up of and the attributes of each entity there in. This just shows you how easy it is to get to this basic information. Now you are ready to dig deeper.
The data that shows up on your screen may have been stored in a table in a relational database or in an xml database or in a a file in the file system, it does not matter. The information is available to you through the http get method (browser sends 'get' method to the end point). Irrespective of the type and location of the data storage, specific data is likely identified by a unique key. If you know the unique identifier of the entity you want to access, then you could use it to get to more information. The following example shows how e Titles are uniquely identified by an 'id' field. This information - that id is the unique identifier of a title is gotten from the previous metadata query we executed. Spend a few minutes reviewing the results of metadata query.
http://odata.netflix.com/v2/Catalog/Titles('13bLK').
When you don't know the unique identifier you could always search titles based on a name (name is an attribute of Title entity as revealed by the metadata query.)
http://odata.netflix.com/v2/Catalog/Titles?$filter=(Name%20eq%20'Rod%20Stewart')
The count of total number of titles Netflix owns is easily found by using the following url. As of 2/8/2013 the count is 160050. Odata defines many such functions count, length, replace, trim etc.
http://odata.netflix.com/v2/Catalog/Titles/$count
So this is what odata specification provides. It specifies the format of the url, the functions and operators you could use in the url to get to the data you want. REST based architecture exposes all the information through http metods - in this case - 'GET' is what we have seen so far.
Now a little bit on to how providers like Netflix create these services. If service provider is exposing mechanisms for creating, updating and deleting the data through odata producer and if the information that's being exposed is in relational database, they need to map the standard CRUD methods to http methods. So create is mapped to post, delete to delete, update to put and select to get. The essential tasks of the producer would be to parse the request, get the http method, get the content, call the back end where the data is stored, perform the mapped functions and convert the retrieved data to Atom or Json format before it's sent back to the client. Vendors like Microsoft and Google (odata4j) are offering function libraries that makes it easy to create your own producer. If you just want to consume the odata services there are many client libraries like popular datajs.
Now you are ready to create your own client.
Here is a snippet of code I got from datajs tutorial. Save this as any html document and use it in any modern browser. You will see all the genres of all the content from the Netflix. Before running the sample don't forget to download datajs-1.1.0.min.js directory from here and save it to your c:\datajs directory.
I hope this post will give you enough information to try things on your own. I also hope to have another post some time later on the same subject. Odata is actively used by companies such as Microsoft, SAP, Netflix, Facebook and Ebay. Only time will tell if this standard becomes widely used technology or not, but it already has gathered enough following to attract our attention. Odata.org website. Good luck. Your comments are welcome. Please feel free to share if you like it.
Simple & Practical
Ideas and practical tips for software professionals.
Saturday, February 9, 2013
Tuesday, September 25, 2012
SAP Sybase ASE - An introduction for a newbie
Since the day of its acquisition of Sybase, SAP has made great strides in porting its business suite on to Sybase ASE (known as Adaptive Server Enterprise - the Sybase relational database). Last week's announcement from SAP is just one of the many milestones that symbolises its committment to Sybase products and highlights the successes. The highest number in SD benchmark was achieved on a 2 core HP/Linux machine by ASE. Check out the details here.
As SAP works towards its stated goal of becoming a premier database company in the world, it is but natural that more and more SAP practitioners and enthusiasts become interested in ASE and may be looking to use it not only for SAP applications but also for other in-house applications. For a java programmer, familiarity with the JDBC API is enough to use a standard RDBMS. But once in a while one is required to connect directly to the underlying database to debug an issue. Unlike production systems where you can depend on a DBA for support, you are often left to deal with problems yourself in a development environment. Quirks or peculiarities in a new relational database system can be a source of frustration. Some features that totally make sense in one database may not work the same way in other. It's those little things, quirks, that can drive one crazy when one is dealing with tighter deadlines.
I have been working with SAP Sybase Adaptive Server Enterprise for more than a year now. I feel comfortable in sharing some bits and pieces I pieced together over that period. Much of this information is available on the web and or in the product documentation, but scouring and digging through this information can take time. Also, some of the information is available at most unlikely places. Having worked with popular databases other than ASE before, these key features seemed interesting, a bit different or I came across them too often. This is not a comprehensive list of ASE features, but it is a list to save some time for a newcomer to quickly get off the ground. So here it goes.
First, SAP allows free downloads of developer versions of most of the Sybase products. You can download the one you want from here. Consider revisiting this post before you install ASE.
1. Concept of devices and databases in ASE - They resemble the concepts of databases and schemas in Oracle.
2. Finding the version of the database you are running - Sybase products including ASE are fast evolving to cater to SAP ecosystem. Major features are being added fairly regularly. When something isn't working as expected, first thing you may want to do before calling into tech support is to find the version of the database you are running and refer to product documentation here.
5. One of the most important things you need to decide while installing the ASE is the page size. All data from one row of one table is colocated in one page to improve performance. Once defined, you can not change page size for the installed ASE instance. Below is a table showing relationship between the page sizes and corresponding maximum possible database sizes.
7. Once the installation is complete you can run Sybase Control Center application that provides a GUI interface to all your ASE instances. You can alternatively use isql, a command line sql interface to interact with ASE. Super admin user name is 'SA' and password could be left blank.
8. Use following stored procedures to add users. sp_addlogin just adds a login and not a user.
10. Before you try any sql you may want to check the 'reserved words' of ASE by using following command. I recommend to go through this list if you are porting an existing application to ASE.
11. Its a common practice to add an autoincrement column to a table and use it as a primary key. Autoincrement is achieved in ASE by denoting a column as IDENTITY. ASE does not allow more than one autoincrement column in a table. There is no 'sequence' as in Oracle. ASE allows adding user defined values to an autoincrement column after you run the following command.
12. If you have used MS SQL Server, then you will find ASE's t-sql very similar as both share the same roots. T-sql is similar to pl/sql in Oracle.
13. ASE's native JDBC driver is called jconnet. JTDS, an open source driver, can be used also. Jconnect supports changing connection properties through connection url. Just append the connection url with 'property_name=property_value'. For example
14. Also important to note is the difference in handling table aliases. Table aliases are allowed in 'SELECT' statements but not in 'UPDATE' or 'DELETE'. For example the first sql below is valid but the second returns an error.
- dataserver -v
- your sybase home directoy/ASE-15_0/install
- DUMP TRANSACTION your_database_name WITH TRUNCATE_ONLY
- DUMP TRANSACTION your_database_name with no_log
Before you try above commands, you may want to look at all the running connections/threads to ASE and their status by executing sp_who stored procedure.
- 2K page size - 4 TB
- 4K page size - 8 TB
- 8K page size - 16 TB
- 16K page size - 32 TB
- select @@maxpagesize
- sp_addlogin - adds only a login name
- sp_adduser - adds a new user.
- sp_role "grant", some_role, username
- select name from master..spt_values where type = "W"
- set identity_insert tablename on
- 'jdbc:sybase:tds:servername:portnumber/database?QUOTED_IDENTIFIER=ON'
- SELECT * from "dbo". "User_Table"
- select * from tablename t1
- update tablename t1 set columnname=value where column2=value2
15. Sybase Jconnect driver depends on some key metadata information for its correct working. In absence of this information you may receive sql exception with similar description as 'missing metadata'. If this happens, it means that you missed a step during ASE installation. This missed step installs the metadata information in the master database. This error could be eliminated by running a stored procedure after the installation. Refer to the documentation here.
16. 'Select for Update' was introduced in ASE 15.7 and you would expect it to work by default in a new install. Alas, no such luck. I highly recommend reading this document before you try this feature. You would need 'datarows' locking scheme on the table on which select is performed. You can turn 'datarows locking' ON on the enitre database or on a table using following commands.- sp_configure "lock scheme", 0, datarows
- alter table tablename lock datarows
17. ASE is case sensitive by default. To display case sensitivity, sort order and character set settings, use sp_helpsort stored procedure.
- Select * from sysobjects where type = 'U'
sysreferences, sysconstraints hold information on relationships between tables like foreign key constraints.
- sp_help tablename
- select sc.name from syscolumns sc, sysobjects so where sc.id=so.id and so.name='tablename'
20. Unlike Oracle, ASE truncates all trailing blanks while storing variable length varchar columns. So be careful if your code has comparisons on strings pulled from varchar columns.
I hope this list will help you install and navigate ASE in early stages. I mentioned these in particular because I had to use these commands more often than any others. Once you spend a little bit of time on ASE, you will of course come across other things that seem more useful. I will try to list those in my next post.
Sunday, April 8, 2012
Understanding Aleri - Complex Event Processing - Part II
Aleri, the complex event processing platform from Sybase was reviewed at high level in my last post.
This week, let's review the Aleri Studio, the user interface to Aleri platform and the use of pub/sub api, one of many ways to interface with the Aleri platform. The studio is an integral part of the platform and comes packaged with the free evaluation copy. If you haven't already done so, please download a copy from here. The fairly easy installation process of Aleri product gets you up and running in a few minutes.
The aleri studio is an authoring platform for building the model that defines interactions and sequencing between various data streams. It also can merge multiple streams to form one or more streams. With this eclipse based studio, you can test the models you build by feeding them with the test data and monitor the activity inside the streams in real time. Let's look at the various type of streams you can define in Aleri and their functionality.
Source Stream - Only this type of stream can handle incoming data. The operations that can be performed by the incoming data are insert, update, delete and upsert. Upsert, as the name suggests updates data if the key defining a row is already present in the stream. Else, it inserts a record in the stream.
Aggregate Stream - This stream creates a summary record for each group defined by specific attribute. This provides functionality equivalent to 'group by' in ANSI SQL.
Copy stream - This stream is created by copying another stream but with a different retention rule.
Compute Stream - This stream allows you to use a function on each row of data to get a new computed element for each row of the data stream.
Extend Stream - This stream is derived from another stream by additional column expressions
Filter Stream - You can define a filter condition for this stream. Just like extend and compute streams, this stream applies filter conditions on other streams to derive a new stream.
Flex Stream - Significant flexibility in handling streaming data is achieved through custom coded methods. Only this stream allows you to write your own methods to meet special needs.
Join Stream - Creates a new stream by joining two or more streams on some condition. Both, Inner and Outer joins can be used to join streams.
Pattern Stream - Pattern matching rules are applied with this stream
Union Stream - As the name suggests, this joins two or more streams with same row data structure. Unlike the join stream, this stream includes all the data from all the participating streams.
By using some of these streams and the pub api of Aeri, I will demonstrate the seggregation of twitter live feed into two different streams. The twitter live feed is consumed by a listener from Twitter4j library. If you just want to try Twitter4j library first, please follow my earlier post 'Tracking user sentiments on Twitter'. The data received by the twitter4j listener, is fed to a source stream in our model by using the publication API from Aleri. In this exercise we will try to separate out tweets based on their content. Built on the example from my previous post, we will divide the incoming stream into two streams based on the content. One stream will get any tweets that consists 'lol' and the other gets tweets with a smiley ":)" face in the text . First, let's list the tasks we need to perform to make this a working example.
- Create a model with three streams
- Validate the model is error free
- Create a static data file
- Start the Aleri server and feed the static data file to the stream manually to confirm correct working of the model.
- Write java code to consume twitter feed. Use the publish API to publish the tweets to Aleri platform.
- Run the demo and see the live data as it flows through various streams.
![]() |
| Image 1 - Aleri Studio - the authoring view |
This image is a snapshot of the Aleri Studio with the three streams - one on the left named "tweets" is a source stream and two on the right named "lolFilter" and "smileyFilter" are of the filter type. Source stream accepts incoming data while filter streams receive the filtered data. Here is how I defined the filter conditions -
like (tweets.text, '%lol%').
tweets is the name of the stream and text is the field in the stream we are interested in. %lol% means, select any tweets that have 'lol' string in the content. Each stream has only 2 fields - id and text. The id and text maps to id and text-message sent by twitter. Once you define the model, you can check it for any errors by clicking on the check mark in the ribbon at the top. Erros if any will show up in the panel at bottom right of the image. Once your model is error free, it's time to test it.
![]() |
| Image 2 - Aleri Studio - real time monitoring view |
The image below shows the format of the data file used to test the model
The next image shows the information flow.
![]() |
| Image 3 - Aleri Publishing - Information flow |
The source code for this exercise is at the bottom.
Remember that you need to have twitter4j library in the build path and have Aleri server running before you run the program. Because I have not added any timer to the execution thread, the only way to stop the execution is to abort it. For brevity and to keep the code line short, I have deleted all the exception handling and logging. The code utilizes only the publishing part of the pub/sub api of Aleri. I will demonstrate the use of sub side of the api in my next blog post.
This blog intends to provide something simple but useful to the developer community. Feel free to leave your comment and share this article if you like it.
Friday, March 23, 2012
Aleri - Complex Event Processing - Part I
| Algo Trading - a CEP use case Trackback URL |
![]() |
| Fraud Detection - Another CEP use case. Trackback url |
In my previous blog post on Complex Event Processing, I demonstrated the use of Esper, the open source CEP software and Twitter4J API to handle stream of tweets from Twitter. A CEP product is much more thanhandling just one stream of data though. Single stream of data could be easily handled through the standard asynchronous messaging platforms and does not pose very challenging scalability or latency issues. But when it comes to consuming more than one real time stream of data and to analyzing it in real time, and when correlation between the streams of data is important, nothing beats a CEP platform. The sources feeding streaming platform could vary in speed, volume and complexity. A true enterprise class CEP should deal effectively with various real time high speed data like stock tickers and slower but voluminous offline batch uploads, with equal ease. Apart from providing standard interfaces, CEP should also provide an easier programming language to query the streaming data and to generate continuous intelligence through such features as pattern matching and snapshot querying.
![]() |
| Sybase Trading Platform - the RAP edition. Trackback URL |
If you are an existing user of any CEP product, I encourage you to compare Aleri with that product and share it with the community or comment on this blog. By somewhat dated estimates, Tibco CEP is the biggest CEP vendor in the market. I am not sure how much market share another leading product StreamBase has. There is also a webinar you can listen to on Youtube.com that explains CEP benefits in general and some key features of Streambase in specific. For new comers, this serves as an excellent introduction to CEP and a capital markets use case.
An application on Aleri CEP is built by creating a model using the Studio (the gui) or using Splash(the language) or by using the Aleri Modeling language (ML) - the final stage before it is deployed.
Following is a list of the key features of Splash.
- Data Types - Supports standard data types and XML . Also supports ‘Typedef ‘ for user defined data types.
- Access Control – a granular level access control enabling access to a stream or modules (containing many streams)
- SQL – another way of building a model. Building an Aleri studio model could take longer due to its visual paradigm. Someone proficient with SQL should be able to do it much faster using Aleri SQL which is very similar to regular SQL we all know.
- Joins - supported joins are Inner, Left, Right and Full
- Filter expressions - include Where, having, Group having
- ML - Aleri SQL produces data model in Aleri modeling language (ML) – A proficient ML users might use only ML (in place of Aleri Studio and Aleri SQL)to build a model.
- The pattern matching language - includes constructs such as ‘within’ to indicate interval (sliding window), ‘from’ to indicate the stream of data and the interesting ‘fby’ that indicates a sequence (followed by)
- User defined functions – user defined function interface provided in the splash allows you to create functions in C++ or Java and to use them within a splash expression in the model.
Advanced pattern matching – capabilities are explained through example here. – Following three code segments and their explanations are directly taken from Sybase's documentation on Aleri.
The first example checks to see whether a broker sends a buy order on the same stock as one of his or her customers, then inserts a buy order for the customer, and then sells that stock. It creates a “buy ahead” event when those actions have occurred in that sequence.
within 5 minutes
from
BuyStock[Symbol=sym; Shares=n1; Broker=b; Customer=c0] as Buy1,
BuyStock[Symbol=sym; Shares=n2; Broker=b; Customer=c1] as Buy2,
SellStock[Symbol=sym; Shares=n1; Broker=b; Customer=c0] as Sell
on Buy1 fby Buy2 fby Sell
{
if ((b = c0) and (b != c1)) {
output [Symbol=sym; Shares=n1; Broker=b];
}
}
This example checks for three events, one following the other, using the fby relationship. Because the same variable sym is used in three patterns, the values in the three events must be the same. Different variables might have the same value, though (e.g., n1 and n2.) It outputs an event if the Broker and Customer from the Buy1 and Sell events are the same, and the Customer from the Buy2 event is different.
The next example shows Boolean operations on events. The rule describes a possible theft condition, when there has been a product reading on a shelf (possibly through RFID), followed by a non-occurrence of a checkout on that product, followed by a reading of the product at a scanner near the door.
within 12 hours
from
ShelfReading[TagId=tag; ProductName=pname] as onShelf,
CounterReading[TagId=tag] as checkout,
ExitReading[TagId=tag; AreaId=area] as exit
on onShelf fby not(checkout) fby exit
output [TagId=t; ProductName=pname; AreaId=area];
The next example shows how to raise an alert if a user tries to log in to an account unsuccessfully three times within 5 minutes.
from
LoginAttempt[IpAddress=ip; Account=acct; Result=0] as login1,
LoginAttempt[IpAddress=ip; Account=acct; Result=0] as login2,
LoginAttempt[IpAddress=ip; Account=acct; Result=0] as login3,
LoginAttempt[IpAddress=ip; Account=acct; Result=1] as login4
on (login1 fby login2 fby login3) and not(login4)
output [Account=acct];
People wishing to break into computer systems often scan a number of TCP/IP ports for an open one, and attempt to exploit vulnerabilities in the programs listening on those ports. Here’s a rule that checks whether a single IP address has attempted connections on three ports, and whether those have been followed by the use of the “sendmail” program.
within 30 minutes
from
Connect[Source=ip; Port=22] as c1,
Connect[Source=ip; Port=23] as c2,
Connect[Source=ip; Port=25] as c3
SendMail[Source=ip] as send
on (c1 and c2 and c3) fby send
output [Source=ip];
Aleri provides many interfaces out of the box for an easy integration with source and target systems. Through these interfaces/adapters the Aleri platform can communicate with standard relational databases, messaging frameworks like IBM MQ, sockets and file system files. Data in various formats like csv, FIX, Reuters market data, SOAP, http, SMTP is easily consumed by Aleri through standardized interfaces.
Following are available techniques for integrating Aleri with other systems.
- Pub/sub API is provided in Java, C++ and dot net - A standard pub/sub mechanism
- SQL interface with SELECT, UPDATE, DELETE and INSERT statements used through ODBC and JDBC connection.
- Built in adapters for market data and FIX
Tuesday, March 13, 2012
Sybase IQ 15.4 - for big data analytics
Big data analytics platform, Sybase IQ 15.4 Express Edition, is now available for free. This article introduces Sybase IQ's big data features and shares some valuable resources with you. Sybase IQ, with its installation base of more than 4,500 clients all over world has always been a leading columnar database for mission critical analytic functions. With new features like native support for Map Reduce and in-database analytics, it positions itself as a premium offering for big data analytics.
Columnar databases have been in existence for almost 2 decades now. Row level databases for OLTP transactions and Columnar databases for analytics, more or less met requirements of organizations. Extracting, transforming and loading (ETL) transactional data into analytic platform has been a big business. However, the recent focus on big data and the nature of it (semi-structured, unstructured) is changing the landscape of analytic platforms and the way ETL is used. Boundaries between OLTP and analytic platform are somewhat blurring due to advent of real-time analytics.
Due in large part to the tremedous growth of ecommerce in recent years, much of the developer talent pool was attracted to low-latency and high throuput transactional systems. Without a doubt, the same pool is gravitating towards aquisition, management and analysis of semi-structured data now. Host of start-ups and established technology companies are creating new tools, methods and methodologies in an effort to address the data deluge we are generating from our use of social networks. Even if there are myriad of options available to handle big data and analytic, some clear trends or underlying technologies have emerged to the fore. Hadoop, the open source implementation for batch operations on big data and in-memory analytic platform to provide real-time business intelligence are two of the most significant trends. Established vendors like Oracle, IBM, Informatica have been adding new products or updating their existing offerings to meet the new demands in this space.
Business intelligence gathered through only one type of data, transactional or semi-structured from social media or machine generated, is not good enough for today's organization. It's really important to glean information from each type of source and co-relate one with the other to present a comprehensive and accurate intelligence that assists critical decision support systems. So, it's imperative that vendors provide a set of tools with good enough integration that support structured, semi-structured and totally unstructured data like audio and video. "Polyglot Persistence", the buzzword of late and made popular by Martin Fowler, addresses the need and nature of different types of data. You may have multiple systems for storage/persistence but ultimately an enterprise needs a comprehensive view of its business through one lense. This is what Sybase's IQ, the traditional columnar database product, is trying to provide by offering native support for Hadoop and Mapreduce. There are other singificant enhancements to the latest enterprise edition of Sybase IQ. What makes this product attractive to the developer community is, its free availability. Please note, it's not an evaluation copy, but is a full function enterprise edition with the only restriction of databas size at 5 GB. This blog post is a quick summary of IQ's features and aims to point important resources that will help you in trying it out.For uninitiated, this blog entry by William McKnight, provides an introduction to concepts of columnar databases.
Following paragraph directly taken from Sybase's web site, sums up the most important features.
Sybase® IQ 15.4 is revolutionizing “Big Data” analytics breaking down silos of data analysis and integrating it into enterprise analytic processes. Sybase IQ offers a single database platform to analyze different data – structured, semi-structured, unstructured – using different algorithms. Sybase IQ 15.4 expands these capabilities with the introduction of a native MapReduce API, advanced and flexible Hadoop integration, Predictive Model Markup Language (PMML) support, and an expanded library of statistical and data mining algorithms that leverage the power of distributed query processing across a PlexQ grid.
Some details on these features is in order.
- User Defined Functions Enabling Map Reduce - One of the best architecture practices in a three tier application architecture is to physically and logically separate business logic and the data. But bringing data to business logic layer and then moving it back to persistence layer adds latency. The mission critical applications typically implement many strategies to reduce the latency, but the underlying theme to all those solutions is the same. That is, to keep business logic and data as close to one another as possible. Sybase IQ's native c++ API allows developers to build user defined functions to implement proprietory algorithms. This means, you can build map reduce functions right inside the database that can yield 10X performance improvements. This also means, you can use ordinary SQL from higher level business logic keeping that layer simpler while taking advantage of map reduce based parallel processing for higher performance. The map reduce jobs are executed in parallel on the grid of servers, called Multiplex or PlexQ in Sybase IQ parlance.
- Hadoop Integration - We discussed the need for analyzing and co-relating semi-structured data with structured data earlier. Sybase IQ provides 4 different ways in which this integration can happen. Hadoop is used to extract data points from unstructured or semi-structured data and then is used with OLTP data for further analysis. Clien-side Federation, ETL based, Query Federation and Data Federation are ways in which Hadoop integration occurs. You can request more in-depth information here.
- PMML Support - PMML (Predictive Model Markup Language) support allows the user to create predictive models using popular tools like SAS and R. These models can be executed in automated fashion extending the already powerful analytics platform further. A plug-in from Zementis is used to provide the PMML validation and its transformation to Java UDFs.
- R Language Support - SQL is woefully inadequate when you need statistical analysis on structured data. But, the simplicity and wide adoption of SQL makes it an attractive query tool. The RJDBC interface in IQ allows an application to use the R programming language to perform statistical functions. R is a very popular open source programming language used in many financial applications today. Please read my blog entry 'Programming with R, it's super' for further information on R.
- In Database Analytics - Sybase IQ in its latest version15.4, uses 'in database' analytics library called DB Lytix from Fuzzy Logix. This analytics engine have the ability to perform advance analytics through simple SELECT and EXECUTE statements. Sybase claims that some of these analytical functions are able to leverage MapReduce API in some data mining algorithms. DB Lytix, according to Fuzzy Logic's website, supports Mathematical and Statistical functions, Monte Carlo Simulations- uni-variate and multi-variate, Data Mining ; Pattern Recognition , Principal Component Analysis, Linear Regression, Logistic Regression, Other supervised learning methods, and Clustering.
Sybase provides detail documentation on the features of IQ on Sybase's website. If you would like to try Sybase IQ, use this direct link. Remember, this edition is not an eval copy. It is a full featured IQ edition limited only by database size at 5GB.
As you can imagine, Sybase IQ is not the only company offering big data solutions. This gigaom article lists few others as well. With so many good options to choose from, the users have to look at factors other than just the technology to make the selection. Some of those factors are - the reputation of the brand, standing in the market, the extent of its user base and the eco-system around the product. In that regard, Sybase IQ scores very high points and is the reason for it's position in leadership quadrant by Gartner. Sybase IQ has been a leading columnar database in the market since 90's and has established a robust eco-system around it. Sybase's other products - Power Designer , the leading data modelling workbench, SAP Business Objects for reporting and analytics, and Sybase Control Center for administration and monitoring of IQ - support the IQ and provid one of the most comprehensive analytics platforms in the industry.
Labels:
big data analytics,
columnar data,
columnar database,
db lytix,
hadoop,
IQ,
map reduce,
R,
Sybase,
sybase IQ
Monday, February 27, 2012
Tracking user sentiments on Twitter with Twitter4j and Esper
For new comers to Complex Event Processing and Twitter API, I hope this serves as a short tutorial and helps them get off the ground quickly.
Managing big data and mining useful information from it is the hottest discussion topic in technology right now. Explosion of growth in semi-structured data flowing from social networks like Twitter, Facebook and Linkedin is making technologies like Hadoop, Cassandra a part of every technology conversation. So as not to fall behind of competition, all customer centric organizations are actively engaged in creating social strategies. What can a company get out of data feeds from social networks? Think location based services, targeted advertisements and algorithm equity trading for starters. IDC Insights have some informative blogs on the relationship between big data and business analytics. Big data in itself will be meaningless unless the right analytic tools are available to sift through it, explains Barb Darrow in her blog post on gigaom.com
Companies often listen into social feeds to learn customers’ interest or perception about the products. They also are trying to identify “influencers” – the one with most connections in a social graph – so they could make better offers to such individuals and get better mileage out of their marketing. The companies involved in equity trading want to know which public trading companies are discussed on Twitter and what are the users' sentiments about them. From big companies like IBM to smaller start-ups, everyone is racing to make most of the opportunities of big data management and analytics. Much documentation about big data like this ebook from IBM 'Big Data Platform' is freely available on the web. However a lot of this covers theory only. Jouko Ahvenainen in reply to Barb Darrow’s post above makes a good point that “many people who talk about the opportunity of big data are on too general level, talk about better customer understanding, better sales, etc. In reality you must be very specific, what you utilize and how”.
It does sound reasonable, doesn't it? So I set out to investigate this a bit further by prototyping an idea, the only good option I know. If I could do it, anybody could do it. The code is remarkably simple. But, that's exactly the point. Writing CEP framework yourself is quite complex but using it is not. Same way, Twitter makes it real easy to get to the information through REST API.
Iphone has never been good.
Iphone has never been so good.
As you can see, addition of just one word to the sentence completely changed the meaning. Because of this reason, natural language processing is considered one of the toughest problems in computer science. You can learn “natural language processing” using free online lectures offered by Stanford University. This link takes you directly to the first lecture on natural language analysis by Christopher Manning. But, in my opnion, the pervasive use of abbreviations in social media and in modern lingo in general, is making the task a little bit easier. Abbreviations like “lol” and “AFAIK” accurately project the meaning. The use of “lol” projects “funny” and “AFAIK” may indicate the user is “unsure” of him or herself.
The code presented below uses Twitter4j api to listen to live twitter feed and Esper CEP to listen to events and alert us when a threshold is met. You can download twitter4j binaries or source from http://twitter4j.org/en/index.html and Esper from http://esper.codehaus.org/ . Before you execute the code, make sure to create a twitter account if you don’t have one and also read Twitter’s guidelines and concepts its streaming API here . The authentication through just username & password combination is currently allowed by Twitter but it is going to be phased out in favor of oAuth authentication in near future. Also, pay close attention to their ‘Access and Rate Limit’ section. The code below uses streaming api in one thread. Please do not use another thread at the same time to avoid hitting the rate limit. Hitting rate limits consistently can result into Twitter blacklisting your twitter ID. Also it is important to note that, the streaming API is not sending each and every tweet our way. Twitter typically will sample the data by sending 1 out every 10 tweets our way. This is not a problem however for us, as long as we are interested in patterns in the data and not in any specific tweet. Twitter offers a paid service for businesses that need streaming data with no rate limits. Following diagram shows the components and processing of data.
Twitter4j listener is created. This listener and CEP listener start listening. Every twitter post is parsed for ‘lol’. Every time ‘lol’ is found, an happy event is generated. CEP listener raises an alert every time the total count of ‘lol’ exceeds 2 in last 10 seconds.
The code establishes a long running thread to get twitter feeds. You will see the output on the console every time threshold is met. Please remember to terminate the program, it doesn't terminate on its own.
Now that you have this basic functionality working, you can extend this prototype in number of ways. You can handle additional data feeds (from source other than Twitter) and use Esper to corelate data from the two data feeds. For visually appealing output, you can feed the output to some charting library. For example, every time Esper identifies an event, the data point is used to render a point on a line graph. If you track the ‘happy event’ this way, then the graph will essentially show the ever changing level of happiness of Twitter users over a period of time.
Please use comment section for your feedback, +1 to share and let me know if you would like to see more postings on this subject.
Managing big data and mining useful information from it is the hottest discussion topic in technology right now. Explosion of growth in semi-structured data flowing from social networks like Twitter, Facebook and Linkedin is making technologies like Hadoop, Cassandra a part of every technology conversation. So as not to fall behind of competition, all customer centric organizations are actively engaged in creating social strategies. What can a company get out of data feeds from social networks? Think location based services, targeted advertisements and algorithm equity trading for starters. IDC Insights have some informative blogs on the relationship between big data and business analytics. Big data in itself will be meaningless unless the right analytic tools are available to sift through it, explains Barb Darrow in her blog post on gigaom.com
Companies often listen into social feeds to learn customers’ interest or perception about the products. They also are trying to identify “influencers” – the one with most connections in a social graph – so they could make better offers to such individuals and get better mileage out of their marketing. The companies involved in equity trading want to know which public trading companies are discussed on Twitter and what are the users' sentiments about them. From big companies like IBM to smaller start-ups, everyone is racing to make most of the opportunities of big data management and analytics. Much documentation about big data like this ebook from IBM 'Big Data Platform' is freely available on the web. However a lot of this covers theory only. Jouko Ahvenainen in reply to Barb Darrow’s post above makes a good point that “many people who talk about the opportunity of big data are on too general level, talk about better customer understanding, better sales, etc. In reality you must be very specific, what you utilize and how”.
It does sound reasonable, doesn't it? So I set out to investigate this a bit further by prototyping an idea, the only good option I know. If I could do it, anybody could do it. The code is remarkably simple. But, that's exactly the point. Writing CEP framework yourself is quite complex but using it is not. Same way, Twitter makes it real easy to get to the information through REST API.
![]() |
| Big Data - http://www.bigdatabytes.com/managing-big-data-starts-here/ |
Complex Event Processing (CEP), I blogged previously (click here to read) is a critical component of the big data framework. Along with CEP, frameworks with Hadoop are used to compile, parse and make sense out of the 24x7 stream of data from the social networks. Today, Twitter's streaming api and CEP could be used together to capture the happiness levels of twitter users. The code I present below listens in to live tweets to generate an 'happy' event every time “lol” is found in the text of a tweet. The CEP is used to capture happy events and alert is raised every time the count of happy events exceed pre-determined number in a pre-determined time period. An assumption that a user is happy every time he or she uses “lol” is very simplistic, but it helps get the point across. In practice, gauging the users' sentiment is not that easy because it involves natural language analysis. Consider below the example that highlights the complexities of analyzing natural language.
Iphone has never been good.
Iphone has never been so good.
As you can see, addition of just one word to the sentence completely changed the meaning. Because of this reason, natural language processing is considered one of the toughest problems in computer science. You can learn “natural language processing” using free online lectures offered by Stanford University. This link takes you directly to the first lecture on natural language analysis by Christopher Manning. But, in my opnion, the pervasive use of abbreviations in social media and in modern lingo in general, is making the task a little bit easier. Abbreviations like “lol” and “AFAIK” accurately project the meaning. The use of “lol” projects “funny” and “AFAIK” may indicate the user is “unsure” of him or herself.
The code presented below uses Twitter4j api to listen to live twitter feed and Esper CEP to listen to events and alert us when a threshold is met. You can download twitter4j binaries or source from http://twitter4j.org/en/index.html and Esper from http://esper.codehaus.org/ . Before you execute the code, make sure to create a twitter account if you don’t have one and also read Twitter’s guidelines and concepts its streaming API here . The authentication through just username & password combination is currently allowed by Twitter but it is going to be phased out in favor of oAuth authentication in near future. Also, pay close attention to their ‘Access and Rate Limit’ section. The code below uses streaming api in one thread. Please do not use another thread at the same time to avoid hitting the rate limit. Hitting rate limits consistently can result into Twitter blacklisting your twitter ID. Also it is important to note that, the streaming API is not sending each and every tweet our way. Twitter typically will sample the data by sending 1 out every 10 tweets our way. This is not a problem however for us, as long as we are interested in patterns in the data and not in any specific tweet. Twitter offers a paid service for businesses that need streaming data with no rate limits. Following diagram shows the components and processing of data.
![]() |
| Diagram. Charts & DB not yet implemented in the code |
Listing 1. Standard java bean representing a happy event.
Listing 2. Esper listener is defined.
Listing 3.
Twitter4j listener is created. This listener and CEP listener start listening. Every twitter post is parsed for ‘lol’. Every time ‘lol’ is found, an happy event is generated. CEP listener raises an alert every time the total count of ‘lol’ exceeds 2 in last 10 seconds.
The code establishes a long running thread to get twitter feeds. You will see the output on the console every time threshold is met. Please remember to terminate the program, it doesn't terminate on its own.
Now that you have this basic functionality working, you can extend this prototype in number of ways. You can handle additional data feeds (from source other than Twitter) and use Esper to corelate data from the two data feeds. For visually appealing output, you can feed the output to some charting library. For example, every time Esper identifies an event, the data point is used to render a point on a line graph. If you track the ‘happy event’ this way, then the graph will essentially show the ever changing level of happiness of Twitter users over a period of time.
Please use comment section for your feedback, +1 to share and let me know if you would like to see more postings on this subject.
Friday, February 24, 2012
End of ERP as we know it?
A friend of mine on Facebook drew my attention to this blog post, 'End of ERP' by Tien Tzuo on Forbes.com. With the professional lives of millions tied to ERP in some way, I can imagine the buzz this post must be creating. SAP, being the The biggest ERP software maker in the world and the parent company of my employer, I read this with interest. So as to not be influenced by others' arguments, I haven't read any responses to this post yet.
If you haven't already, you can read the original post by Tien Tzuo here.
To get your opinion on this matter, I have created a short survey of only 5 questions that you can access by clicking here. I will publish the results of the survey soon. A link to the survey also appears at the bottom of this post for your convenience. In my opinion, this notable (reputation derived from the fact that it appeared on Forbes) post is way biased, as many posts often are. Could Tien's earlier job at SalesForce.com as a marketing officer be the reason? Predicting the end of something epic or a most trusted technology, is sure to generate a lot of buzz, which is what bloggers often set out to do. The post would have been a lot better and valuable had he compared ERP's strengths and weaknesses and explained why the weaknesses are so glaring that ERP customers would be willing to walk away from ERP, something so crucial to their existence. There is no success for a case that lacks even a semblance of honest acknowledgment of the other side of the argument.
In support of his argument, Tien mentions some key changes in consumer behavior and consumption patterns. The change in the ways customers engage with a company is driving ERP to its inevitable death. This is the main theme in 'End of ERP'. The services based consumption is rapidly increasing, but it can be applied only to so many things. By focusing on this alone, isn’t Tien forgetting the business processes around other product segments? Like food, energy, health and vehicles, there are simply too many things we cannot subscribe to and consume remotely. All standard functions of an ERP are still required for those sectors, aren't they? A customer may stop buying cars and instead rent from Zipcar, but cars will still have to be made, sold and bought. How would companies manage their businesses and have consolidated views of them without ERP?
Tien also mentions companies like SalesForce.com and touts their successes as the proof that companies are moving away from ERP. SalesForce doesn’t offer anything other than CRM, does it? Does it provide finance, HR or materials management modules of ERP? I guess not. You can’t just run a big company effectively by mish- mashing different services from ten different vendors. That's why ERP exists and will keep it's market share in the enterprise segment. I do agree, however, that cloudification (I know, I know it's not a word in the English dictionary) of business functions is an irreversible trend. Oracle and SAP’s acquisitions of Taleo and SuccessFactors, respectively, are an indication of their grudging acceptance of this fact. The key to their success is not the demand for ERP in the cloud, which is ever present, but their ability to integrate acquired companies and their products to provide the same kind of comprehensive tool set as ERP.
“End of ERP” concludes by highlighting some key business requirements that according to Tien, are not met by ERP today. Without going in to details, it suffices to say that ERP is not meant to be a silver bullet for all business problems. It does what it does while ERP providers and its ecosystem try to find solutions to the unresolved business problems. Doesn’t business intelligence (BI) software aim to solve the kind of issues he mentions? The case in point is, there are a number of ways to mine information that you need. The importance of BI is undeniable and that's what vendors are investing millions in. The enormous response to SAP's in-memory analytics appliance HANA is just an example of how innovative products will meet the business requirements of today. While the business problems mentioned in the post may be genuine, they simply highlight opportunities for ERP’s improvement and do not in any way spell doom for it.
Make your voice heard? Take the Survey
If you haven't already, you can read the original post by Tien Tzuo here.
To get your opinion on this matter, I have created a short survey of only 5 questions that you can access by clicking here. I will publish the results of the survey soon. A link to the survey also appears at the bottom of this post for your convenience. In my opinion, this notable (reputation derived from the fact that it appeared on Forbes) post is way biased, as many posts often are. Could Tien's earlier job at SalesForce.com as a marketing officer be the reason? Predicting the end of something epic or a most trusted technology, is sure to generate a lot of buzz, which is what bloggers often set out to do. The post would have been a lot better and valuable had he compared ERP's strengths and weaknesses and explained why the weaknesses are so glaring that ERP customers would be willing to walk away from ERP, something so crucial to their existence. There is no success for a case that lacks even a semblance of honest acknowledgment of the other side of the argument.
In support of his argument, Tien mentions some key changes in consumer behavior and consumption patterns. The change in the ways customers engage with a company is driving ERP to its inevitable death. This is the main theme in 'End of ERP'. The services based consumption is rapidly increasing, but it can be applied only to so many things. By focusing on this alone, isn’t Tien forgetting the business processes around other product segments? Like food, energy, health and vehicles, there are simply too many things we cannot subscribe to and consume remotely. All standard functions of an ERP are still required for those sectors, aren't they? A customer may stop buying cars and instead rent from Zipcar, but cars will still have to be made, sold and bought. How would companies manage their businesses and have consolidated views of them without ERP?
![]() |
| ERP modules - Credit (http://www.abouterp.com/) |
Tien also mentions companies like SalesForce.com and touts their successes as the proof that companies are moving away from ERP. SalesForce doesn’t offer anything other than CRM, does it? Does it provide finance, HR or materials management modules of ERP? I guess not. You can’t just run a big company effectively by mish- mashing different services from ten different vendors. That's why ERP exists and will keep it's market share in the enterprise segment. I do agree, however, that cloudification (I know, I know it's not a word in the English dictionary) of business functions is an irreversible trend. Oracle and SAP’s acquisitions of Taleo and SuccessFactors, respectively, are an indication of their grudging acceptance of this fact. The key to their success is not the demand for ERP in the cloud, which is ever present, but their ability to integrate acquired companies and their products to provide the same kind of comprehensive tool set as ERP.
“End of ERP” concludes by highlighting some key business requirements that according to Tien, are not met by ERP today. Without going in to details, it suffices to say that ERP is not meant to be a silver bullet for all business problems. It does what it does while ERP providers and its ecosystem try to find solutions to the unresolved business problems. Doesn’t business intelligence (BI) software aim to solve the kind of issues he mentions? The case in point is, there are a number of ways to mine information that you need. The importance of BI is undeniable and that's what vendors are investing millions in. The enormous response to SAP's in-memory analytics appliance HANA is just an example of how innovative products will meet the business requirements of today. While the business problems mentioned in the post may be genuine, they simply highlight opportunities for ERP’s improvement and do not in any way spell doom for it.
Make your voice heard? Take the Survey
Labels:
ERP,
Forbes,
HANA,
Opinion,
SalesForce,
SAP,
SuccessFactors,
Taleo
Subscribe to:
Posts (Atom)







