Big data analytics platform, Sybase IQ 15.4 Express Edition, is now available for free. This article introduces Sybase IQ's big data features and shares some valuable resources with you. Sybase IQ, with its installation base of more than 4,500 clients all over world has always been a leading columnar database for mission critical analytic functions. With new features like native support for Map Reduce and in-database analytics, it positions itself as a premium offering for big data analytics.
Columnar databases have been in existence for almost 2 decades now. Row level databases for OLTP transactions and Columnar databases for analytics, more or less met requirements of organizations. Extracting, transforming and loading (ETL) transactional data into analytic platform has been a big business. However, the recent focus on big data and the nature of it (semi-structured, unstructured) is changing the landscape of analytic platforms and the way ETL is used. Boundaries between OLTP and analytic platform are somewhat blurring due to advent of real-time analytics.
Due in large part to the tremedous growth of ecommerce in recent years, much of the developer talent pool was attracted to low-latency and high throuput transactional systems. Without a doubt, the same pool is gravitating towards aquisition, management and analysis of semi-structured data now. Host of start-ups and established technology companies are creating new tools, methods and methodologies in an effort to address the data deluge we are generating from our use of social networks. Even if there are myriad of options available to handle big data and analytic, some clear trends or underlying technologies have emerged to the fore. Hadoop, the open source implementation for batch operations on big data and in-memory analytic platform to provide real-time business intelligence are two of the most significant trends. Established vendors like Oracle, IBM, Informatica have been adding new products or updating their existing offerings to meet the new demands in this space.
Business intelligence gathered through only one type of data, transactional or semi-structured from social media or machine generated, is not good enough for today's organization. It's really important to glean information from each type of source and co-relate one with the other to present a comprehensive and accurate intelligence that assists critical decision support systems. So, it's imperative that vendors provide a set of tools with good enough integration that support structured, semi-structured and totally unstructured data like audio and video. "Polyglot Persistence", the buzzword of late and made popular by Martin Fowler, addresses the need and nature of different types of data. You may have multiple systems for storage/persistence but ultimately an enterprise needs a comprehensive view of its business through one lense. This is what Sybase's IQ, the traditional columnar database product, is trying to provide by offering native support for Hadoop and Mapreduce. There are other singificant enhancements to the latest enterprise edition of Sybase IQ. What makes this product attractive to the developer community is, its free availability. Please note, it's not an evaluation copy, but is a full function enterprise edition with the only restriction of databas size at 5 GB. This blog post is a quick summary of IQ's features and aims to point important resources that will help you in trying it out.
For uninitiated, this blog entry by William McKnight, provides an introduction to concepts of columnar databases.
Following paragraph directly taken from Sybase's web site, sums up the most important features.
Sybase® IQ 15.4 is revolutionizing “Big Data” analytics breaking down silos of data analysis and integrating it into enterprise analytic processes. Sybase IQ offers a single database platform to analyze different data – structured, semi-structured, unstructured – using different algorithms. Sybase IQ 15.4 expands these capabilities with the introduction of a native MapReduce API, advanced and flexible Hadoop integration, Predictive Model Markup Language (PMML) support, and an expanded library of statistical and data mining algorithms that leverage the power of distributed query processing across a PlexQ grid.
Some details on these features is in order.
- User Defined Functions Enabling Map Reduce - One of the best architecture practices in a three tier application architecture is to physically and logically separate business logic and the data. But bringing data to business logic layer and then moving it back to persistence layer adds latency. The mission critical applications typically implement many strategies to reduce the latency, but the underlying theme to all those solutions is the same. That is, to keep business logic and data as close to one another as possible. Sybase IQ's native c++ API allows developers to build user defined functions to implement proprietory algorithms. This means, you can build map reduce functions right inside the database that can yield 10X performance improvements. This also means, you can use ordinary SQL from higher level business logic keeping that layer simpler while taking advantage of map reduce based parallel processing for higher performance. The map reduce jobs are executed in parallel on the grid of servers, called Multiplex or PlexQ in Sybase IQ parlance.
- Hadoop Integration - We discussed the need for analyzing and co-relating semi-structured data with structured data earlier. Sybase IQ provides 4 different ways in which this integration can happen. Hadoop is used to extract data points from unstructured or semi-structured data and then is used with OLTP data for further analysis. Clien-side Federation, ETL based, Query Federation and Data Federation are ways in which Hadoop integration occurs. You can request more in-depth information here.
- PMML Support - PMML (Predictive Model Markup Language) support allows the user to create predictive models using popular tools like SAS and R. These models can be executed in automated fashion extending the already powerful analytics platform further. A plug-in from Zementis is used to provide the PMML validation and its transformation to Java UDFs.
- R Language Support - SQL is woefully inadequate when you need statistical analysis on structured data. But, the simplicity and wide adoption of SQL makes it an attractive query tool. The RJDBC interface in IQ allows an application to use the R programming language to perform statistical functions. R is a very popular open source programming language used in many financial applications today. Please read my blog entry 'Programming with R, it's super' for further information on R.
- In Database Analytics - Sybase IQ in its latest version15.4, uses 'in database' analytics library called DB Lytix from Fuzzy Logix. This analytics engine have the ability to perform advance analytics through simple SELECT and EXECUTE statements. Sybase claims that some of these analytical functions are able to leverage MapReduce API in some data mining algorithms. DB Lytix, according to Fuzzy Logic's website, supports Mathematical and Statistical functions, Monte Carlo Simulations- uni-variate and multi-variate, Data Mining ; Pattern Recognition , Principal Component Analysis, Linear Regression, Logistic Regression, Other supervised learning methods, and Clustering.
Sybase provides detail documentation on the features of IQ on Sybase's website. If you would like to try Sybase IQ, use this direct link. Remember, this edition is not an eval copy. It is a full featured IQ edition limited only by database size at 5GB.
As you can imagine, Sybase IQ is not the only company offering big data solutions. This gigaom article lists few others as well. With so many good options to choose from, the users have to look at factors other than just the technology to make the selection. Some of those factors are - the reputation of the brand, standing in the market, the extent of its user base and the eco-system around the product. In that regard, Sybase IQ scores very high points and is the reason for it's position in leadership quadrant by Gartner. Sybase IQ has been a leading columnar database in the market since 90's and has established a robust eco-system around it. Sybase's other products - Power Designer , the leading data modelling workbench, SAP Business Objects for reporting and analytics, and Sybase Control Center for administration and monitoring of IQ - support the IQ and provid one of the most comprehensive analytics platforms in the industry.