Tuesday, November 22, 2011

R programming - It's super!

Last week Sybase announced that its market data analytics platform Sybase RAP - The Trading Edition now supports R. R programming language enables faster algorithm development and handling of huge amount of data to provide its analysis to traders, risk managers and quantitative analysts. For more information on the offering please click here. Sybase also sponsored a webinar on this subject with reasearchers from Yale University. To listen to the webcast click here.

R's official website (http://www.r-project.org/) provides much of the required information if you want to try, but to save you some time, I would like to summerize what I learnt so far from my quick review of R.

Handling and analyzing huge data sets, performing statistical calculations and rendering the results in very professional charts are its strengths. It's performance absolutely blew me away. I tried some of its features on a csv file with 100,000 records with multiple fields and the results were generated instantaneously. Remarkable! According to one of my friends R is used heavily not only in financial companies but also in pharmaceutical or research companies where gene sequencing/analyzing takes place. Bank of America uses R in quantitative analysis. Having spent 10 years with the bank previously, I was surprised I didn't know about it then.The point is, you may not have heard about R because it is not a mainstream programming language, but it is widely used in some industries. Just to show how powerful this language is,
these 2 lines of code is enough to read a csv file with 100000 lines with "|" as separation character and render a scatter plot with values from 2 columns (Durtn & Curs) of this file.


myFile <- read.csv ("Your FileName", head=TRUE, sep="|")
plot(myFile$Durtn, myFile$Curs, main="Duration vs records", xlab="Duration", ylab="Records retrieved")

Google R for more information, but here is the gist.
Some of the features worth mentioning are -

  • R is a language and a development environment built for statistical computing and graphics.
  • It's an open source project.
  • It was built to address some of the short comings of its predecessor called "S".
  • Classes, objects, methods - concepts similar to any object oriented language.
  • The extent of its functionality does not come close to Java or C++ but it addresses a niche and provides functionality that's way easier to use than Java or c++.
  • Lists are ordered collection of objects with no need for the objects to be of the same type.
  • Data frames is a fantastic concept. R can handle multi-dimensional data sets through the use of data frames.
  • Reading a file is as easy as assinging a value to a variable.
  • Accessing a column in a file and running statistical functions on it is accomplished in just one step
  • Charting, graphing is done in one step.
  • Multivariate analysis, a set of techniques dedicated to the analysis of data sets with more than one variable, could be done through R effectively. A use case of multivariate testing is projecting the most effective user behavior (one that yields on multiple clicks) on a website, by moving the assets around on a web page.
  • The output could be sent to the console or a file system resource.
  • Packages are available to plug R into some popular IDEs.
In case you are wondering what other software/languages are used for statistical analysis, please read this thread.
http://www.reddit.com/r/programming/comments/7fg6i/why_are_sasstata_the_default_statistical_tools/

Downloading and trying R is really easy. Just give it a try and let me know what you think.


6 comments: