R is a statistical package first and foremost and contrasts with Matlab, a commercial and general analytical toolkit. R is an open source implementation of the S programming language, which is implemented in the commercial package S-Plus. It is freely available under the GNU General Public License on multiple platforms and can be downloaded at http://www.r-project.org. R is very similar in function to S-Plus (the syntax is nearly identical), contains built-in tools for time series, regression, etc., has graphing and visualization capabilities, and can be used as a programming environment.
The R programming language is rich and well suited for data operations. The argument for it is similar to that for using a high-level language for programming, or for using complex event processing (CEP). The infrastructure work is done for you, and practitioners can focus on what they're trying to accomplish and not the mechanics of how to do it. Additionally, the ease of access and cost are very important to the investment industry in today's cost-conscious environment. Finally, R can be extended for specific purposes, and support available through the R community includes discussion forums, mailing lists, public documentation, etc.
Given the open source nature of R and the community around it, there continues to be a rapid ascent in usage. There is a very large contributor base, and new add-on packages are being created all the time. The cost makes it extremely compelling for universities to use in Financial Engineering courses, so it is now widely used in academia and there are graduates of higher education programs with extensive R experience. When these people join firms, whether on the sell-side or buy-side, their first instinct is to use the statistical analysis package they already know, so use of R will continue to expand in the coming years.
Following is a simple example to merge two trade time series for different symbols and produce a derived time series where the spread between the two instruments exceeds 0.5:
xy <- na.locf(merge(x, y)) xy[abs(xy$x - xy$y) > .5, "y"]
While this may not be intuitive to everyone on first reading, it is instantly intuitive to members of the R community doing time series analysis. No learning curve for proprietary languages is required!
R is not the only statistics package available, but it compares well with its commercial competitors, is easy to use and familiar to many practitioners, has a large and diverse supporting community and does not incur product licensing costs or require proprietary language expertise. This addresses the two-fold challenge of data volume and complexity of analysis present in today's data analysis world. The ultimate advantage is the ability to use R packages and the R language combined with a high performance data management platform, making quantitative analysis easier instead of harder.



Printer Friendly


