OpenMX, again

With help from Michael Spiegel, the lead developer of OpenMX, I was able to get it compiled and installed on my 64-bit Ubuntu system. Now it runs beautifully.
An OpenSSH software for Windows 7 / Vista
AMCMC
http://www.probability.ca/amcmc/
A brief survey of R web interfaces
I’m looking at ways to provide access to R via a web application. First rule: see what’s available first, before you reinvent the wheel. It’s not pretty.
From the R Web Interfaces FAQ:
| Software | Brief notes |
|---|---|
| Rweb | Page last updated 1999. Of the 3 example links on the page one ran very slowly, the second not at all and the third is broken. |
| R-Online | Or rather, not online. Unless this CGI form is the same thing. I tried Example 1, it returned a server error. |
| Rcgi | Links to several CGI forms, none of which worked for me. |
| CGI-based R access | Link did not load. |
| CGIwithR | Package now maintained at Omegahat. Did not attempt installation. Last updated 2005. |
| Rpad | I could not connect to this URL. |
| RApache | The pick of the bunch. Provides server-side access to R through an Apache module. I was able to install RApache on 32-bit (but not 64-bit) Ubuntu 9.10 and get it running. Could use more documentation. |
| Rserve | Serves R via TCP/IP. Last updated 2006. |
| OpenStatServer | Broken link. No longer exists, so far as I can tell. |
| R PHP Online | Link out of date (but you can follow it to the newer page). Last updated 2003, so unlikely to be much use. |
| R-php | Last updated 2006; the example that I tried gave a server error. |
| webbioc | A Bioconductor package. Did not investigate further. |
| Rwui | An application to create R web interfaces. My browser hung at “waiting for cache”. I gave up. |
So, aside from RApache and some very old-fashioned and/or broken CGI scripts, I conclude that there is little interest in writing beautiful, modern statistical web applications (notable exception). Not so much a case of “reinventing” as “inventing”.
Posted in computing, R, research diary, statistics, web resources

Back from Tokyo
Lisa and I turned this into a brief one-week trip to Kyoto and Tokyo, and we had a truly wonderful time on what was our first visit to Japan. I should blog some more about it, but now I will give in to the jet lag and catch up on some sleep...
Design of Experiments – Optimal Designs
When designing an experiment it is not always possible to generate a regular, balanced design such as a full or fractional factorial design plan. There are usually restrictions of the total number of experiments that can be undertaken or constraints on the factor settings both individually or in combination with each other.
In these scenarios computer generated designs, the optimal designs of a given size, can be identified from a list of candidate factor combinations. The library AlgDesign in R has facilities for optimal design searches based on the Federov exchange algorithm. An optimality criterion has to be selected by the investigator, currently D, A or I, and this criterion is minimise by searching for an optimal subset of a given size from the candidate design list.
Given the total number of treatment runs for an experiment and a specified model, the computer algorithm chooses the optimal set of design runs from a candidate set of possible design treatment runs. This candidate set of treatment runs usually consists of all possible combinations of various factor levels that one wishes to use in the experiment.
First stage, as always, is to make the package available for use:
library(AlgDesign)
For illustrative purposes consider a four factor experiment, where the factors have 4, 3, 2, and 2 levels each respectively. Using the expand.grid function we can create a data frame of all possible combinations of the factor settings:
cand.list = expand.grid(Factor1 = c("A", "B", "C", "D"),
Factor2 = c("I", "II", "III"),
Factor3 = c("Low", "High"),
Factor4 = c("Yes", "No"))The random number seed is set so that the algorithm can run:
set.seed(69)
The function optFederov calculates an exact or approximate algorithmic design for one of three criteria, using Federov’s exchange algorithm. The first argument to the function is a formula for the intended model for the data and the data argument specifies the list of candidate points:
optFederov( ~ ., data = cand.list, nTrials = 13)
In this example all of the factors in the candidate list appear in the model with a linear term. Quadratic or cubic terms can be included in this formula. The argument nTrials specifies the number of design points to select from the candidate list. The output from this function is:
$D [1] 0.226687 $A [1] 7.022811 $Ge [1] 0.718 $Dea [1] 0.676 $design Factor1 Factor2 Factor3 Factor4 3 C I Low Yes 6 B II Low Yes 12 D III Low Yes 16 D I High Yes 19 C II High Yes 21 A III High Yes 25 A I Low No 26 B I Low No 29 A II Low No 35 C III Low No 39 C I High No 44 D II High No 46 B III High No $rows [1] 3 6 12 16 19 21 25 26 29 35 39 44 46
This provides details of the values of the optimality criteria for the design points selected from the candidate list, the row numbers and the levels for the factors for the chosen design points.
Creating Customized Packages in SAS Software
It seems there is a little known component called SAS Toolkit that enables you to create customized SAS commands.
I am still trying to find actual usage of this software but it basically can be used to create additional customization in SAS. The price is reportedly 12000 USD a year for the Tool Kit but academics could be encouraged to write thesis or projects in newer algols using standard SAS discounting. In addition there is no licensing constraint as of now to reselling your customized sas algol ( but check with Cary,NC or www.sas.com on this before you go ahead and develop)
So if you have an existing R package (with open source) and someone wants to port it to SAS language or SAS software, they can simply use the SAS Toolkit to transport the algorithm ( which to my knowledge are mostly open in R). Specific instances are graphics, Hmisc, Pl.ier or even lattice and clustering (like mclust) packages. or maybe even license it.
Citation-http://www.sas.com/products/toolkit/index.html
SAS/TOOLKIT® SAS/TOOLKIT software enables you to write your own customized SAS procedures (including graphics procedures), informats, formats, functions (including IML and DATA step functions), CALL routines, and database engines in several languages including C, FORTRAN, PL/I, and IBM assembler. SAS Procedures A SAS procedure is a program that interfaces with the SAS System to perform a given action. The SAS System provides services to the procedure such as:
- statement processing
- data set management
- memory allocation
SAS Informats, Formats, Functions, and CALL Routines (IFFCs) You can use SAS/TOOLKIT software to write your own SAS informats, formats, functions, and CALLroutines in the same choice of languages: C, FORTRAN, PL/I, and IBM assembler. Like procedures, user-written functions and CALL routines add capabilities to the SAS System that enable you to tailor the system to your site’s specific needs. Many of the same reasons for writing procedures also apply to writing SAS formats and CALL routines. SAS/TOOLKIT Software and PROC FORMAT You may wonder why you should use SAS/TOOLKIT software to create user-written formats and informats when base SAS software includes PROC FORMAT. SAS/TOOLKIT software enables you to create formats and informats that perform more than the simple table lookup functions provided by the FORMAT procedure. When you write formats and informats with SAS/TOOLKIT software, you can do the following:
- assign values according to an algorithm instead of looking up a value in a table.
- look up values in a Database to assign formatted values.
Writing a SAS IFFC
The routines you are most likely to use when writing an IFFC perform the following tasks:
- provide a mechanism to interface with functions that are already written at your site
- use algorithms to implement existing programs
- handle problems specific to the SAS environment, such as missing values.
SAS Engines SAS engines allow data to be presented to the SAS System so it appears to be a standard SAS data set. Engines supplied by SAS Institute consist of a large number of subroutines, all of which are called by the portion of the SAS System known as the engine supervisor.
However, with SAS/TOOLKIT software, an additional level of software, the engine middle-manager simplifies how you write your user-written engine. An Engine versus a Procedure To process data from an external file, you can write either an engine or a SAS procedure. In general, it is a good idea to implement data extraction mechanisms as procedures instead of engines. If your applications need to read most or all of a data file, you should consider creating a procedure—-but if they need random access to the file, you should consider creating an engine. Writing SAS Engines When you write an engine, you must include in your program a prescribed set of routines to perform the various tasks required to access the file and interact with the SAS System. These routines:
- open and close the data set
- obtain information about variables
- provide information about an external file or database
- read and write observations.
In addition, your program uses several structures defined by the SAS System for storing information needed by the engine and the SAS System. The SAS System interacts with your engine through the SAS engine middle-manager.
Using the USERPROC Procedure Before you run your grammar, procedure, IFFC, or engine, use SAS/TOOLKIT software’s USERPROC procedure.
- For grammars, the USERPROC procedure produces a grammar function.
- For procedures, IFFCs, and engines, the USERPROC procedure produces a program constants object file, which is necessary for linking all of the compiled object files into an executable module.
Compile and link the output of PROC USERPROC with the SAS System so that the system can access the procedure, IFFC, or engine when a user invokes it.
Using User-Written Procedures, IFFCs, and Engines After you have created a SAS procedure, IFFC, or engine, you need to tell the SAS System where to find the module in order to run it. You can store your executable modules in any appropriate library. Before you invoke the SAS System, use operating system control language to specify the fileref SASLIB for the directory or load library where your executables are stored. When you invoke the SAS System and use the name of your procedure, IFFC, or engine, the SAS System checks its own libraries first and then looks in the SASLIB library for a module with that name.
Debugging Capabilities The TLKTDBG facility allows you to obtain debug information concerning SAS routines called by your code, and works with any of the supported programming languages. You can turn this facility on and off without having to recompile or relink your code. Debug messages are sent to the SAS log. In addition to the SAS/TOOLKIT internal debugger, the C language compiler used to create your extension to the SAS System can be used to debug your program.
The SAS/C Compiler, the VMS Compiler, and the dbx debugger for AIX can all be used. NOTE: SAS/TOOLKIT software is used to develop procedures, IFFCs, and engines. Users do not need to license SAS/TOOLKIT software to run procedures developed with the software
March 2008 Level B support is effective beginning January 1, 2008 until December 31, 2009.March 2005 The SAS/C and SAS/C++ compiler and runtime components are reclassified as SAS Retired products for z/OS, VM/ESA and cross-compiler platforms. SAS has no plans to develop or deliver a new release of the SAS/C product.
The SAS/C and SAS/C++ family of products provides a versatile development environment for IBM zSeries® and System/390® processors. Enhancements and product features for SAS/C 7.50F include support for z/Architecture instructions and 64-bit addressing, IEEE floating-point, C99 math library and a number of C++ language enhancements and extensions. The SAS/C runtime library, optimizer and debugging environments have been updated and enhanced to fully support the breadth of C/C++ 64-bit addressing, IEEE and C++ product features.
Finally, the SAS/C and SAS/C++ 7.50.06 Cross-compiler products for Windows, Linux, Solaris and Aix incorporate the same enhancements and features that are provided with SAS/C and SAS/C++ 7.50F for z/OS.
Also see- http://support.sas.com/kb/15/647.html
Posted in Analytics Tagged: algorithms, base sas, R, r-project, rstats, SAS, sas toolkit, sas/c, statistical software, syntax, translating

News on R Commercial Development -Rattle- R Data Mining Tool
R RANT- while the European R Core leadership led by the Great Dane, Pierre Dalgaard focuses on the small picture and virtually handing the whole commercial side to Prof Nie and David Smith at Revo Computing other smaller package developers have refused to be treated as cheap R and D developers for enterprise software. How’s the book sales coming along, Prof Peter? Any plans to write another R Book or are you done with writing your version of Mathematica (Ref-Newton). Running the R Core project team must be so hard I recommend the Tarantino movie “Inglorious B…” for Herr Doktors. -END
I believe that individual R Package creators like Prof Harell (Hmisc) , or Hadley Wickham (plyr) deserve a share of the royalties or REVENUE that Revolution Computing, or ANY software company that uses R.
On this note-Some updated news on Rattle the Data Mining Tool created by Dr Graham Williams. Once again R development taken ahead by Down Under chaps while the Big Guys thrash out the road map across the Pond.
Data Mining Resources
Citation -http://datamining.togaware.com/
Rattle is a free and open source data mining toolkit written in the statistical language R using the Gnome graphical interface. It runs under GNU/Linux, Macintosh OS X, and MS/Windows. Rattle is being used in business, government, research and for teaching data mining in Australia and internationally. Rattle can be purchased on DVD (or made available as a downloadable CD image) as a standalone installation for $450USD ($560AUD), using one of the following payment buttons.
The free and open source book, The Data Mining Desktop Survival Guide (ISBN 0-9757109-2-3) simply explains the otherwise complex algorithms and concepts of data mining, with examples to illustrate each algorithm using the statistical language R. The book is being written by Dr Graham Williams, based on his 20 years research and consulting experience in machine learning and data mining. An electronic PDF version is available for a small fee from Togaware ($40AUD/$35USD to cover costs and ongoing development);
Other Resources
- The Data Mining Software Repository makes available a collection of free (as in libre) open source software tools for data mining
- The Data Mining Catalogue lists many of the free and commercial data mining tools that are available on the market.
- The Australasian Data Mining Conferences are supported by Togaware, which also hosts the web site.
- Information about the Pacific Asia Knowledge Discovery and Data Mining series of conferences is also available.
- A Data Mining course is taught at the Australian National University.
- See also the Canberra Analytics Practise Group.
- A Data Mining Course was held at the Harbin Institute of Technology Shenzhen Graduate School, China, 6 December – 13 December 2006. This course introduced the basic concepts and algorithms of data mining from an applications point of view and introduced the use of R and Rattle for data mining in practise.
- A Data Mining Workshop was held over two days at the University of Canberra, 27-28 November, 2006. This course introduced the basic concepts and algorithms for data mining and the use of R and Rattle.
Using R for Data Mining
The open source statistical programming language R (based on S) is in daily use in academia and in business and government. We use R for data mining within the Australian Taxation Office. Rattle is used by those wishing to interact with R through a GUI.
R is memory based so that on 32bit CPUs you are limited to smaller datasets (perhaps 50,000 up to 100,000, depending on what you are doing). Deploying R on 64bit multiple CPU (AMD64) servers running GNU/Linux with 32GB of main memory provides a powerful platform for data mining.
R is open source, thus providing assurance that there will always be the opportunity to fix and tune things that suit our specific needs, rather than rely on having to convince a vendor to fix or tune their product to suit our needs.
Also, by being open source, we can be sure that the code will always be available, unlike some of the data mining products that have disappearded (e.g., IBM’s Intelligent Miner).
See earlier interview-
http://decisionstats.wordpress.com/2009/01/13/interview-dr-graham-williams/
Posted in Analytics Tagged: david smith, inference for R, nie, peter dalgaard, R, R core, rattle, revolution computing, SAS, SPSS

Probability of hypercubes…
…in R of course!
There is a handy function to do those calculations. Normally (ahh!) you might resolve to a symbolic calculation package (Maple,Mathematica etc.) but that is not the situation any more. The calculations are done with the mnormt package. Relevant functions exist in other packages as well (R : Distributions)
x <- seq(-2,4,length=21) y <- 2*x+10 z <- x+cos(y) mu <- c(1,12,2) Sigma <- matrix(c(1,2,0,2,5,0.5,0,0.5,3), 3, 3) p2 <- sadmvn(lower=rep(-Inf,2), upper=c(2, 11), mu[1:2], Sigma[1:2,1:2]) > p2 [1] 0.3273202 attr(,"error") [1] 2e-16 attr(,"status") [1] "normal completion"

PBSadmb for R
http://code.google.com/p/pbs-admb/
And here is yet another R to ADMB interface: http://r-forge.r-project.org/projects/r2admb/



