Wednesday, August 25, 2010

Drawing boxplots, violin plots using R Part 1

A box plot is a graphical display of the distribution of data, showing all the quartiles an possible possible outliers. Assuming the box plot is drawn
vertically, The rectangular box lower edge denotes the first quartile, while the upper edge denotes the third quartile. The median is denoted by a line inside the box. Some versions also indicate the position of the arithemetic mean by a cross or a dot. Whiskers are drawn up to the data within 1.5(fs) of the lower and upper quartiles. where fs is the fourth spread, the difference of Q3 and Q1. Data points beyond these minimum and upper ranges are drawn for each data beyond these range and are labelled outliers. The box plot however cannot display the distribution of the data especially for multimodal data.
A Boxplot can be drawn for each column of a matrix.

The violin plot removes any shortcomings of the boxplot by adding a KDE (kernel density estimator to outline the distribution of the data. R usually draws
only a boxplot for one vector only. There are at least two libraries which offers violinplots. One is the violinplot function from the UsingR package of Verzanni. Another is the vioplot library which offers the vioplot function.

Here is an illustration of the differences between boxplot, violinplot and vioplot

library(UsingR)
library(vioplot)

png("box-viol.png", 6*72, 6*72)
X <- rbind(rnorm(50, 5, 2), rnorm(25, 1), rnorm(10, 3))
X <- as.vector(X)
violinplot(X,X,X)
vioplot(X, at=2,col="green", add = T)
boxplot(X, at=1,col="red", add = T)
dev.off()
Three violinplots are shown and the boxplot and vioplot are superimposed on the first and second plot respectively.
Box and Violin Plots example
Notice that in the desire to look more a violin, the vioplot will sometimes cut off at the Q3 + 1.5 fs or at the Q1-1.5fs, which may hide any outlier points!






orientation? positioning? outliers? matrix?dataframe?
boxplotbothyesyesyesyes
violinplotvertical onlyno*nonoyes
vioplotbothyesnonono


In orientation, the box plot and vioplot can be drawn horizontally and each ca

n be positioned at a specific location on the x or y axes using the graphics parameter at="value". As we can see in the above figure, for outliers, the vioplot may stop at the fence values creating a flat top or flat bottom. and hiding the extreme values specifically the minimum and maximum value in the data.The violin plot does show the minimum and maximum of data, but it is hard to know where the fs spreads lie. Both violin plot and vioplot cannot handle input matrix data. You have to specify each column of the matrix to these functions.

The boxplot may have an optional notch to emphasize the location of the median.

In my opinion, a violin plot with a box plot superimposed is the current best way to show distribution and any muliple modalities of the data.

We are still wondering what input format we shall make for our online solver at extreme-solvers.blogspot.com, which we shall show in Part 2 of this article.

We hope that the developers of these plots will implement other features available in the others, like vioplot able to do dataframes.

Monday, August 23, 2010

Drawing scatterplots and sunflower pots with R

Scatter plot and sunflower plots differ only in that the latter draws a flower stem for each repeat of a data point. In other words, it is easy to see multipicities of data points hidden in a plain scatter plot. Points are compared up to a specified number of significant digits.

Here are examples of plain scatter plot and sunflower plots for the same data (two column xy coordinates)


1 3
1 3
1 4
1.5 4
1.5 4
1.5 4
1.75 5
1.75 1.25
2 5
2 6
2 6
2 6
2 6
2 6
2 8


Here is the scatter plot drawn for the above data. Count the points drawn with the points in the data. They don't match!


On the other hand, the sunflower plot will allow a viewer most of the time an idea of the number of points displayed in the graph. Here is the sunflower plot for the same data.


Our solver which offers a scatterplot/sunflower plot generation page is in
http://extreme.adorio-research.org/solvers/rplotpage/scatter/

Drawing three-dimensional plot specified by a function f(x,y) in R

Some of the plots described previously in this blog actually give visualizations of functions in two variables. Examples are the quiver and contour plots. Another one is a perspective plot which can be created by calling the appropriately named plotting function persp() of R.

Here is a simple example: f(x,y) = sin(x) + cos(y), whose figure is displayed below:


It can be quite messy to do it from the R console. The solver page provides a fill-in-the-blanks settings page which describe the figure. By way of illustration, here is the R code for the figure above with line numbers. Intermediate output is shown interspersed without a leading ">" prompt. This happens after the summary report for the z values.

0001 > png("tmpZzY7bz.png", width = (6)*72, height = (6)*72)
0002 > f <-function(x, y){sin(x) + cos(y)}
0003 > x <- seq(-10, 10, length.out = 100)
0004 > y <- seq(-10, 10, length.out = 100)
0005 > tridata <- outer(x, y, f)
0006 > z <- tridata
0007 >     summary(as.vector(z))
0008 Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0009 -1.99900 -0.79430 -0.04088 -0.06207  0.63990  1.99600
0010 >
0011 > z0   <- min(z) - (max(z) - min(z)) / 100
0012 > z    <- rbind(z0, cbind(z0, z, z0), z0)
0013 > x    <- c(min(x) - 1e-10, x, max(x) + 1e-10)
0014 > y    <- c(min(y) - 1e-10, y, max(y) + 1e-10)
0015 > fill <- matrix("green3", nr = nrow(z) - 1, nc = ncol(z) - 1)
0016 > fill[, i2 <- c(1, ncol(fill))] <- "gray"
0017 > fill[i1 <- c(1, nrow(fill)), ] <- "gray"
0018 > fcol <- fill
0019 > fcol[] <- terrain.colors(nrow(fcol))
0020 > fcol   <- fill
0021 > zi <- tridata[-1, -1] + tridata[-1, -ncol(tridata)] + tridata[-nrow(tridata), -1] +tridata[-nrow(tridata), -ncol(trida
0021 a)]
0022 >
0023 > fcol[-i1, -i2] <- topo.colors(20)[cut(zi, quantile(zi, seq(0, 1, len = 20 + 1)), include.lowest = TRUE)]
0024 >
0025 > persp(x, y, 1 * z, theta = 135, phi = 30, col = fcol, scale = FALSE, ltheta=-120,lphi=0,shade = 0.75, border = NA, tic
0025 type = "simple", box = FALSE)
0026 > title(main = "3D Plot", font.main = 4)
0027 > par(bg = "slategray")
0028 >
0029 > graphics.off()
0030 > #img tmpZzY7bz.png

The solver page which generates 3d figures is at /solvers/rplotpage/trid.

The part of the menu page for the settings of 3d plots is shown here:

Drawing quantile-quantile qq plots with R

Quantile-quantile plots gives the most visually appealing method to view the extent of normality of a vector or a two-column matrix of data. Data is either a single column (Y) or two column (X, Y). If nvars is 1, will display qqnorm(y). Otherwise, if nvars is 2, will display qqplot(x,y). Data is always read row wise(byrow=TRUE).

Optionally, line passing thru Q1 and Q3 is drawn if Line? is TRUE. If data on X axis is TRUE, data values will be shown in X-axis otherwise on the Y-axis.


A qq plot may be generated by our solver at /solvers/rplotpage/qq

Drawing quiver plots with R

Our extreme solvers site solvers/rplotpage/quiver
allows one to draw a quiver plot or arrows plot over a displayed image of a function in two variables of the form $$fxy= f(x,y)$$. It is based on R routines by Ripley and Hand and illustrated in the addictedtor.org gallery from which our qpy code is based.

Here is an example for the function (or expression) $$fxy= \sin(x) + \cos(y)$$. Here we have the settings xlo=ylo = -2.0, xhi=yhi=2.0 and xby=yby = 0.2. The contour color is gray and the pallette for the image is terrain.colors. Here is the output image.


Note that we have shown a test code to generate quiver plots previously.

Generating quiver plots using R under Python

Here is Python code to generate an R script which in turn does the actual image generation.

"""
quiver-test.py
"""

def quiverfunc(fxy, x0, x1, xby, y0, y1, yby, pallette="terrain.colors", contourcolor="gray"):
   S = """
par.uin <- function() 
  # determine scale of inches/userunits in x and y
  # from http://tolstoy.newcastle.edu.au/R/help/01c/2714.html
  # Brian Ripley Tue 20 Nov 2001 - 20:13:52 EST
 {
    u <- par("usr") 
    p <- par("pin") 
    c(p[1]/(u[2] - u[1]), p[2]/(u[4] - u[3]))
 }

quiver2 <- function(expr,
                     x,
                     y,
                     nlevels=20, 
                     length=0.05, 
                     ...){

    z <- expand.grid(x,y) 
    xx  <- x
    x   <- z[,1]
    yy  <- y
    y   <- z[,2]

    fxy <- eval(expr) 
    grad_x <- eval(D(expr, "x")) 
    grad_y <- eval(D(expr, "y")) 

  dim(fxy) <- c(length(xx), length(yy)) 
  dim(grad_x) <- dim(fxy) 
  dim(grad_y) <- dim(fxy) 

  maxlen <- min(diff(xx), diff(yy)) * .9 
  grad_x <- grad_x / max(grad_x) * maxlen 
  grad_y <- grad_y / max(grad_y) * maxlen 

  filled.contour(xx, yy, fxy, nlevels=nlevels, 
    plot.axes = { 
      contour(xx, yy, fxy, add=T, col="gray", 
              nlevels=nlevels, drawlabels=FALSE) 

      arrows(x0  = x, 
             x1  = x + grad_x,
             y0  = y,
             y1  = y + grad_y,
             length = length*min(par.uin())) 

      axis(1) 
      axis(2) 
    },
    ...)
}

f <- expression( #expr) 
x <- seq(#x0, #x1,by= #xby) 
y <- seq(#y0, #y1,by= #yby) 
par(mar=c(3,3,3,3)) 
quiver2(f,x,y, color.palette=#pallette) 
graphics.off()
"""
   S = S.replace("#expr", fxy)
   S = S.replace("#x0", x0)
   S = S.replace("#x1", x1)
   S = S.replace("#xby",xby)
   S = S.replace("#y0", y0)
   S = S.replace("#y1", y1)
   S = S.replace("#yby",yby)
   S = S.replace("#pallette",pallette)
   S = S.replace("#contourcol", contourcolor)
   return S 

Rcode = 'png("temp.png", 6 *72, 6 *72)'
Rcode +=  quiverfunc("(3*x^2 + y) * exp(-x^2-y^2)", "-2", "2", "0.2", "-2", "2", "0.2", pallette="terrain.colors", contourcolor="gray")
print Rcode
Save the file to "test-quiver2.py". Then to run it, type python quiver2.py > out.R. Then issue R < out.R --no-save. The generated image is in temp.png. Finally, display it using the imagemagick tool: display temp.png.

Sunday, August 22, 2010

Drawing pie plots with R

Pie plots are the most unadorned graphs in R. Those developers of R even warn of using pie plots for conveying information and instead recommend using a bar plot or dot chart.
Here is an example of a pie chart drawn using R.




In the page for pie chart generation, the radius of the pie is relative to the size of the graph and thus has a maximum value of 1.

Drawing polar plots in three ways using R

A polar plot relates r as a function of angle theta, $$\theta$$. The rectangular components can be
computed can be computed separately as $$x = r cos(theta)$$ and $$y = r sin(theta)$$.

Our solver page at http://extreme.adorio-research.org/solvers/rplotpage/polar/ has three different ways to present polar plots, namely
[polar, rectangular and space]. The range of theta can be specified and the number of points can be set up to 15000, to allow fast drawing without overloading the server.

Here are the three images possible for the butterfly equation, exp(cos(theta)) - 2 * cos(4*theta) + sin(theta/12)^5, presented in Venables and Ripley authoritative book: MASS "Modern Applied Statistics using S" book.








The last image was generated using ntheta = 15000 and pch=5.

Here is the page QP/QPY code for the polar plot page.

# pie.qpy
# 2006.09.02   0.0.1  first version
# 2006.09.20   0.0.2  split from pie,pie, dot page.
# 2010.08.23   0.0.3  third version.

__version__ = "0.0.3"
__date__    = "07.08.12"
__author__  = "eadorio@yahoo.com"
__title__   = "Polar plots using R"
__catalog__ = "POLAR-RPLOT-0127"
__url__     = "/solvers/rplotpage/polar/"
__author__  = "E.P. Adorio"


import time
import tempfile
import commands
import os

from   qp.fill.directory  import Directory
from   qp.fill.form       import Form, StringWidget, TextWidget,CheckboxWidget,SingleSelectWidget
from   qp.fill.css   import BASIC_FORM_CSS

from   qp.sites.extreme.lib.tmpfilesmanager import TmpFilesManager
from   qp.sites.extreme.lib.uicommon        import renderheader, renderfooter, processheader, processfooter
from   qp.pub.common                        import page
from   qp.sites.extreme.lib.checkinput      import checkInputs, getFormStrings
from   qp.sites.extreme.lib.qpyutils        import printRlines, showLogo
from   qp.sites.extreme.lib.webutils        import vecRead,GraphicsFile, as_R_cvector, as_R_vector, as_R_matrix, runRcode
from   qp.sites.extreme.lib import config

       

def Solve(fields):
    (gX, gY, theta0, theta1, ntheta, gtype, psize, equation, col, main, sub, xlab, ylab) = fields

    fname1    = GraphicsFile("png")
    barefile1 = fname1.split(str("/")) [-1]

    Rcode = "png('%s', width=%s*72, height=%s*72)\n" % (fname1, gX, gY)
    Rcode += """
theta  <- seq(%s, %s, len=%s)   
radius <- %s
x      <- radius * cos(theta)
y      <- radius * sin(theta)
""" % (theta0, theta1, ntheta, equation)

    if gtype == "polar":
       Rcode += """
plot(y, x, col="%s",type="l",
     main="%s",sub="%s",xlab="%s",ylab="%s", axes=FALSE)
"""    % (col, main, sub, xlab, ylab)
    elif gtype == "rect":
       Rcode += """
vlo = min(radius, x, y)
vhi = max(radius, x, y)
plot(theta, radius,  ylim=c(vlo, vhi), col="black",type="l", main="%s",sub="%s",xlab="%s",ylab="%s")
points(theta, x,  col="red",type="l")
points(theta, y,  col="blue",type="l")
""" % (main, sub, xlab, ylab)
    elif gtype == "3d":
       Rcode += """
library(scatterplot3d)
scatterplot3d(x, y, theta, highlight.3d=TRUE, col.axis="blue",
              col.grid="lightblue", main="Space Curve", pch=16, cex.symbols = %s)
""" % psize
    Rcode += """
graphics.off()
#img %s
""" % (barefile1)
    status, output = runRcode(Rcode)
    return output
    

class PolarplotPage(Directory):
    def get_exports(self):
        yield ('', 'index', 'PolarPlot', '')
           

    def index[html](self):
        form  = Form(enctype="multipart/form-data")  # enctype for file upload
        form.add(StringWidget,  name="gX",  title = "gX",  value="6", size=3)
        form.add(StringWidget,  name="gY", title = "gY", value="6",  size=3)

        form.add(StringWidget,  name="theta0",  title = "theta0", value = "0", size=5)
        form.add(StringWidget,  name="theta1",  title = "theta1", value = "24*pi", size=5)
        form.add(StringWidget,  name="ntheta",  title = "ntheta", value = "2000", size = 5)
        form.add(SingleSelectWidget,  name="gtype",   title = "Type",
                 value = "polar", options = [("polar", "polar"), ("rect", "rect"),("3d", "space")])
        form.add(StringWidget,  name="psize",  title = "Pt size", value = "0.2", size = 3)

        form.add(StringWidget,  name="equation",  title="Equation in theta r = f(theta)", 
                 value= "exp(cos(theta)) - 2 * cos(4*theta) + sin(theta/12)^5", size = 70)
        form.add(StringWidget,  name="col", title="Color", value="red", size=10)
        form.add(StringWidget,  name="main",  title="Main", value="Polar Plot", size=15)
        form.add(StringWidget,  name="sub",   title="Sub",  value="Example",  size = 15)
        form.add(StringWidget,  name="xlab",  title="xlab", value="X",     size=15)
        form.add(StringWidget,  name="ylab",  title="ylab", value="Y",   size=15)

        form.add_hidden("time",   value = time.time())
        form.add_submit("submit", "submit")

        def render [html] ():
            renderheader(__title__)

            """\
    
%s %s %s %s %s %s %s %s
""" % (form.get_widget("gX").render(), form.get_widget("gY").render(), form.get_widget("theta0").render(), form.get_widget("theta1").render(), form.get_widget("ntheta").render(), form.get_widget("gtype").render(), form.get_widget("col").render(), form.get_widget("psize").render(), ) """\
%s
""" % ( form.get_widget("equation").render(), ) """
%s %s %s %s
""" % (form.get_widget("main").render(), form.get_widget("sub").render(), form.get_widget("xlab").render(), form.get_widget("ylab").render(), ) """ Draws 2D polar plots r vs theta or rectangular plots (r,x,y vs theta) or 3D space curves (x,y,theta) for equation involving theta. The size of the plot is approximately gX inch by gY inch (limited to 10). The point in a space curve will be drawn using filled circles with size determined by Pt size. The following bounds apply to the Open Solver version:
gX float 3, 10
gY float 3, 10
ntheta int 2, 15000
psize float 0.2, 10

Space curves are drawn using the R scatterplot3d package of Uwe Ligges. The butterfly polar function example is from Venables and Ripley, MASS.
""" renderfooter(form, __version__, __catalog__, __author__) if not form.is_submitted(): return page('polarplotpage', render(), style= BASIC_FORM_CSS) def process [html] (): processheader(__title__) calctime_start = time.time() # Get the form input values def tmp1(): return getFormStrings(form, [ "gX", "gY", "theta0", "theta1", "ntheta", "gtype", "psize", "equation", "col", "main", "sub", "xlab", "ylab"]) (gX, gY, theta0, theta1, ntheta, gtype, psize, equation, col, main, sub, xlab, ylab) = tmp1() def tmp2(): inflag = checkInputs( [("gX", gX, ("float", 3, 10)), ("gY", gY, ("float", 3, 10)), ("theta0", theta0, ("eqn", "")), ("theta1", theta1, ("eqn", "")), ("ntheta", ntheta, ("int", 2, 15000)), ("psize", psize, ("float", 0.2, 10)), ("equation", equation, ("eqn",""))]) return inflag inflag = tmp2() if inflag[0]: "

"
               inflag[1]
               "
" else: output = Solve((gX, gY, theta0, theta1, ntheta, gtype, psize, equation, col, main, sub, xlab, ylab)) "
"
               printRlines(output)
               "
" showLogo("Rlogo.jpg") processfooter(form, calctime_start, "./", __url__) process()

Constructive criticisms from our reader are very welcome. Email the page author at ernesto.adorio@gmail.com or alternately at eadorio@yahoo.com

Installing R packages inside R itself.

I am reviewing all my solvers one at a time. The scatterplot page has an option to draw in theee dimensions! this is made possible by the scatterplot3d. Alas, when I clicked to generate the graph, the library was not there since we updated to the latest and greatest version 2.11.1 of R!

Ah ok. Assume you know the package name. Run R with root privileges, sudo R. then issue
install.packages ("scatterplot3d", repos="http://cran.r-project.org/") Then watch R do its intallation procedure.




> install.packages ("scatterplot3d", repos="http://cran.r-project.org/")
trying URL 'http://cran.r-project.org/src/contrib/scatterplot3d_0.3-30.tar.gz'
Content type 'application/x-gzip' length 508829 bytes (496 Kb)
opened URL
==================================================
downloaded 496 Kb


Test if the install was successful: library("scatterplot3d") There should be no error messages.

Drawing a matrix plot with R

Catalog MATRIX-RPLOT-0129
Version 0003
Date 08.23.10
URL /solvers/rplotpage/matrix
Source File /solvers/plots/rplot/matrix.qpy
Doc File /solvers/plots/rplot/doc/rplot.tex
XML-RPC TBD
Solvers Python, R
Author Dr. Ernesto P. Adorio


The matplot function of R allows the user to plot other vectors against a given vector.
Given an input matrix X, our matrix plot allows one to specify a "base" column for which the other
columns are plotted against it.

Various plot settings allows one to plot either points or lines for other columns.
The main input box expects a matrix with each row written on each line.
A maximum of seven columns is allowed by this matrix plot routine.

Here is a view of the matrix plot page menu screen:


And here is the resulting plot drawn by our online solver.



The actual R code to generate the graph is

png('tmpksstQF.png', width=6*72, height=6*72)
> X <- c(21.69935,-11.798573,104.36818,33.40331,-12.988921,105.79699,48.8454,-12.244453,111.39495,44.10297,-9.557104,136 15997,48.51514,-7.568557,150.82756,65.80403,-9.416977,147.99936,34.86882,-8.473734,93.32171,45.74425,-7.322524,83.93707, 1.8046,-4.894993,151.96477,54.41312,-5.934022,43.50714,53.41098,-10.627767,91.71084,58.93975,-11.212206,118.9405,35.4372 ,-8.715245,55.91096,56.1509,-13.563171,40.80423,81.91758,-11.491009,95.21836,42.168,-13.77088,100.34292,55.15041,-7.5223 7,68.27453,35.32567,-8.874406,129.32671,49.89936,-16.804003,112.49538,43.73633,-8.427571,47.67621,52.84546,-8.812953,88. 2893,55.83368,-10.618802,108.34818,37.9956,-11.34301,64.80144,47.22976,-14.296704,95.19047,62.69356,-11.104689,89.95547) > M <- matrix(X, ncol = 3, byrow=T) > M <- M[order(M[,1]),] > M
> matplot( M[,1], M[,-1],type=c('p','p','p','p','p','p'),lty=c(2,2,2,2,2,2),
+ lwd=c(1,1,1,1,1,1),pch=c(1,2,3,4,5,6),col=c('black','blue','red','green','yellow','violet'),
+ cex=c(1,1,1,1,1,1),ylab=c('y,z'),xlab=c('x'),main="Matrix plot",sub="matplot generator")
> graphics.off()
> #img tmpksstQF.png

The Qpy code is presented here so that improvements may be facilitated faster.
# matrix.qpy
# 2006.09.02   0.0.1  first version
# 2006.09.20   0.0.2  split from pie, dot page.
# 2010.08.23   0.0.3  revised.


__version__ = "0.0.3 2010.08.23"
__author__  = "ernesto.adorio@gmail.com"
__title__   = "Matrix PLots using R"
__catalog__ = "MATPLOT-RPLOT-0129"
__url__     = "/solvers/rplotpage/matrix/"


import time
import tempfile
import commands
import os

from   qp.fill.directory  import Directory
from   qp.fill.form       import Form, StringWidget, TextWidget,CheckboxWidget,SingleSelectWidget
from   qp.fill.css   import BASIC_FORM_CSS

from   qp.sites.extreme.lib.tmpfilesmanager import TmpFilesManager
from   qp.sites.extreme.lib.uicommon        import renderheader, renderfooter, processheader, processfooter
from   qp.pub.common                        import page
from   qp.sites.extreme.lib.checkinput      import checkInputs, getFormStrings
from   qp.sites.extreme.lib.qpyutils        import printRlines, showLogo
from   qp.sites.extreme.lib.webutils        import vecRead, GraphicsFile, as_R_cvector, as_R_vector, as_R_matrix
from   qp.sites.extreme.lib import config

       

def getcarg(s, defaultval):
    s = s.strip()
    if s in ["", "None"]:
       return defaultval
    return as_R_cvector(s.split())
   

def getdarg(s, defaultval):
    s = s.strip()
    if s in ["", "None"]:
       return defaultval
    return as_R_vector(s.split())
   
def Solve(fields):
    # Get the fields.
    (gX, gY, matdata, colnames, against, sortq, plottype, lty, lwd, pch, cex, xlab, ylab, col,main, sub) = fields

    if colnames == "": return "ERROR: blank column names field"
    colnames = colnames.split()

    ncol     = len(colnames)
    if ncol > 7:
       return "ERROR: more than 7 columns specified."

    against = against.strip()     
    if against not in ["None",""]:
       try:
           apos = colnames.index(against) + 1
       except:
           raise ValueError, "Variable to plot against [%s]is not in column names." % against
    else:
       apos = 0

    matdata=matdata.strip() 
    if matdata in ["", "None"]: 
       raise ValueError, "ERROR: empty matrix data field."   
    X = vecRead(matdata)

    extraargs = ""
    extraargs += ',type=%s' % getcarg(plottype, "p")
    extraargs += ',lty=%s'   % getdarg(lty, 1)
    extraargs += ',lwd=%s'   % getdarg(lwd, 1)
    extraargs += ',pch=%s'   % getdarg(pch, "c(1,2,3,4,5,6,7)")
    extraargs += ',col=%s'   % getcarg(col, "black")
    extraargs += ',cex=%s'   % getdarg(cex, 1)
    extraargs += ',ylab=%s'  % getcarg(ylab, '""')
    extraargs += ',xlab=%s' % getcarg(xlab, '""')
    extraargs += ',main="%s"'  % main
    extraargs += ',sub="%s"'  % sub


    # Start of R code.
    fname1    = GraphicsFile(str("png"))
    barefile1 = fname1.split(str("/"))[-1]
    Rcode = """png('%s', width=6*72, height=6*72)\n""" % fname1
    Rcode += "X <- " + as_R_vector(X) + "\n"
    Rcode += """M <- matrix(X, ncol = %s, byrow=T)\n""" % ncol
    if apos != 0:
        if sortq:
           Rcode += "M <- M[order(M[,%s]),]\n" % apos
        Rcode += "M\n"
        Rcode += "matplot( M[,%s], M[,-%s] %s)" %(apos,apos, extraargs)
    else:
        Rcode += "matplot(M %s)" % extraargs

    Rcode += """
graphics.off()
#img %s
    """ % (barefile1,)

    # Write to temporary file.
    (f, name) = tempfile.mkstemp(suffix=str(".r"), prefix=str("tmp"), dir=config.tmp_dir)
    os.write(f, str(Rcode))
    (status, output) = commands.getstatusoutput(str("R -q --no-save < %s")  % name)
    os.close(f)  
    return output
    

class MatrixplotPage(Directory):
    def get_exports(self):
        yield ('', 'index', 'MatrixPlot', '')
           

    def index[html](self):
        form  = Form(enctype="multipart/form-data")  # enctype for file upload
        form.add(StringWidget, name="gX", title="gX", value = "6", size=3)
        form.add(StringWidget, name="gY", title="gY", value = "6", size=3)

        sample = """
    21.69935 -11.798573 104.36818
    33.40331 -12.988921 105.79699
    48.84540 -12.244453 111.39495
    44.10297  -9.557104 136.15997
    48.51514  -7.568557 150.82756
    65.80403  -9.416977 147.99936
    34.86882  -8.473734  93.32171
    45.74425  -7.322524  83.93707
    71.80460  -4.894993 151.96477
    54.41312  -5.934022  43.50714
    53.41098 -10.627767  91.71084
    58.93975 -11.212206 118.94050
    35.43727  -8.715245  55.91096
    56.15090 -13.563171  40.80423
    81.91758 -11.491009  95.21836
    42.16800 -13.770880 100.34292
    55.15041  -7.522367  68.27453
    35.32567  -8.874406 129.32671
    49.89936 -16.804003 112.49538
    43.73633  -8.427571  47.67621
    52.84546  -8.812953  88.02893
    55.83368 -10.618802 108.34818
    37.99560 -11.343010  64.80144
    47.22976 -14.296704  95.19047
    62.69356 -11.104689  89.95547
    """
        form.add(TextWidget,    name = "matdata", title="", \
                 value = sample, cols ="65", rows = "10")

        form.add(StringWidget,  name = "colnames", title="Column names (not blank!)",
                 value = "x y z", size = "35")

        form.add(StringWidget,  name = "against", title="Against", value = "x", size = "10")
        form.add(CheckboxWidget, name = "sortq",   title="Sort?")

        form.add(StringWidget,  name = "plottype", size = 35, value = "p p p p p p")
        form.add(StringWidget,  name = "lty",      size = 35, value = "2 2 2 2 2 2")
        form.add(StringWidget,  name = "lwd",      size = 35, value = "1 1 1 1 1 1")
        form.add(StringWidget,  name = "pch",      size = 35, value = "1 2 3 4 5 6")
        form.add(StringWidget,  name = "cex",      size = 35, value = "1 1 1 1 1 1")
        form.add(StringWidget,  name = "xlab",     size = 35, value = "x")
        form.add(StringWidget,  name = "ylab",     size = 35, value = "y,z")
        form.add(StringWidget,  name = "main",     size = 35, value = "Matrix plot")
        form.add(StringWidget,  name = "sub",      size = 35, value = "matplot generator")
        form.add(StringWidget,  name = "col",      size = 35, value = "black blue red green yellow violet")

        form.add_hidden("time",   value = time.time())
        form.add_submit("submit", "submit")

        def render [html] ():
            renderheader(__title__)

            """
            
%s%s%s%s%s
""" % (form.get_widget("gX").render(), form.get_widget("gY").render(), form.get_widget("colnames").render(), form.get_widget("against").render(), form.get_widget("sortq").render(), ) form.get_widget("matdata").render() """
Plot type, type [p l b c o h s S n] %s
Line type, lty [1 2 3 4 5] %s
Line width, lwd %s
Point char, pch [any character]] %s
Char expand, cex %s
X label, xlab %s
Y label, ylab %s
Colors, col %s
Main title, main %s
Subtitle, sub %s
""" % (form.get_widget("plottype").render(), form.get_widget("lty").render(), form.get_widget("lwd").render(), form.get_widget("pch").render(), form.get_widget("cex").render(), form.get_widget("xlab").render(), form.get_widget("ylab").render(), form.get_widget("col").render(), form.get_widget("main").render(), form.get_widget("sub").render()) """ Uses matplot() function of R to plot up to 7 columns of data. The image size is gX inch by gY inch. Use can specify various graphics parameters such as colors, plotting character and labels. Up to six columns can be plotted. When plotting against a specified column, it may be important to sort the matrix based on the specified column if plotting using lines.

When statistics, pairs plots and box blots are required, use the solver Cat STAT-MSS-0064 instead. """ renderfooter(form, __version__, __catalog__, __author__) if not form.is_submitted(): return page('matrixplotpage', render(), style= BASIC_FORM_CSS) def process [html] (): processheader(__title__) calctime_start = time.time() # Get the problem parameters (gX, gY, matdata, colnames, against, sortq, plottype, lty, lwd, pch, cex, xlab, ylab, col, main, sub) = getFormStrings(form, [ "gX", "gY", "matdata", "colnames", "against", "sortq","plottype", "lty", "lwd", "pch", "cex", "xlab", "ylab", "col", "main", "sub" ]) inflag = checkInputs( [("gX", gX, ("float", 3, 10)), ("gY", gY, ("float", 3, 10)), ]) if inflag[0]: "

"
               inflag[1]
               "
" else: output = Solve((gX, gY, matdata, colnames, against, sortq, plottype, lty, lwd, pch, cex, xlab, ylab, col, main, sub)) "
"
               printRlines(output)
               "
" showLogo("Rlogo.jpg") processfooter(form, calctime_start, "./", __url__) process()

Constructive comments from our blog readers are always welcome.

Drawing a dotchart for dataframe in R

It has been suggested that bar charts and dot plots are excellent for displaying grouped data.
The dotchart function in R has the following syntax(typing ?dotchart inside an R session):



Cleveland's Dot Plots

Description:

Draw a Cleveland dot plot.

Usage:

dotchart(x, labels = NULL, groups = NULL, gdata = NULL,
cex = par("cex"), pch = 21, gpch = 21, bg = par("bg"),
color = par("fg"), gcolor = par("fg"), lcolor = "gray",
xlim = range(x[is.finite(x)]),
main = NULL, xlab = NULL, ylab = NULL, ...)

Arguments:

x: either a vector or matrix of numeric values (‘NA’s are
allowed). If ‘x’ is a matrix the overall plot consists of
juxtaposed dotplots for each row.

labels: a vector of labels for each point. For vectors the default
is to use ‘names(x)’ and for matrices the row labels
‘dimnames(x)[[1]]’.

..... More! please read the rest from an actual R session.


If R is a dataframe with column names (header) and row names, then the dot chart will use these
labels to group the data. Each column of data constitute a group and for each group, a series of
dot charts are displayed for all the rows belonging to the group.

Surprisingly, R does not display a dataframe natively inspite of the builtin VADeaths dataset working. Copying the VADeaths dataset to a file and reading it as a dataframe, (replacing embedded spaces with an underscore), resulted in R giving an error message that it expects a matrix or vector(roughly, a one column matrix). So we typecast it as a matrix and we are able to use the dotchart function.

Here is the menu page of our dotchart generator:


and a sample output.



We hasten to add that our page for a dot chart generator is only for quick visualization of data.

Syntax highlighter problems.

The new?? version of Alex Gorbatchev Syntax Highlighter when installed in our extreme-solvers.blogspot blog does not allow source code view, or copyging to the clipboard, or even rapid printing.

We will check if something is amiss in our installation.

Drawing equations with Curveplot using R

You can generate plots for up to 6 equations using our http://extreme.adorio-research.org/solvers/rplotpage/curve/
The graphs can be generated separately or together in a composite graph.
Here is an example of the basic hyperbolic functions plotted together.


This is the solver page with the settings for the composite graph above.



Doubtless this solver needs much improvement. For example, the lines are too thin! and the grid (there is ?!) is drawn also thinly. To remind the solver page writer, we include the source code
so that other people using QP/QPY may make constructive comments.

# file      curve.qpy
# 2006.09.02   0.0.1  first version
# 2006.09.20   0.0.2  split from pie, dot page.
# 2010.08.22   0.0.3  added checkboxes.

__file__    = curve.qpy
__version__ = "0.0.2"
__date__    = "2006.09.20"
__author__  = "ernesto.adorio@gmail.com
__title__   = "XY Equation Curve plots using R"
__catalog__ = "CURVE-RPLOT-0125"
__url__     = "/solvers/rplotpage/curve"

import time
import tempfile
import commands
import os

from   qp.fill.directory  import Directory
from   qp.fill.form       import Form, StringWidget, TextWidget,CheckboxWidget,SingleSelectWidget
from   qp.fill.css   import BASIC_FORM_CSS

from   qp.sites.extreme.lib.tmpfilesmanager import TmpFilesManager
from   qp.sites.extreme.lib.uicommon        import renderheader, renderfooter, processheader, processfooter
from   qp.pub.common                        import page
from   qp.sites.extreme.lib.checkinput      import checkInputs, getFormStrings
from   qp.sites.extreme.lib.qpyutils        import printRlines, showLogo
from   qp.sites.extreme.lib.webutils        import vecRead, GraphicsFile, as_R_cvector, as_R_vector, as_R_matrix
from   qp.sites.extreme.lib import config
import qp.sites.extreme.lib.checkinput as check

_MAXEQN  = 6

def Solve(fields):
    
    # Buildup the R code.
    Rcode = 'source("%s")\n'  % (config.lib_dir + "/mylib.R",)
    (gX, gY, splitq, legendQ, bty, ptype, draweqnQ, eqn, xlo,  xhi, ylo, yhi,  n, col, main, xlab, ylab) = fields

    if splitq == "True":
        splitq = True
    else:
        splitq   = False

    legend = ""
    fname  = ""
    for i in range(_MAXEQN):
        if draweqnQ[i] is True:
           eqni = str(eqn[i]).strip(str(" "))
           if len(eqni) > 0:
              xlow  = float(xlo[i])
              xhigh = float(xhi[i])
    
       ylim  = ",ylim=%s" % as_R_vector([ylo[i], yhi[i]])

              nx    = int(n[i])
              if xhigh <= xlow:
                  return "ERROR: (xhi[%s] = %s) < (xlow[%s] = %s)" % (i, xhigh, i, xlow)
       plottype = ptype[i]

              if splitq:
                 
                 fname    = GraphicsFile("png")
                 barefile = fname.split(str("/")) [-1]
                 Rcode += """png('%s',width=%s*72,height=%s*72)\n""" % (fname, gX, gY)

                 if main[i] != "":
                   mainlabel = ',main="%s"' % main[i]
                 else:
                   mainlabel = ''

                 if xlab[i] != "":
                    xlabel = ',xlab="%s"' % xlab[i]
                 else:
                   xlabel = ''

                 if ylab[i] != "":
                   ylabel = ',ylab="%s"' % ylab[i]
                 else:
                   ylabel = ''

                 Rcode += """
curve(%s,from=%s,to=%s,n=%s, col="%s", type="%s" %s %s %s %s)
grid(col="darkgray")
dev.off()
#img %s
"""            % (eqni, xlo[i], xhi[i], nx, col[i], plottype, ylim, mainlabel, xlabel, ylabel, barefile)
                 TmpFilesManager().add(fname)

                 fname    = GraphicsFile("png")
                 barefile = fname.split(str("/")) [-1]
       # Single plot.     
              else:
                  if i > 0:
                     add = ",add=TRUE"
                  else:
                     add = ""
                  if fname == "":
                    fname    = GraphicsFile("png")
                    barefile = fname.split(str("/")) [-1]
                    Rcode += """png('%s',width=%s*72,height=%s*72)\n""" % (fname, gX, gY)
                    title = ',main="%s"' % main[i]
                    Rcode += """
curve(%s,from=%s,to=%s,n=%s, ylim=c(%s, %s), col="%s", type="%s", xlab="%s", ylab="%s", main="%s")
"""               % (eqni, xlo[i], xhi[i], n[i], ylo[i], yhi[i], col[i], plottype, xlab[i], ylab[i], main[i])
                  else:
                    Rcode += """
curve(%s,from=%s,to=%s,n=%s, ylim=c(%s, %s), col="%s", type="%s" %s)
"""               % (eqni, xlo[i], xhi[i], n[i], ylo[i], yhi[i], col[i], plottype,  add)
 
                  if legend == "":
                     legend="c(\"%s\"" % ylab[i]
                     legendcol= "c(\"%s\"" % col[i]
                  else:
                     legend    += ",\"%s\"" % ylab[i]
                     legendcol += ",\"%s\"" % col[i]
    if not splitq:
        if legendQ :
           Rcode += 'legend("%s", legend=%s), col=%s), lty=1,bty="%s")' % \
                         (legendQ, legend, legendcol,  bty)

    if not splitq:
        Rcode += """
grid()
#img %s""" % barefile
        TmpFilesManager().add(fname)
            
    # Write to temporary file.
    try:
       (f, name) = tempfile.mkstemp(suffix=str(".r"), prefix=str("tmp"), dir= config.tmp_dir)
       os.write(f, str(Rcode))
       (status, output) = commands.getstatusoutput(str("R -q --no-save < %s")  % name)
       os.close(f)   # auto delete.
    except:
        output = str(Rcode) + "\nError: evaluation error in R" 
    return output


class CurveplotPage(Directory):
    def get_exports(self):
        yield ('', 'index', 'xycurveplot', '')
           

    def index[html](self):
        form  = Form(enctype="multipart/form-data")  # enctype for file upload
        form.add(StringWidget, name = "gX",  title = "gX", value = "4", size =2)
        form.add(StringWidget, name = "gY",  title = "gY", value = "4", size =2)

        form.add(SingleSelectWidget, name = "splitq",
                 title = "graphs?",
                 value = "True",
                  options = [("True",  "Separate"),
                            ("False", "Single")
                           ]
                )

        form.add(SingleSelectWidget, name = "legendQ",
                 title = "Legend",
                 value = "",
                 options = [("",          "None"),
                            ("topright",  "topright"),
                            ("top",       "top"),
                            ("topleft",   "topleft"),
                            ("right",     "right"),
                            ("center",    "center"),
                            ("left",      "left"),
                            ("bottomright", "bottomright"),
                            ("bottom",    "bottom"),
                            ("bottomleft","bottomleft"),
                           ]
                )

        form.add(SingleSelectWidget, name = "bty",
                 title   = "Box",
                 value   = "o",
                 options = [("o", "o"),  ("n", "n")] 
                )

        form.add(StringWidget, name = "ptype", title="Plot Type", value="llllll", size = 4)
        for i in range(6):
            form.add(CheckboxWidget, name="eqn%sdrawQ" %i, value=False)        


        eqns = ["sin(x)", "cos(x)", "tan(x)", "sinh(x)", "cosh(x)", "tanh(x)"]
        cols = ["blue",   "red",    "black", "violet", "green", "violet"]
        for i in range(_MAXEQN):
            form.add(StringWidget,  name = "eqn%d" %i, title = "", value = "%s" % eqns[i], size = 32)
            form.add(StringWidget,  name = "xlo%d" %i, title = "", value = -10, size = 3)
            form.add(StringWidget,  name = "xhi%d" %i, title = "", value = 10,  size = 3)
            form.add(StringWidget,  name = "ylo%d" %i, title = "", value = -10, size = 3)
            form.add(StringWidget,  name = "yhi%d" %i, title = "", value = 10,  size = 3)
            form.add(StringWidget,  name = "main%d" %i, title = "", value = "Title%s" % (i+1), size = 5)
            form.add(StringWidget,  name = "xlab%d" %i, title = "", value = "X%s" % (i+1), size = 5)
            form.add(StringWidget,  name = "ylab%d" %i, title = "", value = "Y%s" % (i+1), size = 5)
            form.add(StringWidget,  name = "n%d"   %i, title = "", value = 100, size = 3)
            form.add(StringWidget,  name = "col%d" %i, title = "", value = "%s" % cols[i], size = 4)
        form.add_hidden("time",   value = time.time())
        form.add_submit("submit", "submit")

        def render [html] ():
            renderheader(__title__)

            """
            
%s%s%s%s%s%s
""" % (form.get_widget("gX").render(), form.get_widget("gY").render(), form.get_widget("splitq").render(), form.get_widget("legendQ").render(), form.get_widget("bty").render(), form.get_widget("ptype").render(), ) "" " " for i in range(_MAXEQN): """ \n""" % \ (form.get_widget("eqn%sdrawQ" %i).render(), form.get_widget("eqn%d" % i).render(), form.get_widget("xlo%d" % i).render(), form.get_widget("xhi%d" % i).render(), form.get_widget("ylo%d" % i).render(), form.get_widget("yhi%d" % i).render(), form.get_widget("n%d" % i).render(), form.get_widget("col%d" % i).render(), form.get_widget("main%d" %i).render(), form.get_widget("xlab%d" %i).render(), form.get_widget("ylab%d" %i).render(), ) "
draw?Equationxloxhiyloyhincolormainxlabylab
%s %s %s %s %s %s %s %s %s %s %s
" """ This page enables you to plot up to 6 equations either in separate graphs or in a single composite graph. For the latter, the title is obtained from the main title of the first equation to be plotted. """ renderfooter(form, __version__, __catalog__, __author__) if not form.is_submitted(): return page('curveplotpage', render(), style= BASIC_FORM_CSS) def process [html] (): processheader(__title__) calctime_start = time.time() # Get form strings. (gX,gY,splitq,legendQ,bty,ptype)= check.getFormStrings(form,["gX","gY","splitq","legendQ","bty","ptype"]) gX = str(min(int(gX), 10)) gy = str(min(int(gY), 10)) # Get the equations and their parameters. draweqnQ = [True] * _MAXEQN eqn = ["" for i in range(_MAXEQN)] xlo = ["" for i in range(_MAXEQN)] xhi = ["" for i in range(_MAXEQN)] ylo = ["" for i in range(_MAXEQN)] yhi = ["" for i in range(_MAXEQN)] main =["" for i in range(_MAXEQN)] xlab =["" for i in range(_MAXEQN)] ylab =["" for i in range(_MAXEQN)] n = ["" for i in range(_MAXEQN)] col = ["" for i in range(_MAXEQN)] for i in range(_MAXEQN): draweqnQ[i] = drawQ = form.get("eqn%sdrawQ" % i) if drawQ: eqn[i] = str(form.get("eqn%d" % i)) xlo[i] = str(form.get("xlo%d" % i)) xhi[i] = str(form.get("xhi%d" % i)) ylo[i] = str(form.get("ylo%d" % i)) yhi[i] = str(form.get("yhi%d" % i)) n[i] = str(form.get("n%d" % i)) col[i] = str(form.get("col%d" % i)) if col[i] == "None": col[i] = "" main[i] = str(form.get("main%d" % i)) if main[i] == "None": main[i] = "" xlab[i] = str(form.get("xlab%d" % i)) if xlab[i] == "None": xlab[i] = "" ylab[i] = str(form.get("ylab%d" % i)) if ylab[i] == "None": ylab[i] = "" # Check ptype: ptype = str(ptype).replace(str(" "), str("")) if len(ptype) == 0: ptype="llllll" elif len(ptype) == 1: ptype = ptype * _MAXEQN output = Solve((gX,gY,splitq,legendQ,bty,ptype,draweqnQ, eqn,xlo,xhi,ylo,yhi,n,col,main,xlab,ylab)) "
"
          printRlines(output)
          "
" """
Powered by
""" showLogo("Rlogo.jpg") processfooter(form, calctime_start, "./", __url__) process()

Also, the server and the developer's own laptop has problems with locales. Seem Ubuntu is infected with this bug.


0001 During startup - Warning messages:
0002 1: Setting LC_CTYPE failed, using "C"
0003 2: Setting LC_COLLATE failed, using "C"
0004 3: Setting LC_TIME failed, using "C"
0005 4: Setting LC_MESSAGES failed, using "C"
0006 5: Setting LC_PAPER failed, using "C"
0007 6: Setting LC_MEASUREMENT failed, using "C"

Saturday, August 21, 2010

Drawing contour plots z = f(x,y) using R

Our equation contour plot drawing page requires an equation of the form z = f(x,y). Colors may be assigned to the levels of Z.

This solver page allows setup of the following parameters.

  • gx, gy - rectangular dimension width, height in inches of contour plot.
  • xlo, xhi, nx - lower and upper limits for x and number of points for x.
  • ylo, yhi, ny - lower and upper limits for x and number of points for x.
  • colors - choices are [gray|rainbow|topo.color|terrain.color|heat.color}
  • equation - the formula for z as a function of both x and y.
  • image - whether to display plain level-curves only or colored contours.
  • levels - maximum number of levels to display.
  • Title-title of plot

Here is an output for contour plot whose equation is $$z = exp((sin(x))^2+ cos(y).$$

Drawing bag plots with R

Bag plots are two dimensional generalizations of the one-dimensional boxplots.
In R, the aplpack library allows the drawing of bag plots.


Here is an example drawing generated by our solver at rplotpage/bag

Drawing Delaunay Triangulations and Voronoi Tesselations with R

These two plots find applications in computational geometry.
The Delaunay triangulation is such that no point is inside any of the circumcircles in the triangulation. See Wikipedia article on delaunay triangulation

On the other hand, a voronoi tesselation, has a graph which may be considered the dual of the Delaunay tesselation. See Wikipedia article on Voronoi_diagram


The R software has a deldir package which allows us to easily generate
Delaunay triangulations or Voronoi tesselations or even both!

We offer a web interface in Rplotpage/deldir.
Here are examples illustrating all choices in the deldir plotting package.



Drawing Chernoff faces / star plots with R.

One unusual chart available with R is Chernoff faces available from the aplpack library.
It tries to draw a human face for each row of observation containing a maximum of 15 columns by varying the following human head features:
  1. height of face
  2. width of face
  3. structure of face
  4. height of mouth
  5. width of mouth
  6. smiling
  7. height of eyes
  8. width of eyes
  9. height of hair
  10. width of hair
  11. style of hair
  12. height of nose
  13. width of nose
  14. width of ear
  15. height of ear

Here is a sample figure generated by visiting our chernoff-faces solver page at rplotpage/chernoff



This page also allows you to draw a stars page too. To see the difference, check the following figure:



We will have to update the start generation code.

Drawing two-dimensional convex hulls with R.

Our convex hull drawing generation page, accessible at rplotpage/chull calls the chull plotting function of R.
The convec hull of a set S of two-dimensional points is a subset of S forming the
coordinates of the a convex polynomial enclosing all other points of S.

Here is a sample convex hull drawing using R.


As usual, the reader can specify the size of the plot (default is 6" by 6"). Data is always read as a sequence of (x,y) data pairs.

Drawing bar plot with R

Our extreme online solvers contain drawing pages to enable anyone with Internet connection to generate charts, graphs, and diagrams using R. Here we discuss the generation of bar graphs.

Our bar graph generation page is accessed at solvers/rplotpage/bar.
The page allows the user to specify the following:

  • gX, gY plot width and height in inches.
  • nvars - number of columns in data.
  • reading - Data is read rowwise if set to "by row", otherwise the reading is columnwise if reading is set to "by column"
  • type- a choice of whether the bars are to be drawn beside each other or stacked with each other.
  • orientation- whether to draw the bars vertically or horizontally.
  • axis - True if axis is to be drawn.
  • log? - use logarithm of values to display the bars.
  • border - color of the border.
  • data - text area for entering data.
  • color -whether to use color specified for each bar in the fields below.
  • density-whether to add hatching to each bar drawn.
  • Main - graph title
  • sub - subtitle
  • xlab - label along X-axis
  • ylab - label along Y-axis

Here is a snapshot of a sample input page:



And here is the resulting image when the submit button is clicked.


The default bar-plot does not include 3-D types. The R-software concentrate on creating excellent statistical analytical tools and quick visualizations and not on junk charts. We recommend the reader to use other free software like gnumeric, openoffice, kspread and others.

Saturday, August 14, 2010

Normal Population Sampling

Now that we have done a finite sampling demonstrator, it is easy to create a sampling from a normal population solver page. This time, the mean and variance of the normal population, and the size of sample and the total number of samples to generate ere specified. Data on a statistic for each sample is computed which may be one of

  • mean
  • sum
  • variance
  • min
  • max
  • range
  • median
  • Q1
  • Q3

are gathered. Currently only the mean and variance of the samples of the statistic are computed.
We will add descriptive statistics on the generated statistic of the samples on a future version.
This solver page for sure will be revised soon!

Friday, August 13, 2010

Finite sampling demonstrator

I have to intall a Latex processor for my blog. Look for the mirrored article in
Sampling Demonstator


Finite Sampling Deomonstrator


Given a finite population S with N elements which may not be unique (some elements are repeated),
we extract a finite sample X with n elements where n < N. The way we extract the n elements may be done in the following manner:
  1. Permutions without replacement.
  2. The ordering of the sample elements is important and a sample element once chosen in the sample may NOT be chosen again.
  3. Permutations with replacement.
  4. The ordering of the sample elements is important and a sample element once chosen in the sample may be chosen again.
  5. Combinations without replacement
  6. The ordering of the sample elements is NOT important and a sample element once chosen to be in the sample is not available again.
  7. Combinations with replacement
  8. The ordering of the sample elements is NOT important and an element chosen to be in the sample may still be chosen again.

The total number of samples for each type of finite sampling above is given in the following table:


Number of Samples
sampling CombinationsPermutations
without replacement $$\frac{N!}{n! (N-n)!}$$ $$\frac{N!}{(N-n)!}$$
with replacement $$\frac{(N+n-1)}{n!(N-1)!}$$ $$N^n$$



Let $$S = [s_0, s_1, s_2, ....,s_{N-1}]$$. Our generated sample X is actually X = $$[s_{i_0}, s_{i_1}, ...., s_{i_{n-1}}]$$
where the indices $$i_0 to i_{n-1}$$ is sequentially generated by a combinatorial algorithm.

We may be interested in the following statistic which is a random variable for the totality of all samples:


  1. sample mean
  2. sample sum or total
  3. sample s.d.(standard deviation) (divisor is n-1)
  4. populaton s.d(divisor is n)
  5. sample var (sample variance)(divisor is n-1)
  6. population variance(population variance) (divisor is n)
  7. sample vaiance(sample variance) (divisor is n-1)
  8. sample max (maximum value)
  9. sample min (minimum value)
  10. range (max - min )


We wish to gather data on ALL possible finite samples for the desire statistic,
the mean and standard deviation and of course the distribution table for the statistic which contain the
Columns for statistic, frequency, (rf) relative frequency, (crf) cumulative relative frequency. x rf

Here is a complete example for the sum of the numbers which show up in a throw of three dice:
The population consists of [1,2,3,4,5,6].
The population size is 6.
The sample size is 3.
The ordering is considered important, for example [1,3,2] will be considered different from [3,1,2]. Thus it
is a permutation with replacement.
The "first" sample is [1,1,1] with a total of 3 and the "last" sample is [6,6,6] with a total of 18.
To help with our computations, we use our solvers hosted at www.extreme.adorio-research.org to do it for us!

We will only show the generated frequency distribution table as the 216 generated samples is too long for this page.




Sampling Statistic Frequency Distribution Table
xfrf crfx rf(x-mu)^2 rf
3.010.004629629629630.004629629629630.01388888888890.260416666667
4.030.01388888888890.01851851851850.05555555555560.586805555556
5.060.02777777777780.04629629629630.1388888888890.840277777778
6.0100.04629629629630.09259259259260.2777777777780.9375
7.0150.06944444444440.1620370370370.4861111111110.850694444444
8.0210.09722222222220.2592592592590.7777777777780.607638888889
9.0250.1157407407410.3751.041666666670.260416666667
10.0270.1250.51.250.03125
11.0270.1250.6251.3750.03125
12.0250.1157407407410.7407407407411.388888888890.260416666667
13.0210.09722222222220.8379629629631.263888888890.607638888889
14.0150.06944444444440.9074074074070.9722222222220.850694444444
15.0100.04629629629630.9537037037040.6944444444440.9375
16.060.02777777777780.9814814814810.4444444444440.840277777778
17.030.01388888888890.995370370370.2361111111110.586805555556
18.010.004629629629631.00.08333333333330.260416666667
Sum2161.0mean=10.5variance=8.75
std.dev=2.95803989155
Finite Population Parameters, Correction factor=0.774596669241
(N,n)MeanPvarPstdSvarSstd
(6, 3) 3.5 2.91666666667 1.70782512766 3.5 1.87082869339

Visit the solver Stats sampling

Be sure to specify the right parameters for the solver for the above example, see the screen below:


We will try to complete this solver with a graph of the relative frequency and the cumulative relative frequency vs. the statistic or random variable x.

Hope you will find this finite sampling demonstator a quick tool for mastering statistical concepts!

Tuesday, August 10, 2010

Configuring QP and Apache2 to use scgi.

August 11.

We are now getting more familiar with configuring Apache2 and QP. We hope to contribute more documentation so more people will use QP/QPY with Apache2 (if Apache2 is their web server).



Configuring QP for scgi.


QP is actually a server with optional scgi capabilities. Due to its small size, the code runs quickly. There will be an article later why QP should use scgi. Perhaps the main advantage is aesthetic. There is no ugly port number in the url address of the application. Another is that Apache can serve static content quickly.


In the QP configuration section, add an entry for scgi_address.


class SitePublisher (DurusPublisher):
configuration = dict(
durus_address=('localhost', 7003),
http_address=('localhost', 8003), # may be deleted if scgi_address is specified.
as_https_address=('localhost', 9003),
https_address=('localhost', 10003),
scgi_address =('127.0.0.1', 3007) #http_address will be ignored if included.
)



Configuring Apache2 to use scgi

The configuration files for Apache are found in the /etc/apache2 directory. Create a file configuration file proto(touch proto) inside the sites-available subdirectory. Then create a softlink proto inside the sites-enabled subdirectory to point to sites-available/proto. Here is a virtual host template for the proto example which works in my Linux VPS host.
Replace mydomain, with your own domain, say abc-services.

LoadModule scgi_module //mod_scgi.so

AddDefaultCharset utf-8
ServerName www.proto.mydomain
ServerAlias proto.mydomain

# Decomment the following for special html errors pages.
DocumentRoot /var/www/
#ErrorDocument 500 /500.html
#ErrorDocument 404 /404.html

# handle all requests through SCGI
SCGIMount / 127.0.0.1:3007



Issue sudo a2enmod mod_scgi to make Apache use mod_scgi forever. Otherwise add the line before the settings the line
LoadModule scgi_module /usr/lib/apache2/modules/mod_scgi.so

After saving the config file, don't forget to restart apache2!

sudo /etc/init.d/apache2 restart

Accessing the scgi/Apache2 powered web application

Use the url http://wwww.proto.mydomain/ to access the proto example.


Serving Static file


There is still a problem though. The static files are not served by Apache2, i.e., we want proto.mydomain/all.css to work instead of as wanted by the scgi client. proto.mydomain/scgi/all.css. We will worry about this later when we have more than a thousand visitors!For the impatient, one can try the following


Alias /static "/path/to/static/files/dir"

SetHandler None
Options -Indexes +FollowSymLinks
allow from all



Remarks. It is recommended that the python-passfd package be installed. Use sudo easy_install python-passfd.



Acknowledgments: QP mailing list, Support at Rimuhosting for their helpful replies.

Experiment with the settings. And please share the knowledge gained with other Python programming language lovers.

Monday, August 9, 2010

The QP-QPY framework.The elegant, efficient, tiny Python web framework nobody uses?

I am surprised that among the Python web frameworks: Turbogears, Django, QP/QPY pales in the number of deployments. Admittedly QP/QPY does not have many evangelists, and scant documentation (as befit its tiny code size!). Actually, QP/QPY does its job very well and so quietly, no one notices.

The list of applications using QP/QPY include some web sites which are not active anymore! They should be replaced so that visitor will not get the idea that QP/QPY are for experimentation and not for the reliable, heavy commercial web commerce services as exemplified by the heavyweights JAVA Enterprise Class servers.

Our QP/QPY extreme solvers which was written formerly using Quixote will be revived soon! We will be busy and to support this potent explosive creative endeavor, we will put ads on the site. I recall that Google search engine highly ranked the solvers site (actually its educational reference material).

To our readers, expect this QP/QPY powered site to exhibit growing pains, and temporary instabilities, not of the mental kind, but hardware and software burps and hiccups.

Sunday, August 8, 2010

X11, xvfb and R problems.

August 9


In the good old days, my solvers were hosted in a site managed by my Uber geek former Linux system administrator and I have not much of problems running our online solvers. Now that we are standing alone, I find it hard to juggle a lot of system maintenance, design of new solver pages and debugging current solvers.

Oh, yes, the Xvfb server is for those web based apps which insist on having an X11 server. This is a virtual headless X server where you basically define memory screens like in my /etc/init.d/xvfb script with the following contents:


#! /bin/sh

Xvfb :1 -screen 0 1280x1024x16 -fbdir /tmp -fp /usr/share/fonts/X11/misc


It requires font library to be installed also. The script is run only once on startup. However I have problems runnint R:

> x11()
Error in X11(d$display, d$width, d$height, d$pointsize, d$gamma, d$colortype,  : 
  unable to start device X11
In addition: Warning messages:
1: In x11() :
  locale not supported by Xlib: some X ops will operate in C locale
2: In x11() : X cannot set locale modifiers
3: In x11() : unable to open connection to X11 display ''
> 

I notice that the Rweb server, a Perl-R interface is properly working but its default graphics device is postscript. We can go that route later too. But we shall insist that png device be supported to avoid an expensive rewrite of our solver pages with graphics output. Here is the output with the png device:


> png()
Error in X11(paste("png::", filename, sep = ""), width, height, pointsize, :
unable to start device PNG
In addition: Warning messages:
1: In png() :
locale not supported by Xlib: some X ops will operate in C locale
2: In png() : X cannot set locale modifiers
3: In png() : no png support in this version of R
>


We will try to see if we can fix this in a week. Otherwise we shall rewrite our solvers using the ps device.

Here is our last set of installation commands
522 sudo apt-get install libpng12-0 libpng12-devel
523 sudo apt-get install imagemagic
545 sudo apt-get install libcairo
546 sudo apt-get install libcairo-devel
547 sudo apt-get install pango
548 sudo apt-get install libpango
550 sudo apt-get install libpango1.0-dev
551 sudo apt-get install libcairo2-dev

After installation we got upon running R console:
> capabilities()
    jpeg      png     tiff    tcltk      X11     aqua http/ftp  sockets 
    TRUE     TRUE    FALSE    FALSE     TRUE    FALSE     TRUE     TRUE 
  libxml     fifo   cledit    iconv      NLS  profmem    cairo 
    TRUE     TRUE     TRUE     TRUE     TRUE    FALSE     TRUE 
> 
Where is bmp? Ah well, we will always try to improve this article.

A problem with Ubuntu: Setting up global (5.7.1-1) ... Name or service not known.

When I tried to install package locales, I got the following feedback:
sudo apt-get install locales
Reading package lists... Done
Building dependency tree
Reading state information... Done
locales is already the newest version.
locales set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
1 not fully installed or removed.
After this operation, 0B of additional disk space will be used.
Setting up global (5.7.1-1) ...
hostname: Name or service not known
dpkg: error processing global (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 global
E: Sub-process /usr/bin/dpkg returned an error code (1)

The fix is to add a second line to your /etc/hosts file 127.0.0.1 [your_computer_name or domain_name] and then run find. I don't know why find has to be run but it works! Will search

Updates:

Aug. 28, 2010. Still have not solved the locale problem. When you do sudo dpkg-reconfigure locale
it will try to bring up a graphical screen. In a server enviroment, this is not possible. The fix for those packages which need a configuration is to install the package console-setup. It consumes 2806 Kb bytes. Maybe you can just remove it afterwards. My remaining problem is that dpkg cannot find the en.US-8553-1 locale is not installed! Duh. This is getting to our nerves.

I am reading the info at http://blog.andrewbeacock.com/2007/01/how-to-change-your-default-locale-on.html. Hope it helps.

A problem using scipy and numpy

If you installed scipy and numpy using the Ubuntu command apt-get install [package-name], check if importing scipy.stats results in an error.

There was an error when I tried to import scipy.stats. It complained of wrong data type for
von_mises stats distribution function.

The fix is to remove them using apt-get remove and then reinstalling from the latest svn sources.

svn co http://svn.scipy.org/svn/numpy/trunk numpy
svn co http://svn.scipy.org/svn/scipy/trunk scipy

Istall subversion if not yet installed.
Then cd to each directory and issue sudo python setup.py install to install each package.

Is Apache running? If not then restart

Here is an sh script file to restart the Apache2 server in case it is not running.

#!/bin/sh
run=`ps ax | grep /usr/sbin/apache2 | grep -v grep | cut -c1-5 | paste -s -`
if [ "$run" ];
then
echo “Apache is running”
else
sudo /etc/init.d/apache2 restart
fi

Save to a file checkapache2.sh. Then set it as executible.(sudo chmod ug+x checkapache2.sh).
Enter a cron entry (using crontab -e) to check every 3 minutes. Adjust if necessary. If you have a fast server, then you can lower the time interval between checking.

References: lampdocs

Some problems with Ubuntu installed packages.

Out server has been upgraded with bigger disk and RAM and with new OS to boot! However, there are some problems in the install of packages.



scipy:

File "/usr/lib/python2.6/dist-packages/scipy/stats/distributions.py", line 27, in
import vonmises_cython
File "numpy.pxd", line 30, in scipy.stats.vonmises_cython (scipy/stats/vonmises_cython.c:2939)
ValueError: numpy.dtype does not appear to be the correct type object

These problems fixed in our latest blog: A problem using scipy and numpy

Problem upgrading to MySQL5 in Ubuntu upgrade to Lucid Lynx

Oh, something seriously wrong happened in the way to upgrading an Ubuntu server installation from Hardy to Lucid. The former PHP applications which I consider robust have all problems with the MySQL daabase! Our big Wordpress blog Digital Exploration is down due to

Error in Establishing a Database Connection!

which is kind of silly since all our previous MySQL database connections were working properly.

We hope that we will soon be back. All I can think of is to reinstall MySQL and Wordpress, in that order. As far as I can see, the MySQL server can run. It is the MySQL clients that cannot connect!

The problem was a misconfiguration, basically, the my.cnf file, involving copying the default config file to /etc/mysql/my.cnf I will expound further on this when we have the time.

Thanks to our VPS support team for fixing the error.

Saturday, August 7, 2010

Trouble in PHP paradise: Unable to establish a database connection

Our Php apps (moodle, wordpress) are generally robust, but recently in the upgrade to Ubuntu Lucid Lynx, we encountered the error message

Error establishing a database connection

.

Formerly, we just reboot the server, and it comes back immediately but this time, the most recent upgrade to the OS turned the normally robust apps to problematic ones.

Sunday, August 1, 2010

Upgrading R to Version 2.11.0 2010-04-22

August 11:

We just upgraded the R statistical software to the latest stable version 2.11.0. Expect some solvers involving statistics to not be working.

We also relive the X11 problem. R expects to be able to use an X11 screen even in a server environment! We will dig up our old manuals to fix this and finally publish our results.

August 8: The graphics now work. I don't know what fixed it in the rush to update the OS from hardy to Lucid Lynx. Our Rweb application (courtesy of Jeff Banfield and the R organization) for example works out correctly without much manual configuration on our part! Before we thought we will spend endless nights on fixing Xvfb the X virtual frame buffer.