Monday, February 14, 2011

Correlation Test using R software

There is no reason why anyone interested in statistical computing will not choose R software, unless they are already wedded to existing software from their school days, or the choice was already made for them by their companies or their professors!. R boasts the most number of libraries and is open source and best of all FREE!
It takes time to master the syntax and we suggest that readers should try to download and test the R software in their machine if they had not done so.

In this post, we shall describe our online solver for testing correlation between two variables X and Y.
The solver is mainly a web interface to the cor.test function available in R. This function has the following spec for input arguments:


cor.test(x, y,
alternative = c("two.sided", "less", "greater"),
method = c("pearson", "kendall", "spearman"),
exact = NULL, conf.level = 0.95, continuity = FALSE, ...)


Here is how our solver for correlation test looks like at the moment.



When the user clicks on the submit button, the solver will call the R software, grabs its output, and print the results:

0001 > x <- c(44.4,45.9,41.9,53.3,44.7,44.1,50.7,45.2,60.1)

0002 > y <- c(2.6,3.1,2.5,5.0,3.6,4.0,5.2,2.8,3.8)

0003 >

0004 > cor.test(x,y, alternative="two.sided", method="pearson",conf.level=0.9,exact=TRUE, continuity=TRUE)

0005 

0006 Pearson's product-moment correlation

0007 

0008 data:  x and y

0009 t = 1.8411, df = 7, p-value = 0.1082

0010 alternative hypothesis: true correlation is not equal to 0

0011 90 percent confidence interval:

0012 -0.02223023  0.86697863

0013 sample estimates:

0014 cor

0015 0.5711816

0016 

0017 >

The printed p-value .1082 tells us not to reject the Null hypothesis that the correlation between
x and y is zero.

We will a scattergraph to the solver in the future.


Here is our interface code for the correlation test.

__version__ = "0.0.2"  
__catalog__ = "TEST-STAT-0036"
__date__    = "2011.02.14"
__author__  = "E.P. Adorio"


from   qp.fill.form     import Form, SingleSelectWidget, RadiobuttonsWidget, TextWidget, StringWidget, CheckboxWidget
from   qp.sites.extreme.lib.uicommon  import renderheader, renderfooter, processheader, processfooter
from   math             import sqrt
from   qp.sites.extreme.lib.webutils import vecRead
from   qp.sites.extreme.lib   import webutils
from   qp.fill.directory import Directory
from   qp.pub.common     import header, footer, page, redirect
from   qp.fill.css  import BASIC_FORM_CSS
from   qp.sites.extreme.lib.webutils import vecRead, matRead, matReadByColumn, runRcode
from   qp.sites.extreme.lib.qpyutils        import printRlines, showLogo


from   scipy          import stats

import time
from   scipy          import stats

import copy
import math


def getGrFile(grType):
    grfile   = GraphicsFile("%s" % grType)
    barefile = grfile.split(str("/"))[-1]
    return grfile, barefile

def getformDict(form):
    D={}
    for field in form.get_all_widgets():
        D[field.name] = form.get(field.name)
    return D    

def asColumns(S, ncolumns):
    """
    Args:
      ncolumns is number of columns.
      S is a string containing a space delimited columns of values.
    Return value
      a list consisting of the columns of S.
    """
    columns = [[] for i in range(ncolumns)]
    for i, v in enumerate(S.split()):
        columns[i % ncolumns].append( v)
    return columns

def asRvector(x):
    """
    Arg
      x - Python string array.
    Return value
      an R vector as a string c(v1,....vn)
    """
    S = "c(" 
    for v in x:
        S += str(v) + ","
    S += ")"
    return S.replace(",)", ")")

 


def SolveProblem(form):
    D    = getformDict(form)
    x, y = asColumns(D["pairedSample"], 2)
    alpha       = float(D["alpha"])
    method      = D["method"]
    alternative = D["alternative"]
    cont        = D["continuity_correction"]
    exact       = D["exact"]

    exact = "TRUE" if exact else "FALSE"
    cont  = "TRUE" if cont else  "FALSE"

    xstr = "x <- " + asRvector(x)+"\n"
    ystr = "y <- " + asRvector(y)+"\n"
    Rcode =xstr + ystr

    conflevel = 1-alpha

    Rcode += """
cor.test(x,y, alternative="%s", method="%s",conf.level=%s,exact=%s, continuity=%s)               
""" %(alternative, method,conflevel, exact, cont)
     
    # create the R command.
    Rcode = Rcode.replace(",)", ")")
    return runRcode(Rcode)


class CorrTestPage(Directory):
    def get_exports(self):
        yield ('',       'index',   'Correlation test for Paired Data', "R stat software corr.test for Correlation")

    def index[html](self):
        form = Form(enctype = "multipart/form-data")

        form.add(CheckboxWidget, name = "continuity_correction", \
                 value = True,
                )
        form.add(CheckboxWidget, name = "exact", \
                 value = True,
                )

        form.add(TextWidget,  name = "pairedSample",  \
                 rows = "10", cols = "30", value=""" 
          44.4 2.6
          45.9 3.1
          41.9 2.5
          53.3 5.0
          44.7 3.6
          44.1 4.0
          50.7 5.2
          45.2 2.8
          60.1 3.8
                """
                )

        form.add(StringWidget, name = "alpha", size = 5, value = "0.10")
        form.add(SingleSelectWidget, name="alternative", options=[
                 ("two.sided", "two.sided"),
                 ("less", "less"),
                 ("greater", "greater")])

        form.add(SingleSelectWidget, name="method", options=[
                 ("pearson", "Pearson"),
                 ("kendall", "Kendall"),
                 ("spearman", "Spearman")])


        form.add_hidden("time",   value = time.time())
        form.add_submit("submit", "submit")


        def render [html] ():
            renderheader("Correlation test using R-software")

            pairedSample = form.get_widget("pairedSample")
            alpha = form.get_widget("alpha")
            exact = form.get_widget("exact")
            method   = form.get_widget("method")
            alternative = form.get_widget("alternative")
            cont = form.get_widget("continuity_correction")
            """
            







MethodAlternativeAlphaContinuity CorrectionExact p value?
%s %s%s%s%s
X Y data
%s
""" %( method.render(), alternative.render(), alpha.render(),cont.render(), exact.render(), pairedSample.render())

renderfooter(form, __version__, __catalog__, __author__)

if not form.is_submitted():
return page('CorrTestPage', render(), style= BASIC_FORM_CSS)

def process [html] ():
processheader("Correlation test using R software corr.test() function")
calctime_start = time.time()

if True:
flag, output = SolveProblem(form)
printRlines(output)

if 0:
for widget in form.get_all_widgets():
"
%s %s
" % (widget.name, widget.value)
processfooter(form, calctime_start, "./", "./stest_t1samp")
process()

Tuesday, February 1, 2011

Creating a Unit Conversion Page

We had a unit conversion page before but I felt the initial code was too complicated to maintain. A rewrite was necessary and though the page is still at version 0.0.1, we feel that the rewrite is justified. Here is the code, using Python and QP. I need a web artist or graphic designer to add color and spice to the unit conversion page.

from qp.sites.extreme.lib.config import image_dir, homeurl



__version__ = "0.0.1 2011.01.28"
__date__    = "2011.01.28"
__catalog__ = "UNITS-CONV-001"
__author__  = "ernesto.adorio@gmail.com"
__url__     = homeurl + "solvers/units/"
__title__   = "Fundamental Units Conversion Page"


from qp.pub.common      import get_session
from qp.fill.css        import BASIC_FORM_CSS
from qp.fill.directory  import Directory
from quixote.errors     import PublishError
from quixote.util       import dump_request
from qp.fill.form       import Form
from qp.fill.widget    import CheckboxWidget,StringWidget,SingleSelectWidget, StringWidget, TextWidget
from qp.pub.common      import header, footer, page, redirect
from   qp.sites.extreme.lib.uicommon import renderheader, renderfooter, processheader, processfooter
import time
import urllib

from unitslib import Length, Mass, Time, convert


import urllib, urllib2, os

from  qp.sites.extreme.lib.webutils  import before, after
from  qp.sites.extreme.lib import config


def getstatusoutput(command):
     """
     Fredrick Lundh 
     http://mail.python.org/pipermail/python-list/2006-October/406444.html 
     """
     from subprocess import Popen, PIPE, STDOUT
     p = Popen(command, stdout=PIPE, stderr=STDOUT, shell=True)
     s = p.stdout.read().split("\n")
     return p.wait(), s


class UnitsDirectory(Directory):
    def get_exports(self):
        yield '', 'index', 'NetTools server', None



    def index (self):
        form  = Form(enctype = "multipart/form-data")  # enctype for file upload

 form.add(StringWidget, name = "qty_length", value = "1.0", size = 30)
        length_options = []
        for key in Length:
            length_options.append(key)
      
 form.add(SingleSelectWidget, name = "unit_length",
           value = "meter",  
           options=length_options
        )   


 form.add(StringWidget, name = "qty_mass", value = "1.0", size = 30)
        mass_options = []
        for key in Mass:
            mass_options.append(key)
      
 form.add(SingleSelectWidget, name = "unit_mass",
           value = "kilogram",  
           options=mass_options
        )   

 form.add(StringWidget, name = "qty_time", value = "1.0", size = 30)
        time_options = []
        for key in Time:
            time_options.append(key)
      
 form.add(SingleSelectWidget, name = "unit_time",
           value  = "second",  
           options=time_options
        )   

 form.add_hidden("time",   value = time.time()) 
        form.add_submit("submit", "submit")

 def render [html] ():
            renderheader(__title__)
            if 1:
              """
              

Length Unit

%s%s
""" % (form.get_widget("qty_length").render(), form.get_widget("unit_length").render()) """

Mass Unit

%s%s
""" % (form.get_widget("qty_mass").render(), form.get_widget("unit_mass").render()) """

Time Unit

%s%s
""" % (form.get_widget("qty_time").render(), form.get_widget("unit_time").render()) "The quantified unit will be converted in other equivalent units." renderfooter(form, __version__, __catalog__, __author__) if not form.is_submitted(): return page('UnitsDirectory', render(), style= BASIC_FORM_CSS) def process [html] (): processheader("Unit Utility") calctime_start = time.time() "" "" "" "" "
"
qty = float(form.get("qty_length"))
unit = form.get("unit_length")

"

Length Conversion

"


'' % (qty, unit) for u in Length.keys(): " " % (convert(Length, qty, unit, u),u) "
Input [%s %s]
%s%s
"
"
"
qty = float(form.get("qty_mass"))
unit = form.get("unit_mass")

"

Mass Conversion

"
'' % (qty, unit) for u in Mass.keys(): " " % (convert(Mass, qty, unit, u),u) "
Input [%s %s]
%s%s
"
"
"
qty = float(form.get("qty_time"))
unit = form.get("unit_time")
"

Time Conversion

"

'' % (qty, unit) for u in Time.keys(): " " % (convert(Time, qty, unit, u),u) "
Input [%s %s]
%s%s
"
"
" processfooter(form, calctime_start, homeurl, __url__) return process()

Yuck! the html output code is being translated by a browser! Use your browswerĊ› View Source facility to recover the original contents.

Codes are provided for educational use. Please make the proper attribution.


Only the fundamental units for Length, Mass and Time are considered here, we will add unit conversions for other quantities such as Area, Volume/Capacity, Work/Energy, Pressure and others.

The solver page may be accessed at http://extreme.adorio-research.org/solvers/units/.

The Python code units.py which is called by the above QP interface code above is at http://adorio-research.org/wordpress/?p=10377.

Our reference to conversion values is http://physics.nist.gov/Pubs/SP811/appenB9.html#LENGTH
for length units. The Mass and Time units are similarly obtained from NIST.

Sunday, January 23, 2011

Creating a network tools service page

I first wrote the following code in 2008 and it is still running. People who work in Linux are blessed with plenty of open source and free software tools for networking. Here are five tools which we use in our nettools service page:ping, whois, nslookup, traceroute, and dig.


We show the code, using the mean and lean QP web framework for future reference.
More details are avaiable from the link Digital Explorations, our main blog site.




from qp.sites.extreme.lib.config import image_dir, homeurl

__version__ = "0.0.1 2008.10.23"
__date__    = "2008.10.23"
__catalog__ = "SERV-NETTOOLS-001"
__author__  = "ernesto.adorio@gmail.com"
__url__     = homeurl + "services/nettools/"
__title__   = "NetTools Service Page"



from qp.fill.css        import BASIC_FORM_CSS
from qp.fill.directory  import Directory
from quixote.errors     import PublishError
from quixote.util       import dump_request
from qp.fill.form       import Form
from qp.fill.widget    import CheckboxWidget,StringWidget,SingleSelectWidget, StringWidget, TextWidget
from qp.pub.common      import header, footer, page, redirect
from   qp.sites.extreme.lib.uicommon import renderheader, renderfooter, processheader, processfooter
import time
import urllib



import urllib, urllib2, os

from  qp.sites.extreme.lib.webutils  import before, after
from  qp.sites.extreme.lib import config


def getstatusoutput(command):
     """
     Fredrick Lundh 
     http://mail.python.org/pipermail/python-list/2006-October/406444.html 
     """
     from subprocess import Popen, PIPE, STDOUT
     p = Popen(command, stdout=PIPE, stderr=STDOUT, shell=True)
     s = p.stdout.read().split("\n")
     return p.wait(), s

def isAddressOK(Address):
    """
    Check for bad addresses.
    """
    #Quickie checks.
    if " " in Address:
       return False,"Embedded blanks are not allowed inside addresses"
    if "." not in Address:
       return False
    if Address.startswith("-"): # attempt to hack?
       return False, "Addresses do not start with a dash (-)."

    #Pure numeric address?
    return True, "Ok"


class NetToolsPage(Directory):
    def get_exports(self):
        yield '', 'index', 'NetTools server', None



    def index (self):
        form  = Form(enctype = "multipart/form-data")  # enctype for file upload

 form.add(StringWidget, name = "Address", value = "www.adorio-research.org", size = 30)
        
 form.add(SingleSelectWidget, name = "service",
        value = "ping",  options=[("ping", "ping"),
        ("whois", "whois"), 
        ("nslookup", "nslookup"),
        ("traceroute","traceroute"),
        ("dig", "dig")])

                 
 form.add_hidden("time",   value = time.time()) 
        form.add_submit("submit", "submit")


 
 def render [html] ():
            renderheader(__title__)
     
            """
%sAddress or domain name
%sService command
Please wait for the output to be displayed after clicking on the submit button! """ % (form.get_widget("Address").render(), form.get_widget("service").render()) renderfooter(form, __version__, __catalog__, __author__) if not form.is_submitted(): return page('NetToolsPage', render(), style= BASIC_FORM_CSS) def process [html] (): processheader("NetTools Server") calctime_start = time.time() Address = form.get("Address") service = form.get("service") if service == "ping": command = "ping -c 5 %s" % Address else: command = "%s %s" % (service, Address) status = 0 addressOK, errormessage = isAddressOK(Address) if addressOK: status, output = getstatusoutput(command) if status == 0: """
"""
                  for line in output:
                     "\n%s
" % line
                  """
""" else: """ Something is wrong with the input Address. [%s] Setting status to -1. """ % errormessage status = -1 if status: "Error in processing command[%s], exit code =%s " % (command, status) """Check for valid addresses. Embedded blanks are not allowed and must contain a dot "." and must not start with a "-".
""" processfooter(form, calctime_start, homeurl, __url__) return process()

Visitors to this web service will be naturally curious what IP address of their ISP is. We will add this feature SOON!

We just did! Take a look at the first screen shot. The second screenshot is the output page.




I just imported the get_sesson from qp.pub.common and made the call
remote_address= get_session.get_remote_address()

You can see the result in the first image above.

Sunday, September 5, 2010

The online box/violin/vio plot generator

It is our adage in our work for our online extreme solvers not to wait for a perfect implementation but rather to present quickly a working solver in order to be more productive with our precious time. Our solver is part of rplotpage, a page dedicated to using R to make quick statistical graphics for analysis.

Our solver is in R and we present the QP/QPY code, not to impress but to remind the developer to add more features to make the solver more useful, robust, fast and reliable.

"""
file    boxviolin.qpy
version 2010.09.01   0.0.1  first version
"""

__version__ = "0.0.1 2010.09.01"
__author__  = "ernesto.adorio@gmail.com"
__title__   = "Box-Violin Plots using R"
__file__    = "matrix.qpy"
__catalog__ = "BOXVIOLINPLOT-xxxx"
__url__     = "/solvers/rplotpage/box-violin/"


import time
import tempfile
import commands
import os

from   qp.fill.directory  import Directory
from   qp.fill.form       import Form, StringWidget, TextWidget,CheckboxWidget,SingleSelectWidget
from   qp.fill.css   import BASIC_FORM_CSS

from   qp.sites.extreme.lib.tmpfilesmanager import TmpFilesManager
from   qp.sites.extreme.lib.uicommon        import renderheader, renderfooter, processheader, processfooter
from   qp.pub.common                        import page
from   qp.sites.extreme.lib.checkinput      import checkInputs, getFormStrings
from   qp.sites.extreme.lib.qpyutils        import printRlines, showLogo
from   qp.sites.extreme.lib.webutils        import vecRead, GraphicsFile, as_R_cvector, as_R_vector, as_R_matrix, str2file,runRcode

from   qp.sites.extreme.lib import config

from   qp.sites.extreme.lib import getlists

def Solve(fields):
    # Get the fields.
    (gX, gY, orientation, notchedQ,  color, input, output, main, sub, data)= fields
    
    color = color.strip()
    # Start of R code.
    fname1    = GraphicsFile(str("png"))
    barefile1 = fname1.split(str("/"))[-1]

    if input == "lists":
       values, names = getlists.getlists(data)
    else:
       str2file(data, "%s.datR" % barefile1)       
    
    Rcode = """
library("UsingR")
library("vioplot")
png('%s', width=%s*72, height=%s*72)\n""" % (fname1,gX,gY)

    if input == "lists":
       Rcode += values
       Rcode += "%s(" % output  
       n = len(names)
       for i, name in enumerate(names):      
         if i == 0:
            Rcode += "%s" % name
         else:
            Rcode += ",%s"  % name
    else:
       if output != "vioplot": 
          Rcode += 'D<- read.table("%s.datR", header= %s)\n' %(barefile1, "T" if input == "dataframeh" else "F")
          Rcode += "%s(D" % output  
       else:
          return "Error: vioplot do not work with dataframes (will fix this later)." 

    Rcode += ",col=\"%s\"" % color 

    if output != "violinplot":
       Rcode += ",horizontal= %s" %("T" if (orientation== "horizontal") else "F")
    if output== "boxplot": 
       if notchedQ == "True":
          Rcode += ",notch=T"
       Rcode += ",main=\"%s\",sub=\"%s\"" % (main, sub)
    Rcode += ")\n"
    Rcode += """
graphics.off()
#img %s
    """ % (barefile1,)

    (status, output) = runRcode(Rcode)
    return output
    

class BoxviolinPage(Directory):
    def get_exports(self):
        yield ('', 'index', 'MatrixPlot', '')
           


    def index[html](self):
        form  = Form(enctype="multipart/form-data")  # enctype for file upload
        form.add(StringWidget, name="gX", title="gX", value = "6", size=3)
        form.add(StringWidget, name="gY", title="gY", value = "6", size=3)

        sample = """
    A =  [0.48277807, 0.55883118, 1.16229686, -2.46396356,  0.51974668,-0.01998613,
  -0.86259931, -1.06209308, -0.15671515,  0.38586572, -0.58470602,  0.31188390,
 -1.68227059,  0.23231185,  0.31535337, -0.26056577, -0.79349169, -0.94405202,
   0.24571925, -0.09696371, -0.23873567, -0.04282470, -1.14515572, -0.27451771,
  -0.34858889,  0.82800299, -0.95087183,  0.96757912, -0.15727265,  0.18871157,
 -0.86204394,  0.38754598,  1.50002723,  1.12436546,  0.61330870,  1.06893060,
  -1.41018422,  0.51767624, -0.45544199, -2.51855547, -0.77679863, -1.15285965,
   1.63166143,  0.65999935, -0.32582916,  1.56306037,  0.64053237,  0.01575183,
   0.46375195, -0.59255240,  0.10008879, -1.84196389, -1.52021625, -0.65748902,
   1.37202728,  1.02064967, -0.67488492, -0.60657784, -1.03975969, -0.33201024,
  -0.21770026, -0.35978620, -1.27524339, -0.98302583, -0.14502137,  0.54930432,
  -0.62277989,  0.30322268,  0.37256666, -0.32351923,  0.29565189, -0.18387578,
  -0.19855784, -1.15357907, -0.22684307, -1.45764974, -1.10523354, -0.04629259,
  0.36703816, -0.74684309, -1.61969633,  0.58941017,  -0.64764459,  0.11335716,
  -0.57165179, -0.02908054, -2.99190083, -0.13697042, -0.93464799, -0.09097572,
   0.77899241,  0.91366189, -0.36055108,  0.53784267, -2.15995157,  0.58759839,
  -2.36184597, -0.77934578,  0.80640923, -0.28747470]

B = [28.833158, 25.579569, 24.135939, 22.743252, 29.847344, 25.390941, 26.578796,
 28.889899, 26.165342, 33.802780, 22.116170, 15.447919, 20.142984, 25.902806,
 19.445479, 28.119898, 26.801888, 29.305805, 30.587547, 34.293373, 28.956599,
 28.371756, 30.963548, 16.572734, 35.695663, 32.681236, 25.234438, 19.401117,
 23.782763, 26.520000, 39.802655, 21.715052, 25.242914, 21.716478, 26.979986,
 25.078014,  9.517322, 27.996393, 35.096322, 30.132288, 33.942315, 26.993927,
 27.792392, 15.718047, 36.352729, 28.949376, 20.445088, 29.874274, 29.586799,
 33.060320, 28.655216, 27.505567, 26.661354, 29.419386, 27.377346, 23.985406,
 15.868329, 17.621934, 35.456224, 26.697508, 28.179293, 27.151317, 28.227135,
 23.882481, 45.793041, 22.712121, 29.222936, 27.619567, 28.854152, 23.744545,
 23.856285, 34.919047, 40.032500, 32.566862, 34.253867, 31.959225, 29.008039,
 29.965751, 20.319337, 39.284185, 29.676313, 34.686862, 22.103798, 38.521644,
 30.967211, 18.150335, 21.622198, 24.717461, 29.424366, 34.169033, 27.881900,
 28.577999, 29.547534, 35.179072, 27.350809, 35.940215, 31.848857, 23.747476,
 27.135937, 29.275092]
        """
        form.add(TextWidget, name = "data",   title="Input", rows="25", cols = "90", value = sample)

        form.add(SingleSelectWidget, name= "input", title="Input", value = "lists", options=
           [ ("lists",      "R or Python lists"),
             ("dataframeh", "dataframe with headers"),
             ("dataframe",  "dataframe without headers")
           ])

        form.add(SingleSelectWidget, name="orientation", title="Orientation", value = "vertical", options= 
             [("vertical",   "vertical"), 
              ("horizontal", "horizontal")])

        form.add(CheckboxWidget, name="notched?", title="Notched?",value = True)

        form.add(SingleSelectWidget, name= "output", title="Output Graph ", value="boxplot", options=
           [("boxplot",     "Box plot"),
            ("violinplot",  "Violin plot"),  
            ("vioplot",     "Vioplot")
            # ("violinbox",   "Violin with boxplot"),
            # ("viobox",      "Vioplot with boxplot")
           ])

        form.add(StringWidget,  name = "color",  title="color", size = 35, value = "blue")
        form.add(StringWidget,  name = "main", title="Main Title",    size = 35, value = "Box plot")
        form.add(StringWidget,  name = "sub", title="Sub Title",     size = 35, value = "Extreme Computing")
          

        form.add_hidden("time",   value = time.time())
        form.add_submit("submit", "submit")

        def render [html] ():
            renderheader(__title__)

            """
            





%s %s%s %s %s
%s %s%s%s
%s
""" % (form.get_widget("gX").render(),
form.get_widget("gY").render(),
form.get_widget("orientation").render(),
form.get_widget("notched?").render(),
form.get_widget("color").render(),

form.get_widget("input").render(),
form.get_widget("output").render(),
form.get_widget("main").render(),
form.get_widget("sub").render(),
form.get_widget("data").render())
"""
Notched?, Main title and subtitle only works for boxplot.
Orientation only works for both box and vioplot.
"""

renderfooter(form, __version__, __catalog__, __author__)


if not form.is_submitted():
return page('Box Violin Plot page', render(), style= BASIC_FORM_CSS)

def process [html] ():
processheader(__title__)
calctime_start = time.time()

# Get the problem parameters
(gX, gY,orientation, notchedQ, color, input, output, main, sub, data) = getFormStrings(form,
[
"gX", "gY", "orientation", "notched?", "color",
"input", "output", "main", "sub",
"data"
])


output=Solve((gX, gY, orientation, notchedQ, color, input, output, main, sub, data))
"
"
            printRlines(output)
            "
"
showLogo("Rlogo.jpg")
processfooter(form, calctime_start, "./", __url__)
process()

Currently the box|violin|vio plot will only plot one of these types for each list.
The vioplot cannot be drawn for a dataframe and does not accept a main and a sub title.See previous posting on these topics.


Sep. 10: Vioplot can now process dataframes, by rewriting the code to draw the columns. If D is a dataframe with 3 columns for example, the code to draw a vioplot for this set is is vioplot(D[[1]], D[[2]], D[[3]]).




We will remove some of the limitation in a future version.

Wednesday, August 25, 2010

Drawing boxplots, violin plots using R Part 1

A box plot is a graphical display of the distribution of data, showing all the quartiles an possible possible outliers. Assuming the box plot is drawn
vertically, The rectangular box lower edge denotes the first quartile, while the upper edge denotes the third quartile. The median is denoted by a line inside the box. Some versions also indicate the position of the arithemetic mean by a cross or a dot. Whiskers are drawn up to the data within 1.5(fs) of the lower and upper quartiles. where fs is the fourth spread, the difference of Q3 and Q1. Data points beyond these minimum and upper ranges are drawn for each data beyond these range and are labelled outliers. The box plot however cannot display the distribution of the data especially for multimodal data.
A Boxplot can be drawn for each column of a matrix.

The violin plot removes any shortcomings of the boxplot by adding a KDE (kernel density estimator to outline the distribution of the data. R usually draws
only a boxplot for one vector only. There are at least two libraries which offers violinplots. One is the violinplot function from the UsingR package of Verzanni. Another is the vioplot library which offers the vioplot function.

Here is an illustration of the differences between boxplot, violinplot and vioplot

library(UsingR)
library(vioplot)

png("box-viol.png", 6*72, 6*72)
X <- rbind(rnorm(50, 5, 2), rnorm(25, 1), rnorm(10, 3))
X <- as.vector(X)
violinplot(X,X,X)
vioplot(X, at=2,col="green", add = T)
boxplot(X, at=1,col="red", add = T)
dev.off()
Three violinplots are shown and the boxplot and vioplot are superimposed on the first and second plot respectively.
Box and Violin Plots example
Notice that in the desire to look more a violin, the vioplot will sometimes cut off at the Q3 + 1.5 fs or at the Q1-1.5fs, which may hide any outlier points!






orientation? positioning? outliers? matrix?dataframe?
boxplotbothyesyesyesyes
violinplotvertical onlyno*nonoyes
vioplotbothyesnonono


In orientation, the box plot and vioplot can be drawn horizontally and each ca

n be positioned at a specific location on the x or y axes using the graphics parameter at="value". As we can see in the above figure, for outliers, the vioplot may stop at the fence values creating a flat top or flat bottom. and hiding the extreme values specifically the minimum and maximum value in the data.The violin plot does show the minimum and maximum of data, but it is hard to know where the fs spreads lie. Both violin plot and vioplot cannot handle input matrix data. You have to specify each column of the matrix to these functions.

The boxplot may have an optional notch to emphasize the location of the median.

In my opinion, a violin plot with a box plot superimposed is the current best way to show distribution and any muliple modalities of the data.

We are still wondering what input format we shall make for our online solver at extreme-solvers.blogspot.com, which we shall show in Part 2 of this article.

We hope that the developers of these plots will implement other features available in the others, like vioplot able to do dataframes.

Monday, August 23, 2010

Drawing scatterplots and sunflower pots with R

Scatter plot and sunflower plots differ only in that the latter draws a flower stem for each repeat of a data point. In other words, it is easy to see multipicities of data points hidden in a plain scatter plot. Points are compared up to a specified number of significant digits.

Here are examples of plain scatter plot and sunflower plots for the same data (two column xy coordinates)


1 3
1 3
1 4
1.5 4
1.5 4
1.5 4
1.75 5
1.75 1.25
2 5
2 6
2 6
2 6
2 6
2 6
2 8


Here is the scatter plot drawn for the above data. Count the points drawn with the points in the data. They don't match!


On the other hand, the sunflower plot will allow a viewer most of the time an idea of the number of points displayed in the graph. Here is the sunflower plot for the same data.


Our solver which offers a scatterplot/sunflower plot generation page is in
http://extreme.adorio-research.org/solvers/rplotpage/scatter/