Showing posts with label boxplot. Show all posts
Showing posts with label boxplot. Show all posts

Sunday, September 5, 2010

The online box/violin/vio plot generator

It is our adage in our work for our online extreme solvers not to wait for a perfect implementation but rather to present quickly a working solver in order to be more productive with our precious time. Our solver is part of rplotpage, a page dedicated to using R to make quick statistical graphics for analysis.

Our solver is in R and we present the QP/QPY code, not to impress but to remind the developer to add more features to make the solver more useful, robust, fast and reliable.

"""
file    boxviolin.qpy
version 2010.09.01   0.0.1  first version
"""

__version__ = "0.0.1 2010.09.01"
__author__  = "ernesto.adorio@gmail.com"
__title__   = "Box-Violin Plots using R"
__file__    = "matrix.qpy"
__catalog__ = "BOXVIOLINPLOT-xxxx"
__url__     = "/solvers/rplotpage/box-violin/"


import time
import tempfile
import commands
import os

from   qp.fill.directory  import Directory
from   qp.fill.form       import Form, StringWidget, TextWidget,CheckboxWidget,SingleSelectWidget
from   qp.fill.css   import BASIC_FORM_CSS

from   qp.sites.extreme.lib.tmpfilesmanager import TmpFilesManager
from   qp.sites.extreme.lib.uicommon        import renderheader, renderfooter, processheader, processfooter
from   qp.pub.common                        import page
from   qp.sites.extreme.lib.checkinput      import checkInputs, getFormStrings
from   qp.sites.extreme.lib.qpyutils        import printRlines, showLogo
from   qp.sites.extreme.lib.webutils        import vecRead, GraphicsFile, as_R_cvector, as_R_vector, as_R_matrix, str2file,runRcode

from   qp.sites.extreme.lib import config

from   qp.sites.extreme.lib import getlists

def Solve(fields):
    # Get the fields.
    (gX, gY, orientation, notchedQ,  color, input, output, main, sub, data)= fields
    
    color = color.strip()
    # Start of R code.
    fname1    = GraphicsFile(str("png"))
    barefile1 = fname1.split(str("/"))[-1]

    if input == "lists":
       values, names = getlists.getlists(data)
    else:
       str2file(data, "%s.datR" % barefile1)       
    
    Rcode = """
library("UsingR")
library("vioplot")
png('%s', width=%s*72, height=%s*72)\n""" % (fname1,gX,gY)

    if input == "lists":
       Rcode += values
       Rcode += "%s(" % output  
       n = len(names)
       for i, name in enumerate(names):      
         if i == 0:
            Rcode += "%s" % name
         else:
            Rcode += ",%s"  % name
    else:
       if output != "vioplot": 
          Rcode += 'D<- read.table("%s.datR", header= %s)\n' %(barefile1, "T" if input == "dataframeh" else "F")
          Rcode += "%s(D" % output  
       else:
          return "Error: vioplot do not work with dataframes (will fix this later)." 

    Rcode += ",col=\"%s\"" % color 

    if output != "violinplot":
       Rcode += ",horizontal= %s" %("T" if (orientation== "horizontal") else "F")
    if output== "boxplot": 
       if notchedQ == "True":
          Rcode += ",notch=T"
       Rcode += ",main=\"%s\",sub=\"%s\"" % (main, sub)
    Rcode += ")\n"
    Rcode += """
graphics.off()
#img %s
    """ % (barefile1,)

    (status, output) = runRcode(Rcode)
    return output
    

class BoxviolinPage(Directory):
    def get_exports(self):
        yield ('', 'index', 'MatrixPlot', '')
           


    def index[html](self):
        form  = Form(enctype="multipart/form-data")  # enctype for file upload
        form.add(StringWidget, name="gX", title="gX", value = "6", size=3)
        form.add(StringWidget, name="gY", title="gY", value = "6", size=3)

        sample = """
    A =  [0.48277807, 0.55883118, 1.16229686, -2.46396356,  0.51974668,-0.01998613,
  -0.86259931, -1.06209308, -0.15671515,  0.38586572, -0.58470602,  0.31188390,
 -1.68227059,  0.23231185,  0.31535337, -0.26056577, -0.79349169, -0.94405202,
   0.24571925, -0.09696371, -0.23873567, -0.04282470, -1.14515572, -0.27451771,
  -0.34858889,  0.82800299, -0.95087183,  0.96757912, -0.15727265,  0.18871157,
 -0.86204394,  0.38754598,  1.50002723,  1.12436546,  0.61330870,  1.06893060,
  -1.41018422,  0.51767624, -0.45544199, -2.51855547, -0.77679863, -1.15285965,
   1.63166143,  0.65999935, -0.32582916,  1.56306037,  0.64053237,  0.01575183,
   0.46375195, -0.59255240,  0.10008879, -1.84196389, -1.52021625, -0.65748902,
   1.37202728,  1.02064967, -0.67488492, -0.60657784, -1.03975969, -0.33201024,
  -0.21770026, -0.35978620, -1.27524339, -0.98302583, -0.14502137,  0.54930432,
  -0.62277989,  0.30322268,  0.37256666, -0.32351923,  0.29565189, -0.18387578,
  -0.19855784, -1.15357907, -0.22684307, -1.45764974, -1.10523354, -0.04629259,
  0.36703816, -0.74684309, -1.61969633,  0.58941017,  -0.64764459,  0.11335716,
  -0.57165179, -0.02908054, -2.99190083, -0.13697042, -0.93464799, -0.09097572,
   0.77899241,  0.91366189, -0.36055108,  0.53784267, -2.15995157,  0.58759839,
  -2.36184597, -0.77934578,  0.80640923, -0.28747470]

B = [28.833158, 25.579569, 24.135939, 22.743252, 29.847344, 25.390941, 26.578796,
 28.889899, 26.165342, 33.802780, 22.116170, 15.447919, 20.142984, 25.902806,
 19.445479, 28.119898, 26.801888, 29.305805, 30.587547, 34.293373, 28.956599,
 28.371756, 30.963548, 16.572734, 35.695663, 32.681236, 25.234438, 19.401117,
 23.782763, 26.520000, 39.802655, 21.715052, 25.242914, 21.716478, 26.979986,
 25.078014,  9.517322, 27.996393, 35.096322, 30.132288, 33.942315, 26.993927,
 27.792392, 15.718047, 36.352729, 28.949376, 20.445088, 29.874274, 29.586799,
 33.060320, 28.655216, 27.505567, 26.661354, 29.419386, 27.377346, 23.985406,
 15.868329, 17.621934, 35.456224, 26.697508, 28.179293, 27.151317, 28.227135,
 23.882481, 45.793041, 22.712121, 29.222936, 27.619567, 28.854152, 23.744545,
 23.856285, 34.919047, 40.032500, 32.566862, 34.253867, 31.959225, 29.008039,
 29.965751, 20.319337, 39.284185, 29.676313, 34.686862, 22.103798, 38.521644,
 30.967211, 18.150335, 21.622198, 24.717461, 29.424366, 34.169033, 27.881900,
 28.577999, 29.547534, 35.179072, 27.350809, 35.940215, 31.848857, 23.747476,
 27.135937, 29.275092]
        """
        form.add(TextWidget, name = "data",   title="Input", rows="25", cols = "90", value = sample)

        form.add(SingleSelectWidget, name= "input", title="Input", value = "lists", options=
           [ ("lists",      "R or Python lists"),
             ("dataframeh", "dataframe with headers"),
             ("dataframe",  "dataframe without headers")
           ])

        form.add(SingleSelectWidget, name="orientation", title="Orientation", value = "vertical", options= 
             [("vertical",   "vertical"), 
              ("horizontal", "horizontal")])

        form.add(CheckboxWidget, name="notched?", title="Notched?",value = True)

        form.add(SingleSelectWidget, name= "output", title="Output Graph ", value="boxplot", options=
           [("boxplot",     "Box plot"),
            ("violinplot",  "Violin plot"),  
            ("vioplot",     "Vioplot")
            # ("violinbox",   "Violin with boxplot"),
            # ("viobox",      "Vioplot with boxplot")
           ])

        form.add(StringWidget,  name = "color",  title="color", size = 35, value = "blue")
        form.add(StringWidget,  name = "main", title="Main Title",    size = 35, value = "Box plot")
        form.add(StringWidget,  name = "sub", title="Sub Title",     size = 35, value = "Extreme Computing")
          

        form.add_hidden("time",   value = time.time())
        form.add_submit("submit", "submit")

        def render [html] ():
            renderheader(__title__)

            """
            





%s %s%s %s %s
%s %s%s%s
%s
""" % (form.get_widget("gX").render(),
form.get_widget("gY").render(),
form.get_widget("orientation").render(),
form.get_widget("notched?").render(),
form.get_widget("color").render(),

form.get_widget("input").render(),
form.get_widget("output").render(),
form.get_widget("main").render(),
form.get_widget("sub").render(),
form.get_widget("data").render())
"""
Notched?, Main title and subtitle only works for boxplot.
Orientation only works for both box and vioplot.
"""

renderfooter(form, __version__, __catalog__, __author__)


if not form.is_submitted():
return page('Box Violin Plot page', render(), style= BASIC_FORM_CSS)

def process [html] ():
processheader(__title__)
calctime_start = time.time()

# Get the problem parameters
(gX, gY,orientation, notchedQ, color, input, output, main, sub, data) = getFormStrings(form,
[
"gX", "gY", "orientation", "notched?", "color",
"input", "output", "main", "sub",
"data"
])


output=Solve((gX, gY, orientation, notchedQ, color, input, output, main, sub, data))
"
"
            printRlines(output)
            "
"
showLogo("Rlogo.jpg")
processfooter(form, calctime_start, "./", __url__)
process()

Currently the box|violin|vio plot will only plot one of these types for each list.
The vioplot cannot be drawn for a dataframe and does not accept a main and a sub title.See previous posting on these topics.


Sep. 10: Vioplot can now process dataframes, by rewriting the code to draw the columns. If D is a dataframe with 3 columns for example, the code to draw a vioplot for this set is is vioplot(D[[1]], D[[2]], D[[3]]).




We will remove some of the limitation in a future version.

Wednesday, August 25, 2010

Drawing boxplots, violin plots using R Part 1

A box plot is a graphical display of the distribution of data, showing all the quartiles an possible possible outliers. Assuming the box plot is drawn
vertically, The rectangular box lower edge denotes the first quartile, while the upper edge denotes the third quartile. The median is denoted by a line inside the box. Some versions also indicate the position of the arithemetic mean by a cross or a dot. Whiskers are drawn up to the data within 1.5(fs) of the lower and upper quartiles. where fs is the fourth spread, the difference of Q3 and Q1. Data points beyond these minimum and upper ranges are drawn for each data beyond these range and are labelled outliers. The box plot however cannot display the distribution of the data especially for multimodal data.
A Boxplot can be drawn for each column of a matrix.

The violin plot removes any shortcomings of the boxplot by adding a KDE (kernel density estimator to outline the distribution of the data. R usually draws
only a boxplot for one vector only. There are at least two libraries which offers violinplots. One is the violinplot function from the UsingR package of Verzanni. Another is the vioplot library which offers the vioplot function.

Here is an illustration of the differences between boxplot, violinplot and vioplot

library(UsingR)
library(vioplot)

png("box-viol.png", 6*72, 6*72)
X <- rbind(rnorm(50, 5, 2), rnorm(25, 1), rnorm(10, 3))
X <- as.vector(X)
violinplot(X,X,X)
vioplot(X, at=2,col="green", add = T)
boxplot(X, at=1,col="red", add = T)
dev.off()
Three violinplots are shown and the boxplot and vioplot are superimposed on the first and second plot respectively.
Box and Violin Plots example
Notice that in the desire to look more a violin, the vioplot will sometimes cut off at the Q3 + 1.5 fs or at the Q1-1.5fs, which may hide any outlier points!






orientation? positioning? outliers? matrix?dataframe?
boxplotbothyesyesyesyes
violinplotvertical onlyno*nonoyes
vioplotbothyesnonono


In orientation, the box plot and vioplot can be drawn horizontally and each ca

n be positioned at a specific location on the x or y axes using the graphics parameter at="value". As we can see in the above figure, for outliers, the vioplot may stop at the fence values creating a flat top or flat bottom. and hiding the extreme values specifically the minimum and maximum value in the data.The violin plot does show the minimum and maximum of data, but it is hard to know where the fs spreads lie. Both violin plot and vioplot cannot handle input matrix data. You have to specify each column of the matrix to these functions.

The boxplot may have an optional notch to emphasize the location of the median.

In my opinion, a violin plot with a box plot superimposed is the current best way to show distribution and any muliple modalities of the data.

We are still wondering what input format we shall make for our online solver at extreme-solvers.blogspot.com, which we shall show in Part 2 of this article.

We hope that the developers of these plots will implement other features available in the others, like vioplot able to do dataframes.