RMEQ

A tool for Computing Equivalence Groups in Repeated Measures Studies

Aaron M. Cohen and Shannon K. McWeeney

Introduction

A hallmark of bioinformatics tool and algorithm evaluation is the comparison of results for a large number of systems on the same set of test cases to compare performance. Our tool, RMEQ (Repeated Measure Equivalents) makes it simple to analyze systems evaluated in this manner and separate systems into distinct, statistically indistinguishable performance groups.

Requirements

Python version 2.4 or higher (available at www.python.org).
R version 2.4.0 or higher (available at www.r-project.org).
RPy version 1.0RC2 or higher (available at rpy.sourceforge.net).

Note that the version of RPy required depends upon the versions of R and Python that you have or wish to install.

Usage

python rmeq.py datafile system-header test-case-header score-header alpha (min|max) (rank)?

The program takes several arguments:

datafile is the filepath of the R-formatted data file, which should have column headers and sequentially numbered rows corresponding to each system/test case combination.
system-header test-case-header and score-header arguments designate the column headers that specify the system name, the test case name, and the score respectively.
The alpha argument specifies the confidence value for the statistical tests.
The sixth argument is either “max” or “min” and indicates whether the best system is the one with the minimum (or maximum) sum of the scores across all test cases.
Argument seven is optional. The only valid value is “rank”, this specifies that a test case rank transformation be performed on the scores prior to executing the rest of the script.

Downloads

Download the Python source code here.

Sample Run

To get you started, here is some artificially created sample data in the correct input format:

	SYSTEM	TASK	SCORE
1	SA	T1	10
2	SB	T1	5
3	SC	T1	7
4	SA	T2	9
5	SB	T2	7
6	SC	T2	8
7	SA	T3	9
8	SB	T3	5
9	SC	T3	4
10	SA	T4	10
11	SB	T4	5
12	SC	T4	6

Download the sample data file here.

Running RMEQ on the above data produces the following command-line output on a Windows XP machine:

C:>python rmeq.py rmeq-sample-data.tsv.txt SYSTEM TASK SCORE 0.05 max rank

RHOME= C:\Program Files\R\R-2.4.0
RVERSION= 2.4.0
RVER= 2040
RUSER= C:\
Loading the R DLL C:\Program Files\R\R-2.4.0\bin\R.dll .. Done.
Loading Rpy version 2040 .. Done.
Creating the R object 'r' .. Done

RANK GROUP 1 (top = SA): ('SA',)
RANK GROUP 2 (top = None): ('SB', 'SC')

C:>

The first group of lines is produced by the RPy library when initializing the interface to R. The next group of two lines show the output of RMEQ. The SA system was placed alone in the top rank group. Systems SB and SC are both placed in the second rank group.

Citation

To cite RMEQ, please use the following reference:

Cohen AM, McWeeney SK. RMEQ: A tool for computing equivalence groups in repeated measures studies. In: Linking Literature, Information and Knowledge for Biology: Proceedings of the BioLINK2008 Workshop; 2008; Toronto, ON; (in press).

References

Maxwell, S. E. and Delaney, H. D. (2003) Designing Experiments and Analyzing Data: A Model Comparison Perspective.
Lawrence Erlbaum, Mahwah, New Jersey. Moreira, W. and Warnes, G. R. (2007) RPy (R from Python). http://rpy.sourceforge.net/
Python Software Foundation (2007) Python Programming Language. http://www.python.org/
R Development Core Team (2007) R: A Language and Environment for Statistical Computing. http://www.R-project.org