ColdFusion in Context: Rank Order Correlation Coefficient

Suppose you want to test your supposition that there is a relationship between the rank that experimental subjects will achieve in two different settings, perhaps their rank in Math class and their rank in Science. The following procedure lets you enter values and computes their Rank Order Correlation Coefficient ("rho"), a value you can use to determine the likelihood that the observed relationship between the ranks of subjects in one setting and the ranks of the same subjects in a different setting occurred by chance. During this demonstration, you'll manipulate data entering and leaving a textarea field, sort pairs of values, determine their ranks (including the average rank for "ties"), and implement a multi-step algorithm in ColdFusion.

Overview

Everything will take place on a single page: rho.cfm. The bulk of it will be a form whose textarea fields are shaped into tall, thin columns and whose surroundings are formatted with table cells. The user will enter data into the first two columns and then click the calculate button. This will cause the page to be reloaded with the data, only this time the remaining columns and the bottom of the page will contain calculations based on the data.

Initialize and Begin the Form

Give a short explanation and set form field defaults for the fields that the user will enter directly.

Name the form and lay out the first row of its outer table. The first column will hold the raw pairs of input data. After a separator, remaining columns will display data based on these columns.

Rank Order Correlation Coefficient ("rho") of Two Measurements

<!--- Set defaults --->
<cfparam name="form.Xraw" default="">
<cfparam name="form.Yraw" default="">
<cfset form.Xrank="">
<cfset form.Yrank="">
<cfset form.Diff="">
<cfset form.DiffSquared="">

<!--- Begin form and overall table --->
<form name="Ranks" action="rho.cfm" method="post">
<input type="submit" name="doit" value="Submit">
<table border="1">
<tr><td>RAW PAIRS</td>
<td bgcolor="LightGrey"><hr width="0" height="0"></td>
<td>ORDERED</td>
<td>X RANK</td><td>Y RANK</td>
<td>DIFF "D"</td><td>D SQUARED</td></tr>

Display and Store Input Values

Within the first cell of the next row, create an inner table to hold X and Y values. Use the textarea tag to create tall, thin columns for data entry. Use the physical attribute to let line breaks neatly separate the entries. Because a user might use a space, linefeed [chr(10)], carriage return [chr(13)], or some combination of these to separate values, treat all of them as potential delimiters when parsing the data from the form into an array. Put the first point into the first cell, the second point into the second cell, and so on.

<!--- Begin inner table for raw data --->
<tr><td>
<table>
<tr><td>X</td><td>Y</td></tr>

<!--- Show X raw data --->
<tr><td><textarea name="Xraw" wrap="physical" rows="25" cols="3">
<cfoutput>#trim(form.Xraw)#</cfoutput></textarea><br><hr></td>

<!--- Store X raw data --->
<cfset XrawPoints=ArrayNew(1)>
<cfset XrawCount=0>
<cfloop list="#form.Xraw#" index="Value"
delimiters=" #chr(10)##chr(13)#">
<cfset XrawCount=XrawCount+1>
<cfset XrawPoints[XrawCount]=val(Value)>
</cfloop>

<!--- Show Y raw data; end inner table --->
<td><textarea name="Yraw" wrap="physical" rows="25" cols="3">
<cfoutput>#trim(form.Yraw)#</cfoutput></textarea><br><hr></td></tr>
</table>
</td>

<!--- Store Y raw data --->
<cfset YrawPoints=ArrayNew(1)>
<cfset YrawCount=0>
<cfloop list="#form.Yraw#" index="Value"
delimiters=" #chr(10)##chr(13)#">
<cfset YrawCount=YrawCount+1>
<cfset YrawPoints[YrawCount]=val(Value)>
</cfloop>

Sort Pairs Backwards by X Values Alone

Stop if you don't have the same number of values in each input column. When conditions are right, create triplets of strings that are right-justified numeric values (zero-filled from the left so a text sort will still treat them as numbers). Let the first member of the triplet be the X value. For the second member, add 1000 to the number of values and count backward so that the first value of 30 would be 1030 and the last would be 1001. This way, in a descending sort, ties will be settled with the first of the tied items appearing first! The last member of the triplet should be the Y value so that its link to the X value isn't lost. Separate the members with colons. Use a standard ColdFusion function, listsort, to sort the triplets.

<!--- Stop if columns aren't ready --->
<cfif XrawCount neq YrawCount>
  </table></table>
  Cannot proceed until the X and Y raw score columns have have the same number of values.
  <cfabort>  
</cfif>

<!--- Make, then sort triplets like 000X:{1000+Seq}:000Y by X alone --->
<cfset Seq=1000+YrawCount>
<cfset Xlist="">
<cfloop from="1" to="#YrawCount#" index="Point">
<cfset Seq=Seq-1>
<cfset Entry="#numberFormat(XrawPoints[Point], "0000")#:#Seq#:#numberFormat(YrawPoints[Point], "0000")#">
<cfset Xlist=listAppend(Xlist, Entry)>
</cfloop>
<cfset Xsort=listSort(Xlist, "text", "desc")>

Display the Sorted Pairs

After inserting a table cell separator to serve as a reminder that the pairs are not in the same sequence as the original input, construct and display sorted pairs from the newly sorted list. Each of the data points is extracted from the Xsort string whose elements are separated by commas but whose subelements (x:sequence:y) are separated by colons.The ShowByX field will hold XY pairs (separated by a tilde) for display. The chr(10) forces a newline between pairs to cause them to appear on separate lines of the display. The Xord variable will hold a list of X values to be ranked.

<!--- Separate input from processed data --->
<td bgcolor="LightGrey"><hr height="0" width="0"></td>

<!--- Format pairs for display; store X values for manipulation --->
<cfset Xord="">
<cfset form.ShowByX="">
<cfloop list="#Xsort#" index="Pair">
<cfset form.ShowByX=listAppend(form.ShowByX,
"#val(listGetAt(Pair,1,":"))#~#val(listGetAt(Pair,3,":"))#",
"#chr(10)#")>
<cfset Xord=listAppend(Xord,"#val(listGetAt(Pair,1,":"))#")>
</cfloop>

<!--- Display sorted pairs --->
<td>
<table>
<tr><td>by X</td></tr>
<tr><td><textarea name="ShowByX" wrap="physical" rows="25" cols="6">
<cfoutput>#trim(form.ShowByX)#</cfoutput></textarea><br><hr></td></tr>
</table>
</td>

Determine and Display Rank by X

When ranking for statistical purposes, you deal with "ties" by giving them the average of the ranks they would otherwise have had if they were different. For example, if the values are 8,6,6,4,2, then their ranks are 1 [first place],2.5,2.5 [the average of second and third place],4 [fourth place], and 5 [fifth place]. The tie values receive a rank equal to the average rank of the individual tied values. An easy way to do this is check each value to see how many times it's represented, sum ranks from the current rank forward that number of places, divide by that number of places to get the average rank, and then as you step forward, give those values that average rank. (With no ties, average rank defaults to exact rank, and you handle each value separately.)

Once this is done, formatting and displaying the ranks by X is similar to the process used earlier.

<!--- Get X Rank --->
<cfset Xarray=ArrayNew(1)>
<cfset Row=1>
<cfset Max=Listlen(Xord)>
<cfloop condition="Row le Max">

<!--- Get value to be ranked --->
<cfset Seek=val(listGetAt(Xord,Row))>

<!--- Count the duplicates --->
<cfset NrSame=(listValueCount(Xord,"#Seek#"))>

<!--- Sum the like row numbers --->
<cfset LikeRankSum=0>
<cfset LastLike=Row+NrSame-1>
<cfloop from="#Row#" to="#LastLike#" index="RawRank">
<cfset LikeRankSum=LikeRankSum+RawRank>
</cfloop>

<!--- Divide by the number of duplicates --->
<cfset Rank=LikeRankSum/NrSame>

<!--- Post to matching rows --->
<cfloop condition="#Row# le #LastLike#">
<cfset Xarray[Row]=Rank>
<cfset Row=Row+1>
</cfloop>

<!--- Continue until all rows have been processed --->

</cfloop>

<!--- Format X rank --->
<cfloop from="1" to="#listLen(Xord)#" index="Row">
<cfset form.Xrank=listAppend(form.Xrank,
"#Xarray[Row]#",
"#chr(10)#")>
</cfloop>

<td>
<table>
<tr>
<td>
1..N
</td>
<tr>
<td><textarea name="Xrank" wrap="physical" rows="25" cols="6">
<cfoutput>#trim(form.Xrank)#</cfoutput></textarea><br><hr></td>
</tr>
</table>
<td>
<table>
<tr>

Now Repeat for Y

Ordering the pairs by X affected the order of the Y values for the pairs. Now to rank the pairs by their Y order, make pairs consisting of the Y value (left-justified again) concatenated to their current sequence (also left-justified). Sort them in descending order. Then convert them back to regular numbers (subtracting the 1000 you added to the original sequence) for manipulation. Once this is done, the ranking process proceeds in the same manner as it did for the X values. Format and display the Y ranks.

<cfset Ymake="">
<cfset Row=1000>
<cfloop list="#Xsort#" index="Pair">
<cfset Row=Row+1>
<cfset Ymake=listAppend(Ymake, "#numberFormat(listGetAt(Pair,3,":"),'0000')#:#Row#")>
</cfloop>

<cfset Ysort=listSort(Ymake, "text", "desc")>

<!--- Store Y and Row values for manipulation --->
<cfset Yord="">
<cfset Yrealrow="">
<cfloop list="#Ysort#" index="Pair">
<cfset Yord=listAppend(Yord,"#val(listGetAt(Pair,1,":"))#")>
<cfset Yrealrow=listAppend(Yrealrow,"#evaluate(val(listGetAt(Pair,2,":"))-1000)#")>
</cfloop>

<!--- Get Y Rank --->
<cfset Yarray=ArrayNew(1)>
<cfset Row=1>
<cfset Max=Listlen(Yord)>
<cfloop condition="Row le Max">

<!--- Get value to be ranked --->
<cfset Seek=val(listGetAt(Yord,Row))>

<!--- Count the duplicates --->
<cfset NrSame=(listValueCount(Yord,"#Seek#"))>

<!--- Sum the like row numbers --->
<cfset LikeRankSum=0>
<cfset LastLike=Row+NrSame-1>
<cfloop from="#Row#" to="#LastLike#" index="RawRank">
<cfset LikeRankSum=LikeRankSum+RawRank>
</cfloop>

<!--- Divide by the number of duplicates --->
<cfset Rank=LikeRankSum/NrSame>

<!--- Post to matching rows --->
<cfloop condition="#Row# le #LastLike#">
<cfset Yarray[listGetAt(YrealRow,Row)]=Rank>
<cfset Row=Row+1>
</cfloop>

<!--- Continue until all rows have been processed --->

</cfloop>


<!--- Format Y rank --->
<cfloop from="1" to="#listLen(Yord)#" index="Row">
<cfset form.Yrank=listAppend(form.Yrank,
"#Yarray[Row]#",
"#chr(10)#")>
</cfloop>

<td>

1..N
</td>
<tr>
<td><textarea name="Yrank" wrap="physical" rows="25" cols="6">
<cfoutput>#trim(form.Yrank)#</cfoutput></textarea><br><hr></td>
</tr>
</table>
</td>
<td>
<table>
<tr>
<td>
1..N
</td>
<tr>

Subtract the Ranks

Because the point of this technique is to determine how the rank for one measurement stacks up against the rank for a different measurement, the next step is to subtract the ranks. Subtract the Y value from the X value. A negative difference indicates that the second rank is lower. Format and display the differences.

<cfset Diffs="">
<cfloop from="1" to="#listLen(Yord)#" index="Row">
<cfset DiffVal=Xarray[Row]-Yarray[Row]>
<cfset Diffs=listAppend(Diffs, DiffVal,"#chr(10)#")>
</cfloop>
<cfset form.Diff=Diffs>

<td><textarea name="Diff" wrap="physical" rows="25" cols="6">
<cfoutput>#trim(form.Diff)#</cfoutput></textarea><br><hr>
</td>
</tr>
</table>
</td>
<td>
<table>
<tr>
<td>
1..N
</td>

Square the Differences and End the Form

The next step is to square each difference, display the squared differences, and compute and display their sum. Close out the tables and the form.

<cfset DiffSqs="">
<cfset DiffSqSum=0>
<cfloop from="1" to="#listLen(Yord)#" index="Row">
<cfset DiffSq=listGetAt(Diffs,Row, "#chr(10)#")*listGetAt(Diffs,Row, "#chr(10)#")>
<cfset DiffSqSum=DiffSqSum+DiffSq>
<cfset DiffSqs=listAppend(DiffSqs, DiffSq, "#chr(10)#")>
</cfloop>
<cfset form.DiffSquared=DiffSqs>

<tr>
<td><textarea name="DiffSquared" wrap="physical" rows="25" cols="9">
<cfoutput>#trim(form.DiffSquared)#</cfoutput></textarea><br>
Sum: <cfoutput>#DiffSqSum#</cfoutput>
</td>
</tr>
</table>
</td>
<td>
</tr>
</table>
</form>

Perform and Display Remaining Calculations

If at least two measurements have not been entered, walk through the procedure for the calcluation. Let the ol tag produced an ordered list. To yield a result with precision (imprecision?) that can be compared to a set of tables, multiply by the desired number of places, round the result, and then divide by the same number of places.

<cfif Yrawcount lt 2>
  Further calculations cannot take place until at least two sets of scores have been entered.
  <cfabort>
</cfif>

<h4>Procedure to find Rho</h4>
<ol>
<cfset Nsquared=Yrawcount*Yrawcount>
<li>Square the number of scores <cfoutput>(#Yrawcount#) yielding #Nsquared#</cfoutput>.
<cfset NsquaredMinusOne=Nsquared-1>
<li>Subtract one, yielding <cfoutput>#NsquaredMinusOne#</cfoutput>.
<cfset Denominator=NsquaredMinusOne*Yrawcount>
<li>Multiply by the number of scores, yielding <cfoutput>#Denominator#</cfoutput>: the denominator.
<cfset Numerator=6*DiffSqSum>
<li>Multiply the sum of the squared differences <cfoutput>(#DiffSqSum#) 
by six yielding #Numerator#</cfoutput>: the numerator.
<cfset Ratio=Numerator/Denominator>
<li>Divide the denominator into the numerator yielding a ratio of <cfoutput>#Ratio#</cfoutput>.
<cfset Rho=1-Ratio>
<cfset RhoRound=round(Rho*10000)/10000>
<li>Subtract this from one, yielding Rho = <cfoutput>#Rho#</cfoutput>
which to four decimal places rounds to <cfoutput>#RhoRound#</cfoutput>.

Now What?

Enter values, compute rho, and look it up in a table of such values to determine the likelihood that an apparent relationship between these ranks is due to chance. If you're feeling brave, extend the technique to replace the table of rho values altogether. =Marty=