RE: data display (contours,etc)

Bob Ashcroft (cytomat@netcore.com.au)
Sun, 5 Oct 1997 21:10:12 +1000

Marty,

That is succinct, pithy and apposite!
beautiful!
your undying admirer, Bob
-----Original Message-----
From: BIGOS@Beadle.Stanford.EDU [SMTP:BIGOS@Beadle.Stanford.EDU]
Sent: Tuesday, September 30, 1997 9:27 PM
To: Cytometry Mailing List
Cc: CYTOMETRY:;;@Beadle.Stanford.EDU;
Subject: Re: data display (contours,etc)

I will make an attempt here to sort out the can of worms Alice G. called contour
maps. Hopefully we won't get any noggin clogs along the way.

Firstly, we must return to the smoothed data discussion. Assuming that one is
using a 64x64 or 128x128 grid to generate the 2D histogram to be contoured, it
is my experience that unless one has a large number of events in the histogram
(several hundred thousand) the contour lines are very messy. The noise in the
system and the relative sparceness of the data taxes most contour line generation
algorithms. So, in order to have readable contour maps one needs to either
collect very large data sets or smooth small data sets. On unsmoothed small data
sets, other graphic presentations will probably be more informative than contour
maps.

Secondly there is the question of how to choose the interval between contour
levels. Three are currently in common use in the flow community. The most common
one can be referred to as Linear, where the interval between contour lines is a
fixed density or number of events. Choosing this interval is the can of worm
that Alice refers to - make the interval too small and the contour map turns
into a big black smudge - make the interval too large and significant features
can disappear. I have not seen a good algorithm for automatically generating
this interval.

Another method of choosing contour intervals can be called Logarithmic. The user
specifies an interval, say 50%, and the algorithm finds the highest peak, puts
the first contour at 50% of that level, the next at 25%, and so forth until the
last contour is at the one event level. This is an automatic process, and graphs
generated with this method will be consistent in showing variation in the data
that occur at low frequency but poor at showing moderate to high frequency
features.

The last method can be called Probability. This method was developed by Wayne
Moore in the early 1980s. Based on its use here for over 15 years I can
confidently say it provides an automatic way of contouring immunologic flow data
that shows all significant moderate and high level features. Combined with
outlying dots, it also allows viewing low-level features. Commercially this
method is available both in CellQuest (BD) and FlowJo (Treestar). Briefly,
here's how it works.

The user specifies a percent, which, like the Logarithmic method, determines the
number of contour levels - 10% results in nine, 5% results in 19, etc. The
algorithm picks the contour levels so that the specified percentage of events is
between each level. Note that if there are separate event populations (as there
usually is), then the number of events between corresponding contour levels on
each population will add up to the specified percentage. Mathematically this
means that given any event (at random), it has equal chance on appearing between
any two contour levels - hence the name Probability contours.

Using probability contours with outlying dots on smoothed data removes almost
all the uncertainty that Alice expressed for visualizing immunologic flow data.
For data that has populations with narrow sharp peaks, such as chromosomes,
Linear levels work much better.

Lastly, what is important is not contours per say, but the method of choosing
the contour line levels and the smoothness of the underlying distribution.
Having chosen the levels, other renderings of the data such as the pseudo-color
plots on probability levels in FlowJo, can be just as informative as a contour
lines. Nothing, however, will substitute for the experience and insight of the
researcher analyzing the data.

-Marty Bigos
Stanford Shared FACS Facility