An update! At last! Anyway, a solid way to consistently and unbiasedly quantify ion data is via its relation to other ions within the same sample. What the most basic of all relationship's, i.e. the least biased of any sort of quantification one can possibly imagine? Bigger than, or less than--given ion X and ion Y, which one is bigger? How frequently is one bigger than the other? Now, as far as filtering out ions go, I think we can use the Pearson's chi-square test rather effectively to see if an ion-pair relationship is nonrandom enough to be included for testing. Essentially, we will use a chi-square distribution with 1 degree of freedom, and with a P-value of 0.05 or something (to be a little more inclusive) to see if an ion pair is useful for analysis.
Finally back in action! Now here's a possible solution to your problem of creating a consensus stratified ion rank: for each ion, calculate that average difference in rank with every other ion in another sample, and then from there, figure out some sort of threshold (mean-2*SD or something) as a cut off for clustering ions to the same rank.
Here's an idea I'll probably never use: calculating the area of overlap between two MLE derived PDFs. I may be able to use the overlap percentage to help quantify confidence. I do, however already have this bootstrapped K-S test business, so they may possibly overlap in purpose. I'll have to sit on it a little longer and flesh it out more. For sure the calculations will be computationally intensive.
There are two primary things that cause an ion to register as having a high change: A) an actual change in the rank of the ion and B) a change in the variability of the ion's position. This may reflect true biological differences in the ion shift, and they may be indeed independent events, but nonetheless equal parts of both contribute to being able to distinguish between metabolome states. In the case of A, this is clearly and cleanly detected in the shape parameter of an estimated gamma curve for the data, and may or may not be independent of the scale parameter. For the B case, this is detected in the shift of the scale parameter. Usually for a given ion it is a mix of both A and B contributing to the change. It is not uncommon to see the scale parameter increase and the shape parameter to significantly drop in experimental data (what does this mean?). Furthermore, it is not uncommon to see a huge shift in the scale parameter without a corresponding shift in the shape parameter (an exclusive B event).
I currently place more emphasis and precedence for A over B, but I think I may need to look into this further and figure out exactly how much A and B contribute to an ion's shift.
Good research always starts with a good question, or a viable need. Here's a good question: how do i figure out which ions are correlated with which other ions in an organism's metabolome for a given stimulus? Obviously, the network that forms will be dependent on the stimuli, so it's necessary to observe across time points, and treatment levels. How can one go about figuring out which ions are correlated to what? This would be almost a Holy Grail of metabolomics, but let's keep this in the back of our head for now.
Here's an idea that may start to address my previous post, which is disturbingly enough inspired by US News college rankings. If you'll notice that in these college rankings, it is very common for more than one school to occupy the same rank. This same idea can be applied to my ranking algorithm--if metabolite abundances are close enough together, then give them the same rank. How do you determine if two (or three!) metabolites are close together? You can measure how far apart each successive ranked ion is from one another and take the additive mean and standard deviation, and "small enough" would simply be the distance determined by your usual mean+2*sd! Who knows if it'll work but it's certainly worth a crack at...
Another idea to consider in figuring out which ions get the same rank is by analyzing multiple samples, and creating an aggregate rank list. If certain ions have a tendency to swap places, then you assign the same rank to them.
I'm trying to come up with a new algorithm that internally normalizes like what the rank algorithm I have does right now, but at the same time preserves some of the relationships between the ions in a given sample. It's long been known that the abundance are anything but independent, and there's definitiely a necessity to make some sort of modified rank scheme that capitalizes on this feature. Clearly there will be some information loss involved, but the point is to actually retain some of these relationships and exploit it.
I've been waffling around for a few days as to what direction to take this new dendrogram analysis that I've created, and I think I've come up with a good, if rather vague idea to work on--so far I've only been doing pairwise comparisons, and that's exactly what my algorithm has been designed for, but I somehow need to come up with a scheme where I analyze entire groups together in a non-pairwise holistic fashion. What I think my dendrogram has been able to accomplish is to narrow down the sets that can be compared as a group. Clearly there's a lot more thinking that has to go into this, but it's a start.
Here's a fresh idea: Start messing around with the raw rank scores, not the rank change scores. Start by "standardizing" the scores by making them all from 0 to 1. How will this change things? How will it shift things around? You can easily take the mean of the standardized rank scores of a group, and see where it goes from there.
Alternatively, you really do need to work on the data you've been given as well.
Here's another idea: there's always the problem of not having enough data to use, and just like when you don't have enough dimensions when creating an SVM, you can create new ones by formulating nonlinear combinations of sample data. There could be some ideological, fundamental problems with this concept, but it's something worth thinking about.
I've been thinking about how to handle ions that appear in both experimental sets, but do not appear in the control set. These ions are currently identified as simply not changing between the two sets, but in reality they certainly can and do. Coming up with a similar scoring scheme for these ions has proven very difficult, as there is no common base score set to do K-S tests on, and furthermore, the sample sizes are simply far too low to draw any meaningful conclusions from. The only thing I believe I can draw from them are gamma distribution parameters, which I suppose may be useful enough for what I need to do. Nonetheless, the question isn't whether I can do it, but whether it's worth the effort, as things are in fairly good working order as they are now without it.