(8/13/2010) The metric distance algorithm has worked out incredibly well, and oddly enough it was my original scheme that was most successful in creating a clean dendrogram. I still need to learn a bit more about dendrograms though to fully exploit its capabilities.
As for my next direction, I think I need to start putting some efforts towards that expanded High Confidence metabolite set that I worked on a bit during the beginning of the summer. First and foremost, I need to generate the same graphs as I did before to verify that the expanded set is useful, then I'll move on to metric distance calculations with it. It's rather standard fare, but it must be done, I suppose.
As for even further directions, I need to start exploring the signed rank change distributions and the true characteristics behind my rank change data. I need to look more into the Chi-Square distribution and so forth, gain a deeper mathematical understanding of it.
(8/02/2010) For my metric distance program, I really need to consider filtering things out that are below the threshold dictated by the control data. It'll probably really help clean things up.
(7/29/2010) Here's a pretty good idea for identifying suitable ions (high confidence) for further analysis: for large groups where the differentiation is unclear, just randomly split them up, and then compare them, then compare the comparisons, and filter ions based on their degree of fluctuation. It's a bootstrapped process, but it's definitely worth it to see how far you can take this.
(7/23/2010) Here are some really neat ideas that I really should explore in the very, very near future:
Utilize the Chi-square distribution by assuming there is only one true DOF and any other detected DOFs are from outlier samples.
Explore rank shift (signed rank change) versus absolute rank change. This sort of data should definitely yield more normal distributions, and solves the problem of normalization for fitting the Chi-square distribution.
Here are some slightly more loopy ideas that should be explored if there is still time:
Utilizing the beta distribution for comparing two shape parameters of gamma functions with the assumption of a constant scale parameter.
As an extension, perhaps even explore the Dirchlet distribution for comparing multiple gammas, possibly for the control data.
More ideas to explore:
What does an expected rank change value really mean? What does it physically translate to? The ion shifting from a set location?
Why does normalization by control vs. control rank change work? Theoretically, wouldn't the absolute value of change be of more use? Or is there some underlying factor whereby the magnitude of an ion's deviation during experimental perturbations is predicated by its initial predilection for fluctuation? Based on empirical data, I'd lean towards the latter.
(7/14/2010) I have some qualms about including ions that aren't on the "approved list", even if they consistently show up in low percentages in the control data, because there's simply no verification process for them. If they show up in a given experimental sample set, it could be a complete fluke, temporal, etc. For now I guess I'll run with it, because there's no biological reason why I can't include them, but at the same time I'm pretty sure it'll artifically inflate my scores. We'll see how the metric calculations perform with and without the set.
(7/09/2010) I'm finding it difficult to justify choosing perturbations without regard to their sample origin. The most obvious reason to do it is so that I can have enough combinations to ensure a well sampled score at the end of it. If I were to pick groups via sample origin, I definitely wouldn't have enough, nor would I have a constant sample size that I was initially gunning for. However, I suppose that one can consider the fact that doing things "sample blind" can be a good thing. It's not like every score from a sample is going to be bad, right? Furthermore, I think it places a little too much confidence on the control data itself, so doing it this way may be a very good thing indeed.
I can also certainly incorporate an outlier algorithm with this as well, which wouldn't be too hard to implement, at least conceptually speaking.
(7/08/2010) You know perhaps it's not the case that the Kolmogorov-Smirnov test is highly sample size dependent, because the critical D value certainly factors in the sample size via DOF. I think it's because when I do tests with large (like thousands) sample sizes, there are almost always subsets that will throw the test off, and this subset is usually a not insignificant part of the dataset, which further contributes to it. This, more than anything, is probably the culprit.
(7/07/2010) I'm debating whether it's better to select the raw data in random batches or whether I should be "sample aware" and select via sample labels instead. It's a tough choice, and implementing both would be very costly, but perhaps not. I'll implement the former first and see how that performs, then go from there.
Ah HA! I figured out the culprit in the unusually low p-values I get with the Kolmogorov-Smirnov test! Seems that the test is EXCEPTIONALLY sample count dependent! The higher the count, the more likely it will reject! Alright I need some good way to do this now so I can normalize for this fact...may be tough!
(7/06/2010) I'm having a lot of trouble placing ions that appear (or disappear) in between groups of comparison in the same regime as that of ions that appear in both groups. There seems to be no good way to make them comparable. It's a rather taxing annoyance, and I don't think I'll ever find a good answer to this, as it's been plaguing me since last summer. Oh well.
Ah, inspiration struck on my walk to my car. What I'm gonna do is assign the "impact score" for these sorts of ions on the basis of the average impact score of "influential ions" in the current comparison set. That way, it'll enhance what's already there, and not make otherwise innocuous random blips in the unexciting sets go wild. We'll see how this works...
(7/05/2010) I've decided to turn my site into a place where I ramble on about random ideas that pop into my head during the course of my glorious research explorations! Exciting huh?