Past the construction site, across the deserted parking lot, and through the shrubbery, I finally arrived at the front entrance of Northeastern University for my first academic conference.
Over the next two days, with 270 brilliant minds, I learned a lot about machine learning and healthcare. More importantly, however, I discovered how to make the most of an academic conference. Specifically, all you have to do is avoid the seven deadly sins of conferences.
- Greed: I can have it all.
- Gluttony: Six cookies sounds like a good idea.
- Sloth: My chair is so comfortable.
- Pride: I am awesome.
- Wrath: Ugh, this is the worst.
- Lust: Look at that hotter research topic.
- Envy: I wish I had done that.
This past weekend, I learned a lot about machine learning and healthcare at MLHC 2017. Here are 30 thoughts and findings that I could squeeze into bullet points.
Please excuse (and contact me about) any errors in summarization or attribution.
What’s special about healthcare?
- Data are numerous but hard to access! Beth Israel Deaconess Medical Center handles 7 petabytes of patient data. And yet many papers presented handle datasets with patients in the thousands or even dozens due to data availability challenges or targeting rare diseases.
- FDA approval is hard but important. Although the initial process is arduous, minor updates (e.g. retraining deep learning models) only need notification but new models need reapproval. One method of convincing the FDA involves showing model accuracy fits within the variance of human experts.
- Freely accessible massive datasets have accelerated machine learning research in healthcare with many accepted papers using MIMIC data.
- Validity and reproducibility are of immediate concern in this growing field. Researchers reproducing datasets from 38 experiments using MIMIC data found half the experiments had listed cohort sizes and reproduced cohort sizes differing by more than 25%.
No, not gumballs
Until I read the recent paper at ICML 2017, I hadn’t heard of the Gumbel trick. There is surprisingly little online about the Gumbel trick—related to the more popular Gumbel-max trick—so here we go.
We often want to characterize probabilistic models in discrete situations. The Gumbel trick allows us to estimate as associated partition function $Z$ with relative ease. At a high level, finding $Z$ or even $\ln Z$ is very difficult; however, we can add some noise and compute the maximum a posteriori (MAP) more easily through approximation methods. If we repeat this process enough times, we get a reliable estimate of $Z$.
In complexity theory, we know that finding the MAP is NP-hard but can be approximated quickly in practice. Note that the partition function is a harder even still, containing #P-hard problems.
Behold, a blog.
For a long time, I thought creating a personal website was too intimidating, too cumbersome, or too much trouble. When I read other people’s blogs, the authors never mentioned how they made the blog, which made me think it was effortless for them.
No more! Certainly every blog is different, but here’s how I made this one.
If I can do it, you (yes you!) can make a blog too.