30 things I learned at MLHC 2017

August 22, 2017 · 8 min read

This past weekend, I learned a lot about machine learning and healthcare at MLHC 2017. Here are 30 thoughts and findings that I could squeeze into bullet points.

Please excuse (and contact me about) any errors in summarization or attribution.

What’s special about healthcare?

  • Data are numerous but hard to access! Beth Israel Deaconess Medical Center handles 7 petabytes of patient data. And yet many papers presented handle datasets with patients in the thousands or even dozens due to data availability challenges or targeting rare diseases.
  • FDA approval is hard but important. Although the initial process is arduous, minor updates (e.g. retraining deep learning models) only need notification but new models need reapproval. One method of convincing the FDA involves showing model accuracy fits within the variance of human experts.
  • Freely accessible massive datasets have accelerated machine learning research in healthcare with many accepted papers using MIMIC data.
  • Validity and reproducibility are of immediate concern in this growing field. Researchers reproducing datasets from 38 experiments using MIMIC data found half the experiments had listed cohort sizes and reproduced cohort sizes differing by more than 25%.

(read more)

The Gumbel Trick

August 17, 2017 · 6 min read

No, not gumballs

Until I read the recent paper at ICML 2017, I hadn’t heard of the Gumbel trick. There is surprisingly little online about the Gumbel trick—related to the more popular Gumbel-max trick—so here we go.

We often want to characterize probabilistic models in discrete situations. The Gumbel trick allows us to estimate as associated partition function $Z$ with relative ease. At a high level, finding $Z$ or even $\ln Z$ is very difficult; however, we can add some noise and compute the maximum a posteriori (MAP) more easily through approximation methods. If we repeat this process enough times, we get a reliable estimate of $Z$.

In complexity theory, we know that finding the MAP is NP-hard but can be approximated quickly in practice. Note that the partition function is a harder even still, containing #P-hard problems.

(read more)

How to make this blog

July 28, 2017 · 10 min read

Behold, a blog.

For a long time, I thought creating a personal website was too intimidating, too cumbersome, or too much trouble. When I read other people’s blogs, the authors never mentioned how they made the blog, which made me think it was effortless for them.

No more! Certainly every blog is different, but here’s how I made this one.

If I can do it, you (yes you!) can make a blog too.

(read more)