30 things I learned at MLHC 2017
This past weekend, I learned a lot about machine learning and healthcare at MLHC 2017. Here are 30 thoughts and findings that I could squeeze into bullet points.
Please excuse (and contact me about) any errors in summarization or attribution.
What’s special about healthcare?
- Data are numerous but hard to access! Beth Israel Deaconess Medical Center handles 7 petabytes of patient data. And yet many papers presented handle datasets with patients in the thousands or even dozens due to data availability challenges or targeting rare diseases.
- FDA approval is hard but important. Although the initial process is arduous, minor updates (e.g. retraining deep learning models) only need notification but new models need reapproval. One method of convincing the FDA involves showing model accuracy fits within the variance of human experts.
- Freely accessible massive datasets have accelerated machine learning research in healthcare with many accepted papers using MIMIC data.
- Validity and reproducibility are of immediate concern in this growing field. Researchers reproducing datasets from 38 experiments using MIMIC data found half the experiments had listed cohort sizes and reproduced cohort sizes differing by more than 25%.
Interpretability is important
- “Rationales” are short, coherent, and predictive phrases that may explain beer reviews and pathology reports. Although beer reviews make for great annotated data, healthcare professionals care especially about understanding the why for black-box methods applied to pathology reports.
- You can take interpretability even further and argue that interpretable models “should fit on a Powerpoint slide and can be calculated without a calculator.” Extending the Rashomon Effect, multiple models can have similar performance but different interpretability level. Why not find a model with high performance and high interpretability?
Healthcare includes hospital operations
- Monitoring hand hygiene with depth sensors can lead to prevention of hospital acquired infections—which affect 1 in 25 patients. Anonymized depth-perceived humans were assessed for correctly following hand hygiene protocol before entering and after leaving a patient room.
- Better predicting surgery duration helps increase hospital efficiency as well as balance costs: extra downtime is not as bad as scheduling collisions.
- Even seemingly simple machine learning methods can greatly improve hospital processes, like using OCR on incoming consent forms to direct them into the right patient file.
- An hour of untreated stroke ages a brain 3.6 years, which heightens the importance of treating patients who receive strokes while in the hospital.
Reinforcement learning making moves
- In the application of mobile health, one has to balance timely interventions with annoying notifications. A reinforcement learning (RL) approach can adapt to a user’s preferences and recommend activity suggestions and motivation messages. Among many statistical subtleties, RL must balance negative immediate effects of the treatment and potentially large delayed benefits. We can combat this by reducing the posterior mean for treatment by proportional feedback control.
- Due to logistics, approval, and ethics, it can be difficult to conduct controlled trials. It is possible to learn treatment policies over continuous state spaces from observational data to mimic the repeated experiments of RL.
- In a randomized trial, patients receiving Ranolazine found no difference in death rates; using a dedicated treatment strategy for high risk patients, however, showed tremendous difference in results. RL has shown success in constructing the optimal policy of treatment.
Computer vision victories
- Video recordings can accelerate and standardize diagnosis for movement disorders such as Parkinson’s disease and ataxia using neural network-based pose estimation.
- Another useful application of pose estimation is assessing surgeon technical skill using Hourglass Networks with workers from Mechanical Turk providing annotated data through ensemble voting.
- Segmenting heart sonograms using a CNN allows cardiologists to save time and mental stamina.
- Social media post history embeddings can capture attributes of mental health and homophilic relations between users.
- A viral blog post and aggressive search engine optimization allowed one researcher to find other patients with the same genetic disease as his infant son.
- Crowd sourcing annotations of scientific articles can help create a knowledge graph of gene mutations and therapies. Approximately 6 laypeople are able to annotate with the ability of 1 expert.
- It is especially crucial to use psychology in convincing clinicians to adopt systems: repeated positive feedback helps ease resistance to new systems or increased logging mechanisms.
Miscellaneous ML methods
- To deal with partially missing data labels, only backpropagate loss from subset of pieces which are annotated.
- Heteroscedasticity refers to the phenomenon when the variability of a variable is unequal across the range of values of a second variable that predicts it. Surgery durations will never go negative, so we would not expect variability of the errors to follow something like a Gaussian. Using a multilayer perceptron, we can estimate variance and mean of each prediction point.
Odds and ends
- Special thanks to Andreas Stuhlmuller for inspiring this piece with his summary of NIPS 2016
- AWS got a hilarious amount of shoutouts for its ease of use and HIPAA compliance security.
- As with most conferences, MLHC had an active Twitter presence, which is how I met Zachary Lipton.
- Flour Bakery continues to have delicious sandwiches and scrumptious cookies.
- I’ll be curious to see where MLHC 2017 goes from here. I loved the format of every paper getting screen time with big name speakers anchoring each half-day. The conference has grown from 11 people several years ago to 270 attendees, with the audience mix skewing towards machine learning researchers. Onwards!