R25 VOICE Section 3 - Datasets


Manage episode 196482334 series 1964439
By Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio streamed directly from their servers.

Papers discussed in this Section 3 podcast:

  • Liao, Fangzhou; Liang, Ming; Li, Zhe; Hu, Xiaolin; and Song, Sen. Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network. eprint arXiv:1711.08324, 2017
  • Pollard, T. J., & Johnson, A. E. W. The MIMIC-III Clinical Database. http://dx.doi.org/10.13026/C2XW26 (2016)
  • Pranav Rajpurkar, Jeremy Irvin, Aarti Bagul, Daisy Ding, Tony Duan, Hershel Mehta, Brandon Yang, Kaylie Zhu, Dillon Laird, Robyn L. Ball, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Ng. MURA Dataset: Towards Radiologist-Level Abnormality Detection in Musculoskeletal Radiographs. arXiv:1712.06957, 2017
  • X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR (spotlight); arXiv:1705.02315, 2017

Podcast Contents:

  • Why Datasets are important?
  • Kinds of Datasets?
  • What's a gold standard?
  • Best practices in dataset descriptions.
    • Sample distribution
    • Meta-data
      • Patients
      • Radiologists
      • PACS Systems Used for Annotation
      • Images
  • Strategies for Labeling Data
    • Natural Language Processing
    • Amazon Mechanical Turk
    • Natural Language Processing Validation Sets

6 episodes available. A new episode about every 10 days averaging 33 mins duration .