Ongoing (practical) problems of ML development

  • Docker images are huge

    • hard to iterate locally

    • CI is slow and flaky

  • Docker may be the only sane approach to reproduction because of dependency hell

    • development is fast and libraries are breaking compatibility quickly so managing transitive dependencies is a hell of an effort
  • Hard to get started with a decent performance at a low budget due to prior knowledge about CPU/GPUs in addition to ML engineering

  • Not enough architecture support from dependencies

    • ARM vs Intel
  • Versioning of data is difficult and slow due to its size

  • Hard to spot errors due to the non-deterministic nature

  • Using the same model for learning and inference is costly, and significant accuracy loss due to quantization