Docker images are huge
hard to iterate locally
CI is slow and flaky
Docker may be the only sane approach to reproduction because of dependency hell
- development is fast and libraries are breaking compatibility quickly so managing transitive dependencies is a hell of an effort
Hard to get started with a decent performance at a low budget due to prior knowledge about CPU/GPUs in addition to ML engineering
Not enough architecture support from dependencies
- ARM vs Intel
Versioning of data is difficult and slow due to its size
Hard to spot errors due to the non-deterministic nature
Using the same model for learning and inference is costly, and significant accuracy loss due to quantization