<ul>
<li>Docker images are huge
<ul>
<li>hard to iterate locally
</li>
<li>CI is slow and flaky
</li>
</ul>
</li>
<li>Docker may be the only sane approach to reproduction because of dependency hell
<ul>
<li>development is fast and libraries are breaking compatibility quickly so managing transitive dependencies is a hell of an effort</li>
</ul>
</li>
<li>Hard to get started with a decent performance at a low budget due to prior knowledge about CPU/GPUs in addition to ML engineering
</li>
<li>Not enough architecture support from dependencies
<ul>
<li>ARM vs Intel</li>
</ul>
</li>
<li>Versioning of data is difficult and slow due to its size
</li>
<li>Hard to spot errors due to the non-deterministic nature
</li>
<li>Using the same model for learning and inference is costly, and significant accuracy loss due to quantization
</li>
</ul>

* Docker images are huge
    
    * hard to iterate locally
        
    * CI is slow and flaky
        
* Docker may be the only sane approach to reproduction because of dependency hell
    
    * development is fast and libraries are breaking compatibility quickly so managing transitive dependencies is a hell of an effort
        
* Hard to get started with a decent performance at a low budget due to prior knowledge about CPU/GPUs in addition to ML engineering
    
* Not enough architecture support from dependencies
    
    * ARM vs Intel
        
* Versioning of data is difficult and slow due to its size
    
* Hard to spot errors due to the non-deterministic nature
    
* Using the same model for learning and inference is costly, and significant accuracy loss due to quantization

Ongoing (practical) problems of ML development

I am a passionate developer, interested in scalability, distributed systems and databases and I love reading and running. 🎨⚡️


Daily Wandering ⚡️

Daily Wandering ⚡️

Ongoing (practical) problems of ML development