ML Quantization Engineer

Permanent employee, Full-time · Dresden

About the Role
We’re SEMRON, a venture-backed startup focused on redefining AI hardware for Edge devices. If you’re deep into quantization and enjoy working at the intersection of machine learning and hardware, we’d like to hear from you. In this role, you will be responsible for building a highly scalable inference framework for our future chip generations. You will participate in fundamental architectural decisions and have the opportunity to contribute to upstream open-source projects.
What you will do:
  • Develop and maintain an inference framework that’s tightly tuned for SEMRON hardware.
  • Collaborate directly with ML, compiler, and hardware teams to refine and adapt quantization algorithms for our specific needs.
  • Apply and innovate on the latest quantization methods like AdaRound, BRECQ, GPTQ, and QuaRot, bringing fresh ideas to SEMRON’s approach.
What you should bring in:
  • Solid skills in PyTorch and experience with torch.FX, plus the know-how to write efficient, custom CUDA kernels.
  • A solid understanding of current quantization research and hands-on experience with techniques that push performance.
Helpful but not required:
  • Experience with State-of-the-art NN compression methods like Adaround, QDrop, QUIP, or GPTQ
  • Experience with typical tools used in ML environments like HuggingFace’s transformers or DeepSpeed
Your application!
We appreciate your interest in Demo GmbH. Please fill in the following short form. Should you have any difficulties in uploading your files, please contact us by mail at demodaten@demo.de.
Uploading document. Please wait.
Please add all mandatory information with a * to send your application.