ML Quantization Engineer

Permanent employee, Full-time · Dresden

About the Role
In this role, you are responsible to build a highly scalable inference framework for our future chip generations. You will participate in fundamental architectural decisions and have the opportunity to contribute to upstream open-source projects.
What you will do:
  • Research, develop, and maintain the machine learning inference framework used for SEMRON architectures.
  • Collaborate with machine learning, compiler and hardware engineers/researchers to facilitate the quantization/optimization of NNs for SEMRON’s hardware.
What you should bring in:
  • Proficiency with machine learning frameworks like Tensorflow, PyTorch in Linux environments including the ability to extend those framework with custom C/C++/CUDA code
  • A deep understanding of machine learning algorithms and modeling techniques, including but not limited to semi-supervised or weakly supervised learning, transfer learning, quantization, optimization, and large language models
  • Knowledge of various operations methodologies, including MLOps, ModelOps, and LMOps
Helpful but not required:
  • Experience with State-of-the-art NN compression methods like Adaround, QDrop, QUIP, or GPTQ
  • Experience with typical tools used in ML environments like HuggingFace’s transformers or DeepSpeed
Your application!
We appreciate your interest in Demo GmbH. Please fill in the following short form. Should you have any difficulties in uploading your files, please contact us by mail at demodaten@demo.de.
Uploading document. Please wait.
Please add all mandatory information with a * to send your application.