Postdoctoral Researcher

Alireza Olama

Information Technology Department
Faculty of Science and Engineering
Åbo Akademi University, Vaasa, Finland

|

Alireza Olama
AO

Research Focus

  • Machine Learning Systems
  • Distributed Training
  • Sparsity & Pruning
  • GPU/HPC Computing
  • Energy-Efficient AI
  • Federated Learning
0 Publications
0 Supervised Theses
0 Countries Collaborated
0 Open-Source Tools

News & Updates

Recent activities, publications, and announcements.

Feb 2025

Paper submitted to ACM ICS 2026

PruneX: A Hierarchical Communication-Efficient System for Distributed CNN Training with Structured Pruning

Feb 2025

Paper submitted to ICLR 2026

Federated Learning With ℓ0 Constraint Via Probabilistic Gates For Sparsity

Spring 2025

Teaching: Parallel Programming & GPU Programming

New courses at Åbo Akademi University with hands-on access to LUMI, Puhti, and Mahti supercomputers

Dec 2023

Joined Åbo Akademi University

Started as Postdoctoral Researcher in Information Technology Department

About Me

Postdoctoral researcher focused on machine learning systems (MLSys) for energy-efficient, large-scale AI. My work addresses communication, memory, and energy bottlenecks in distributed training and inference of foundation models.

Biography

I design and implement scalable ML systems on HPC platforms, bridging algorithmic sparsity with distributed execution models. My research vision: performance optimization is energy optimization—reducing communication, memory traffic, and redundant computation directly translates into sustainability gains for AI infrastructure.

Education

  • Ph.D. in Automation & Systems Engineering
    Federal University of Santa Catarina (UFSC), Brazil, 2023
  • M.Sc. in Electrical & Computer Engineering
    Shiraz University of Technology (SUTECH), Iran, 2017

Research Interests

  • Energy-efficient distributed training
  • Sparsity-aware optimization & structured pruning
  • Hardware–software co-design for inference
  • Communication-efficient parallelism
  • Federated & decentralized learning
  • GPU programming & HPC systems

Technical Skills

Deep expertise in high-performance computing, distributed systems, and ML infrastructure for building scalable, energy-efficient AI systems.

Programming Languages

Python Proficient
C/C++ Proficient
CUDA Proficient
HIP Basic
Bash Proficient

AI & Machine Learning

PyTorch Proficient
Distributed PyTorch (DDP/FSDP) Proficient
Mixed-Precision Training (AMP) Proficient

Distributed Computing & HPC

Slurm Proficient
Multi-Node, Multi-GPU Training Proficient
NCCL, MPI, OpenMP Proficient

AI Inference & Performance

ONNX & TensorRT Basic
Pruning (Structured, N:M) Proficient
Quantization Proficient
TVM & ML Compilers Basic

GPU Kernel Optimization

Memory Hierarchy Optimization Proficient
Nsight Systems / Nsight Compute Proficient
Roofline & Bottleneck Analysis Proficient
Custom CUDA Kernels (PyTorch C++) Proficient

Software Engineering

Git / GitHub Proficient
Docker Basic
CI/CD Basic
Linux & HPC Ecosystems Proficient

Research Timeline

2015–2017

M.Sc.

Shiraz Univ. of Tech., Iran

2019–2023

Ph.D.

UFSC, Brazil

2021–2022

Visiting Researcher

Univ. of Bologna, Italy

2023–Present

Postdoc

Åbo Akademi, Finland

Research Interests

My research focuses on making AI systems faster, more efficient, and scalable—from training foundation models to deploying them in production.

DP TP PP

Efficient Training of Foundation Models

Scalable training strategies for LLMs and vision transformers using parallelism techniques:

  • Data Parallelism: Distributed gradient computation across devices
  • Tensor Parallelism: Splitting model layers across GPUs
  • Pipeline Parallelism: Micro-batch pipelining for deep models
  • Mixed Precision: FP16/BF16 training with loss scaling
  • ZeRO Optimization: Memory-efficient optimizer states
Compress

Efficient Inference

Reducing computational and memory costs for model deployment:

  • Sparsity: Structured/unstructured pruning, N:M patterns
  • Quantization: INT8/INT4 weights, activation quantization
  • Knowledge Distillation: Teacher-student model compression
  • Sparse Tensor Cores: Hardware-accelerated sparse ops
  • Dynamic Inference: Early exit, adaptive computation
GPU GPU L1 L2 HBM NVL Fused

Performance Optimization of AI Systems

System-level optimizations for maximum hardware utilization:

  • Communication Efficiency: Gradient compression, overlap
  • Kernel Fusion: Reducing memory bandwidth bottlenecks
  • Memory Optimization: Activation checkpointing, offloading
  • Hardware Profiling: CUDA profiling, roofline analysis
  • Compiler Optimization: XLA, TorchInductor, Triton

Distributed Optimization

Algorithms for decentralized and federated machine learning:

  • ADMM: Alternating direction method of multipliers
  • Distributed SGD: Synchronous/asynchronous variants
  • Consensus Optimization: Decentralized averaging methods
  • Federated Learning: Privacy-preserving distributed training
  • Sparse Optimization: L0/L1 regularized distributed problems

Selected Publications

15 publications spanning machine learning systems, distributed optimization, and control theory. ORCID · Full list

ML

Distributed ℓ₀ Sparse Aggregative Optimization

Olama, A., Carnevale, G., Notarstefano, G. & Camponogara, E.

IEEE CASE, 2024
ML

A Tracking Augmented Lagrangian Method for ℓ₀ Sparse Consensus Optimization

Olama, A., Carnevale, G., Notarstefano, G. & Camponogara, E.

CoDIT, 2023
OPT

Sparse Convex Optimization Toolkit: A Mixed-Integer Framework

Olama, A., Camponogara, E. & Kronqvist, J.

Optimization Methods and Software, 38(6), 2023
OPT

A Distributed Primal Outer Approximation Algorithm for Sparse Convex Programming (DiPOA)

Olama, A., Camponogara, E. & Mendes, P.R.C.

Journal of Global Optimization, 86(3), 2023
OPT

Relaxed Hybrid Consensus ADMM for Distributed Convex Optimisation with Coupling Constraints

Olama, A., Bastianello, N., Mendes, P.R.C. & Camponogara, E.

IET Control Theory & Applications, 2019
CTRL

Lyapunov-based Hybrid Model Predictive Control for Energy Management of Microgrids

Olama, A., Mendes, P.R.C. & Camacho, E.F.

IET Generation, Transmission & Distribution, 2018

Talks & Presentations

Conference presentations, invited talks, and workshop lectures.

Workshop

HPC Workshop on Mahti Supercomputer

CSC Finland · Spring 2025

Hands-on parallel programming and performance optimization techniques for graduate students and researchers.

Conference

PruneX: Hierarchical Communication-Efficient Distributed Training

ACM ICS 2026 (Submitted)

Structured pruning co-designed with cluster topology for scalable multi-node GPU training.

Conference

Distributed ℓ₀ Sparse Aggregative Optimization

IEEE CASE 2024

Sparse consensus optimization in multi-agent networked systems.

Conference

Tracking Augmented Lagrangian Method for ℓ₀ Sparse Consensus

CoDIT 2023, Rome, Italy

Novel tracking-based approach for cardinality-constrained distributed optimization.

Lecture Series

GPU Programming for AI Systems

Åbo Akademi University · 2025+

Graduate course covering CUDA programming, memory optimization, and kernel development.

Teaching

Responsible for designing, modernizing, and delivering courses at Åbo Akademi University, with integration of national CSC HPC infrastructures (Puhti, Mahti, LUMI).

Åbo Akademi Courses

  • Database Systems (Bachelor) · Winter 2024
  • Parallel Programming (Master) · Spring 2025+
    Focus on data-center computing and AI
  • GPU Programming (Master) · Autumn 2025+
    CUDA, performance optimization

Workshops & Resources

  • HPC Workshop on Mahti · Spring 2025
    Parallel programming and performance optimization
  • HPC4AI YouTube Channel
    Recorded lectures and course materials
  • Hands-on HPC Labs
    Using Puhti, Mahti, LUMI supercomputers

Open-Source Software

Research software for distributed machine learning, sparse optimization, and HPC.

Research Impact

Citation metrics and academic footprint.

0
Publications
Google Scholar
0
Citations
Update manually
5
h-index
As of 2026
0
Students Supervised
BSc, MSc, PhD

Last updated: March 2026 · View Google Scholar Profile

Get in Touch

Open to collaboration on energy-efficient ML systems, distributed training, and HPC research.

Location

Vaasa, Finland

Affiliation

Åbo Akademi University
Information Technology Department