22-24 May 2024 Toulouse (France)

Talks & slides

 

Slides:
https://cloud.irsamc.ups-tlse.fr/index.php/s/rPPxaLgNqfknaei

Please upload your slides.

 

Paolo Bientinesi: Overview of the field

Overview of the field (existing software).
 

Edward Valeev: TiledArray: generic framework for efficient data-sparse tensor algebra on distributed-memory heterogeneous platforms

TiledArray (TA) is a framework for tensor computation. It has been used primarily for quantum many-body simulation on commodity as well as high-end machines. Although developed for the needs of quantum many-body simulation, it is a largely domain agnostic and reusable tool whose design was guided by several core principles. 1. Any successful tool must be as efficient as possible for dense tensor computation, but must prioritize data-sparse tensors. Hence TA uses block-sparse tensors as the core model. 2. Dynamic sparsity (e.g., due to decay of operator matrix elements) is more common than static sparsity (e.g., due to point group). Hence dynamic sparsity is the core model in TA; static sparsity is its subset. 3. Block-sparsity is only one of many types of data sparsity; must support others (e.g., compression of blocks a la PNOs) via deep customizability by accepting user-defined tile types and compute models. 4. No domain-specific concepts (e.g., spin) can be hardwired into the framework. But domain-specific extensions must be possible. 5. Expression-based programming (e.g., einsum) is not always sufficient; need to provide ability to compose general _algorithms_ over mutable data, hence TA provides byte-level access to the data and tile-level composability. 6. Memory space of a single node or GPU is not enough; must support distributed memory from outset. In this talk I will discuss how these principles are reflected in the design and implementation of TiledArray and what these principles mean for standardization efforts.
 

Miles Stoudenmire: The ITensor Tensor Network Software

ITensor is a library for rapidly prototyping tensor network algorithms aimed at reducing bugs while maintaining performance. Its unique interface hides index ordering, inspired by tensor diagrams. After reviewing its design and domain, I will discuss challenges and opportunities where we could collaborate with the broader computing community.
 

Devin Matthews: The Tensor-Based Library Instantiation Software (TBLIS)

Not surprisingly, tensor operations are ubiquitous in many science and engineering disciplines due to the need represent and manipulate high-dimensional data. Mathematically, tensor operations extend familiar concepts from linear algebra into a multi-linear setting, with operations such as tensor contraction or tensor-times-matrix-multi (e.g. simultaneous change of basis) mapping closely to matrix multiplication. However, attaining high performance often requires jumping through hoops to physically as well as logically map onto matrix operations in order to use existing software kernels such as BLAS. Instead, TBLIS leverages concepts and formal derivation techniques from dense linear algebra (particularly matrix multiplication) in order to natively implement high-performance tensor contraction and related operations without relying on a “black-box” BLAS layer. This talk will present an overview of TBLIS as well as the BLIS framework on which it is based, and address related questions such spanning topics such as multithreading, APIs, GPUs, distributed computing, tensor structure, and software modularity.
 

Justin Turney

The perspective of a tensor software application expert and a developer of Einsum tensor library.
 

Marco Trenti: Tensor contractions patterns in industrial applications with Tensor Networks Machine Learning

In this talk, I will present my firsthand experience in developing and using tensor libraries for industrial AI applications. The presentation will outline the simple but special kinds of tensor operations inherent in these contexts, as well as the insights gained by coding these algorithms with different backends (such as Itensor C++, cuBlas, cuTensor). Furthermore, I will explore the deployment of these algorithms across a wide range of hardware architectures: from accelerators and embedded systems to supercomputing clusters and quantum computers.
 

Alejandro Gallo: Tensor contractions for many-body solid-state physics

Many body theories such as Coupled Cluster theories have gained much popularity in the context of material science. These theories can bridge the gap where Density Functional Theories, the workhorse of computational material science, fall short in delivering the necessary accuracy for practical applications. The steep polynomial scaling of many-body theories means that very efficient implementations must be written in order to evaluate the complicated tensor contractions on tens of thousands of cores. This talk discusses advancing theories beyond Density Functional Theories and highlights challenges in implementing efficient tensor contractions in codes like CC4S. It also addresses remaining work in tensor engines, needs for solid state physicists, and challenges with modern accelerators
 

Edward Stow: Compiler Architecture for Tensors

Tensor computations can be represented in different ways, from abstract mathematics to number crunching enormous arrays of data. The right representation for these problems depends on the context. This talk is about how we can use compiler transformations between representations to do the leg work so that users can spend more time innovating and less time coding. We propose the solution is to build an ecosystem of representations, each a domain specific language in its own right, that takes us from the highly abstract to the concrete in small lowering steps. This allows optimisations to be implemented more elegantly, and if done correctly, brings application scientists and engineers into the compiler where their expertise can be shared. We illustrate these ideas with our work on storage layout optimisation, based on the Dagstuhl-Tensor-Language and Data-Layout-Trees dialect in the xDSL project.
 
Cyclops Tensor Framework (CTF) is a library originally developed to perform distributed memory tensor contractions for large-scale quantum chemistry applications. Since then, it has grown to offer a range of additional features, broadening its usefulness across different computational areas. In this talk, we introduce various features of CTF which include redistribution of tensors in distributed memory via virtual processor grids, tensor algebra with general element types and algebraic structures, and distributed-memory sparse tensor computations. CTF has been utilized to implement large-scale algorithms with applications in data science, scientific computing, and quantum chemistry. We present how CTF provides an intuitive interface for performing distributed-memory tensor computations efficiently and compare the ease of use and efficiency of these operations with other available libraries.
 

Oguz Kaya: Celeste: A task-parallel library for tensor computations on distributed heterogenous architectures

Tensor contractions: distributed memory, sparse tensors, tensor trains.
 

Paul Springer

cuTENSOR: history, vision/ambition, applications, design decisions, current/future operations supported.
 

Christopher Millette: Introduction to hipTensor

Let’s explore the current implementation of the ROCm hipTensor library, its future direction and discuss perspectives on tensor computing.
 

Alexander Heinecke: From Tensor Processing Primitive towards Tensor Compilers using upstream MLIR

During the past decade, Deep Learning (DL) algorithms, programming systems and hardware have converged with the High Performance Computing (HPC) counterparts. Nevertheless, the programming methodology of DL and HPC systems is stagnant, relying on highly-optimized, yet platform-specific and inflexible vendor-optimized libraries. Such libraries provide close-to-peak performance on specific platforms, kernels and shapes thereof that vendors have dedicated optimizations efforts, while they underperform in the remaining use-cases, yielding non-portable codes with performance glass-jaws. This talk will shade light on abstraction efforts, mainly targeting CPUs and widening to GPUs the close the approaches get to DSLs/Compilers. We will introduce the Tensor Processing Primitives (TPP) as an virtual and software-defined ISA abstraction in form of ukernels. Subsequently we will cover programming abstractions on top of TPP which is carried out in two steps: 1) Expressing the computational core using Tensor Processing Primitives (TPPs): a compact, versatile set of 2D-tensor operators, 2) Expressing the logical loops around TPPs in a high-level, declarative fashion whereas the exact instantiation (ordering, tiling, parallelization) is determined via simple knobs. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms. We will close the talk by demonstrating how TPP can be the architectural target of a tensor compiler which in turn is then able to generate hand-coded performance.
 

Juraj Hasik + Boris Ponsioen: Tensor network algorithms - challenges to high-performance implementations

In this talk we will sketch basic TN algos underpinning applications to two-dimensional effective lattice models of strongly correlated materials. First, we will focus on the implementation: High-level formulation built from the low-level tensor-algebra primitives. Second, we turn to differences between basic dense and abelian-symmetric formulation, which leads to a block-sparse tensors. In particular for the latter, the computational bottlenecks and parallelization opportunities are crucially dependent on the symmetries considered. We conclude by discussing future directions in terms of i) building code agnostic to tensor algebra primitives, and ii) parallelization and optimization strategies for PEPS.
 

Michal Lesiuk: Tensor decomposition methods in coupled cluster theory

In this talk, an overview of the tensor decomposition techniques useful in coupled cluster theory will be given. First, the coupled cluster theory will be briefly introduced for the benefits of participants from different application domains of tensor software. Next, we show how tensors naturally arise in the context of the electronic many-body wavefunctions and how they quickly become the bottleneck as the number of correlated particles increases. Finally, tensor decomposition methods and formats shall be presented which are useful in the coupled cluster theory. It will be discussed how their use reduces the cost of electronic-structure calculations, as well as their current limitations.
Online user: 1 Privacy
Loading...