Learning
ARM Malaysia On-Demand Training

Arm-Malaysia On Demand Training

Day 1

Introduction to ARM

ARM stands for Advanced RISC Machine, a family of computer processors that are widely used in mobile devices, embedded systems, and increasingly in servers and desktops. ARM processors are known for their power efficiency and performance. Almost 100% of smartphones and tablets use ARM processors, making them a dominant force in the mobile computing market.

Arm's Company is a British semiconductor and software design company, known for its ARM architecture. The company designs the ARM architecture and licenses it to other companies like Apple, Samsung, MediaTek and many more.

Introduction to ARM CPU Architecture

The ARM CPU architecture defines a CPU's behavior when software runs on it.

  • CPU architecture defines the basic instruction set and the exception and memory models that are relied on by the operating system and hypervisor
  • CPU microarchitecture defines the implementation of the CPU architecture, including the pipeline, cache, and other hardware features.

A-profile (Application profile) is used in high-performance applications such as smartphones, tablets, and laptops. It includes features like:

  • Support for virtual memory
  • Support for multiple cores
  • Support for SIMD (Single Instruction, Multiple Data) instructions

Mostly targets running a full OS such as Linux

  • Implementation: Cortex-A, Cortex-X and Neoverse
  • Latest Version: Armv9-A and Armv8-A

R-Profile (Real-time profile) is used in real-time applications such as automotive and industrial control systems. It includes features like:

  • Low-latency interrupt handling
  • Deterministic execution

Targets embedded systems with real-time requirements

  • Implementation: Cortex-R
  • Latest Version: Armv8-R

M-profile (Microcontroller profile) is used in microcontrollers and embedded systems. It includes features like:

  • Low power consumption

  • Small code size

  • Implementation: Cortex-M

  • Latest Version: Armv8-M

Execution States

AArch32: AArch32 is the 32-bit execution state of the ARM architecture. It supports both ARM and Thumb instruction sets and is used in a wide range of applications, from embedded systems to mobile devices. Supports T32 and A32 instruction sets.

AArch64: AArch64 is the 64-bit execution state of the ARM architecture. It supports a new instruction set and is designed for high-performance applications, such as servers and desktops. It is also used in mobile devices that require more memory and processing power. Supports A64 instruction set.

Big.LITTLE: Big.LITTLE is a heterogeneous computing architecture that combines high-performance cores (big) with power-efficient cores (LITTLE) in a single system-on-chip (SoC). This allows the system to dynamically switch between cores based on workload requirements, optimizing performance and power consumption.

R-Profile Architecture

  • No overlapping memory regions
  • New exception model compatible with the Armv8-A model
  • Virtualization support
  • Memory management unit (MMU) support

Cortex-R Processors: Cortex-R82 with memory management unit (MMU) support for OS like Linux

M-Profile Architecture

Low-latency, highly deterministic operation

  • Armv8-M Baseline: Introduced the Armv8-M architecture, which includes support for the Armv8-M Security Extension (TrustZone) and the Armv8-M Mainline profile.
  • Armv8-M Mainline: Introduced the Armv8-M Mainline profile, which includes support for the Armv8-M Security Extension (TrustZone) and the Armv8-M Baseline profile.

AMBA

The Advanced Microcontroller Bus Architecture (AMBA) is an open standard on-chip interconnect specification for the connection and management of functional blocks in SoC designs.

AMBA is used to connect different components of a system-on-chip (SoC) design, such as processors, memory, and peripherals. It provides a standard interface for communication between these components, allowing for easier integration and interoperability.

Why use AMBA?

  • Efficient IP reuse: IP reuse is an essential part of SoC design, and AMBA provides a standard interface for communication between different components, allowing for easier integration and interoperability.
  • Flexibility: AMBA provides a flexible and scalable architecture that can be used to connect different components of a system-on-chip (SoC) design, allowing for easier integration and interoperability.
  • Compatibility: AMBA is a widely used standard in the industry, and many IP cores are designed to be compatible with it, making it easier to integrate different components into a system-on-chip (SoC) design.

Security in Cortex-A

Cortex-A processors are designed with security features to protect against various threats, including malware and unauthorized access. These features include:

  • TrustZone: A hardware-based security extension that creates a secure execution environment for sensitive applications, allowing them to run in isolation from the rest of the system.
  • Memory Protection Unit (MPU): A hardware component that enforces memory access permissions, preventing unauthorized access to sensitive data.
  • Secure Boot: A process that ensures only trusted software is loaded during the boot process, preventing malicious code from executing.
  • Cryptographic Acceleration: Hardware support for cryptographic operations, such as encryption and decryption, to enhance security and performance.

Armv9-A Architecture for a Rich Secure IoT Ecosystem

  • Memory tagging: Support for memory tagging to enhance security and prevent memory-related attacks.
  • Secure-EL2: A secure exception level that provides an isolated environment for sensitive operations.
  • uArch side-band protection: Hardware-based protection mechanisms to secure microarchitecture elements.
  • Crypto support: Enhanced hardware support for cryptographic operations.
  • Branch target identifiers: Mechanisms to protect against branch target injection attacks.
  • Pointer authentication: Techniques to verify the authenticity of pointers and prevent unauthorized access.

Cortex ISA Progression for ML

  • Armv7-A NEON including: 32x64bit register (Cortex-A7)
  • Armv8.0-A NEON including AArch32 and AArch64 (Cortex-A53)
  • Armv8.2-A NEON including: FP16 support (Cortex-A55)
  • Armv9.2-A SVE2 including BF16 support (Cortex-A520)

SVE2: SIMD Architecture for Next Decade of Devices

  • Better Developer experience: SVE2 provides a more flexible and efficient programming model for developers, allowing them to write code that can take advantage of the latest hardware features.
  • Better Auto-Vectorization: SVE2 provides better support for auto-vectorization, allowing compilers to generate more efficient code for SIMD operations.

Scalable Matrix Extension (SME)

SME2 execution unit in CPU cluster

  • Matrix outer product
  • Shared between all the CPUs
  • All CPUs have access to the engine
  • Transparent to OS

(Resource from ARM Scalable Matrix Extension SME introduction)

Benefits of SME2 in CPU

  • Performance: SME2 provides a significant performance boost for matrix operations, which are commonly used in machine learning and AI applications.
  • Flexibility & Future-Proof: SME2 is designed to be flexible and future-proof, allowing it to adapt to new workloads and use cases as they emerge.
  • Developer Efficiency: SME2 provides tools and features that enhance developer productivity, making it easier to write and optimize code for matrix operations.
  • Security: SME2 includes security features to protect against potential threats and vulnerabilities, ensuring the integrity and confidentiality of data processed by the engine.

Software Stack

Cortex-A software stack for Hayas composed of:

  • ML: TFL runtime ONNX, TVM, ArmNN, ACL
  • Computer Vision: OpenCV, ACL
  • Security: TF-A, OP-TEE, Secure Partition Manager

Arm Kleidi Libraries

  • Arm Kleidi AI
  • Arm Kleidi CV

//write more about Arm Kleidi AI and CV

Cortex-R Benefits

Cortex-R processors are designed for real-time applications, providing low-latency and deterministic performance. They are used in applications such as automotive, industrial control systems, and medical devices.

Key Benefits

  • Fast and deterministic response times: Cortex-R processors are designed to provide fast and deterministic response times, making them ideal for real-time applications.
  • Real-time performance: Tightly coupled memory, low latency peripheral ports, and a low-latency interrupt controller.
  • High reliability and fault tolerance: Cortex-R processors are designed to be highly reliable and fault-tolerant, making them ideal for applications that require high reliability and fault tolerance.

Applications of Cortex-R Processors

  • Automotive: Cortex-R processors are used in automotive applications such as advanced driver assistance systems (ADAS), electronic stability control (ESC), and anti-lock braking systems (ABS).
  • Industrial control systems: Cortex-R processors are used in industrial control systems such as robotics, factory automation, and process control.
  • Medical devices: Cortex-R processors are used in medical devices such as patient monitoring systems, infusion pumps, and imaging systems.
  • Other applications: Automotive, modem, storage, healthcare, industrial control systems, and more.

The latest Cortex-R82AE features 64-bit architecture with enhanced functional safety capabilities, virtual memory management, full system coherency, and native 64-bit instruction execution support.

Note: Cortex-A MMU is required for HLOS e.g. Linux/Android while Cortex-R typically runs on RTOS like FreeRTOS, Zephyr, and others.

Functional Safety

  • Systematic Errors Protection
  • Integrated Safety Mechanisms
  • Fault Detection and Recovery

Cortex-M

Cortex-M is the most units shipped in the world, with over 200 billion units shipped since its introduction in 2004. It is used in a wide range of applications, including consumer electronics, automotive, industrial control systems, and medical devices.

Cortex-M processors are designed for low-power, low-cost applications, providing a balance between performance and power efficiency. They are used in applications such as IoT devices, wearables, and smart home devices.

Applications of Cortex-M Processors

  • Energy grid, automotive, consumer electronics, industrial control systems, healthcare, retail, smart city, VR/AR, connected clothing and more.

Factors to Choose Suitable Cortex-M

When selecting the appropriate Cortex-M processor for your application, consider these key factors:

  • Cost/Die Area: Smaller die size reduces manufacturing costs, making it ideal for high-volume consumer products
  • Low Power: Ultra-low power consumption extends battery life for IoT devices and wearables
  • DSP Capabilities: Built-in digital signal processing features for audio, sensor data, and control algorithms
  • Security: Hardware security features like TrustZone-M for secure boot and data protection
  • Performance: Processing speed and efficiency for real-time applications and responsiveness
  • ML Support: Machine learning acceleration for edge AI applications and intelligent sensing

Arm Helium

arm.com/resource/ebook/helium-mve-reference-book (opens in a new tab)

  • In DSP/ML performance
  • x2 datapath increase
  • Achieve a x4 uplift in DSP/ML performance

Open Source Libraries Supporting Helium

  • CMSIS-DSP: CMSIS-DSP is a library of DSP functions optimized for Cortex-M processors, providing efficient implementations of common DSP algorithms.
  • CMSIS-Stream and CMSIS-NN: CMSIS-Stream and CMSIS-NN are libraries for machine learning and neural network applications on Cortex-M processors, providing optimized implementations of common ML algorithms.
  • Arm-2D Graphics Library: The Arm-2D graphics library is a lightweight, high-performance graphics library for Cortex-M processors, providing efficient rendering of 2D graphics.

Summary

  • Cortex-M spans the spectrum of embedded applications
  • Drives up energy efficiency
  • High performance