1. Overview
The NEORV32 RISC-V Processor is an open-source RISC-V compatible processor system that is intended as ready-to-go auxiliary processor within a larger SoC designs or as stand-alone custom / customizable microcontroller.
The system is highly configurable and provides optional common peripherals like embedded memories, timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb compatible on-chip debugger accessible via JTAG.
Special focus is paid on execution safety to provide defined and predictable behavior at any time. Therefore, the CPU ensures that all memory access are acknowledged and no invalid/malformed instructions are executed. Whenever an unexpected situation occurs, the application code is informed via hardware exceptions.
The software framework of the processor comes with application makefiles, software libraries for all CPU and processor features, a bootloader, a runtime environment and several example programs - including a port of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as default toolchain (prebuilt toolchains are also provided).
Check out the processor’s online User Guide that provides hands-on tutorials to get you started.
Structure
Annotations Types
Warning |
Important |
Note |
Tip |
1.1. Rationale
Why did you make this?
For me, processor and CPU architecture designs are fascinating things: they are the magic frontier where software meets hardware. This project started as something like a journey into this realm to understand how things actually work down on the very low level and evolved over time to a quite capable system-on-chip.
When I started to dive into the emerging RISC-V ecosystem I felt overwhelmed by the complexity. As a beginner it is hard to get an overview - especially when you want to setup a minimal platform to tinker with… Which core to use? How to get the right toolchain? What features do I need? How does booting work? How do I create an actual executable? How to get that into the hardware? How to customize things? Where to start???
This project aims to provide a simple to understand and easy to use yet powerful and flexible platform that targets FPGA and RISC-V beginners as well as advanced users.
Why a soft-core processor?
As a matter of fact soft-core processors cannot compete with discrete (ASIC) processors in terms of performance, energy efficiency and size. But they do fill a niche in the design space: for example, soft-core processors allow to implement the control flow part of certain applications (like communication protocol handling) using software like plain C. This provides high flexibility as software can be easily changed, re-compiled and re-uploaded again.
Furthermore, the concept of flexibility applies to all aspects of a soft-core processor. The user can add exactly the features that are required by the application: additional memories, custom interfaces, specialized co-processors and even user-defined instructions. These application-specific optimization capabilities compensate for many of the limitations of soft-core processors.
Why RISC-V?
RISC-V is a free and open ISA enabling a new era of processor innovation through open standard collaboration.
https://riscv.org/about/
Open-source is a great thing! While open-source has already become quite popular in software, hardware-focused projects still need to catch up. Although processors and CPUs are the heart of almost every digital system, having a true open-source platform is still a rarity. RISC-V aims to change that - and even it is just one approach, it helps paving the road for future development.
Furthermore, I highly appreciate the community aspect of RISC-V. The ISA and everything beyond is developed in direct contact with the community: this includes businesses and professionals but also hobbyist, amateurs and enthusiasts. Everyone can join discussions and contribute to RISC-V in their very own way.
Finally, I really like the RISC-V ISA itself. It aims to be a clean, orthogonal and "intuitive" ISA that resembles with the basic concepts of RISC: simple yet effective.
Yet another RISC-V core? What makes it special?
The NEORV32 is not based on another (RISC-V) core. It was build entirely from ground up just following the official ISA specs. The project does not intend to replace certain RISC-V cores or beat existing ones in terms of performance or size. It was build having a different design goal in mind.
The project aims to provide another option in the RISC-V / soft-core design space with a different performance vs. size trade-off and a different focus: embrace concepts like documentation, platform-independence / portability, RISC-V compatibility, extensibility & customization and - last but not least - ease of use.
Furthermore, the NEORV32 pays special focus on execution safety using Full Virtualization. The CPU aims to provide fall-backs for everything that could go wrong. This includes malformed instruction words, privilege escalations and even memory accesses that are checked for address space holes and deterministic response times of memory-mapped devices. Precise exceptions allow a defined and fully-synchronized state of the CPU at every time an in every situation.
To summarize, this project pursues the following objectives (in rough order of importance):
-
RISC-V-compliance and -compatibility
-
Functionality and features
-
Extensibility
-
Safety and security
-
Minimal area
-
Short critical paths, high operating clock
-
Simplicity / easy to understand
-
Low-power design
-
High overall performance
A multi-cycle architecture?!
The primary goal of many mainstream CPUs is pure performance. Deep pipelines and out-of-order execution are some concepts to boost performance, while also increasing complexity. In contrast, most CPUs used for teaching are single-cycle designs since they are probably the most easiest to understand. But what about something in-between?
In terms of energy, throughput, area and maximal clock frequency, multi-cycle architectures are somewhere in between single-cycle and fully-pipelined designs: they provide higher throughput and clock speed when compared to their single-cycle counterparts while having less hardware complexity (= area) and thus, less performance, then a fully-pipelined designs. So I decided to use the multi-cycle-approach because of the following reasons:
-
Multi-cycle architectures are quite small! There is no need for pipeline hazard detection/resolution logic (e.g. forwarding). Furthermore, you can "re-use" parts of the core to do several tasks (e.g. the ALU is used for actual data processing and also for address generation, branch condition check and branch target computation).
-
Single-cycle architectures require memories that can be read asynchronously - a thing that is not feasible to implement in real-world applications (i.e. FPGA block RAM is entirely synchronous). Furthermore, such designs usually have a very long critical path tremendously reducing maximal operating frequency.
-
Pipelined designs increase performance by having several instruction "in fly" at the same time. But this also means there is some kind of "out-of-order" behavior: if an instruction at the end of the pipeline causes an exception all the instructions in earlier stages have to be invalidated. Potential architectural state changes have to be made undone requiring additional logic (Spectre and Meltdown…). In a multi-cycle architecture this situation cannot occur since only a single instruction is being processed ("in-fly") at a time.
-
Having only a single instruction in fly does not only reduce hardware costs, it also simplifies simulation/verification/debugging, state preservation/restoring during exceptions and extensibility (no need to care about pipeline hazards) - but of course at the cost of reduced throughput.
To counteract the loss of performance implied by a pure multi-cycle architecture, the NEORV32 CPU uses a mixed approach: instruction-fetch (front-end) and instruction-execution (back-end) are de-coupled to operate independently of each other. Data is interchanged via a queue building a simple 2-stage pipeline. Each "pipeline" stage in terms is implemented as multi-cycle architecture to simplify the hardware and to provide precise state control (for example during exceptions).
1.2. Project Key Features
Project
-
all-in-one package: CPU + SoC + Software Framework & Tooling
-
completely described in behavioral, platform-independent VHDL - no vendor- or technology-specific primitives, attributes, macros, libraries, etc. are used at all
-
all-Verilog "version" available (auto-generated by GHDL)
-
extensive configuration options for adapting the processor to the requirements of the application
-
highly extensible hardware - on CPU, SoC and system level
-
aims to be as small as possible while being as RISC-V-compliant as possible - with a reasonable area-vs-performance trade-off
-
FPGA friendly (e.g. all internal memories can be mapped to block RAM - including the register file)
-
optimized for high clock frequencies to ease timing closure and integration
-
from zero to "hello world!" - completely open source and documented
-
easy to use even for FPGA/RISC-V starters – intended to work out of the box
NEORV32 CPU (the core)
-
32-bit RISC-V CPU
-
fully compatible to the RISC-V ISA specs. - checked by the official RISCOF architecture tests
-
base ISA + privileged ISA + several optional standard and custom ISA extensions
-
option to add user-defined RISC-V instructions as custom ISA extension
-
rich set of customization options (ISA extensions, design goal: performance / area / energy, tuning options, …)
-
Full Virtualization capabilities to increase execution safety
-
official RISC-V open source architecture ID
NEORV32 Processor (the SoC)
-
highly-configurable full-scale microcontroller-like processor system
-
based on the NEORV32 CPU
-
optional standard serial interfaces (UART, TWI, SPI (host and device), 1-Wire)
-
optional timers and counters (watchdog, system timer)
-
optional general purpose IO and PWM; a native NeoPixel(c)-compatible smart LED interface
-
optional embedded memories and caches for data, instructions and bootloader
-
optional external memory interface for custom connectivity
-
optional execute in-place (XIP) module to execute code directly form an external SPI flash
-
optional DMA controller for CPU-independent data transfers
-
optional CRC module to check data integrity
-
on-chip debugger compatible with OpenOCD and GDB including hardware trigger module and optional authentication
Software framework
-
GCC-based toolchain - prebuilt toolchains available; application compilation based on GNU makefiles
-
internal bootloader with serial user interface (via UART)
-
core libraries and HAL for high-level usage of the provided functions and peripherals
-
processor-specific runtime environment and several example programs
-
doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
-
FreeRTOS port + demos available
Extensibility and Customization
The NEORV32 processor is designed to ease customization and extensibility and provides several options for adding application-specific custom hardware modules and accelerators. The three most common options for adding custom on-chip modules are listed below.
-
Processor-External Bus Interface (XBUS) to attach processor-external IP modules (memories and peripherals)
-
Custom Functions Subsystem (CFS) for tightly-coupled processor-internal co-processors
-
Custom Functions Unit (CFU) for custom RISC-V instructions
A more detailed comparison of the extension/customization options can be found in section Adding Custom Hardware Modules of the user guide. |
1.3. Project Folder Structure
The root directory of the repository is considered the NEORV32 base or home folder (i.e. neorv32/
).
neorv32 - Project home folder │ ├docs - Project documentation │├datasheet - AsciiDoc sources for the NEORV32 data sheet │├figures - Figures and logos │├references - Data sheets and RISC-V specs │├sources - Sources for the images in 'figures/' │└userguide - AsciiDoc sources for the NEORV32 user guide │ ├rtl - VHDL sources │├core - Core sources of the CPU & SoC │├processor_templates - Pre-configured SoC wrappers │├system_integration - System wrappers and bridges for advanced connectivity │└test_setups - Minimal test setup "SoCs" used in the User Guide │ ├sim - Simulation files │ └-sw - Software framework ├bootloader - Sources of the processor-internal bootloader ├common - Linker script, crt0.S start-up code and central makefile ├example - Example programs for the core and the SoC modules │├eclipse - Pre-configured Eclipse IDE project │└... - Several example programs ├lib - Processor core library │├include - NEORV32 core library header files (*.h) │└source - NEORV32 core library source files (*.c) ├image_gen - Helper program to generate executables & memory images ├ocd_firmware - Firmware for the on-chip debugger's "park loop" ├openocd - OpenOCD configuration files └svd - Processor system view description file (CMSIS-SVD)
1.4. VHDL File Hierarchy
All required VHDL hardware source files are located in the project’s rtl/core
folder.
VHDL Library
All core VHDL files from the list below have to be assigned to a new library named neorv32 .
|
Compilation Order
See section File-List Files for more information.
|
neorv32_top.vhd - NEORV32 PROCESSOR/SOC TOP ENTITY │ ├neorv32_cpu.vhd - NEORV32 CPU TOP ENTITY │├neorv32_cpu_alu.vhd - Arithmetic/logic unit ││├neorv32_cpu_cp_bitmanip.vhd - Bit-manipulation co-processor (B ext.) ││├neorv32_cpu_cp_cfu.vhd - Custom instructions co-processor (Zxcfu ext.) ││├neorv32_cpu_cp_cond.vhd - Integer conditional co-processor (Zicond ext.) ││├neorv32_cpu_cp_crypto.vhd - Scalar cryptographic co-processor (Zk*/Zbk* ext.) ││├neorv32_cpu_cp_fpu.vhd - Floating-point co-processor (Zfinx ext.) ││├neorv32_cpu_cp_muldiv.vhd - Mul/Div co-processor (M ext.) ││└neorv32_cpu_cp_shifter.vhd - Bit-shift co-processor (base ISA) │├neorv32_cpu_control.vhd - CPU control, exception system and CSRs ││└neorv32_cpu_decompressor.vhd - Compressed instructions decoder (C ext.) │├neorv32_cpu_lsu.vhd - Load/store unit │├neorv32_cpu_pmp.vhd - Physical memory protection unit (Smpmp ext.) │└neorv32_cpu_regfile.vhd - Data register file │ ├neorv32_boot_rom.vhd - Bootloader ROM │└neorv32_bootloader_image.vhd - Bootloader ROM memory image (package) ├neorv32_bus.vhd - SoC bus infrastructure modules ├neorv32_cache.vhd - Generic cache module ├neorv32_cfs.vhd - Custom functions subsystem ├neorv32_clockgate.vhd - Generic clock gating switch ├neorv32_crc.vhd - Cyclic redundancy check unit ├neorv32_debug_dm.vhd - on-chip debugger: debug module ├neorv32_debug_auth.vhd - on-chip debugger: authentication module ├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module ├neorv32_dma.vhd - Direct memory access controller ├neorv32_dmem.vhd - Generic processor-internal data memory ├neorv32_fifo.vhd - Generic FIFO component ├neorv32_gpio.vhd - General purpose input/output port unit ├neorv32_gptmr.vhd - General purpose 32-bit timer ├neorv32_imem.vhd - Generic processor-internal instruction memory │└neorv32_application_image.vhd - IMEM application initialization image (package) ├neorv32_mtime.vhd - Machine system timer ├neorv32_neoled.vhd - NeoPixel (TM) compatible smart LED interface ├neorv32_onewire.vhd - One-Wire serial interface controller ├neorv32_package.vhd - Main VHDL package file ├neorv32_pwm.vhd - Pulse-width modulation controller ├neorv32_sdi.vhd - Serial data interface controller (SPI device) ├neorv32_slink.vhd - Stream link interface ├neorv32_spi.vhd - Serial peripheral interface controller (SPI host) ├neorv32_sys.vhd - System infrastructure modules ├neorv32_sysinfo.vhd - System configuration information memory ├neorv32_trng.vhd - True random number generator ├neorv32_twi.vhd - Two wire serial interface controller ├neorv32_uart.vhd - Universal async. receiver/transmitter ├neorv32_wdt.vhd - Watchdog timer ├neorv32_xbus.vhd - External (Wishbone) bus interface gateways ├neorv32_xip.vhd - Execute in place module └neorv32_xirq.vhd - External interrupt controller
Replacing Modules for Customization or Optimization
Any module of the core can be replaced by the user for customization purpose. For example, the default IMEM and DMEM
modules as well as the CPU’s register file can be replaced by technology-specific primitives to optimize energy, speed
and area utilization. The module, which are dedicated for customization, i.e. CFS and CFU can be replaced by
user-defined modules to implement application-specific functionality.
|
1.4.1. File-List Files
Most of the RTL sources use entity instantiation. Hence, the RTL compile order might be relevant (depending on
the synthesis/simulation tool. Therefore, two file-list files are provided in the rtl
folder that list all required
HDL files for the CPU core and for the entire processor and also represent their recommended compile order.
These file-list files can be consumed by EDA tools to simplify project setup.
-
file_list_cpu.f
- HDL files and compile order for the CPU core; top module:neorv32_cpu
-
file_list_soc.f
- HDL files and compile order for the entire processor/SoC; top module:neorv32_top
A simple bash script generate_file_lists.sh
is provided for regenerating the file-lists (using GHDL’s elaborate command).
This script can also be invoked using the default application makefile (see Makefile Targets).
By default, the file-list files include a placeholder in the path of each included hardware source file. These placeholders need to be replaced by the actual path before being used. Example:
-
default:
NEORV32_RTL_PATH_PLACEHOLDER/core/neorv32_package.vhd
-
adjusted:
path/to/neorv32/rtl/core/neorv32_package.vhd
NEORV32_HOME = path/to/neorv32 (1)
NEORV32_SOC_FILE = $(shell cat $(NEORV32_HOME)/rtl/file_list_soc.f) (2)
NEORV32_SOC_SRCS = $(subst NEORV32_RTL_PATH_PLACEHOLDER, $(NEORV32_HOME)/rtl, $(NEORV32_SOC_FILE))) (3)
1 | Path to the NEORV32 home folder (i.e. the root folder of the GitHub repository). |
2 | Load the content of the file_list_soc.f file-list into a new variable NEORV32_SOC_FILE . |
3 | Substitute the file-list file’s path placeholder “NEORV32_RTL_PATH_PLACEHOLDER” by the actual path. |
set file_list_file [read [open "$neorv32_home/rtl/file_list_soc.f" r]]
set file_list [string map [list "NEORV32_RTL_PATH_PLACEHOLDER" "$neorv32_home/rtl"] $file_list_file]
puts "NEORV32 source files:"
puts $file_list
File-List Usage Examples
The provided file-list files are used by the GHDL-based simple simulation setup (sim/ghdl.setup.sh ) as
well as by the Vivado IP packager script (rtl/system_integration/neorv32_vivado_ip.tcl ).
|
1.5. VHDL Coding Style
-
The entire processor including the CPU core is written in platform-/technology-independent VHDL. The code makes minimal use of VHDL 2008 features to provide compatibility even for older EDA tools.
-
A single package / library file (
neorv32_package.vhd
) is used to provide global defines and helper functions. The specific user-defined configuration is done entirely by the generics of the top entity. -
Internally, the generics are checked to ensure a correct configuration. Asserts and "sanity checks" are used to inform the user about the actual processor configuration and potential illegal setting.
-
The code uses entity instation for all internal modules. However, if several "submodules" are specified within the same file component instantiation is used for those.
-
When instantiating the processor top module (
neorv32_top.vhd
) in a custom design either entity instantiation or component instantiation can be used as the NEORV32 package file / library already provides an according component declaration.
Verilog Version
A GHDL-generated all-Verilog version of the processor is available at https://github.com/stnolting/neorv32-verilog.
The provided setup generates a synthesizable Verilog netlist for a custom processor configuration.
|
1.6. FPGA Implementation Results
This section shows exemplary FPGA implementation results for the NEORV32 CPU and NEORV32 Processor modules.
The results are generated by manual synthesis runs. Hence, they might not represent the latest version of the processor. |
CPU
HW version: |
|
Top entity: |
|
FPGA: |
Intel Cyclone IV E |
Toolchain: |
Quartus Prime Lite 21.1 |
Constraints: |
no timing constraints, "balanced optimization", fmax from "Slow 1200mV 0C Model" |
CPU ISA Configuration | LEs | FFs | MEM bits | DSPs | fmax |
---|---|---|---|---|---|
|
1223 |
607 |
1024 |
0 |
130 MHz |
|
1578 |
773 |
1024 |
0 |
130 MHz |
|
2087 |
983 |
1024 |
0 |
130 MHz |
|
2338 |
992 |
1024 |
0 |
130 MHz |
|
3175 |
1247 |
1024 |
0 |
130 MHz |
|
3186 |
1254 |
1024 |
0 |
130 MHz |
|
3187 |
1254 |
1024 |
0 |
130 MHz |
|
4450 |
1906 |
1024 |
7 |
123 MHz |
|
4825 |
2018 |
1024 |
7 |
123 MHz |
Goal-Driven Optimization
The CPU provides further options to reduce the area footprint or to increase performance.
See section Processor Top Entity - Generics for more information. Also, take a look at the User Guide section
Application-Specific Processor Configuration.
|
Processor - Modules
HW version: |
|
Top entity: |
|
FPGA: |
Intel Cyclone IV E |
Toolchain: |
Quartus Prime Lite 21.1 |
Constraints: |
no timing constraints, "balanced optimization" |
Module | Description | LEs | FFs | MEM bits | DSPs |
---|---|---|---|---|---|
BOOT ROM |
Bootloader ROM (4kB) |
2 |
2 |
32768 |
0 |
Bus switch (core) |
SoC bus infrastructure |
28 |
15 |
0 |
0 |
Bus switch (DMA) |
SoC bus infrastructure |
159 |
9 |
0 |
0 |
CFS |
Custom functions subsystem (depends on custom design logic) |
- |
- |
- |
- |
CRC |
Cyclic redundancy check unit |
130 |
117 |
0 |
0 |
dCACHE |
Data cache (4 blocks, 64 bytes per block) |
300 |
167 |
2112 |
0 |
DM |
On-chip debugger - debug module |
377 |
241 |
0 |
0 |
DTM |
On-chip debugger - debug transfer module (JTAG) |
262 |
220 |
0 |
0 |
DMA |
Direct memory access controller |
365 |
291 |
0 |
0 |
DMEM |
Processor-internal data memory (8kB) |
6 |
2 |
65536 |
0 |
Gateway |
SoC bus infrastructure |
215 |
91 |
0 |
0 |
GPIO |
General purpose input/output ports |
102 |
98 |
0 |
0 |
GPTMR |
General Purpose Timer |
150 |
105 |
0 |
0 |
IO Switch |
SoC bus infrastructure |
217 |
0 |
0 |
0 |
iCACHE |
Instruction cache (2x4 blocks, 64 bytes per block) |
458 |
296 |
4096 |
0 |
IMEM |
Processor-internal instruction memory (16kB) |
7 |
2 |
131072 |
0 |
MTIME |
Machine system timer |
307 |
166 |
0 |
0 |
NEOLED |
Smart LED Interface (NeoPixel/WS28128) (FIFO_depth=1) |
171 |
129 |
0 |
0 |
ONEWIRE |
1-wire interface |
105 |
77 |
0 |
0 |
PWM |
Pulse_width modulation controller (4 channels) |
91 |
81 |
0 |
0 |
Reservation Set |
Reservation set controller for LR/SC instructions |
52 |
33 |
0 |
0 |
SDI |
Serial data interface |
103 |
77 |
512 |
0 |
SLINK |
Stream link interface (RX/TX FIFO depth=32) |
96 |
73 |
2048 |
0 |
SPI |
Serial peripheral interface |
137 |
97 |
1024 |
0 |
SYSINFO |
System configuration information memory |
11 |
11 |
0 |
0 |
TRNG |
True random number generator |
140 |
108 |
512 |
0 |
TWI |
Two-wire interface |
93 |
64 |
0 |
0 |
UART0, UART1 |
Universal asynchronous receiver/transmitter 0/1 (FIFO_depth=1) |
222 |
142 |
1024 |
0 |
WDT |
Watchdog timer |
107 |
89 |
0 |
0 |
WISHBONE |
External memory interface |
122 |
112 |
0 |
0 |
XIP |
Execute in place module |
369 |
276 |
0 |
0 |
XIRQ |
External interrupt controller (4 channels) |
35 |
29 |
0 |
0 |
1.7. CPU Performance
The performance of the NEORV32 was tested and evaluated using the Core Mark CPU benchmark.
The according sources can be found in the sw/example/coremark
folder.
The resulting CoreMark score is defined as CoreMark iterations per second per MHz.
HW version: |
|
Hardware: |
32kB int. IMEM, 16kB int. DMEM, no caches, 100MHz clock |
CoreMark: |
2000 iterations, MEM_METHOD is MEM_STACK |
Compiler: |
RISCV32-GCC 10.2.0 (compiled with |
Compiler flags: |
default but with |
CPU | CoreMark Score | CoreMarks/MHz | Average CPI |
---|---|---|---|
small ( |
33.89 |
0.3389 |
4.04 |
medium ( |
62.50 |
0.6250 |
5.34 |
performance ( |
95.23 |
0.9523 |
3.54 |
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of several consecutive micro operations. The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on the available CPU extensions. More information regarding the execution time of each implemented instruction can be found in section Instruction Sets and Extensions.
2. NEORV32 Processor (SoC)
The NEORV32 Processor is based on the NEORV32 CPU. Together with common peripheral interfaces and embedded memories it provides a RISC-V-based full-scale microcontroller-like SoC platform.
Section Structure
Key Features
-
optional processor-internal data and instruction memories (DMEM/IMEM)
-
optional caches (I-CACHE, D-CACHE, XIP-CACHE, XBUS-CACHE)
-
optional internal bootloader (BOOTROM) with UART console & SPI flash boot option
-
optional machine system timer (MTIME), RISC-V-compatible
-
optional two independent universal asynchronous receivers and transmitters (UART0, UART1) with optional hardware flow control (RTS/CTS)
-
optional serial peripheral interface host controller (SPI) with 8 dedicated CS lines
-
optional 8-bit serial data device interface (SDI)
-
optional two wire serial interface controller (TWI), compatible to the I²C standard
-
optional general purpose parallel IO port (GPIO), 64xOut, 64xIn
-
optional 32-bit external bus interface, Wishbone b4 / AXI4-Lite compatible (XBUS)
-
optional watchdog timer (WDT)
-
optional PWM controller with up to 16 individual channels (PWM)
-
optional ring-oscillator-based true random number generator (TRNG)
-
optional custom functions subsystem for custom co-processor extensions (CFS)
-
optional NeoPixel™/WS2812-compatible smart LED interface (NEOLED)
-
optional external interrupt controller with up to 32 channels and programmable interrupt triggers (XIRQ)
-
optional general purpose 32-bit timer (GPTMR)
-
optional execute in-place module (XIP)
-
optional 1-wire serial interface controller (ONEWIRE), compatible to the 1-wire standard
-
optional autonomous direct memory access controller (DMA)
-
optional stream link interface (SLINK), AXI4-Stream compatible
-
optional cyclic redundancy check unit (CRC)
-
optional on-chip debugger with JTAG TAP (OCD)
-
optional system configuration information memory to determine hardware configuration via software (SYSINFO)
2.1. Processor Top Entity - Signals
The following table shows all interface signals of the processor top entity (rtl/core/neorv32_top.vhd
).
All signals are of type std_ulogic
or std_ulogic_vector
, respectively.
Default Values of Inputs
All optional input signals provide default values in case they are not explicitly assigned during instantiation.
The weak driver strengths of VHDL ('L' and 'H' ) are used to model a pull-down or pull-up resistor.
|
Variable-Sized Ports
Some peripherals allow to configure the number of channels to-be-implemented by a generic (for example the number
of PWM channels). The according input/output signals have a fixed sized regardless of the actually configured
amount of channels. If less than the maximum number of channels is configured, only the LSB-aligned channels are used:
in case of an input port the remaining bits/channels are left unconnected; in case of an output port the remaining
bits/channels are hardwired to zero.
|
Tri-State Interfaces
Some interfaces (like the TWI and the 1-Wire bus) require explicit tri-state drivers in the final top module.
|
Input/Output Registers
By default all output signals are driven by register and all input signals are synchronized into the processor’s
clock domain also using registers. However, for ASIC implementations it is recommended to add another register state
to all inputs and output so the synthesis tool can insert an explicit IO (boundary) scan chain.
|
Name | Width | Direction | Default | Description |
---|---|---|---|---|
Global Control (Processor Clocking and Processor Reset) |
||||
|
1 |
in |
none |
global clock line, all registers triggering on rising edge |
|
1 |
in |
none |
global reset, asynchronous, low-active |
JTAG Access Port for On-Chip Debugger (OCD) |
||||
|
1 |
in |
|
serial clock |
|
1 |
in |
|
serial data input |
|
1 |
out |
- |
serial data output |
|
1 |
in |
|
mode select |
|
32 |
out |
- |
destination address |
|
32 |
out |
- |
read data |
|
3 |
out |
- |
access tag |
|
1 |
out |
- |
write enable ('0' = read transfer) |
|
4 |
out |
- |
byte enable |
|
1 |
out |
- |
strobe |
|
1 |
out |
- |
valid cycle |
|
32 |
in |
|
write data |
|
1 |
in |
|
transfer acknowledge |
|
1 |
in |
|
transfer error |
|
32 |
in |
|
RX data |
|
4 |
in |
|
RX source routing information |
|
1 |
in |
|
RX data valid |
|
1 |
in |
|
RX last element of stream |
|
1 |
out |
- |
RX ready to receive |
|
32 |
out |
- |
TX data |
|
4 |
out |
- |
TX destination routing information |
|
1 |
out |
- |
TX data valid |
|
1 |
out |
- |
TX last element of stream |
|
1 |
in |
|
TX allowed to send |
|
1 |
out |
- |
chip select, low-active |
|
1 |
out |
- |
serial clock |
|
1 |
in |
|
serial data input |
|
1 |
out |
- |
serial data output |
|
64 |
out |
- |
general purpose parallel output |
|
64 |
in |
|
general purpose parallel input |
Primary Universal Asynchronous Receiver and Transmitter (UART0) |
||||
|
1 |
out |
- |
serial transmitter |
|
1 |
in |
|
serial receiver |
|
1 |
out |
- |
RX ready to receive new char |
|
1 |
in |
|
TX allowed to start sending, low-active |
Secondary Universal Asynchronous Receiver and Transmitter (UART1) |
||||
|
1 |
out |
- |
serial transmitter |
|
1 |
in |
|
serial receiver |
|
1 |
out |
- |
RX ready to receive new char |
|
1 |
in |
|
TX allowed to start sending, low-active |
|
1 |
out |
- |
controller clock line |
|
1 |
out |
- |
serial data output |
|
1 |
in |
|
serial data input |
|
8 |
out |
- |
select (low-active) |
|
1 |
in |
|
controller clock line |
|
1 |
out |
- |
serial data output |
|
1 |
in |
|
serial data input |
|
1 |
in |
|
chip select, low-active |
|
1 |
in |
|
serial data line sense input |
|
1 |
out |
- |
serial data line output (pull low only) |
|
1 |
in |
|
serial clock line sense input |
|
1 |
out |
- |
serial clock line output (pull low only) |
|
1 |
in |
|
1-wire bus sense input |
|
1 |
out |
- |
1-wire bus output (pull low only) |
|
16 |
out |
- |
pulse-width modulated channels |
|
32 |
in |
|
custom CFS input signal conduit |
|
32 |
out |
- |
custom CFS output signal conduit |
|
1 |
out |
- |
asynchronous serial data output |
|
64 |
out |
- |
MTIME system time output |
|
32 |
in |
|
external interrupt requests |
RISC-V Machine-Mode Processor Interrupts |
||||
|
1 |
in |
|
machine timer interrupt (RISC-V), high-level-active; for chip-internal usage only |
|
1 |
in |
|
machine software interrupt (RISC-V), high-level-active; for chip-internal usage only |
|
1 |
in |
|
machine external interrupt (RISC-V), high-level-active; for chip-internal usage only |
2.2. Processor Top Entity - Generics
This section lists all configuration generics of the NEORV32 processor top entity (rtl/neorv32_top.vhd
).
These generics allow to configure the system according to your needs. The generics are
used to control implementation of certain CPU extensions and peripheral modules and even allow to
optimize the system for certain design goals like minimal area or maximum performance.
Default Values
All optional configuration generics provide default values in case they are not explicitly assigned during instantiation.
|
Software Discovery of Configuration
Software can determine the actual CPU configuration via the misa and mxisa CSRs. The Soc/Processor
and can be determined via the SYSINFO memory-mapped registers.
|
Excluded Modules and Extensions
If optional modules (like CPU extensions or peripheral devices) are not enabled the according hardware
will not be synthesized at all. Hence, the disabled modules do not increase area and power requirements
and do not impact timing.
|
Table Abbreviations
The generic type “suv(x:y)” is an abbreviation for “std_ulogic_vector(x downto y)”.
|
Name | Type | Default | Description |
---|---|---|---|
|
natural |
0 |
The clock frequency of the processor’s |
|
boolean |
false |
Enable clock gating when CPU is in sleep mode (see sections Sleep Mode and Processor Clocking). |
Core Identification |
|||
|
suv(31:0) |
x"00000000" |
The hart thread ID of the CPU (passed to |
|
suv(10:0) |
"00000000000" |
JEDEC ID; continuation codes plus vendor ID (passed to |
|
natural |
0 |
Boot mode select; see Boot Configuration. |
|
suv(31:0) |
x"00000000" |
Custom CPU boot address (available if |
|
boolean |
false |
Implement the on-chip debugger and the CPU debug mode. |
|
boolean |
false |
Implement Debug Authentication module. |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
true |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable |
|
boolean |
false |
Enable NEORV32-specific |
CPU Architecture Tuning Options |
|||
|
boolean |
false |
Implement fast but large full-parallel multipliers (trying to infer DSP blocks); see section CPU Arithmetic Logic Unit. |
|
boolean |
false |
Implement fast but large full-parallel barrel shifters; see section CPU Arithmetic Logic Unit. |
|
boolean |
false |
Implement full hardware reset for register file (use individual FFs instead of BRAM); see section CPU Register File. |
Physical Memory Protection ( |
|||
|
natural |
0 |
Number of implemented PMP regions (0..16). |
|
natural |
4 |
Minimal region granularity in bytes. Has to be a power of two, min 4. |
|
boolean |
true |
Implement support for top-of-region (TOR) mode. |
|
boolean |
true |
Implement support for naturally-aligned power-of-two (NAPOT & NA4) modes. |
Hardware Performance Monitors ( |
|||
|
natural |
0 |
Number of implemented hardware performance monitor counters (0..13). |
|
natural |
40 |
Total LSB-aligned size of each HPM counter. Min 0, max 64. |
Internal Instruction Memory (IMEM) |
|||
|
boolean |
false |
Implement the processor-internal instruction memory. |
|
natural |
16*1024 |
Size in bytes of the processor internal instruction memory (use a power of 2). |
Internal Data Memory (DMEM) |
|||
|
boolean |
false |
Implement the processor-internal data memory. |
|
natural |
8*1024 |
Size in bytes of the processor-internal data memory (use a power of 2). |
|
boolean |
false |
Implement the instruction cache. |
|
natural |
4 |
Number of blocks ("lines") Has to be a power of two. |
|
natural |
64 |
Size in bytes of each block. Has to be a power of two. |
|
boolean |
false |
Implement the data cache. |
|
natural |
4 |
Number of blocks ("lines"). Has to be a power of two. |
|
natural |
64 |
Size in bytes of each block. Has to be a power of two. |
Processor-External Bus Interface (XBUS) (Wishbone b4 protocol) |
|||
|
boolean |
false |
Implement the external bus interface. |
|
natural |
255 |
Clock cycles after which a pending external bus access will auto-terminate and raise a bus fault exception. |
|
boolean |
false |
Implement XBUS register stages to ease timing closure. |
|
boolean |
false |
Implement the external bus cache. |
|
natural |
64 |
Number of blocks ("lines"). Has to be a power of two. |
|
natural |
32 |
Size in bytes of each block. Has to be a power of two. |
|
boolean |
false |
Implement the execute in-place module. |
|
boolean |
false |
Implement XIP cache. |
|
natural |
8 |
Number of blocks in XIP cache. Has to be a power of two. |
|
natural |
256 |
Number of bytes per XIP cache block. Has to be a power of two, min 4. |
|
natural |
0 |
Number of channels of the external interrupt controller. Valid values are 0..32. |
Peripheral/IO Modules |
|||
|
boolean |
false |
Disable System Configuration Information Memory (SYSINFO) module; ⚠️ not recommended - for advanced users only! |
|
natural |
0 |
Number of general purpose input/output pairs of the General Purpose Input and Output Port (GPIO). |
|
boolean |
false |
Implement the Machine System Timer (MTIME). |
|
boolean |
false |
Implement the Primary Universal Asynchronous Receiver and Transmitter (UART0). |
|
natural |
1 |
UART0 RX FIFO depth, has to be a power of two, minimum value is 1, max 32768. |
|
natural |
1 |
UART0 TX FIFO depth, has to be a power of two, minimum value is 1, max 32768. |
|
boolean |
false |
Implement the Secondary Universal Asynchronous Receiver and Transmitter (UART1). |
|
natural |
1 |
UART1 RX FIFO depth, has to be a power of two, minimum value is 1, max 32768. |
|
natural |
1 |
UART1 TX FIFO depth, has to be a power of two, minimum value is 1, max 32768. |
|
boolean |
false |
Implement the Serial Peripheral Interface Controller (SPI). |
|
natural |
1 |
Depth of the Serial Peripheral Interface Controller (SPI) FIFO. Has to be a power of two, min 1, max 32768. |
|
boolean |
false |
Implement the Serial Data Interface Controller (SDI). |
|
natural |
1 |
Depth of the Serial Data Interface Controller (SDI) FIFO. Has to be a power of two, min 1, max 32768. |
|
boolean |
false |
Implement the Two-Wire Serial Interface Controller (TWI). |
|
natural |
1 |
Depth of the Two-Wire Serial Interface Controller (TWI) FIFO. Has to be a power of two, min 1, max 32768. |
|
natural |
0 |
Number of channels of the Pulse-Width Modulation Controller (PWM) to implement (0..16). |
|
boolean |
false |
Implement the Watchdog Timer (WDT). |
|
boolean |
false |
Implement the True Random-Number Generator (TRNG). |
|
natural |
1 |
Depth of the TRNG data FIFO. Has to be a power of two, min 1, max 32768. |
|
boolean |
false |
Implement the Custom Functions Subsystem (CFS). |
|
suv(31:0) |
x"00000000" |
"Conduit" generic to pass user-defined flags to the Custom Functions Subsystem (CFS). |
|
natural |
32 |
Size of the Custom Functions Subsystem (CFS) input signal conduit ( |
|
natural |
32 |
Size of the Custom Functions Subsystem (CFS) output signal conduit ( |
|
boolean |
false |
Implement the Smart LED Interface (NEOLED). |
|
natural |
1 |
TX FIFO depth of the the Smart LED Interface (NEOLED). Has to be a power of two, min 1, max 32768. |
|
boolean |
false |
Implement the General Purpose Timer (GPTMR). |
|
boolean |
false |
Implement the One-Wire Serial Interface Controller (ONEWIRE). |
|
boolean |
false |
Implement the Direct Memory Access Controller (DMA). |
|
boolean |
false |
Implement the Stream Link Interface (SLINK). |
|
natural |
1 |
SLINK RX FIFO depth, has to be a power of two, minimum value is 1, max 32768. |
|
natural |
1 |
SLINK TX FIFO depth, has to be a power of two, minimum value is 1, max 32768. |
|
boolean |
false |
Implement the Cyclic Redundancy Check (CRC) unit. |
2.3. Processor Clocking
The processor is implemented as fully-synchronous logic design using a single clock domain that is driven entirely
by the top’s clk_i
signal. This clock signal is used by all internal registers and memories. All of them trigger
on the rising edge of this clock signal - the only exception it the default Clock Gating module. External
"clocks" like the OCD’s JTAG clock or the SDI’s serial clock are synchronized into the processor’s clock domain
before being used as "general logic signal" (and not as a dedicated clock).
2.3.1. Clock Gating
The single clock domain of the processor can be split into an always-on clock domain and a switchable clock domain. The switchable clock domain is used to clock the CPU core, the CPU’s bus switch and - if implemented - the caches. This domain can be deactivated to reduce power consumption. The always-on clock domain is used to clock all other processor modules like peripherals, memories and IO devices. Hence, these modules can continue operation (e.g. a timer keeps running) even if the CPU is shut down.
The splitting into two clock domain is enabled by the CLOCK_GATING_EN
generic (Processor Top Entity - Generics).
When enabled, a generic clock switching gate is added to decouple the switchable clock from the always-on clock domain
(VHDL file neorv32_clockgate.vhd
). Whenever the CPU enters Sleep Mode the CPU clock domain ist shut down.
Clock Switch Hardware
By default, a generic clock gate is used (rtl/core/neorv32_clockgate.vhd ) to shut down the CPU clock.
Especially for FPGA setups it is highly recommended to replace this default version by a technology-specific primitive
or macro wrapper to improve efficiency (clock skew, global clock tree usage, etc.).
|
2.3.2. Peripheral Clocks
Many processor modules like the UARTs or the timers provide a programmable time base for operations. In order to simplify
the hardware, the processor implements a global "clock generator" (neorv32_sys.vhd
) that provides single-cycle clock enables
for certain frequencies which are derived from the main clock. These clock enable signals are synchronous to the system’s
main clock. The processor modules can use these enables for sub-main-clock operations while still providing a single
clock domain only.
In total, 8 sub-main-clock signals are available. All processor modules, which feature a time-based configuration, provide a
programmable three-bit prescaler select in their control register to select one of the 8 available clocks. The
mapping of the prescaler select bits to the according clock source is shown in the table below. Here, f represents the
processor main clock from the top entity’s clk_i
signal.
Prescaler bits: |
|
|
|
|
|
|
|
|
Resulting clock: |
f/2 |
f/4 |
f/8 |
f/64 |
f/128 |
f/1024 |
f/2048 |
f/4096 |
Power Saving
If no peripheral modules requires a clock signal from the internal clock generator (all according modules are disabled by
clearing the enable bit in the according module’s control register) the generator is automatically deactivated to reduce
dynamic power consumption.
|
2.4. Processor Reset
The NEORV32 processor includes a central reset sequencer (neorv32_sys.vhd
) that handles all reset requests
and controls the internal reset nets. The processor-wide reset (aka "system reset") can be triggered by any
of the following sources:
-
the asynchronous low-active
rstn_i
top entity input signal (External source) -
the On-Chip Debugger (OCD) (internal source)
-
the Watchdog Timer (WDT) (internal source)
Processor Reset Signal
Make sure to connect the processor’s reset signal rstn_i to a valid reset source (a button, the "locked"
signal of a PLL, a dedicated reset controller, etc.).
|
Reset Cause
The actual reset cause can be determined via the Watchdog Timer (WDT).
|
If any of these sources triggers a reset, the internal system-wide reset will be active for at least 4 clock cycles ensuring
a valid reset of the entire processor. This system reset is asserted asynchronoulsy if triggered by the external
rstn_i
signal and is asserted synchronously if triggered by an internal reset source. However, the system reset is
always de-asserted synchronously at the next rising clock edge.
Internally, all registers that are not meant for mapping to blockRAM (like the register file) do provide a dedicated and low-active asynchronous hardware reset. This asynchronous reset ensures that the entire processor logic is reset to a defined state even if the main clock is not operational yet.
2.5. Processor Interrupts
The NEORV32 Processor provides several interrupt request signals (IRQs) for custom platform use.
Trigger Type
All interrupt request lines are level-triggered and high-active. Once set, the signal should remain high until
the interrupt request is explicitly acknowledged (e.g. writing to a memory-mapped register).
|
2.5.1. RISC-V Standard Interrupts
The processor setup features the standard machine-level RISC-V interrupt lines for "machine timer interrupt", "machine software interrupt" and "machine external interrupt". Their usage is defined by the RISC-V privileged architecture specifications. However, bare-metal system can also repurpose these interrupts. See CPU section Traps, Exceptions and Interrupts for more information.
Top signal | Description |
---|---|
|
Machine timer interrupt from processor-external MTIME unit ( |
|
Machine software interrupt ( |
|
Machine external interrupt ( |
2.5.2. NEORV32-Specific Fast Interrupt Requests
As part of the NEORV32-specific CPU extensions, the processor core features 16 fast interrupt request signals
(FIRQ0
to FIRQ15
) providing dedicated bits in the mip
and mie
CSRs and custom mcause
trap codes.
The FIRQ signals are reserved for processor-internal modules only (for example for the communication
interfaces to signal "available incoming data" or "ready to send new data").
The mapping of the 16 FIRQ channels to the according processor-internal modules is shown in the following table (the channel number also corresponds to the according FIRQ priority: 0 = highest, 15 = lowest):
Channel | Source | Description |
---|---|---|
0 |
TRNG data available interrupt |
|
1 |
Custom functions subsystem (CFS) interrupt (user-defined) |
|
2 |
UART0 RX FIFO level interrupt |
|
3 |
UART0 TX FIFO level interrupt |
|
4 |
UART1 RX FIFO level interrupt |
|
5 |
UART1 TX FIFO level interrupt |
|
6 |
SPI FIFO level interrupt |
|
7 |
TWI FIFO level interrupt |
|
8 |
External interrupt controller interrupt |
|
9 |
NEOLED TX FIFO level interrupt |
|
10 |
DMA transfer done interrupt |
|
11 |
SDI FIFO level interrupt |
|
12 |
General purpose timer interrupt |
|
13 |
1-wire idle interrupt |
|
14 |
SLINK RX FIFO level interrupt |
|
15 |
SLINK TX FIFO level interrupt |
2.6. Address Space
As a 32-bit architecture the NEORV32 can access a 4GB physical address space. By default, this address space is
split into six main regions. Each region provides specific physical memory attributes ("PMAs") that define
the access capabilities (rwxac
; r
= read permission, w
= write permission, x
- execute permission,
a
= atomic access support, c
= cached CPU access, p
= privileged access only).
The "Void" (Unmapped Addresses)
All accesses to "unmapped" addresses (= "void") are redirected to the Processor-External Bus Interface (XBUS).
For example, if the internal IMEM is disabled, the accesses to the entire address space between 0x00000000 and
0x7FFFFFFF are converted into XBUS requests. If the XBUS interface is not enabled any access to the void will
raise a bus error exception.
|
# | Region | PMAs | Description |
---|---|---|---|
1 |
Internal IMEM address space |
|
For instructions (=code) and constants; mapped to the internal Instruction Memory (IMEM). |
2 |
Internal DMEM address space |
|
For application runtime data (heap, stack, etc.); mapped to the internal Data Memory (DMEM)). |
3 |
Memory-mapped XIP flash |
|
Memory-mapped access to the Execute In Place Module (XIP) SPI flash. |
4 |
Bootloader address space |
|
Read-only memory for the internal Bootloader ROM (BOOTROM) containing the default Bootloader. |
5 |
IO/peripheral address space |
|
Processor-internal peripherals / IO devices. |
6 |
The "void" |
|
Unmapped address space. All accesses to this region(s) are redirected to the Processor-External Bus Interface (XBUS) (if implemented). |
Privileged IO and BOOTROM Access Only
Only privileged accesses (M-mode) to the IO/peripheral and bootloader address spaces are allowed.
If an unprivileged application tries to access this address space a bus access error exception is raised.
|
Custom PMAs
Custom physical memory attributes enforced by the CPU’s physcial memory protection (Smpmp ISA Extension)
can be used to further constrain the physical memory attributes.
|
2.6.1. Bus System
The CPU can access all of the 32-bit address space from the instruction fetch interface and also from the data access interface. Both CPU interfaces can be equipped with optional caches (Processor-Internal Data Cache (dCACHE) and Processor-Internal Instruction Cache (iCACHE)). The two CPU interfaces are multiplexed by a simple bus switch into a single processor-internal bus. Optionally, this bus is further switched by another instance of the bus switch so the Direct Memory Access Controller (DMA) controller can also access the entire address space. Accesses via the resulting SoC bus are split by the Bus Gateway that redirects accesses to the according main address regions (see table above). Accesses to the processor-internal IO/peripheral devices are further redirected via a dedicated IO Switch.
Bus System Infrastructure
The components of the processor’s bus system infrastructure are located in rtl/core/neorv32_bus.vhd .
|
Bus Interface
See sections CPU Architecture and Bus Interface for more information regarding the CPU bus accesses.
|
2.6.2. Bus Gateway
The central bus gateway serves two purposes: redirect core accesses to the according modules (e.g. memory accesses
vs. memory-mapped IO accesses) and monitor all bus transactions. The redirection of access request is based on a
customizable memory map implemented via VHDL constants in the main package file (rtl/core/neorv323_package.vhd
):
-- Main Address Regions ---
constant mem_imem_base_c : std_ulogic_vector(31 downto 0) := x"00000000";
constant mem_dmem_base_c : std_ulogic_vector(31 downto 0) := x"80000000";
constant mem_xip_base_c : std_ulogic_vector(31 downto 0) := x"e0000000";
constant mem_xip_size_c : natural := 256*1024*1024;
constant mem_boot_base_c : std_ulogic_vector(31 downto 0) := x"ffffc000";
constant mem_boot_size_c : natural := 8*1024;
constant mem_io_base_c : std_ulogic_vector(31 downto 0) := x"ffffe000";
constant mem_io_size_c : natural := 8*1024;
Besides the delegation of bus requests the gateway also implements a bus monitor (aka "the bus keeper") that tracks all active bus transactions to ensure safe and deterministic operations.
Whenever a memory-mapped device is accessed (a real memory, a memory-mapped IO or some processor-external module) the bus
monitor starts an internal timer. The accessed module has to respond ("ACK") to the bus request within a specific
time window. This time window is defined by a global constant in the processor’s VHDL package file
(rtl/core/neorv323_package.vhd
).
constant bus_timeout_c : natural := 15;
This constant defines the maximum number of cycles after which a non-responding bus request (i.e. no ack
and no err
signal) will time out raising a bus access fault exception. For example this can happen when accessing
"address space holes" - addresses that are not mapped to any physical module. The resulting exception type corresponds
to the according access type, i.e. instruction fetch access exception, load access exception or store access exception.
XIP Timeout
Accesses to the memory-mapped XIP flash (via the Execute In Place Module (XIP)) will never time out.
|
External Bus Interface Timeout
Accesses that are delegated to the external bus interface have a different maximum timeout value that is defined by an
explicit specific processor generic. See section Processor-External Bus Interface (XBUS) for more information.
|
2.6.3. Reservation Set Controller
The reservation set controller is responsible for handling the load-reservate and store-conditional bus transaction that
are triggered by the lr.w
(LR) and sc.w
(SC) instructions from the CPU’s Zalrsc
ISA Extension.
A "reservation" defines an address or address range that provides a guarding mechanism to support atomic accesses. A new
reservation is registered by the LR instruction. The address provided by this instruction defines the memory location
that is now monitored for atomic accesses. The according SC instruction evaluates the state of this reservation. If
the reservation is still valid the write access triggered by the SC instruction is finally executed and the instruction
return a "success" state (rd
= 0). If the reservation has been invalidated the SC instruction will not write to memory
and will return a "failed" state (rd
= 1).
Reservation Set(s) and Granule
The reservation set controller supports only a single global reservation set with a word-aligned 4-byte granule.
|
The reservation is invalidated if…
-
an SC instruction is executed that accesses an address outside of the reservation set of the previous LR instruction. This SC instruction will fail (not writing to memory).
-
an SC instruction is executed that accesses an address inside of the reservation set of the previous LR instruction. This SC instruction will succeed (finally writing to memory).
-
a normal store operation accesses an address inside of the current reservation set (by the CPU or by the DMA).
-
a hardware reset is triggered.
Consecutive LR Instructions
If an LR instruction is followed by another LR instruction the reservation set of the former one is overridden
by the reservation set of the latter one.
|
Bus Access Errors
If the LR operation causes a bus access error (raising a load access exception) the reservation is registered anyway.
If the SC operation causes a bus access error (raising a store access exception) an already registered reservation set
is invalidated anyway.
|
Strong Semantic
The LR/SC mechanism follows the strong semantic approach: the LR/SC instruction pair fails only if there is a write
access to the referenced memory location between the LR and SC instructions (by the CPU itself or by the DMA).
Context changes, interrupts, traps, etc. do not effect nor invalidate the reservation state at all.
|
Physical Memory Attributes
The reservation set can be set for any address (only constrained by the configured granularity). This also
includes cached memory, memory-mapped IO devices and processor-external address spaces.
|
Bus transactions triggered by the LR instruction register a new reservation set and are delegated to the adressed memory/device. Bus transactions triggered by the SC remove a reservation set and are forwarded to the adressed memory/device only if the SC operations succeeds. Otherwise, the access request is not forwarded and a local ACK is generated to terminate the bus transaction.
LR/SC Bus Protocol
More information regarding the LR/SC bus transactions and the the according protocol can be found in section
Bus Interface / Atomic Accesses.
|
Cache Coherency
Atomic operations always bypass the cache using direct/uncached accesses. Care must be taken
to maintain data cache coherency (e.g. by using the fence instruction).
|
2.6.4. IO Switch
The IO switch further decodes the address when accessing the processor-internal IO/peripheral devices and forwards
the access request to the according module. Note that a total address space size of 256 bytes is assigned to each
IO module in order to simplify address decoding. The IO-specific address map is also defined in the main VHDL
package file (rtl/core/neorv323_package.vhd
).
-- IO Address Map --
constant iodev_size_c : natural := 256; -- size of a single IO device (bytes)
constant base_io_cfs_c : std_ulogic_vector(31 downto 0) := x"ffffeb00";
constant base_io_slink_c : std_ulogic_vector(31 downto 0) := x"ffffec00";
constant base_io_dma_c : std_ulogic_vector(31 downto 0) := x"ffffed00";
IO Access Latency
In order to shorten the critical path of the IO system, the IO switch contain a partial register stage that
buffers the address bus. Hence, accesses to the processor-internal IO region requires an additional clock cycle
to complete.
|
2.7. Boot Configuration
The NEORV32 processor provides some pre-defined boot configurations to adjust system start-up to
the requirements of the application. The actual boot configuration is defined by the BOOT_MODE_SELECT
generic (see Processor Top Entity - Generics).
BOOT_MODE_SELECT |
Name | Boot address | Description |
---|---|---|---|
0 (default) |
Bootloader |
Base of internal BOOTROM |
Implement the processor-internal Bootloader ROM (BOOTROM) as pre-initialized ROM and boot from there. |
1 |
Custom Address |
|
Start booting at user-defined address ( |
2 |
IMEM Image |
Base of internal IMEM |
Implement the processor-internal Instruction Memory (IMEM) as pre-initialized ROM and boot from there. |
2.7.1. Booting via Bootloader
This is the most common and thus, the default boot configuration. When selected, the processor-internal
Bootloader ROM (BOOTROM) is enabled. This ROM contains the executable image (rtl/core/neorv32_bootloader_image.vhd
)
of the default NEORV32 Bootloader that will be executed right after reset. The bootloader provides an interactive
user console for executable upload as well as an automatic boot-configuration targeting external (SPI) memories.
If the processor-internal Instruction Memory (IMEM) is enabled it will be implemented as blank RAM.
2.7.2. Boot from Custom Address
This is the most flexible boot configuration as it allows the user to specify a custom boot address via the
BOOT_ADDR_CUSTOM
generic. Note that this address has to be aligned to 4-byte boundary. The processor will
start executing from the defined address right after reset. For example, this boot configuration ca be used to
execute a custom bootloader from a memory that is attached via the Processor-External Bus Interface (XBUS).
The Bootloader ROM (BOOTROM) is not enabled / implement at all. If the processor-internal Instruction Memory (IMEM) is enabled it will be implemented as blank RAM.
2.7.3. Boot IMEM Image
This configuration will implement the Instruction Memory (IMEM) as pre-initialized read-only memory (ROM).
The ROM is initialized during synthesis with the according application image file (rtl/core/neorv32_application_image.vhd
).
After reset, the CPU will directly start executing this image. Since the IMEM is implemented as ROM, the executable cannot
be altered at runtime at all.
The Bootloader ROM (BOOTROM) is not enabled / implement at all.
Internal IMEM is Required
This boot configuration requires the IMEM to be enabled (MEM_INT_IMEM_EN = true).
|
Simulation Setup
This boot configuration is handy for simulations as the application software is executed right away without the
need for an explicit initialization / executable upload.
|
2.8. Processor-Internal Modules
Privileged IO Access Only
Only privileged accesses (M-mode) to the IO/peripheral modules are allowed. If an unprivileged application
tries to access this address space a bus access error exception is raised.
|
Full-Word Write Accesses Only
All peripheral/IO devices should only be written in full-word mode (i.e. 32-bit). Byte or half-word (8/16-bit) write accesses
might cause undefined behavior.
|
Writing to Read-Only Registers
Unless otherwise specified, writing to registers that are listed as read-only does not trigger an exception.
The write access is simply ignored by the corresponding hardware module.
|
IO Module’s Address Space
Each peripheral/IO module occupies an address space of 256 bytes (64 words). Most devices do not fully utilize this address
space and will simply mirror the available interface registers across the entire 256 bytes of address space.
|
Unimplemented Modules / Address Holes
When accessing an IO device that hast not been implemented (disabled via the according generic)
or when accessing an address that is actually unused, a load/store access fault exception is raised.
|
Module Interrupts
Several peripheral/IO devices provide some kind of interrupt. These interrupts are mapped to the CPU’s
Custom Fast Interrupt Request Lines. See section Processor Interrupts for more information.
|
CMSIS System Description View (SVD)
A CMSIS-compatible System View Description (SVD) file including all peripherals is available in sw/svd .
|
2.8.1. Instruction Memory (IMEM)
Hardware source files: |
neorv32_imem.vhd |
default platform-agnostic instruction memory (RAM or ROM) |
neorv32_application_image.vhd |
initialization image (a VHDL package) |
|
Software driver files: |
none |
implicitly used |
Top entity ports: |
none |
|
Configuration generics: |
|
implement processor-internal IMEM when |
|
IMEM size in bytes (use a power of 2) |
|
|
implement IMEM as ROM when |
|
CPU interrupts: |
none |
|
Access restrictions: |
none / read-only if |
Overview
Implementation of the processor-internal instruction memory is enabled by the processor’s
MEM_INT_IMEM_EN
generic. The total memory size in bytes is defined via the MEM_INT_IMEM_SIZE
generic.
Note that this size should be a power of two to optimize physical implementation. If enabled,
the IMEM is mapped to base address 0x00000000
(see section Address Space).
By default the IMEM is implemented as true RAM so the content can be modified during run time. This is required when using the Bootloader (or the [_on_chip_debugger]) so it can update the content of the IMEM at any time.
Alternatively, the IMEM can be implemented as pre-initialized read-only memory (ROM), so the processor can
directly boot from it after reset. This option is configured via the BOOT_MODE_SELECT
generic. See section
Boot Configuration for more information. The initialization image is embedded into the bitstream during synthesis.
The software framework provides an option to generate and override the default VHDL initialization file
rtl/core/neorv32_application_image.vhd
, which is automatically inserted into the IMEM (see Makefile Targets.
If the IMEM is implemented as RAM (default), the memory block will not be initialized at all.
Platform-Specific Memory Primitives
If required, the default IMEM can be replaced by a platform-/technology-specific primitive to
optimize area utilization, timing and power consumption.
|
Memory Size
If the configured memory size (via the MEM_INT_IMEM_SIZE generic) is not a power of two the actual memory
size will be auto-adjusted to the next power of two (e.g. configuring a memory size of 60kB will result in a
physical memory size of 64kB).
|
Legacy HDL Style
If synthesis fails to infer block RAM for the IMEM, turn on the alt_style_c option inside
the memory’s VHDL source file. When enabled, a different HDL style is used to describe the memory core.
|
Read-Only Access
If the IMEM is implemented as ROM any write attempt to it will raise a store access fault exception.
|
2.8.2. Data Memory (DMEM)
Hardware source files: |
neorv32_dmem.vhd |
default platform-agnostic data memory |
Software driver files: |
none |
implicitly used |
Top entity ports: |
none |
|
Configuration generics: |
|
implement processor-internal DMEM when |
|
DMEM size in bytes (use a power of 2) |
|
CPU interrupts: |
none |
|
Access restrictions: |
none |
Overview
Implementation of the processor-internal data memory is enabled by the processor’s MEM_INT_DMEM_EN
generic. The total memory size in bytes is defined via the MEM_INT_DMEM_SIZE
generic. Note that this
size should be a power of two to optimize physical implementation. If the DMEM is implemented,
it is mapped to base address 0x80000000
by default (see section Address Space).
The DMEM is always implemented as true RAM.
Platform-Specific Memory Primitives
If required, the default DMEM can be replaced by a platform-/technology-specific primitive to
optimize area utilization, timing and power consumption.
|
Memory Size
If the configured memory size (via the MEM_INT_DMEM_SIZE generic) is not a power of two the actual memory
size will be auto-adjusted to the next power of two (e.g. configuring a memory size of 60kB will result in a
physical memory size of 64kB).
|
Legacy HDL Style
If synthesis fails to infer block RAM for the DMEM, turn on the alt_style_c option inside
the memory’s VHDL source file. When enabled, a different HDL style is used to describe the memory core.
|
Execute from RAM
The CPU is capable of executing code also from arbitrary data memory.
|
2.8.3. Bootloader ROM (BOOTROM)
Hardware source files: |
neorv32_boot_rom.vhd |
default platform-agnostic bootloader ROM |
neorv32_bootloader_image.vhd |
initialization image (a VHDL package) |
|
Software driver files: |
none |
implicitly used |
Top entity ports: |
none |
|
Configuration generics: |
|
implement BOOTROM when |
CPU interrupts: |
none |
|
Access restrictions: |
privileged access only, read-only |
Overview
The boot ROM contains the executable image of the default NEORV32 Bootloader. When the
Boot Configuration is set to bootloader mode (0) via the BOOT_MODE_SELECT
generic, the
boot ROM is automatically enabled and the CPU boot address is automatically adjusted to the
base address of the boot ROM.
Bootloader Image
The boot ROM is initialized during synthesis with the default bootloader image
(rtl/core/neorv32_bootloader_image.vhd ). Note that the BOOTROM size is constrained to 4kB.
|
2.8.4. Processor-Internal Instruction Cache (iCACHE)
Hardware source files: |
neorv32_cache.vhd |
Generic cache module |
Software driver files: |
none |
implicitly used |
Top entity ports: |
none |
|
Configuration generics: |
|
implement processor-internal instruction cache when |
|
number of cache blocks (pages/lines) |
|
|
size of a cache block in bytes |
|
CPU interrupts: |
none |
|
Access restrictions: |
none |
Overview
The processor features an optional instruction cache to improve performance when using memories with high access latencies. The cache is connected directly to the CPU’s instruction fetch interface and provides full-transparent accesses. The cache is direct-mapped and read-only.
Cached/Uncached Accesses
The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the
processor-internal IO/peripheral modules). All accesses that target the address range from 0xF0000000 to 0xFFFFFFFF
will not be cached at all (see section Address Space). Direct/uncached accesses have lower priority than
cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
progress / data coherency. Furthermore, atomic load-reservate and store-conditional instructions (Zalrsc ISA Extension)
will always bypass the cache.
|
Caching Internal Memories
The data cache is intended to accelerate data access to processor-external memories.
The CPU cache(s) should not be implemented when using only processor-internal data and instruction memories.
|
Manual Cache Clear/Reload
By executing the fence(.i) instruction the cache is cleared and a reload from main memory is triggered.
|
Retrieve Cache Configuration from Software
Software can retrieve the cache configuration/layout from the SYSINFO - Cache Configuration register.
|
Bus Access Fault Handling
The cache always loads a complete cache block (aligned to the block size) every time a
cache miss is detected. Each cached word from this block provides a single status bit that indicates if the
according bus access was successful or caused a bus error. Hence, the whole cache block remains valid even
if certain addresses inside caused a bus error. If the CPU accesses any of the faulty cache words, an
instruction bus error exception is raised.
|
2.8.5. Processor-Internal Data Cache (dCACHE)
Hardware source files: |
neorv32_cache.vhd |
Generic cache module |
Software driver files: |
none |
implicitly used |
Top entity ports: |
none |
|
Configuration generics: |
|
implement processor-internal data cache when |
|
number of cache blocks (pages/lines) |
|
|
size of a cache block in bytes |
|
CPU interrupts: |
none |
|
Access restrictions: |
none |
Overview
The processor features an optional data cache to improve performance when using memories with high access latencies. The cache is connected directly to the CPU’s data access interface and provides full-transparent accesses. The cache is direct-mapped and uses "write-allocate" and "write-back" strategies.
Cached/Uncached Accesses
The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the
processor-internal IO/peripheral modules). All accesses that target the address range from 0xF0000000 to 0xFFFFFFFF
will not be cached at all (see section Address Space). Direct/uncached accesses have lower priority than
cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
progress / data coherency. Furthermore, atomic load-reservate and store-conditional instructions (Zalrsc ISA Extension)
will always bypass the cache.
|
Caching Internal Memories
The data cache is intended to accelerate data access to processor-external memories.
The CPU cache(s) should not be implemented when using only processor-internal data and instruction memories.
|
Manual Cache Flush/Clear/Reload
By executing the fence(.i) instruction the cache is flushed, cleared and a reload from main memory is triggered.
|
Retrieve Cache Configuration from Software
Software can retrieve the cache configuration/layout from the SYSINFO - Cache Configuration register.
|
Bus Access Fault Handling
The cache always loads a complete cache block (aligned to the block size) every time a
cache miss is detected. Each cached word from this block provides a single status bit that indicates if the
according bus access was successful or caused a bus error. Hence, the whole cache block remains valid even
if certain addresses inside caused a bus error. If the CPU accesses any of the faulty cache words, a
data bus error exception is raised.
|
2.8.6. Direct Memory Access Controller (DMA)
Hardware source files: |
neorv32_dma.vhd |
|
Software driver files: |
neorv32_dma.c |
|
neorv32_dma.h |
||
Top entity ports: |
none |
|
Configuration generics: |
|
implement DMA when |
CPU interrupts: |
fast IRQ channel 10 |
DMA transfer done (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The NEORV32 DMA provides a small-scale scatter/gather direct memory access controller that allows to transfer and modify data independently of the CPU. A single read/write transfer channel is implemented that is configured via memory-mapped registers. a configured transfer can either be triggered manually or by a programmable CPU FIRQ interrupt (see NEORV32-Specific Fast Interrupt Requests).
The DMA is connected to the central processor-internal bus system (see section Address Space) and can access the same address space as the CPU core. It uses interleaving mode accessing the central processor bus only if the CPU does not currently request and bus access.
The controller can handle different data quantities (e.g. read bytes and write them back as sign-extend words) and can also change the Endianness of data while transferring.
DMA Demo Program
A DMA example program can be found in sw/example/demo_dma .
|
Theory of Operation
The DMA provides four memory-mapped interface registers: A status and control register CTRL
and three registers for
configuring the actual DMA transfer. The base address of the source data is programmed via the SRC_BASE
register.
Vice versa, the base address of the destination data is programmed via the DST_BASE
. The third configuration register
TTYPE
is use to configure the actual transfer type and the number of elements to transfer.
The DMA is enabled by setting the DMA_CTRL_EN
bit of the control register. Manual trigger mode (i.e. the DMA transfer is
triggered by writing to the TTYPE
register) is selected if DMA_CTRL_AUTO
is cleared. Alternatively, the DMA transfer can
be triggered by a processor internal FIRQ signal if DMA_CTRL_AUTO
is set (see section below).
The DMA uses a load-modify-write data transfer process. Data is read from the bus system, internally modified and then written
back to the bus system. This combination is implemented as an atomic progress, so canceling the current transfer by clearing the
DMA_CTRL_EN
bit will stop the DMA right after the current load-modify-write operation.
If the DMA controller detects a bus error during operation, it will set either the DMA_CTRL_ERROR_RD
(error during
last read access) or DMA_CTRL_ERROR_WR
(error during last write access) and will terminate the current transfer.
Software can read the SRC_BASE
or DST_BASE
register to retrieve the address that caused the according error.
Alternatively, software can read back the NUM
bits of the control register to determine the index of the element
that caused the error. The error bits are automatically cleared when starting a new transfer.
When the DMA_CTRL_DONE
flag is set the DMA has actually executed a transfer. However, the DMA_CTRL_ERROR_*
flags
should also be checked to verify that the executed transfer completed without errors. The DMA_CTRL_DONE
flag is
automatically cleared when writing the CTRL
register.
DMA Access Privilege Level
Transactions performed by the DMA are executed as bus transactions with elevated machine-mode privilege level.
Note that any physical memory protection rules (Smpmp ISA Extension) are not applied to DMA transfers.
|
Transfer Configuration
If the DMA is set to manual trigger mode (DMA_CTRL_AUTO
= 0) writing the TTRIG
register will start the
programmed DMA transfer. Once started, the DMA will read one data quantity from the source address, processes it internally
and then will write it back to the destination address. The DMA_TTYPE_NUM
bits of the TTYPE
register define how many
times this process is repeated by specifying the number of elements to transfer.
Optionally, the source and/or destination addresses can be increments according to the data quantities
automatically by setting the according DMA_TTYPE_SRC_INC
and/or DMA_TTYPE_DST_INC
bit.
Four different transfer quantities are available, which are configured via the DMA_TTYPE_QSEL
bits:
-
00
: Read source data as byte, write destination data as byte -
01
: Read source data as byte, write destination data as zero-extended word -
10
: Read source data as byte, write destination data as sign-extended word -
11
: Read source data as word, write destination data as word
Optionally, the DMA controller can automatically convert Endianness of the transferred data if the DMA_TTYPE_ENDIAN
bit is set.
Address Alignment
Make sure to align the source and destination base addresses to the according transfer data quantities. For instance,
word-to-word transfers require that the two LSB of SRC_BASE and DST_BASE are cleared.
|
Writing to IO Device
When writing data to IO / peripheral devices (for example to the Cyclic Redundancy Check (CRC)) the destination
data quantity has to be set to word (32-bit) since all IO registers can only be written in full 32-bit word mode.
|
Automatic Trigger
As an alternative to the manual trigger mode, the DMA can be set to automatic trigger mode starting a pre-configured
transfer if a specific processor-internal peripheral issues a FIRQ interrupt request. The automatic trigger mode is enabled by
setting the CTRL
register’s DMA_CTRL_AUTO
bit. In this configuration no transfer is started when writing to the DMA’s
TTYPE
register.
The actually triggering FIRQ channel is configured via the control register’s DMA_CTRL_FIRQ_SEL
bits. Writing a 0 will
select FIRQ channel 0, writing a 1 will select FIRQ channel 1, and so on. See section Processor Interrupts
for a list of all FIRQ channels and their according sources.
The FIRQ trigger can operate in two trigger mode configured via the DMA_CTRL_FIRQ_TYPE
flag:
-
DMA_CTRL_FIRQ_TYPE = 0
: trigger the automatic DMA transfer on a rising-edge of the selected FIRQ channel (e.g. trigger DMA transfer only once) -
DMA_CTRL_FIRQ_TYPE = 1
: trigger the automatic DMA transfer when the selected FIRQ channel is active (e.g. trigger DMA transfer again and again)
FIRQ Trigger
The DMA transfer will start if a rising edge is detected on the configured FIRQ channel. Hence, the DMA is triggered only
once even if the selected FIRQ channel keeps pending.
|
Memory Barrier / Fence Operation
Optionally, the DMA can issue a FENCE request to the downstream memory system when a transfer has been completed
without errors. This can be used to re-sync caches (flush and reload) and buffers to maintain data coherency.
This automatic fencing is enabled by the setting the control register’s DMA_CTRL_FENCE
bit.
DMA Interrupt
The DMA features a single CPU interrupt that is triggered when the programmed transfer has completed. This
interrupt is also triggered if the DMA encounters a bus error during operation. The interrupt will remain pending
until the control register’s DMA_CTRL_DONE
is cleared (this will happen upon any write access to the control
register).
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
DMA module enable |
|
r/w |
Enable automatic mode (FIRQ-triggered) |
||
|
r/w |
Issue a downstream FENCE operation when DMA transfer completes (without errors) |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
Error during read access, clears when starting a new transfer |
||
|
r/- |
Error during write access, clears when starting a new transfer |
||
|
r/- |
DMA transfer in progress |
||
|
r/c |
Set if a transfer was executed; auto-clears on write-access |
||
|
r/- |
reserved, read as zero |
||
|
r/w |
Trigger on rising-edge ( |
||
|
r/w |
FIRQ trigger select (FIRQ0=0 … FIRQ15=15) |
||
|
r/- |
reserved, read as zero |
||
|
|
|
r/w |
Source base address (shows the last-accessed source address when read) |
|
|
|
r/w |
Destination base address (shows the last-accessed destination address when read) |
|
|
|
r/w |
Number of elements to transfer (shows the last-transferred element index when read) |
|
r/- |
reserved, read as zero |
||
|
r/w |
Quantity select ( |
||
|
r/w |
Constant ( |
||
|
r/w |
Constant ( |
||
|
r/w |
Swap Endianness when set |
2.8.7. Processor-External Bus Interface (XBUS)
Hardware source files: |
neorv32_xbus.vhd |
External bus gateway |
neorv32_cache.vhd |
Generic cache module |
|
Software driver files: |
none |
implicitly used |
Top entity ports: |
|
address output (32-bit) |
|
data output (32-bit) |
|
|
access tag (3-bit) |
|
|
write enable (1-bit) |
|
|
byte enable (4-bit) |
|
|
bus strobe (1-bit) |
|
|
valid cycle (1-bit) |
|
|
data input (32-bit) |
|
|
acknowledge (1-bit) |
|
|
bus error (1-bit) |
|
Configuration generics: |
|
enable external bus interface when |
|
number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled) |
|
|
implement XBUS register stages |
|
|
implement the external bus cache |
|
|
number of blocks ("lines"), has to be a power of two. |
|
|
size in bytes of each block, has to be a power of two. |
|
CPU interrupts: |
none |
|
Access restrictions: |
none |
Overview
The external bus interface provides a Wishbone b4-compatible on-chip bus interface that is
implemented if the XBUS_EN
generic is true
. This bus interface can be used to attach processor-external
modules like memories, custom hardware accelerators or additional peripheral devices.
An optional cache module ("XCACHE") can be enabled to improve memory access latency.
Address Mapping
The external interface is not mapped to a specific address space. Instead, all CPU memory accesses that
do not target a specific (and actually implemented) processor-internal address region (hence, accessing the "void";
see section Address Space) are redirected to the external bus interface.
|
AXI4-Lite Interface Bridge
A simple bridge that converts the processor’s XBUS into an AXI4-lite-compatible host interface can
be found in in rtl/system_inegration (xbus2axi4lite_bridge.vhd ).
|
AHB3-Lite Interface Bridge
A simple bridge that converts the processor’s XBUS into an AHB3-lite-compatible host interface can
be found in in rtl/system_inegration (xbus2ahblite_bridge.vhd ).
|
Wishbone Bus Protocol
The external bus interface complies to the pipelined Wishbone b4 protocol. Even though this protocol was explicitly designed to support pipelined transfers, only a single transfer will be "in fly" at once. Hence, just two types of bus transactions are generated by the XBUS controller (see images below).
Wishbone "Classic" Protocol
Native support for the "classic" Wishbone protocol has been deprecated.
However, classic mode can still be emulated by connecting the processor’s xbus_cyc_o directly to the
device’s / bus system’s cyc and stb signals (omitting the processor’s xbus_stb_o signal).
|
Endianness
Just like the processor itself the XBUS interface uses little-endian byte order.
|
Wishbone Specs.
A detailed description of the implemented Wishbone bus protocol and the according interface signals
can be found in the data sheet "Wishbone B4 - WISHBONE System-on-Chip (SoC) Interconnection
Architecture for Portable IP Cores". A copy of this document can be found in the docs folder of this
project.
|
An accessed XBUS/Wishbone device does not have to respond immediately to a bus request by sending an ACK
.
Instead, there is a time window where the device has to acknowledge the transfer. This time window
is configured by the XBUS_TIMEOUT
generic and it defines the maximum time (in clock cycles) a bus access can
be pending before it is automatically terminated raising an bus fault exception. If XBUS_TIMEOUT
is set to zero,
the timeout is disabled and a bus access can take an arbitrary number of cycles to complete. Note that this is not
recommended as a missing ACK will permanently stall the entire processor!
Furthermore, an accesses XBUS/Wishbone device can signal an error condition at any time by setting the ERR
signal
high for one cycle. This will also terminate the current bus transaction before raising a CPU bus fault exception.
Register Stage
An optional register stage can be added to the XBUS gateway to break up the critical path easing timing closure.
When XBUS_REGSTAGE_EN is true all outgoing and incoming XBUS signals are registered increasing access latency
by two cycles. Furthermore, all outgoing signals (like the address) will be kept stable if there is no bus access
being initiated.
|
Access Tag
The XBUS tag signal xbus_tag_o(0)
provides additional information about the current access cycle.
It compatible to the the AXI4 ARPROT
and AWPROT
signals.
-
xbus_tag_o(0)
P: access is performed from privileged mode (machine-mode) when set -
xbus_tag_o(1)
NS: this bit is hardwired to0
indicating a secure access -
xbus_tag_o(2)
I: access is an instruction fetch when set; access is a data access when cleared
External Bus Cache (X-CACHE)
The XBUS interface provides an optional internal cache that can be used to buffer processor-external accesses.
The x-cache is enabled via the XBUS_CACHE_EN
generic. The total size of the cache is split into the number of
cache lines or cache blocks (XBUS_CACHE_NUM_BLOCKS
generic) and the line or block size in bytes
(XBUS_CACHE_BLOCK_SIZE
generic).
Direct Access +----------+
/|------------------------->| Register |------------------------>|\
| | +----------+ | |
Core --->| | | |---> XBUS
| | +--------------+ +--------------+ +-------------+ | |
\|--->| Host Arbiter |--->| Cache Memory |<---| Bus Arbiter |--->|/
+--------------+ +--------------+ +-------------+
The cache uses a direct-mapped architecture that implements "write-allocate" and "write-back" strategies. The write-allocate strategy will fetch the entire referenced block from main memory when encountering a cache write-miss. The write-back strategy will gather all writes locally inside the cache until the according cache block is about to be replaced. In this case, the entire modified cache block is written back to main memory.
Cached/Uncached Accesses
The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO.
All accesses that target the address range from 0xF0000000 to 0xFFFFFFFF
will not be cached at all (see section Address Space). Direct/uncached accesses have lower priority than
cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
progress / data coherency. Furthermore, atomic load-reservate and store-conditional instructions (Zalrsc ISA Extension)
will always bypass the cache.
|
2.8.8. Stream Link Interface (SLINK)
Hardware source files: |
neorv32_slink.vhd |
|
Software driver files: |
neorv32_slink.c |
|
neorv32_slink.h |
||
Top entity ports: |
|
RX link data (32-bit) |
|
RX routing information (4-bit) |
|
|
RX link data valid (1-bit) |
|
|
RX link last element of stream (1-bit) |
|
|
RX link ready to receive (1-bit) |
|
|
TX link data (32-bit) |
|
|
TX routing information (4-bit) |
|
|
TX link data valid (1-bit) |
|
|
TX link last element of stream (1-bit) |
|
|
TX link allowed to send (1-bit) |
|
Configuration generics: |
|
implement SLINK when true |
|
RX FIFO depth (1..32k), has to be a power of two, min 1 |
|
|
TX FIFO depth (1..32k), has to be a power of two, min 1 |
|
CPU interrupts: |
fast IRQ channel 14 |
RX SLINK IRQ (see Processor Interrupts) |
fast IRQ channel 15 |
TX SLINK IRQ (see Processor Interrupts) |
|
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The stream link interface provides independent RX and TX channels for sending and receiving
stream data. Each channel features a configurable internal FIFO to buffer stream data
(IO_SLINK_RX_FIFO
for the RX FIFO, IO_SLINK_TX_FIFO
for the TX FIFO). The SLINK interface provides higher
bandwidth and less latency than the external bus interface making it ideally suited for coupling custom
stream processors or streaming peripherals.
Example Program
An example program for the SLINK module is available in sw/example/demo_slink .
|
Interface & Protocol
The SLINK interface consists of four signals for each channel:
-
dat
contains the actual data word -
val
marks the current transmission cycle as valid -
lst
marks the current transmission cycle as the last element of a stream -
rdy
indicates that the receiver is ready to receive -
src
anddst
provide source/destination routing information (optional)
AXI4-Stream Compatibility
The interface names (except for src and dst ) and the underlying protocol is compatible to the AXI4-Stream protocol standard.
A processor top entity with a AXI4-Stream-compatible interfaces can be found in rtl/system_inegration .
More information regarding this alternate top entity can be found in in the user guide:
https://stnolting.github.io/neorv32/ug/#_packaging_the_processor_as_vivado_ip_block
|
Theory of Operation
The SLINK provides four interface registers. The control register (CTRL
) is used to configure
the module and to check its status. Two individual data registers (DATA
and DATA_LAST
)
are used to send and receive the link’s actual data stream.
The DATA
register provides direct access to the RX/TX FIFO buffers. Read accesses return data from the RX FIFO.
After reading data from this register the control register’s SLINK_CTRL_RX_LAST
flag can be checked to determine
if the according data word has been marked as "end of stream" via the slink_rx_lst_i
signal (this signal is also
buffered by the link’s FIFO).
Writing to the DATA
register will immediately write to the TX link FIFO.
When writing to the TX_DATA_LAST
the according data word will also be marked as "end of stream" via the
slink_tx_lst_o
signal (this signal is also buffered by the link’s FIFO).
The configured FIFO sizes can be retrieved by software via the control register’s SLINK_CTRL_RX_FIFO_*
and
SLINK_CTRL_TX_FIFO_*
bits.
The SLINK is globally activated by setting the control register’s enable bit SLINK_CTRL_EN
. Clearing this bit will
reset all internal logic and will also clear both FIFOs. The FIFOs can also be cleared manually at any time by
setting the SLINK_CTRL_RX_CLR
and/or SLINK_CTRL_TX_CLR
bits (these bits will auto-clear).
FIFO Overflow
Writing to the TX channel’s FIFO while it is full will have no effect. Reading from the RX channel’s FIFO while it
is empty will also have no effect and will return the last received data word. There is no overflow indicator
implemented yet.
|
The current status of the RX and TX FIFOs can be determined via the control register’s SLINK_CTRL_RX_*
and
SLINK_CTRL_TX_*
flags.
Stream Routing Information
Both stream link interface provide an optional port for routing information: slink_tx_dst_o
(AXI stream’s TDEST
)
can be used to set a destination address when using a switch/interconnect to access several stream sinks. slink_rx_src_i
(AXI stream’s TID
) can be used to determine the source when several sources can send data via a switch/interconnect.
The routing information can be set/read via the ROUTE
interface registers. Note that all routing information is also
fully buffered by the internal RX/TX FIFOs. RX routing information has to be read after reading the according RX
data. Vice versa, TX routing information has to be set before writing the according TX data.
Interrupts
The SLINK module provides two independent interrupt channels: one for RX events and one for TX events.
The interrupt conditions are based on the according link’s FIFO status flags and are configured via the control
register’s SLINK_CTRL_IRQ_*
flags. The according interrupt will fire when the module is enabled (SLINK_CTRL_EN
)
and the selected interrupt conditions are met. Note that all enabled interrupt conditions are logically OR-ed per
channel. If any enable interrupt conditions becomes active the interrupt will become pending until the
interrupt-causing condition is resolved (e.g. by reading from the RX FIFO).
Register Map
Address | Name [C] | Bit(s) | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
SLINK global enable |
|
-/w |
Clear RX FIFO when set (bit auto-clears) |
||
|
-/w |
Clear TX FIFO when set (bit auto-clears) |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
Last word read from |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
RX FIFO empty |
||
|
r/- |
RX FIFO at least half full |
||
|
r/- |
RX FIFO full |
||
|
r/- |
TX FIFO empty |
||
|
r/- |
TX FIFO at least half full |
||
|
r/- |
TX FIFO full |
||
|
r/- |
reserved, read as zero |
||
|
r/w |
RX interrupt if RX FIFO not empty |
||
|
r/w |
RX interrupt if RX FIFO at least half full |
||
|
r/w |
RX interrupt if RX FIFO full |
||
|
r/w |
TX interrupt if TX FIFO empty |
||
|
r/w |
TX interrupt if TX FIFO not at least half full |
||
|
r/w |
TX interrupt if TX FIFO not full |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
log2(RX FIFO size) |
||
|
r/- |
log2(TX FIFO size) |
||
|
|
|
r/w |
TX destination routing information ( |
|
r/- |
RX source routing information ( |
||
|
-/- |
reserved |
||
|
|
|
r/w |
Write data to TX FIFO; read data from RX FIFO |
|
|
|
r/w |
Write data to TX FIFO (and also set "last" signal); read data from RX FIFO |
2.8.9. General Purpose Input and Output Port (GPIO)
Hardware source files: |
neorv32_gpio.vhd |
|
Software driver files: |
neorv32_gpio.c |
|
neorv32_gpio.h |
||
Top entity ports: |
|
64-bit parallel output port |
|
64-bit parallel input port |
|
Configuration generics: |
|
number of input/output pairs to implement (0..64) |
CPU interrupts: |
none |
|
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The general purpose parallel IO unit provides a simple parallel input and output port. These ports can be used chip-externally (for example to drive status LEDs, connect buttons, etc.) or chip-internally to provide control signals for other IP modules.
The actual number of input/output pairs is defined by the IO_GPIO_NUM
generic. When set to zero, the GPIO module
is excluded from synthesis and the output port gpio_o
is tied to all-zero. If IO_GPIO_NUM
is less than the
maximum value of 64, only the LSB-aligned bits in gpio_o
and gpio_i
are actually connected while the remaining
bits are tied to zero or are left unconnected, respectively.
Access Atomicity
The GPIO modules uses two memory-mapped registers (each 32-bit) each for accessing the input and
output signals. Since the CPU can only process 32-bit "at once" updating the entire output cannot
be performed within a single clock cycle.
|
Register Map
Address | Name [C] | Bit(s) | R/W | Function |
---|---|---|---|---|
|
|
31:0 |
r/- |
parallel input port pins 31:0 |
|
|
31:0 |
r/- |
parallel input port pins 63:32 |
|
|
31:0 |
r/w |
parallel output port pins 31:0 |
|
|
31:0 |
r/w |
parallel output port pins 63:32 |
2.8.10. Cyclic Redundancy Check (CRC)
Hardware source files: |
neorv32_crc.vhd |
|
Software driver files: |
neorv32_crc.c |
|
neorv32_crc.h |
||
Top entity ports: |
none |
|
Configuration generics: |
|
implement CRC module when |
CPU interrupts: |
none |
|
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The cyclic redundancy check unit provides a programmable checksum computation module. The unit operates on single bytes and can either compute CRC8, CRC16 or CRC32 checksums based on an arbitrary polynomial and start value.
CRC Demo Program
A CRC example program (also using CPU-independent DMA transfers) can be found in sw/example/crc_dma .
|
CPU-Independent Operation
The CRC unit can compute a checksum for an arbitrary memory array without any CPU overhead
by using the processor’s Direct Memory Access Controller (DMA).
|
Theory of Operation
The module provides four interface registers:
-
MODE
: selects either CRC8-, CRC16- or CRC32-mode -
POLY
: programmable polynomial -
DATA
: data input register (single bytes only) -
SREG
: the CRC shift register; this register is used to define the start value and to obtain the final processing result
The MODE
, POLY
and SREG
registers need to be programmed before the actual processing can be started.
Writing a byte to DATA
will update the current checksum in SREG
.
Access Latency
Write access to the CRC module have an increased latency of 8 clock cycles. This additional latency
ensures that the internal bit-serial processing of the current data byte has also been completed when the
transfer is completed.
|
Data Size
For CRC8-mode only bits 7:0 of POLY and SREG are relevant; for CRC16-mode only bits 15:0 are used
and for CRC32-mode the entire 32-bit of POLY and SREG are used.
|
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
CRC mode select ( |
|
r/- |
reserved, read as zero |
||
|
|
|
r/w |
CRC polynomial |
|
|
|
r/w |
data input (single byte) |
|
r/- |
reserved, read as zero, writes are ignored |
||
|
|
|
r/w |
current CRC shift register value (set start value on write) |
2.8.11. Watchdog Timer (WDT)
Hardware source files: |
neorv32_wdt.vhd |
|
Software driver files: |
neorv32_wdt.c |
|
neorv32_wdt.h |
||
Top entity ports: |
none |
|
Configuration generics: |
|
implement watchdog when |
CPU interrupts: |
none |
|
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The watchdog (WDT) provides a last resort for safety-critical applications. When a pre-programmed timeout value is reached a system-wide hardware reset is generated. The internal counter has to be reset explicitly by the application program every now and then to prevent a timeout.
Theory of Operation
The watchdog is enabled by setting the control register’s WDT_CTRL_EN
bit. When this bit is cleared, the internal
timeout counter is reset to zero and no system reset can be triggered by this module.
The internal 32-bit timeout counter is clocked at 1/4096th of the processor’s main clock (fWDT[Hz] = fmain[Hz] / 4096).
Whenever this counter reaches the programmed timeout value (WDT_CTRL_TIMEOUT
bits in the control register) a
hardware reset is triggered.
The watchdog’s timeout counter is reset ("feeding the watchdog") by writing the reset PASSWORD to the RESET
register.
The password is hardwired to hexadecimal 0x709D1AB3
.
Watchdog Operation during Debugging
By default, the watchdog stops operation when the CPU enters debug mode and will resume normal operation after
the CPU has left debug mode again. This will prevent an unintended watchdog timeout during a debug session. However,
the watchdog can also be configured to keep operating even when the CPU is in debug mode by setting the control
register’s WDT_CTRL_DBEN bit.
|
Watchdog Operation during CPU Sleep
By default, the watchdog stops operating when the CPU enters sleep mode. However, the watchdog can also be configured
to keep operating even when the CPU is in sleep mode by setting the control register’s WDT_CTRL_SEN bit.
|
Configuration Lock
The watchdog control register can be locked to protect the current configuration from being modified. The lock is
activated by setting the WDT_CTRL_LOCK
bit. In the locked state any write access to the control register is entirely
ignored (see table below, "writable if locked"). However, read accesses to the control register as well as watchdog resets
are further possible.
The lock bit can only be set if the WDT is already enabled (WDT_CTRL_EN
is set). Furthermore, the lock bit can
only be cleared again by a system-wide hardware reset.
Strict Mode
The strict operation mode provides additional safety functions. If the strict mode is enabled by the WDT_CTRL_STRICT
control register bit an immediate hardware reset if enforced if
-
the
RESET
register is written with an incorrect password or -
the
CTRL
register is written and theWDT_CTRL_LOCK
bit is set.
Cause of last Hardware Reset
The cause of the last system hardware reset can be determined via the WDT_CTRL_RCAUSE_*
bits:
-
WDT_RCAUSE_EXT
(0b00): Reset caused by external reset signal/pin -
WDT_RCAUSE_OCD
(0b01): Reset caused by on-chip debugger -
WDT_RCAUSE_TMO
(0b10): Reset caused by watchdog timeout -
WDT_RCAUSE_ACC
(0b11): Reset caused by illegal watchdog access (strict mode)
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Reset value | Writable if locked | Function |
---|---|---|---|---|---|---|
|
|
|
r/w |
|
no |
watchdog enable |
|
r/w |
|
no |
lock configuration when set, clears only on system reset, can only be set if enable bit is set already |
||
|
r/w |
|
no |
set to allow WDT to continue operation even when CPU is in debug mode |
||
|
r/w |
|
no |
set to allow WDT to continue operation even when CPU is in sleep mode |
||
|
r/w |
|
no |
set to enable strict mode (force hardware reset if reset password is incorrect or if write access to locked CTRL register) |
||
|
r/- |
|
- |
cause of last system reset; 0=external reset, 1=ocd-reset, 2=watchdog reset |
||
|
r/- |
- |
- |
reserved, reads as zero |
||
|
r/w |
0 |
no |
24-bit watchdog timeout value |
||
|
|
|
-/w |
- |
yes |
Write PASSWORD to reset WDT timeout counter |
2.8.12. Machine System Timer (MTIME)
Hardware source files: |
neorv32_mtime.vhd |
|
Software driver files: |
neorv32_mtime.c |
|
neorv32_mtime.h |
||
Top entity ports: |
|
RISC-V machine timer IRQ if internal one is not implemented |
|
Current system time ( |
|
Configuration generics: |
|
implement machine timer when |
CPU interrupts: |
|
machine timer interrupt (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The MTIME module implements a memory-mapped machine system timer that is compatible to the RISC-V
privileged specifications. The 64-bit system time is accessed via individual TIME_LO
and
TIME_HI
registers. A 64-bit time compare register, which is accessible via individual TIMECMP_LO
and TIMECMP_HI
registers, can be used to configure the CPU’s machine timer interrupt (MTI
)).
The interrupt is triggered whenever TIME
(high & low part) is greater than or equal to TIMECMP
(high & low part).
The interrupt remains active (=pending) until TIME
becomes less than TIMECMP
again (either by modifying
TIME
or TIMECMP
). The current system time is available for other SoC modules via the top’s mtime_time_o
signal.
Hardware Reset
After a hardware reset the TIME and TIMECMP register are reset to all-zero.
|
External MTIME Interrupt
If the internal MTIME module is disabled (IO_MTIME_EN = false ) the machine timer interrupt becomes available
as external signal. The mtime_irq_i signal is level-triggered and high-active. Once set the signal has to stay
high until the interrupt request is explicitly acknowledged (e.g. writing to a user-defined memory-mapped register).
|
Register Map
Address | Name [C] | Bits | R/W | Function |
---|---|---|---|---|
|
|
31:0 |
r/w |
system time, low word |
|
|
31:0 |
r/w |
system time, high word |
|
|
31:0 |
r/w |
time compare, low word |
|
|
31:0 |
r/w |
time compare, high word |
2.8.13. Primary Universal Asynchronous Receiver and Transmitter (UART0)
Hardware source files: |
neorv32_uart.vhd |
|
Software driver files: |
neorv32_uart.c |
|
neorv32_uart.h |
||
Top entity ports: |
|
serial transmitter output |
|
serial receiver input |
|
|
flow control: RX ready to receive, low-active |
|
|
flow control: RX ready to receive, low-active |
|
Configuration generics: |
|
implement UART0 when |
|
RX FIFO depth (power of 2, min 1) |
|
|
TX FIFO depth (power of 2, min 1) |
|
CPU interrupts: |
fast IRQ channel 2 |
RX interrupt |
fast IRQ channel 3 |
TX interrupt (see Processor Interrupts) |
|
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The NEORV32 UART provides a standard serial interface with independent transmitter and receiver channels, each
equipped with a configurable FIFO. The transmission frame is fixed to 8N1: 8 data bits, no parity bit, 1 stop
bit. The actual transmission rate (Baud rate) is programmable via software. The module features two memory-mapped
registers: CTRL
and DATA
. These are used for configuration, status check and data transfer.
Standard Console
All default example programs and software libraries of the NEORV32 software framework (including the bootloader
and the runtime environment) use the primary UART (UART0) as default user console interface. Furthermore, UART0
is used to implement the "standard consoles" (STDIN , STDOUT and STDERR ).
|
RX and TX FIFOs
The UART provides individual data FIFOs for RX and TX to allow data transmission without CPU intervention.
The sizes of these FIFOs can be configured via the according configuration generics (UART0_RX_FIFO
and UART0_TX_FIFO
).
Both FIFOs a re automatically cleared when disabling the module via the UART_CTRL_EN
flag. However, the FIFOs can
also be cleared individually by setting the UART_CTRL_RX_CLR
/ UART_CTRL_TX_CLR
flags.
Theory of Operation
The module is enabled by setting the UART_CTRL_EN
bit in the UART0 control register CTRL
. The Baud rate
is configured via a 10-bit UART_CTRL_BAUDx
baud divisor (baud_div
) and a 3-bit UART_CTRL_PRSCx
clock prescaler (clock_prescaler
).
UART_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
Baud rate = (fmain[Hz] / clock_prescaler
) / (baud_div
+ 1)
The control register’s UART_CTRL_RX_*
and UART_CTRL_TX_*
flags provide information about the RX and TX FIFO fill level.
Disabling the module via the UART_CTRL_EN
bit will also clear these FIFOs.
A new TX transmission is started by writing to the DATA
register. The
transfer is completed when the UART_CTRL_TX_BUSY
control register flag returns to zero. RX data is available when
the UART_CTRL_RX_NEMPTY
flag becomes set. The UART_CTRL_RX_OVER
will be set if the RX FIFO overflows. This flag
is cleared only by disabling the module via UART_CTRL_EN
.
UART Interrupts
The UART module provides independent interrupt channels for RX and TX. These interrupts are triggered by certain RX and TX
FIFO levels. The actual configuration is programmed independently for the RX and TX interrupt channel via the control register’s
UART_CTRL_IRQ_RX_*
and UART_CTRL_IRQ_TX_*
bits:
-
RX IRQ The RX interrupt can be triggered by three different RX FIFO level states: If
UART_CTRL_IRQ_RX_NEMPTY
is set the interrupt fires if the RX FIFO is not empty (e.g. when incoming data is available). IfUART_CTRL_IRQ_RX_HALF
is set the RX IRQ fires if the RX FIFO is at least half-full. IfUART_CTRL_IRQ_RX_FULL
the interrupt fires if the RX FIFO is full. Note that all these programmable conditions are logically OR-ed (interrupt fires if any enabled conditions is true). -
TX IRQ The TX interrupt can be triggered by two different TX FIFO level states: If
UART_CTRL_IRQ_TX_EMPTY
is set the interrupt fires if the TX FIFO is empty. IfUART_CTRL_IRQ_TX_NHALF
is set the interrupt fires if the TX FIFO is not at least half full. Note that all these programmable conditions are logically OR-ed (interrupt fires if any enabled conditions is true).
Once an UART interrupt has fired it remains pending until the actual cause of the interrupt is resolved; for
example if just the UART_CTRL_IRQ_RX_NEMPTY
bit is set, the RX interrupt will keep firing until the RX FIFO is empty again.
RX/TX FIFO Size
Software can retrieve the configured sizes of the RX and TX FIFO via the according UART_DATA_RX_FIFO_SIZE and
UART_DATA_TX_FIFO_SIZE bits from the DATA register.
|
RTS/CTS Hardware Flow Control
The NEORV32 UART supports optional hardware flow control using the standard CTS uart0_cts_i
("clear to send") and RTS
uart0_rts_o
("ready to send" / "ready to receive (RTR)") signals. Both signals are low-active.
Hardware flow control is enabled by setting the UART_CTRL_HWFC_EN
bit in the modules control register CTRL
.
When hardware flow control is enabled:
-
The UART’s transmitter will not start a new transmission until the
uart0_cts_i
signal goes low. During this time, the UART busy flagUART_CTRL_TX_BUSY
remains set. -
The UART will set
uart0_rts_o
signal low if the RX FIFO is less than half full (to have a wide safety margin). As long as this signal is low, the connected device can send new data.uart0_rts_o
is always low if the hardware flow-control is disabled. Disabling the UART (settingUART_CTRL_EN
low) while having hardware flow-control enabled, will setuart0_rts_o
high to signal that the UARt is not capable of receiving new data.
Note that RTS and CTS signaling can only be activated together. If the RTS handshake is not required the signal can be left unconnected. If the CTS handshake is not required it has to be tied to zero. |
Simulation Mode
The UART provides a simulation-only mode to dump console data as well as raw data directly to a file. When the simulation
mode is enabled (by setting the UART_CTRL_SIM_MODE
bit) there will be no physical transaction on the uart0_txd_o
signal.
Instead, all data written to the DATA
register is immediately dumped to a file. Data written to DATA[7:0]
will be dumped as
ASCII chars to a file named neorv32.uart0_sim_mode.out
. Additionally, the ASCII data is printed to the simulator console.
Both file are created in the simulation’s home folder.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
UART enable |
|
r/w |
enable simulation mode |
||
|
r/w |
enable RTS/CTS hardware flow-control |
||
|
r/w |
Baud rate clock prescaler select |
||
|
r/w |
12-bit Baud value configuration value |
||
|
r/- |
RX FIFO not empty |
||
|
r/- |
RX FIFO at least half-full |
||
|
r/- |
RX FIFO full |
||
|
r/- |
TX FIFO empty |
||
|
r/- |
TX FIFO not at least half-full |
||
|
r/- |
TX FIFO full |
||
|
r/w |
fire IRQ if RX FIFO not empty |
||
|
r/w |
fire IRQ if RX FIFO at least half-full |
||
|
r/w |
fire IRQ if RX FIFO full |
||
|
r/w |
fire IRQ if TX FIFO empty |
||
|
r/w |
fire IRQ if TX not at least half full |
||
|
r/- |
reserved read as zero |
||
|
r/w |
Clear RX FIFO, flag auto-clears |
||
|
r/w |
Clear TX FIFO, flag auto-clears |
||
|
r/- |
RX FIFO overflow; cleared by disabling the module |
||
|
r/- |
TX busy or TX FIFO not empty |
|
|
|
r/w |
receive/transmit data |
||
|
r/- |
log2(RX FIFO size) |
||
|
r/- |
log2(RX FIFO size) |
2.8.14. Secondary Universal Asynchronous Receiver and Transmitter (UART1)
Hardware source files: |
neorv32_uart.vhd |
|
Software driver files: |
neorv32_uart.c |
|
neorv32_uart.h |
||
Top entity ports: |
|
serial transmitter output |
|
serial receiver input |
|
|
flow control: RX ready to receive, low-active |
|
|
flow control: RX ready to receive, low-active |
|
Configuration generics: |
|
implement UART1 when |
|
RX FIFO depth (power of 2, min 1) |
|
|
TX FIFO depth (power of 2, min 1) |
|
CPU interrupts: |
fast IRQ channel 4 |
RX interrupt |
fast IRQ channel 5 |
TX interrupt (see Processor Interrupts) |
|
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The secondary UART (UART1) is functionally identical to the primary UART
(Primary Universal Asynchronous Receiver and Transmitter (UART0)). Obviously, UART1 uses different addresses for the
control register (CTRL
) and the data register (DATA
). The register’s bits/flags use the same bit positions and naming
as for the primary UART. The RX and TX interrupts of UART1 are mapped to different CPU fast interrupt (FIRQ) channels.
Simulation Mode
The secondary UART (UART1) provides the same simulation options as the primary UART (UART0). However, output data is
written to UART1-specific file neorv32.uart1_sim_mode.out
. This data is also printed to the simulator console.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
… |
… |
Same as UART0 |
|
|
… |
… |
Same as UART0 |
2.8.15. Serial Peripheral Interface Controller (SPI)
Hardware source files: |
neorv32_spi.vhd |
|
Software driver files: |
neorv32_spi.c |
|
neorv32_spi.h |
||
Top entity ports: |
|
1-bit serial clock output |
|
1-bit serial data output |
|
|
1-bit serial data input |
|
|
8-bit dedicated chip select output (low-active) |
|
Configuration generics: |
|
implement SPI controller when |
|
FIFO depth, has to be a power of two, min 1 |
|
CPU interrupts: |
fast IRQ channel 6 |
configurable SPI interrupt (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The NEORV32 SPI module is a host transceiver. Hence, it is responsible for generating transmission.
The module operates on a byte.wide data granularity, supports all 4 standard clock modes, a fine-tunable
SPI clock generator and provides up to 8 dedicated chip select signals via the top entity’s spi_csn_o
signal.
An optional receive/transmit ring-buffer/FIFO can be configured via the IO_SPI_FIFO
generic to support block-based
transmissions without CPU interaction.
Host-Mode Only
The NEORV32 SPI module only supports host mode. Transmission are initiated only by the processor’s SPI module
and not by an external SPI module. If you are looking for a device-mode serial peripheral interface (transactions
initiated by an external host) check out the Serial Data Interface Controller (SDI).
|
The SPI module provides a single control register CTRL
to configure the module and to check it’s status
and a single data register DATA
for receiving/transmitting data.
Theory of Operation
The SPI module is enabled by setting the SPI_CTRL_EN
bit in the CTRL
control register. No transfer can be initiated
and no interrupt request will be triggered if this bit is cleared. Clearing this bit will reset the entire module, clear
the FIFO and terminate any transfer being in process.
The actual SPI transfer (receiving one byte while sending one byte) as well as control of the chip-select lines is handled
via the module’s DATA
register. Note that this register will access the TX FIFO of the ring-buffer when writing and will
access the RX FIFO of the ring-buffer when reading.
The most significant bit of the DATA
register (SPI_DATA_CMD
) is used to select the purpose of the data being written.
When the SPI_DATA_CMD
is cleared, the lowest 8-bit represent the actual SPI TX data. This data will be transmitted by the
SPI bus engine. After completion, the received data is stored to the RX FIFO.
If SPI_DATA_CMD
is cleared, the lowest 4-bit control the chip-select lines. In this case, bis 2:0
select one of the eight
chip-select lines. The selected line will become enabled when bit 3
is also set. If bit 3
is cleared, all chip-select
lines will be disabled.
Examples:
-
Enable chip-select line 3:
NEORV32_SPI→DATA = (1 << SPI_DATA_CMD) | (1 << 3) | 3;
-
Enable chip-select line 7:
NEORV32_SPI→DATA = (1 << SPI_DATA_CMD) | (1 << 3) | 7;
-
Disable all chip-select lines:
NEORV32_SPI→DATA = (1 << SPI_DATA_CMD) | (0 << 3);
-
Send data byte
0xAB
:NEORV32_SPI→DATA = (0 << SPI_DATA_CMD) | 0xAB;
Since all SPI operations are controlled via the FIFO, entire SPI sequences (chip-enable, data transmissions, chip-disable) can be "programmed". Thus, SPI operations can be executed without any CPU interaction at all.
Application software can check if any chip-select is enabled by reading the control register’s SPI_CS_ACTIVE
flag.
SPI Clock Configuration
The SPI module supports all standard SPI clock modes (0, 1, 2, 3), which are configured via the two control register bits
SPI_CTRL_CPHA
and SPI_CTRL_CPOL
. The SPI_CTRL_CPHA
bit defines the clock phase and the SPI_CTRL_CPOL
bit defines the clock polarity.
The SPI clock frequency (spi_clk_o
) is programmed by the 3-bit SPI_CTRL_PRSCx
clock prescaler for a coarse clock selection
and a 4-bit clock divider SPI_CTRL_CDIVx
for a fine clock configuration.
The following clock prescalers (SPI_CTRL_PRSCx
) are available:
SPI_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
Based on the programmed clock configuration, the actual SPI clock frequency fSPI is derived from the processor’s main clock fmain according to the following equation:
fSPI = fmain[Hz] / (2 * clock_prescaler
* (1 + SPI_CTRL_CDIVx
))
Hence, the maximum SPI clock is fmain / 4 and the lowest SPI clock is fmain / 131072. The SPI clock is always symmetric having a duty cycle of exactly 50%.
High-Speed Mode
The SPI provides a high-speed mode to further boost the maximum SPI clock frequency. When enabled via the control
register’s SPI_CTRL_HIGHSPEED
bit the clock prescaler configuration (SPI_CTRL_PRSCx
bits) is overridden setting it
to a minimal factor of 1. However, the clock speed can still be fine-tuned using the SPI_CTRL_CDIVx
bits.
fSPI = fmain[Hz] / (2 * 1 * (1 + SPI_CTRL_CDIVx
))
Hence, the maximum SPI clock is fmain / 2 when in high-speed mode.
SPI Interrupt
The SPI module provides a set of programmable interrupt conditions based on the level of the RX/TX FIFO. The different
interrupt sources are enabled by setting the according control register’s SPI_CTRL_IRQ_*
bits. All enabled interrupt
conditions are logically OR-ed, so any enabled interrupt source will trigger the module’s interrupt signal.
Once the SPI interrupt has fired it remains pending until the actual cause of the interrupt is resolved; for
example if just the SPI_CTRL_IRQ_RX_AVAIL
bit is set, the interrupt will keep firing until the RX FIFO is empty again.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
SPI module enable |
|
r/w |
clock phase |
||
|
r/w |
clock polarity |
||
|
r/w |
3-bit clock prescaler select |
||
|
r/w |
4-bit clock divider for fine-tuning |
||
|
r/w |
high-speed mode enable (overriding |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
RX FIFO data available (RX FIFO not empty) |
||
|
r/- |
TX FIFO empty |
||
|
r/- |
TX FIFO not at least half full |
||
|
r/- |
TX FIFO full |
||
|
r/w |
Trigger IRQ if RX FIFO not empty |
||
|
r/w |
Trigger IRQ if TX FIFO empty |
||
|
r/w |
Trigger IRQ if TX FIFO not at least half full |
||
|
r/w |
Trigger IRQ if TX FIFO is empty and SPI bus engine is idle |
||
|
r/- |
FIFO depth; log2( |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
Set if any chip-select line is active |
||
|
r/- |
SPI module busy when set (serial engine operation in progress and TX FIFO not empty yet) |
||
|
|
|
r/w |
receive/transmit data (FIFO) |
|
r/- |
reserved, read as zero |
||
|
-/w |
data ( |
2.8.16. Serial Data Interface Controller (SDI)
Hardware source files: |
neorv32_sdi.vhd |
|
Software driver files: |
neorv32_sdi.c |
|
neorv32_sdi.h |
||
Top entity ports: |
|
1-bit serial clock input |
|
1-bit serial data output |
|
|
1-bit serial data input |
|
|
1-bit chip-select input (low-active) |
|
Configuration generics: |
|
implement SDI controller when |
|
data FIFO size, has to a power of two, min 1 |
|
CPU interrupts: |
fast IRQ channel 11 |
configurable SDI interrupt (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The serial data interface module provides a device-class SPI interface and allows to connect the processor
to an external SPI host, which is responsible of performing the actual transmission - the SDI is entirely
passive. An optional receive/transmit ring buffer (FIFOs) can be configured via the IO_SDI_FIFO
generic to
support block-based transmissions without CPU interaction.
Device-Mode Only
The NEORV32 SDI module only supports device mode. Transmission are initiated by an external host and not by the
the processor itself. If you are looking for a host-mode serial peripheral interface (transactions
performed by the NEORV32) check out the Serial Peripheral Interface Controller (SPI).
|
The SDI module provides a single control register CTRL
to configure the module and to check it’s status
and a single data register DATA
for receiving/transmitting data. Any access to the DATA
register
actually accesses the internal ring buffer.
Theory of Operation
The SDI module is enabled by setting the SDI_CTRL_EN
bit in the CTRL
control register. Clearing this bit
resets the entire module and will also clear the entire RX/TX ring buffer.
The SDI operates on byte-level only. Data written to the DATA
register will be pushed to the TX FIFO. Received
data can be retrieved by reading the RX FIFO via the DATA
register. The current state of these FIFOs is available
via the control register’s SDI_CTRL_RX_*
and SDI_CTRL_TX_*
flags. If no data is available in the TX FIFO while
an external device performs a transmission the external device will read all-zero from the SDI controller.
Application software can check the current state of the SDI chip-select input via the control register’s
SDI_CTRL_CS_ACTIVE
flag (the flag is set when the chip-select line is active (pulled low)).
MSB-first Only
The NEORV32 SDI module only supports MSB-first mode.
|
In-Transmission Abort
If the external SPI controller aborts the transmission by setting the chip-select signal high again before
8 data bits have been transferred, no data is written to the RX FIFO.
|
SDI Clocking
The SDI module supports both SPI clock polarity modes ("CPOL") but only "CPHA=0"-clock-phase is officially supported yet. However, experiments have shown that the SDI module can also deal with both clock phase modes (for slow SDI clock speeds).
All SDI operations are clocked by the external sdi_clk_i
signal. This signal is synchronized to the processor’s
clock domain to simplify timing behavior. This clock synchronization requires the external SDI clock
(sdi_clk_i
) does not exceed 1/4 of the processor’s main clock.
SDI Interrupt
The SDI module provides a set of programmable interrupt conditions based on the level of the RX & TX FIFOs. The different
interrupt sources are enabled by setting the according control register’s SDI_CTRL_IRQ_*
bits. All enabled interrupt
conditions are logically OR-ed so any enabled interrupt source will trigger the module’s interrupt signal.
Once the SDI interrupt has fired it will remain active until the actual cause of the interrupt is resolved; for
example if just the SDI_CTRL_IRQ_RX_AVAIL
bit is set, the interrupt will keep firing until the RX FIFO is empty again.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
SDI module enable |
|
r/- |
reserved, read as zero |
||
|
r/- |
FIFO depth; log2(IO_SDI_FIFO) |
||
|
r/- |
reserved, read as zero |
||
|
r/w |
fire interrupt if RX FIFO is not empty |
||
|
r/w |
fire interrupt if RX FIFO is at least half full |
||
|
r/w |
fire interrupt if if RX FIFO is full |
||
|
r/w |
fire interrupt if TX FIFO is empty |
||
|
r/w |
fire interrupt if TX FIFO is not at least half full |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
RX FIFO data available (RX FIFO not empty) |
||
|
r/- |
RX FIFO at least half full |
||
|
r/- |
RX FIFO full |
||
|
r/- |
TX FIFO empty |
||
|
r/- |
TX FIFO not at least half full |
||
|
r/- |
TX FIFO full |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
Chip-select is active when set |
||
|
|
|
r/w |
receive/transmit data (FIFO) |
|
r/- |
reserved, read as zero |
2.8.17. Two-Wire Serial Interface Controller (TWI)
Hardware source files: |
neorv32_twi.vhd |
|
Software driver files: |
neorv32_twi.c |
|
neorv32_twi.h |
||
Top entity ports: |
|
1-bit serial data line sense input |
|
1-bit serial data line output (pull low only) |
|
|
1-bit serial clock line sense input |
|
|
1-bit serial clock line output (pull low only) |
|
Configuration generics: |
|
implement TWI controller when |
|
FIFO depth, has to be a power of two, min 1 |
|
CPU interrupts: |
fast IRQ channel 7 |
FIFO empty and module idle interrupt (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The NEORV32 TWI implements an I2C-compatible host controller to communicate with arbitrary I2C-devices. Note that peripheral-mode (controller acts as a device) and multi-controller mode are not supported yet.
The TWI controller provides two memory-mapped registers that are used for configuring the module and
for triggering operation: CTRL
is the control and status register, DCMD
is the command and data register.
Key features:
-
Programmable clock speed
-
Optional clock stretching
-
Generate START / repeated-START and STOP conditions
-
Sending & receiving 8 data bits including ACK/NACK
-
Generating a host-ACK (ACK send by the TWI controller)
-
Configurable data/command FIFO to "program" large TWI sequences without further involvement of the CPU
Tristate Drivers
The TWI module requires two tristate drivers (actually: open-drain drivers; signals can only be actively driven low) for
the SDA and SCL lines, which have to be implemented by the user in the setup’s top module / IO ring. A generic VHDL example
is shown below (here, sda_io
and scl_io
are the actual TWI bus lines, which are of type std_logic
).
sda_io <= '0' when (twi_sda_o = '0') else 'Z'; -- drive
scl_io <= '0' when (twi_scl_o = '0') else 'Z'; -- drive
twi_sda_i <= std_ulogic(sda_io); -- sense
twi_scl_i <= std_ulogic(scl_io); -- sense
TWI Clock Speed
The TWI clock frequency is programmed by two bit-fields in the device’s control register CTRL
: a 3-bit TWI_CTRL_PRSCx
clock prescaler is sued for a coarse clock configuration and a 4-bit clock divider TWI_CTRL_CDIVx
is used for a fine
clock configuration.
TWI_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
Based on the clock configuration, the actual TWI clock frequency fSCL is derived from the processor’s main clock fmain according to the following equation:
fSCL = fmain[Hz] / (4 * clock_prescaler
* (1 + TWI_CTRL_CDIV))
Hence, the maximum TWI clock is fmain / 8 and the lowest TWI clock is fmain / 262144. The generated TWI clock is always symmetric having a duty cycle of exactly 50%.
Clock Stretching
An accessed peripheral can slow down/halt the controller’s bus clock by using clock stretching (= actively keeping the
SCL line low). The controller will halt operation in this case. Clock stretching is enabled by setting the
TWI_CTRL_CLKSTR bit in the module’s control register CTRL .
|
TWI Transfers
The TWI is enabled via the TWI_CTRL_EN
bit in the CTRL
control register. All TWI operations are controlled by
the DCMD
register. The actual operation is selected by a 2-bit value that is written to the register’s TWI_DCMD_CMD_*
bit-field:
-
00
: NOP (no-operation); all further bit-fields inDCMD
are ignored -
01
: Generate a (repeated) START conditions; all further bit-fields inDCMD
are ignored -
10
: Generate a STOP conditions; all further bit-fields inDCMD
are ignored -
11
: Trigger a data transmission; the data to be send has to be written to the register’sTWI_DCMD_MSB : TWI_DCMD_LSB
bit-field; ifTWI_DCMD_ACK
is set the controller will send a host-ACK in the ACK/NACK time slot; after the transmission is completedTWI_DCMD_MSB : TWI_DCMD_LSB
contains the RX data andTWI_DCMD_ACK
the device’s response if no host-ACK was configured (0
= ACK,1
= ACK)
All operations/data written to the DCMD
register are buffered by a configurable data/command FIFO. The depth of the FIFO is
configured by the IO_TWI_FIFO
top generic. Software can retrieve this size by reading the control register’s TWI_CTRL_FIFO
bits.
The command/data FIFO is internally split into a TX FIFO and a RX FIFO. Writing to DCMD
will write to the TX FIFO while reading from
DCMD
will read from the RX FIFO. The TX FIFO is full when the TWI_CTRL_TX_FULL
flag is set. Accordingly, the RX FIFO contains valid
data when the TWI_CTRL_RX_AVAIL
flag is set.
The control register’s busy flag TWI_CTRL_BUSY
is set as long as the TX FIFO contains valid data (i.e. programmed TWI operations
that have not been executed yet) or of the TWI bus engine is still processing an operation.
An active transmission can be terminated at any time by disabling the TWI module. This will also clear the data/command FIFO. |
When reading data from a device, an all-one byte (0xFF ) has to be written to TWI data register NEORV32_TWI.DATA
so the accessed device can actively pull-down SDA when required.
|
TWI Interrupt
The TWI module provides a single interrupt to signal "idle condition" to the CPU. The interrupt becomes active when the
TWI module is enabled (TWI_CTRL_EN
= 1
) and the TX FIFO is empty and the TWI bus engine is idle.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
TWI enable, reset if cleared |
|
r/w |
3-bit clock prescaler select |
||
|
r/w |
4-bit clock divider |
||
|
r/w |
Enable (allow) clock stretching |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
FIFO depth; log2( |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
set if the TWI bus is claimed by any controller |
||
|
r/- |
RX FIFO data available |
||
|
r/- |
TWI bus engine busy or TX FIFO not empty |
||
|
|
|
r/w |
RX/TX data byte |
|
r/w |
write: ACK bit sent by controller; read: |
||
|
r/w |
TWI operation ( |
2.8.18. One-Wire Serial Interface Controller (ONEWIRE)
Hardware source files: |
neorv32_onewire.vhd |
|
Software driver files: |
neorv32_onewire.c |
|
neorv32_onewire.h |
||
Top entity ports: |
|
1-bit 1-wire bus sense input |
|
1-bit 1-wire bus output (pull low only) |
|
Configuration generics: |
|
implement ONEWIRE interface controller when |
CPU interrupts: |
fast IRQ channel 13 |
operation done interrupt (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The NEORV32 ONEWIRE module implements a single-wire interface controller that is compatible to the
Dallas/Maxim 1-Wire protocol, which is an asynchronous half-duplex bus requiring only a single signal wire
connected to onewire_io
(plus ground).
The bus is based on a single open-drain signal. The controller and all the devices can only pull-down the bus actively. Hence, an external pull-up resistor is required. Recommended values are between 1kΩ and 4kΩ depending on the bus characteristics (wire length, number of devices, etc.). Furthermore, a series resistor (~100Ω) at the controller side is recommended to control the slew rate and to reduce signal reflections. Also, additional external ESD protection clamp diodes should be added to the bus line.
Tri-State Drivers
The ONEWIRE module requires a tri-state driver (actually, open-drain) for the 1-wire bus line, which has to be implemented
in the top module of the setup. A generic VHDL example is given below (onewire
is the actual 1-wire
bus signal, which is of type std_logic
).
onewire <= '0' when (onewire_o = '0') else 'Z'; -- drive
onewire_i <= std_ulogic(onewire); -- sense
Theory of Operation
The ONEWIRE controller provides two interface registers: CTRL
and DATA.
The control registers (CTRL
)
is used to configure the module, to trigger bus transactions and to monitor the current state of the module.
The DATA
register is used to read/write data from/to the bus.
The module is enabled by setting the ONEWIRE_CTRL_EN
bit in the control register. If this bit is cleared, the
module is automatically reset and the bus is brought to high-level (due to the external pull-up resistor).
The basic timing configuration is programmed via the clock prescaler bits ONEWIRE_CTRL_PRSCx
and the
clock divider bits ONEWIRE_CTRL_CLKDIVx
(see next section).
The controller can execute three basic bus operations, which are triggered by setting one out of three specific control register bits (the bits auto-clear):
-
generate reset pulse and check for device presence; triggered when setting
ONEWIRE_CTRL_TRIG_RST
-
transfer a single-bit (read-while-write); triggered when setting
ONEWIRE_CTRL_TRIG_BIT
-
transfer a full-byte (read-while-write); triggered when setting
ONEWIRE_CTRL_TRIG_BYTE
Only one trigger bit may be set at once, otherwise undefined behavior might occur. |
When a single-bit operation has been triggered, the data previously written to DATA[0]
will be send to the bus
and DATA[7]
will be sampled from the bus. Accordingly, a full-byte transmission will send the previously
byte written to DATA[7:0]
to the bus and will update DATA[7:0]
with the data read from the bus (LSB-first).
The triggered operation has completed when the module’s busy flag ONEWIRE_CTRL_BUSY
has cleared again.
Read from Bus
In order to read a single bit from the bus DATA[0] has to set to 1 before triggering the bit transmission
operation to allow the accessed device to pull-down the bus. Accordingly, DATA has to be set to 0xFF before
triggering the byte transmission operation when the controller shall read a byte from the bus.
|
The ONEWIRE_CTRL_PRESENCE
bit gets set if at least one device has send a "presence" signal right after the
reset pulse.
Bus Timing
The control register provides a 2-bit clock prescaler select (ONEWIRE_CTRL_PRSCx
) and a 8-bit clock divider
(ONEWIRE_CTRL_CLKDIVx
) for timing configuration. Both are used to define the elementary base time Tbase.
All bus operations are timed using multiples of this elementary base time.
ONEWIRE_CTRL_PRSCx |
0b00 |
0b01 |
0b10 |
0b11 |
---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
Together with the clock divider value (ONEWIRE_CTRL_PRSCx
bits = clock_divider
) the base time is defined by the
following formula:
Tbase = (1 / fmain[Hz]) * clock_prescaler
* (clock_divider
+ 1)
Example:
-
fmain = 100MHz
-
clock prescaler select =
0b01
→clock_prescaler
= 4 -
clock divider
clock_divider
= 249
Tbase = (1 / 100000000Hz) * 4 * (249 + 1) = 10000ns = 10µs
The base time is used to coordinate all bus interactions. Hence, all delays, time slots and points in time are quantized as multiples of the base time. The following images show the two basic operations of the ONEWIRE controller: single-bit (0 or 1) transaction and reset with presence detect. The relevant points in time are shown as absolute time (in multiples of the time base) with the bus' falling edge as reference point.
Single-bit data transmission (not to scale) |
Reset pulse and presence detect (not to scale) |
Symbol | Description | Multiples of Tbase | Time when Tbase = 10µs |
---|---|---|---|
Single-bit data transmission |
|||
|
Time until end of active low-phase when writing a |
1 |
10µs |
|
Time until controller samples bus state (read operation) |
2 |
20µs |
|
Time until end of bit time slot (when writing a |
7 |
70µs |
|
Time until end of inter-slot pause (= total duration of one bit) |
9 |
90µs |
Reset pulse and presence detect |
|||
|
Time until end of active reset pulse |
48 |
480µs |
|
Time until controller samples bus presence |
55 |
550µs |
|
Time until end of presence phase |
96 |
960µs |
The default values for base time multiples were chosen to for stable and reliable bus operation (not for maximum throughput). |
The absolute points in time are hardwired by the VHDL code and cannot be changed during runtime. However, the timing parameter can be customized by editing the ONEWIRE’s VHDL source file:
neorv32_onewire.vhd
-- timing configuration (absolute time in multiples of the base tick time t_base) --
constant t_write_one_c : unsigned(6 downto 0) := to_unsigned( 1, 7); -- t0
constant t_read_sample_c : unsigned(6 downto 0) := to_unsigned( 2, 7); -- t1
constant t_slot_end_c : unsigned(6 downto 0) := to_unsigned( 7, 7); -- t2
constant t_pause_end_c : unsigned(6 downto 0) := to_unsigned( 9, 7); -- t3
constant t_reset_end_c : unsigned(6 downto 0) := to_unsigned(48, 7); -- t4
constant t_presence_sample_c : unsigned(6 downto 0) := to_unsigned(55, 7); -- t5
constant t_presence_end_c : unsigned(6 downto 0) := to_unsigned(96, 7); -- t6
Overdrive
The ONEWIRE controller does not support the overdrive mode. However, it can be implemented by reducing the base
time Tbase (and by eventually changing the hardwired timing configuration in the VHDL source file).
|
Interrupt
A single interrupt is provided by the ONEWIRE module to signal "idle" condition to the CPU. Whenever the controller is idle (again) the interrupt becomes active.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
ONEWIRE enable, reset if cleared |
|
r/w |
2-bit clock prescaler select |
||
|
r/w |
8-bit clock divider value |
||
|
-/w |
trigger reset pulse, auto-clears |
||
|
-/w |
trigger single bit transmission, auto-clears |
||
|
-/w |
trigger full-byte transmission, auto-clears |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
current state of the bus line |
||
|
r/- |
device presence detected after reset pulse |
||
|
r/- |
operation in progress when set |
||
|
|
|
r/w |
receive/transmit data (8-bit) |
2.8.19. Pulse-Width Modulation Controller (PWM)
Hardware source files: |
neorv32_pwm.vhd |
|
Software driver files: |
neorv32_pwm.c |
|
neorv32_pwm.h |
||
Top entity ports: |
|
PWM output channels (16-bit) |
Configuration generics: |
|
number of PWM channels to implement (0..16) |
CPU interrupts: |
none |
|
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The PWM module implements a pulse-width modulation controller with up to 16 independent channels. Duty cycle and
carrier frequency can be programmed individually for each channel.The total number of implemented channels is
defined by the IO_PWM_NUM_CH
generic. The PWM output signal pwm_o
has a static size of 16-bit. Channel 0
corresponds to bit 0, channel 1 to bit 1 and so on. If less than 16 channels are configured, only the LSB-aligned
channel bits are connected while the remaining ones are hardwired to zero.
Theory of Operation
Depending on the configured number channels, the PWM module provides 16 configuration registers CHANNEL_CFG[0]
to
CHANNEL_CFG[15]
- one for each channel. Regardless of the configuration of IO_PWM_NUM_CH
all channel registers can
be accessed without raising an exception. However, registers above IO_PWM_NUM_CH-1
are read-only and hardwired to
all-zero.
Each configuration provides a 1-bit enable flag to enable/disable the according channel, an 8-bit register for setting the duty cycle and a 3-bit clock prescaler select as well as a 10-bit clock diver for coarse and fine tuning of the carrier frequency, respectively.
A channel is enabled by setting the PWM_CFG_EN
bit. If this bit is cleared the according PWM output is set to zero.
The duty cycle is programmed via the 8 PWM_CFG_DUTY
bits. Based on the value programmed to this bits the duty cycle
the resulting duty cycle of the according channel can be computed by the following formula:
Duty Cycle[%] = PWM_CFG_DUTY
/ 28
The PWM period (carrier frequency) is derived from the processor’s main clock (fmain). The PWM_CFG_PRSC
register
bits allow to select one out of eight pre-defined clock prescalers for a coarse clock scaling. The 10 PWM_CFG_CDIV
register
bits can be used to apply another fine clock scaling.
PWM_CFG_PRSC |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
The resulting PWM carrier frequency is defined by:
fPWM[Hz] = fmain[Hz] / (28 * clock_prescaler
* (1 + PWM_CFG_CDIV
))
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
Channel 0: channel enabled when set |
|
r/w |
Channel 0: 3-bit clock prescaler select |
||
|
r/- |
Channel 0: reserved, hardwired to zero |
||
|
r/w |
Channel 0: 10-bit clock divider |
||
|
r/w |
Channel 0: 8-bit duty cycle |
||
|
|
… |
r/w |
Channels 1 to 14 |
|
|
|
r/w |
Channel 15: channel enabled when set |
|
r/w |
Channel 15: 3-bit clock prescaler select |
||
|
r/- |
Channel 15: reserved, hardwired to zero |
||
|
r/w |
Channel 15: 10-bit clock divider |
||
|
r/w |
Channel 15: 8-bit duty cycle |
2.8.20. True Random-Number Generator (TRNG)
Hardware source files: |
neorv32_trng.vhd |
|
Software driver files: |
neorv32_trng.c |
|
neorv32_trng.h |
||
Top entity ports: |
none |
|
Configuration generics: |
|
implement TRNG when |
|
data FIFO depth, min 1, has to be a power of two |
|
CPU interrupts: |
fast IRQ channel 0 |
TRNG data available interrupt (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The NEORV32 true random number generator provides physically true random numbers. It is based on free-running ring-oscillators that generate phase noise when being sampled by a constant clock. This phase noise is used as physical entropy source. The TRNG features a platform independent architecture without FPGA-specific primitives, macros or attributes so it can be synthesized for any FPGA.
In-Depth Documentation
For more information about the neoTRNG architecture and an analysis of its random quality check out the
neoTRNG repository: https://github.com/stnolting/neoTRNG
|
Inferring Latches
The synthesis tool might emit warnings regarding inferred latches or combinatorial loops. However, this
is not design flaw as this is exactly what we want. ;)
|
Simulation
When simulating the processor the TRNG is automatically set to "simulation mode". In this mode the physical entropy
sources (the ring oscillators) are replaced by a simple pseudo RNG based on a LFSR providing only
deterministic pseudo-random data. The TRNG_CTRL_SIM_MODE flag of the control register is set if simulation
mode is active.
|
Theory of Operation
The TRNG features a single control register CTRL
for control, status check and data access. When the TRNG_CTRL_EN
bit is set, the TRNG is enabled and starts operation. As soon as the TRNG_CTRL_VALID
bit is set a new random data byte
is available and can be obtained from the lowest 8 bits of the CTRL
register. If this bit is cleared, there is no
valid data available and the lowest 8 bit of the CTRL
register are set to all-zero.
An internal entropy FIFO can be configured using the IO_TRNG_FIFO
generic. This FIFO automatically samples
new random data from the TRNG to provide some kind of random data pool for applications, which require a large number
of random data in a short time. The random data FIFO can be cleared at any time either by disabling the TRNG or by
setting the TRNG_CTRL_FIFO_CLR
flag. The FIFO depth can be retrieved by software via the TRNG_CTRL_FIFO_*
bits.
TRNG Interrupt
As the neoTRNG is a rather slow entropy source, a "data available" interrupt is provided to inform the application
software that new random data is available. This interrupt can be trigger by either of two conditions: trigger the
interrupt if any random data is available (i.e. the data FIFO is not empty; TRNG_CTRL_IRQ_SEL = 0
) or trigger
the interrupt if the random pool is full (i.e. the data FIFO is full; TRNG_CTRL_IRQ_SEL = 1
).
Once the TRNG interrupt has fired it remains pending until the actual cause of the interrupt is resolved.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/- |
8-bit random data |
|
r/- |
reserved, read as zero |
||
|
r/- |
FIFO depth, log2( |
||
|
r/- |
reserved, read as zero |
||
|
r/w |
interrupt trigger select (0 = data available, 1 = FIFO full) |
||
|
-/w |
flush random data FIFO when set; flag auto-clears |
||
|
r/- |
simulation mode (PRNG!) |
||
|
r/w |
TRNG enable |
||
|
r/- |
random data is valid when set |
2.8.21. Custom Functions Subsystem (CFS)
Hardware source files: |
neorv32_cfs.vhd |
|
Software driver files: |
neorv32_cfs.c |
|
neorv32_cfs.h |
||
Top entity ports: |
|
custom input conduit |
|
custom output conduit |
|
Configuration generics: |
|
implement CFS when |
|
custom generic conduit |
|
|
size of |
|
|
size of |
|
CPU interrupts: |
fast IRQ channel 1 |
CFS interrupt (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The custom functions subsystem is meant for implementing custom tightly-coupled co-processors or interfaces.
IT provides up to 64 32-bit memory-mapped read/write registers (REG
, see register map below) that can be
accessed by the CPU via normal load/store operations. The actual functionality of these register has to be
defined by the hardware designer. Furthermore, the CFS provides two IO conduits to implement custom on-chip
or off-chip interfaces.
Just like any other externally-connected IP, logic implemented within the custom functions subsystem can operate independently of the CPU providing true parallel processing capabilities. Potential use cases might include dedicated hardware accelerators for en-/decryption (AES), signal processing (FFT) or AI applications (CNNs) as well as custom IO systems like fast memory interfaces (DDR) and mass storage (SDIO), networking (CAN) or real-time data transport (I2S).
If you like to implement custom instructions that are executed right within the CPU’s ALU
see the Zxcfu ISA Extension and the according Custom Functions Unit (CFU).
|
Take a look at the template CFS VHDL source file (rtl/core/neorv32_cfs.vhd ). The file is highly
commented to illustrate all aspects that are relevant for implementing custom CFS-based co-processor designs.
|
The CFS can also be used to replicate existing NEORV32 modules - for example to implement several TWI controllers. |
CFS Software Access
The CFS memory-mapped registers can be accessed by software using the provided C-language aliases (see
register map table below). Note that all interface registers are defined as 32-bit words of type uint32_t
.
// C-code CFS usage example
NEORV32_CFS->REG[0] = (uint32_t)some_data_array(i); // write to CFS register 0
int temp = (int)NEORV32_CFS->REG[20]; // read from CFS register 20
CFS Interrupt
The CFS provides a single high-level-triggered interrupt request signal mapped to the CPU’s fast interrupt channel 1.
CFS Configuration Generic
By default, the CFS provides a single 32-bit std_ulogic_vector
configuration generic IO_CFS_CONFIG
that is available in the processor’s top entity. This generic can be used to pass custom configuration options
from the top entity directly down to the CFS. The actual definition of the generic and it’s usage inside the
CFS is left to the hardware designer.
CFS Custom IOs
By default, the CFS also provides two unidirectional input and output conduits cfs_in_i
and cfs_out_o
.
These signals are directly propagated to the processor’s top entity. These conduits can be used to implement
application-specific interfaces like memory or peripheral connections. The actual use case of these signals
has to be defined by the hardware designer.
The size of the input signal conduit cfs_in_i
is defined via the top’s IO_CFS_IN_SIZE
configuration
generic (default = 32-bit). The size of the output signal conduit cfs_out_o
is defined via the top’s
IO_CFS_OUT_SIZE
configuration generic (default = 32-bit). If the custom function subsystem is not implemented
(IO_CFS_EN
= false) the cfs_out_o
signal is tied to all-zero.
If the CFU output signals are to be used outside the chip, it is recommended to register these signals.
Register Map
Address | Name [C] | Bit(s) | R/W | Function |
---|---|---|---|---|
|
|
|
(r)/(w) |
custom CFS register 0 |
|
|
|
(r)/(w) |
custom CFS register 1 |
… |
… |
|
(r)/(w) |
… |
|
|
|
(r)/(w) |
custom CFS register 62 |
|
|
|
(r)/(w) |
custom CFS register 63 |
2.8.22. Smart LED Interface (NEOLED)
Hardware source files: |
neorv32_neoled.vhd |
|
Software driver files: |
neorv32_neoled.c |
|
neorv32_neoled.h |
||
Top entity ports: |
|
1-bit serial data output |
Configuration generics: |
|
implement NEOLED controller when |
|
TX FIFO depth, has to be a power of 2, min 1 |
|
CPU interrupts: |
fast IRQ channel 9 |
configurable NEOLED data FIFO interrupt (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The NEOLED module provides a dedicated interface for "smart RGB LEDs" like WS2812, WS2811 or any other compatible LEDs. These LEDs provide a single-wire interface that uses an asynchronous serial protocol for transmitting color data. Using the NEOLED module allows CPU-independent operation of an arbitrary number of smart LEDs. A configurable data buffer (FIFO) allows to utilize block transfer operation without requiring the CPU.
The NEOLED interface is compatible to the "Adafruit Industries NeoPixel™" products, which feature WS2812 (or older WS2811) smart LEDs. Other LEDs might be compatible as well when adjusting the controller’s programmable timing configuration. |
The interface provides a single 1-bit output neoled_o
to drive an arbitrary number of cascaded LEDs. Since the
NEOLED module provides 24-bit and 32-bit operating modes, a mixed setup with RGB LEDs (24-bit color)
and RGBW LEDs (32-bit color including a dedicated white LED chip) is possible.
Theory of Operation
The NEOLED modules provides two accessible interface registers: the control register CTRL
and the write-only
TX data register DATA
. The NEOLED module is globally enabled via the control register’s
NEOLED_CTRL_EN
bit. Clearing this bit will terminate any current operation, clear the TX buffer, reset the module
and set the neoled_o
output to zero. The precise timing (e.g. implementing the WS2812 protocol) and transmission
mode are fully programmable via the CTRL
register to provide maximum flexibility.
RGB / RGBW Configuration
NeoPixel™ LEDs are available in two "color" version: LEDs with three chips providing RGB color and LEDs with four chips providing RGB color plus a dedicated white LED chip (= RGBW). Since the intensity of every LED chip is defined via an 8-bit value the RGB LEDs require a frame of 24-bit per module and the RGBW LEDs require a frame of 32-bit per module.
The data transfer quantity of the NEOLED module can be programmed via the NEOLED_MODE_EN
control
register bit. If this bit is cleared, the NEOLED interface operates in 24-bit mode and will transmit bits 23:0
of
the data written to DATA
to the LEDs. If NEOLED_MODE_EN
is set, the NEOLED interface operates in 32-bit
mode and will transmit bits 31:0
of the data written to DATA
to the LEDs.
The mode bit can be reconfigured before writing a new data word to DATA
in order to support an arbitrary setup/mixture
of RGB and RGBW LEDs.
Protocol
The interface of the WS2812 LEDs uses an 800kHz carrier signal. Data is transmitted in a serial manner starting with LSB-first. The intensity for each R, G & B (& W) LED chip (= color code) is defined via an 8-bit value. The actual data bits are transferred by modifying the duty cycle of the signal (the timings for the WS2812 are shown below). A RESET command is "send" by pulling the data line LOW for at least 50μs.
Ttotal (Tcarrier) |
1.25μs +/- 300ns |
period for a single bit |
T0H |
0.4μs +/- 150ns |
high-time for sending a |
T0L |
0.8μs +/- 150ns |
low-time for sending a |
T1H |
0.85μs +/- 150ns |
high-time for sending a |
T1L |
0.45μs +/- 150 ns |
low-time for sending a |
RESET |
Above 50μs |
low-time for sending a RESET command |
Timing Configuration
The basic carrier frequency (800kHz for the WS2812 LEDs) is configured via a 3-bit main clock prescaler
(NEOLED_CTRL_PRSC*
, see table below) that scales the main processor clock fmain and a 5-bit cycle
multiplier NEOLED_CTRL_T_TOT_*
.
NEOLED_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
The duty-cycles (or more precisely: the high- and low-times for sending either a '1' bit or a '0' bit) are
defined via the 5-bit NEOLED_CTRL_T_ONE_H_*
and NEOLED_CTRL_T_ZERO_H_*
values, respectively. These programmable
timing constants allow to adapt the interface for a wide variety of smart LED protocol (for example WS2812 vs.
WS2811).
Timing Configuration - Example (WS2812)
Generate the base clock fTX for the NEOLED TX engine:
-
processor clock fmain = 100 MHz
-
NEOLED_CTRL_PRSCx
=0b001
= fmain / 4
fTX = fmain[Hz] / clock_prescaler
= 100MHz / 4 = 25MHz
TTX = 1 / fTX = 40ns
Generate carrier period (Tcarrier) and high-times (duty cycle) for sending 0
(T0H) and 1
(T1H) bits:
-
NEOLED_CTRL_T_TOT
=0b11110
(= decimal 30) -
NEOLED_CTRL_T_ZERO_H
=0b01010
(= decimal 10) -
NEOLED_CTRL_T_ONE_H
=0b10100
(= decimal 20)
Tcarrier = TTX * NEOLED_CTRL_T_TOT
= 40ns * 30 = 1.4µs
T0H = TTX * NEOLED_CTRL_T_ZERO_H
= 40ns * 10 = 0.4µs
T1H = TTX * NEOLED_CTRL_T_ONE_H
= 40ns * 20 = 0.8µs
The NEOLED SW driver library (neorv32_neoled.h ) provides a simplified configuration
function that configures all timing parameters for driving WS2812 LEDs based on the processor
clock frequency.
|
TX Data FIFO
The interface features a configurable TX data buffer (a FIFO) to allow more CPU-independent operation. The buffer
depth is configured via the IO_NEOLED_TX_FIFO
top generic (default = 1 entry). The FIFO size configuration can be
read via the NEOLED_CTRL_BUFS_x
control register bits, which result log2(IO_NEOLED_TX_FIFO).
When writing data to the DATA
register the data is automatically written to the TX buffer. Whenever
data is available in the buffer the serial transmission engine will take and transmit it to the LEDs.
The data transfer size (NEOLED_MODE_EN
) can be modified at any time since this control register bit is also buffered
in the FIFO. This allows an arbitrary mix of RGB and RGBW LEDs in the chain.
Software can check the FIFO fill level via the control register’s NEOLED_CTRL_TX_EMPTY
, NEOLED_CTRL_TX_HALF
and NEOLED_CTRL_TX_FULL
flags. The NEOLED_CTRL_TX_BUSY
flags provides additional information if the the serial
transmit engine is still busy sending data.
Please note that the timing configurations (NEOLED_CTRL_PRSCx , NEOLED_CTRL_T_TOT_x ,
NEOLED_CTRL_T_ONE_H_x and NEOLED_CTRL_T_ZERO_H_x ) are NOT stored to the buffer. Changing
these value while the buffer is not empty or the TX engine is still busy will cause data corruption.
|
Strobe Command ("RESET")
According to the WS2812 specs the data written to the LED’s shift registers is strobed to the actual PWM driver registers when the data line is low for 50μs ("RESET" command, see table above). This can be implemented using busy-wait for at least 50μs. Obviously, this concept wastes a lot of processing power.
To circumvent this, the NEOLED module provides an option to automatically issue an idle time for creating the RESET
command. If the NEOLED_CTRL_STROBE
control register bit is set, all data written to the data FIFO (via DATA
,
the actually written data is irrelevant) will trigger an idle phase (neoled_o
= zero) of 127 periods (= Tcarrier).
This idle time will cause the LEDs to strobe the color data into the PWM driver registers.
Since the NEOLED_CTRL_STROBE
flag is also buffered in the TX buffer, the RESET command is treated just as another
data word being written to the TX buffer making busy wait concepts obsolete and allowing maximum refresh rates.
NEOLED Interrupt
The NEOLED modules features a single interrupt that triggers based on the current TX buffer fill level.
The interrupt can only become pending if the NEOLED module is enabled. The specific interrupt condition
is configured via the NEOLED_CTRL_IRQ_CONF
bit in the unit’s control register.
If NEOLED_CTRL_IRQ_CONF
is set, the module’s interrupt is generated whenever the TX FIFO is less than half-full.
In this case software can write up to IO_NEOLED_TX_FIFO
/2 new data words to DATA
without checking the FIFO
status flags. If NEOLED_CTRL_IRQ_CONF
is cleared, an interrupt is generated when the TX FIFO is empty.
Once the NEOLED interrupt has fired it remains pending until the actual cause of the interrupt is resolved.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
NEOLED enable |
|
r/w |
data transfer size; |
||
|
r/w |
|
||
|
r/w |
3-bit clock prescaler, bit 0 |
||
|
r/- |
4-bit log2(IO_NEOLED_TX_FIFO) |
||
|
r/w |
5-bit pulse clock ticks per total single-bit period (Ttotal) |
||
|
r/w |
5-bit pulse clock ticks per high-time for sending a zero-bit (T0H) |
||
|
r/w |
5-bit pulse clock ticks per high-time for sending a one-bit (T1H) |
||
|
r/w |
TX FIFO interrupt configuration: |
||
|
r/- |
TX FIFO is empty |
||
|
r/- |
TX FIFO is at least half full |
||
|
r/- |
TX FIFO is full |
||
|
r/- |
TX serial engine is busy when set |
||
|
|
|
-/w |
TX data (32- or 24-bit, depending on NEOLED_CTRL_MODE bit) |
2.8.23. External Interrupt Controller (XIRQ)
Hardware source files: |
neorv32_xirq.vhd |
|
Software driver files: |
neorv32_xirq.c |
|
neorv32_xirq.h |
||
Top entity ports: |
|
External interrupts input (32-bit) |
Configuration generics: |
|
Number of external IRQ channels to implement (0..32) |
CPU interrupts: |
fast IRQ channel 8 |
XIRQ (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The external interrupt controller provides a simple mechanism to implement up to 32 platform-level / processor-external interrupt request signals. The external IRQ requests are prioritized, queued and signaled to the CPU via a single CPU fast interrupt request channel.
Theory of Operation
The XIRQ provides up to 32 external interrupt channels configured via the XIRQ_NUM_CH
generic. Each bit in the
xirq_i
input signal vector represents one interrupt channel. If less than 32 channels are configured, only the
LSB-aligned channels are used while the remaining ones are left unconnected internally.
The external interrupt controller features four interface registers:
-
external interrupt channel enable (
EIE
) -
external interrupt source (
ESC
) -
trigger type configuration (
TTYP
) -
trigger polarity configuration (
TPOL
)
The actual interrupt trigger type can be configured individually for each channel using the TTYP
and TPOL
registers. TTYP
defines the actual trigger type (level-triggered or edge-triggered), while TPOL
defines
the trigger’s polarity (low-level/falling-edge or high-level/rising-edge). The position of each bit in these
registers corresponds the according XIRQ channel.
TTYP(i) |
TPOL(i) |
Resulting trigger of xirq_i(i) |
---|---|---|
|
|
low-level |
|
|
high-level |
|
|
falling-edge |
|
|
rising-edge |
Each interrupt channel can be enabled or disabled individually using the EIE
register. If the trigger of a
disabled channel fires the interrupt request is entirely ignored.
If the configured trigger of an enabled channels fires, the according interrupt request is buffered internally
and an interrupt request is sent to the CPU. If more than one trigger fires at one a prioritization is used:
the channels are prioritized in a static order, i.e. channel 0 (xirq_i(0)
) has the highest priority and channel
31 (xirq_i(31)
) has the lowest priority.
The CPU can determine the most prioritized external interrupt request by reading the interrupt source register ESC
.
This register provides a 5-bit wide ID (0..31) identifying the currently firing external interrupt source channel as
well as a single bit (the MSB) that
Writing any value to this register will acknowledge and clear the current CPU interrupt (so the XIRQ controller
can issue a new CPU interrupt).
Register Map
Address | Name [C] | Bit(s) | R/W | Description |
---|---|---|---|---|
|
|
|
r/w |
External interrupt enable register (one bit per channel, LSB-aligned) |
|
|
|
r/c |
XIRQ interrupt when set; write any value to this register to acknowledge the current XIRQ interrupt |
|
r/- |
reserved, read as zero |
||
|
r/c |
Interrupt source ID (0..31) of firing IRQ (prioritized!) |
||
|
|
|
r/w |
Trigger type select ( |
|
|
|
r/w |
Trigger polarity select ( |
2.8.24. General Purpose Timer (GPTMR)
Hardware source files: |
neorv32_gptmr.vhd |
|
Software driver files: |
neorv32_gptmr.c |
|
neorv32_gptmr.h |
||
Top entity ports: |
none |
|
Configuration generics: |
|
implement general purpose timer when |
CPU interrupts: |
fast IRQ channel 12 |
timer interrupt (see Processor Interrupts) |
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The general purpose timer module implements a simple yet universal 32-bit timer. It is implemented if the processor’s
IO_GPTMR_EN
top generic is set true
. The timer provides a pre-scaled counter register that can trigger an interrupt
when reaching a programmable threshold value.
The GPTMR provides three interface registers : a control register (CTRL
), a 32-bit counter register (COUNT
) and a
32-bit threshold register (THRES
). The timer is globally enabled by setting the GPTMR_CTRL_EN
bit in the module’s
control register. When the timer is enable the COUNT
register will start incrementing from zero at a programmable
rate that scales the main processor clock. this pre-scaler is configured via the three GPTMR_CTRL_PRSCx
control register bits:
GPTMR_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
Whenever the counter register COUNT
equals the programmable threshold value THRES
the module’s interrupt
signal becomes pending (indicated by GPTMR_CTRL_IRQ_PND
being set). Note that a pending interrupt has to be
cleared manually by writing a 1
to GPTMR_CTRL_IRQ_CLR
.
The control register’s GPTMR_CTRL_MODE
bit defines what will happen when COUNT == THRES
.
-
GPTMR_CTRL_MODE = 0
: single-shot mode - theCOUNT
register will stop incrementing -
GPTMR_CTRL_MODE = 1
: continuous mode - theCOUNT
register is automatically reset and restarts incrementing from zero
Resetting the Counter
Disabling the GPTMR will also clear the COUNT register.
|
Interrupt
The GPTRM provides a single interrupt line is triggered whenever COUNT
equals THRES
. Once triggered, the interrupt will
stay pending until explicitly cleared by writing a 1 to GPTMR_CTRL_IRQ_CLR
.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
Timer enable flag |
|
r/w |
3-bit clock prescaler select |
||
|
r/w |
Operation mode (0=single-shot, 1=continuous) |
||
|
r/- |
reserved, read as zero |
||
|
-/w |
Write |
||
|
r/- |
Timer-match interrupt pending |
||
|
|
|
r/w |
Threshold value register |
|
|
|
r/- |
Counter register |
2.8.25. Execute In Place Module (XIP)
Hardware source files: |
neorv32_xip.vhd |
XIP module |
neorv32_cache.vhd |
Generic cache module |
|
Software driver files: |
neorv32_xip.c |
|
neorv32_xip.h |
||
Top entity ports: |
|
1-bit chip select, low-active |
|
1-bit serial clock output |
|
|
1-bit serial data input |
|
|
1-bit serial data output |
|
Configuration generics: |
|
implement XIP module when |
|
implement XIP cache when |
|
|
number of blocks in XIP cache; has to be a power of two |
|
|
number of bytes per XIP cache block; has to be a power of two, min 4 |
|
CPU interrupts: |
none |
|
Access restrictions: |
control registers: privileged access only, non-32-bit write accesses are ignored |
|
XIP data access: read-only |
Overview
The execute in-place (XIP) module allows to execute code (and read constant data) directly from an external SPI flash memory. The standard serial peripheral interface (SPI) is used as transfer protocol. All bus requests issued by the CPU are converted transparently into SPI flash access commands. Hence, the external XIP flash behaves like a simple on-chip ROM.
From the CPU side, the modules provides two independent interfaces: one for transparently accessing the XIP flash and another one for accessing the module’s control and status registers. The first interface provides the transparent gateway to the SPI flash, so the CPU can directly fetch and execute instructions and/or read constant data. Note that this interface is read-only. Any write access will raise a bus error exception. The second interface is mapped to the processor’s IO space and allows accesses to the XIP module’s configuration registers as well as conducting individual SPI transfers.
The XIP module provides an optional configurable cache to accelerate SPI flash accesses.
XIP Address Mapping
When XIP mode is enabled the flash is mapped to fixed address space region starting at address
0xE0000000 (see section Address Space) supporting a maximum flash size of 256MB.
|
XIP Example Program
An example program is provided in sw/example/demo_xip that illustrate how to program and configure
an external SPI flash to run a program from it.
|
SPI Configuration
The XIP module accesses external flash using the standard SPI protocol. The module always sends data MSB-first and
provides all of the standard four clock modes (0..3), which are configured via the XIP_CTRL_CPOL
(clock polarity)
and XIP_CTRL_CPHA
(clock phase) control register bits, respectively. The flash’s "read command", which initiates
a read access, is defined by the XIP_CTRL_RD_CMD
control register bits. For most SPI flash memories this is 0x03
for normal SPI mode.
The SPI clock (xip_clk_o
) frequency is programmed by the 3-bit XIP_CTRL_PRSCx
clock prescaler for a coarse clock
selection and a 4-bit clock divider XPI_CTRL_CDIVx
for a fine clock selection.
The following clock prescalers (XIP_CTRL_PRSCx
) are available:
XIP_CTRL_PRSCx |
0b000 |
0b001 |
0b010 |
0b011 |
0b100 |
0b101 |
0b110 |
0b111 |
---|---|---|---|---|---|---|---|---|
Resulting |
2 |
4 |
8 |
64 |
128 |
1024 |
2048 |
4096 |
Based on the programmed clock configuration, the actual SPI clock frequency fSPI is derived from the processor’s main clock fmain according to the following equation:
fSPI = fmain[Hz] / (2 * clock_prescaler
* (1 + XPI_CTRL_CDIVx
))
Hence, the maximum SPI clock is fmain / 4 and the lowest SPI clock is fmain / 131072. The SPI clock is always symmetric having a duty cycle of 50%.
High-Speed Mode
The XIP module provides a high-speed mode to further boost the maximum SPI clock frequency. When enabled via the control
register’s XIP_CTRL_HIGHSPEED
bit the clock prescaler configuration (XIP_CTRL_PRSCx
bits) is overridden setting it
to a minimal factor of 1. However, the clock speed can still be fine-tuned using the XPI_CTRL_CDIVx
bits.
fSPI = fmain[Hz] / (2 * 1 * (1 + XPI_CTRL_CDIVx
))
Hence, the maximum SPI clock when in high-speed mode is fmain / 2.
Direct SPI Access
The XIP module allows to initiate direct SPI transactions. This feature can be used to configure the attached SPI
flash or to perform direct read and write accesses to the flash memory. Two data registers DATA_LO
and
DATA_HI
are provided to send up to 64-bit of SPI data. The DATA_HI
register is write-only,
so a total of just 32-bits of receive data is provided. Note that the module handles the chip-select
line (xip_csn_o
) by itself so it is not possible to construct larger consecutive transfers.
The actual data transmission size in bytes is defined by the control register’s XIP_CTRL_SPI_NBYTES
bits.
Any configuration from 1 byte to 8 bytes is valid. Other value will result in unpredictable behavior.
Since data is always transferred MSB-first, the data in DATA_HI:DATA_LO
also has to be MSB-aligned. Receive data is
available in DATA_LO
only since DATA_HI
is write-only. Writing to DATA_HI
triggers the actual SPI transmission.
The XIP_CTRL_PHY_BUSY
control register flag indicates a transmission being in progress.
The chip-select line of the XIP module (xip_csn_o
) will only become asserted (enabled, pulled low) if the
XIP_CTRL_SPI_CSEN
control register bit is set. If this bit is cleared, xip_csn_o
is always disabled
(pulled high).
Direct SPI mode is only possible when the module is enabled (setting XIP_CTRL_EN ) but before the actual
XIP mode is enabled via XIP_CTRL_XIP_EN .
|
When the XIP mode is not enabled, the XIP module can also be used as additional general purpose SPI controller with a transfer size of up to 64 bits per transmission. |
Using the XIP Mode
The XIP module is globally enabled by setting the XIP_CTRL_EN
bit in the device’s CTRL
control register.
Clearing this bit will reset the whole module and will also terminate any pending SPI transfer.
Since there is a wide variety of SPI flash components with different sizes, the XIP module allows to specify
the address width of the flash: the number of address bytes used for addressing flash memory content has to be
configured using the control register’s XIP_CTRL_XIP_ABYTES bits. These two bits contain the number of SPI
address bytes (minus one). For example for a SPI flash with 24-bit addresses these bits have to be set to
0b10
.
The transparent XIP accesses are transformed into SPI transmissions with the following format (starting with the MSB):
-
8-bit command: configured by the
XIP_CTRL_RD_CMD
control register bits ("SPI read command") -
8 to 32 bits address: defined by the
XIP_CTRL_XIP_ABYTES
control register bits ("number of address bytes") -
32-bit data: sending zeros and receiving the according flash word (32-bit)
Hence, the maximum XIP transmission size is 72-bit, which has to be configured via the XIP_CTRL_SPI_NBYTES
control register bits. Note that the 72-bit transmission size is only available in XIP mode. The transmission
size of the direct SPI accesses is limited to 64-bit.
When using four SPI flash address bytes, the most significant 4 bits of the address are always hardwired to zero allowing a maximum accessible flash size of 256MB. |
The XIP module always fetches a full naturally aligned 32-bit word from the SPI flash. Any sub-word data masking or alignment will be performed by the CPU core logic. |
The XIP mode requires the 4-byte data words in the flash to be ordered in little-endian byte order. |
After the SPI properties (including the amount of address bytes and the total amount of SPI transfer bytes)
and XIP address mapping are configured, the actual XIP mode can be enabled by setting
the control register’s XIP_CTRL_XIP_EN
bit. This will enable the "transparent SPI access port" of the module and thus,
the transparent conversion of access requests into proper SPI flash transmissions. Hence, any access to the processor’s
memory-mapped XIP region (0xE0000000
to 0xEFFFFFFF
) will be converted into SPI flash accesses.
Make sure XIP_CTRL_SPI_CSEN
is also set so the module can actually select/enable the attached SPI flash.
No more direct SPI accesses via DATA_HI:DATA_LO
are possible when the XIP mode is enabled. However, the
XIP mode can be disabled at any time.
If the XIP module is disabled (XIP_CTRL_EN = 0 ) any accesses to the memory-mapped XIP flash address region
will raise a bus access exception. If the XIP module is enabled (XIP_CTRL_EN = 1 ) but XIP mode is not enabled
yet (XIP_CTRL_XIP_EN = '0') any access to the programmed XIP memory segment will also raise a bus access exception.
|
It is highly recommended to enable the Processor-Internal Instruction Cache (iCACHE) to cover some of the SPI access latency. |
XIP Cache
Since every single instruction fetch request from the CPU is translated into serial SPI transmissions the access latency is very high resulting in a low throughput. In order to improve performance, the XIP module provides an optional cache that allows to buffer recently-accessed data. The cache is implemented as a simple direct-mapped read-only cache with a configurable cache layout:
-
XIP_CACHE_EN
: when set totrue
the CIP cache is implemented -
XIP_CACHE_NUM_BLOCKS
defines the number of cache blocks (or lines) -
XIP_CACHE_BLOCK_SIZE
defines the size in bytes of each cache block
When the cache is implemented, the XIP module operates in burst mode utilizing the flash’s incremental read capabilities.
Thus, several bytes (= XIP_CACHE_BLOCK_SIZE
) are read consecutively from the flash using a single read command.
The XIP cache is cleared when the XIP module is disabled (XIP_CTRL_EN = 0
), when XIP mode is disabled
(XIP_CTRL_XIP_EN = 0
) or when the CPU issues a fence(.i)
instruction.
Register Map
Address | Name [C] | Bit(s), Name [C] | R/W | Function |
---|---|---|---|---|
|
|
|
r/w |
XIP module enable |
|
r/w |
3-bit SPI clock prescaler select |
||
|
r/w |
SPI clock polarity |
||
|
r/w |
SPI clock phase |
||
|
r/w |
Number of bytes in SPI transaction (1..9) |
||
|
r/w |
XIP mode enable |
||
|
r/w |
Number of address bytes for XIP flash (minus 1) |
||
|
r/w |
Flash read command |
||
|
r/w |
Allow SPI chip-select to be actually asserted when set |
||
|
r/w |
enable SPI high-speed mode (ignoring |
||
|
r/- |
4-bit clock divider for fine-tuning |
||
|
r/- |
reserved, read as zero |
||
|
r/- |
SPI PHY busy when set |
||
|
r/- |
XIP access in progress when set |
||
|
reserved |
|
r/- |
reserved, read as zero |
|
|
|
r/w |
Direct SPI access - data register low |
|
|
|
-/w |
Direct SPI access - data register high; write access triggers SPI transfer |
2.8.26. System Configuration Information Memory (SYSINFO)
Hardware source files: |
neorv32_sysinfo.vhd |
|
Software driver files: |
neorv32_sysinfo.h |
|
Top entity ports: |
none |
|
Configuration generics: |
* |
most of the top’s configuration generics |
CPU interrupts: |
none |
|
Access restrictions: |
privileged access only, non-32-bit write accesses are ignored |
Overview
The SYSINFO module allows the application software to determine the setting of most of the Processor Top Entity - Generics that are related to CPU and processor/SoC configuration. This device is always implemented - regardless of the actual hardware configuration since the NEORV32 software framework requires information from this device for correct operation. However, advanced users that do not want to use the default NEORV32 software framework can choose to disable the entire SYSINFO module. This might also be suitable for setups that use the processor just as wrapper for a CPU-only configuration.
Disabling the SYSINFO Module
Setting the IO_DISABLE_SYSINFO top entity generic to true will remove the SYSINFO module from the design.
This option is suitable for advanced uses that wish to use a CPU-only setup that still contains the bus infrastructure.
As a result, large parts of the NEORV32 software framework no longer work (e.g. most IO drivers, the RTE and the bootloader).
Hence, this option is not recommended.
|
Register Map
All registers of this module are read-only except for the CLK
register. Upon reset, the CLK
registers is initialized
from the CLOCK_FREQUENCY
top entity generic. Application software can override this default value in order, for example,
to take into account a dynamic frequency scaling of the processor.
Address | Name [C] | R/W | Description |
---|---|---|---|
|
|
r/w |
clock frequency in Hz (initialized from top’s |
|
|
r/- |
internal memory configuration (see SYSINFO - Memory Configuration) |
|
|
r/- |
specific SoC configuration (see SYSINFO - SoC Configuration) |
|
|
r/- |
cache configuration information (see SYSINFO - Cache Configuration) |
SYSINFO - Memory Configuration
Bit fields in this register are set to all-zero if the according memory system is not implemented. |
Byte | Name [C] | Description |
---|---|---|
|
|
log2(internal IMEM size in bytes), via top’s |
|
|
log2(internal DMEM size in bytes), via top’s |
|
- |
reserved, read as zero |
|
|
boot mode configuration, via top’s |
SYSINFO - SoC Configuration
Bit | Name [C] | Description |
---|---|---|
|
|
set if processor-internal bootloader is implemented (via top’s |
|
|
set if external Wishbone bus interface is implemented (via top’s |
|
|
set if processor-internal DMEM is implemented (via top’s |
|
|
set if processor-internal IMEM is implemented (via top’s |
|
|
set if on-chip debugger is implemented (via top’s |
|
|
set if processor-internal instruction cache is implemented (via top’s |
|
|
set if processor-internal data cache is implemented (via top’s |
|
|
set if CPU clock gating is implemented (via top’s |
|
|
set if external bus interface cache is implemented (via top’s |
|
|
set if XIP module is implemented (via top’s |
|
|
set if XIP cache is implemented (via top’s |
|
|
set if on-chip debugger authentication is implemented (via top’s |
|
|
set if processor-internal IMEM is implemented as pre-initialized ROM (via top’s |
|
- |
reserved, read as zero |
|
|
set if direct memory access controller is implemented (via top’s |
|
|
set if GPIO is implemented (via top’s |
|
|
set if MTIME is implemented (via top’s |
|
|
set if primary UART0 is implemented (via top’s |
|
|
set if SPI is implemented (via top’s |
|
|
set if TWI is implemented (via top’s |
|
|
set if PWM is implemented (via top’s |
|
|
set if WDT is implemented (via top’s |
|
|
set if custom functions subsystem is implemented (via top’s |
|
|
set if TRNG is implemented (via top’s |
|
|
set if SDI is implemented (via top’s |
|
|
set if secondary UART1 is implemented (via top’s |
|
|
set if NEOLED is implemented (via top’s |
|
|
set if XIRQ is implemented (via top’s |
|
|
set if GPTMR is implemented (via top’s |
|
|
set if stream link interface is implemented (via top’s |
|
|
set if ONEWIRE interface is implemented (via top’s |
|
|
set if cyclic redundancy check unit is implemented (via top’s |
SYSINFO - Cache Configuration
Bit fields in this register are set to all-zero if the according cache is not implemented. |
Bit | Name [C] | Description |
---|---|---|
|
|
log2(i-cache block size in bytes), via top’s |
|
|
log2(i-cache number of cache blocks), via top’s |
|
|
log2(d-cache block size in bytes), via top’s |
|
|
log2(d-cache number of cache blocks), via top’s |
|
|
log2(xip-cache block size in bytes), via top’s |
|
|
log2(xip-cache number of cache blocks), via top’s |
|
|
log2(xbus-cache block size in bytes), via top’s |
|
|
log2(xbus-cache number of cache blocks), via top’s |
3. NEORV32 Central Processing Unit (CPU)
The NEORV32 CPU is an area-optimized RISC-V core implementing the rv32i_zicsr_zifencei
base (privileged) ISA and
supporting several additional/optional ISA extensions. The CPU’s micro architecture is based on a von-Neumann
machine build upon a mixture of multi-cycle and pipelined execution schemes.
This chapter assumes that the reader is familiar with the official RISC-V User and Privileged Architecture specifications. |
Section Structure
3.1. RISC-V Compatibility
The NEORV32 CPU passes the tests of the official RISCOF RISC-V Architecture Test Framework. This framework is used to check RISC-V implementations for compatibility to the official RISC-V user/privileged ISA specifications. The NEORV32 port of this test framework is available in a separate repository at GitHub: https://github.com/stnolting/neorv32-riscof
Unsupported ISA Extensions
Executing instructions or accessing CSRs from yet unsupported ISA extensions will raise an illegal
instruction exception (see section Full Virtualization).
|
Incompatibility Issues and Limitations
time[h] CSRs (Wall Clock Time)time[h] registers. Any access to these registers will trap. It is
recommended that the trap handler software provides a means of accessing the platform-defined Machine System Timer (MTIME).
|
No Hardware Support of Misaligned Memory Accesses
The CPU does not support resolving unaligned memory access by the hardware (this is not a
RISC-V-incompatibility issue but an important thing to know!). Any kind of unaligned memory access
will raise an exception to allow a software-based emulation provided by the application. However, unaligned memory
access can be emulated using the NEORV32 runtime environment. See section Application Context Handling
for more information.
|
3.2. CPU Top Entity - Signals
The following table shows all interface signals of the CPU top entity rtl/core/neorv32_cpu.vhd
. The
type of all signals is std_ulogic or std_ulogic_vector, respectively. The "Dir." column shows the signal
direction as seen from the CPU.
Signal | Width/Type | Dir | Description |
---|---|---|---|
Global Signals |
|||
|
1 |
in |
Global clock line, all registers triggering on rising edge, this clock can be switched off during Sleep Mode |
|
1 |
in |
Always-on clock, used to keep the the sleep control active when |
|
1 |
in |
Global reset, low-active |
|
1 |
out |
CPU is in Sleep Mode when set |
|
1 |
out |
CPU is in debug mode when set |
Interrupts (Traps, Exceptions and Interrupts) |
|||
|
1 |
in |
RISC-V machine software interrupt |
|
1 |
in |
RISC-V machine external interrupt |
|
1 |
in |
RISC-V machine timer interrupt |
|
16 |
in |
Custom fast interrupt request signals |
|
1 |
in |
Request CPU to halt and enter debug mode (RISC-V On-Chip Debugger (OCD)) |
Instruction Bus Interface |
|||
|
|
out |
Instruction fetch bus request |
|
|
in |
Instruction fetch bus response |
Data Bus Interface |
|||
|
|
out |
Data access (load/store) bus request |
|
|
in |
Data access (load/store) bus response |
Bus Interface Protocol
See section Bus Interface for the instruction fetch and data access interface protocol and the
according interface types (bus_req_t and bus_rsp_t ).
|
3.3. CPU Top Entity - Generics
Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section Processor Top Entity - Generics). and are not listed here. However, the CPU provides some specific generics that are used to configure the CPU for the NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration. The specific generics are listed below.
Table Abbreviations
The generic type "suv(x:y)" represents a std_ulogic_vector(x downto y) .
|
Name | Type | Description |
---|---|---|
|
suv(31:0) |
Value for the |
|
suv(31:0) |
CPU reset address. See section Address Space. |
|
suv(31:0) |
"Park loop" entry address for the On-Chip Debugger (OCD), has to be 4-byte aligned. |
|
suv(31:0) |
"Exception" entry address for the On-Chip Debugger (OCD), has to be 4-byte aligned. |
|
boolean |
Implement RISC-V-compatible "debug" CPU operation mode required for the On-Chip Debugger (OCD). |
|
boolean |
Implement RISC-V-compatible trigger module. See section On-Chip Debugger (OCD). |
|
boolean |
Implement RISC-V-compatible physical memory protection (PMP). See section |
3.4. Architecture
The CPU implements a pipelined multi-cycle architecture: each instruction is executed as a series of consecutive micro-operations. In order to increase performance, the CPU’s front-end (instruction fetch) and back-end (instruction execution) are de-couples via a FIFO (the instruction prefetch buffer. Thus, the front-end can already fetch new instructions while the back-end is still processing the previously-fetched instructions.
Basically, the CPU’s micro architecture is somewhere between a classical pipelined architecture, where each stage requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes every single instruction (including fetch) in a series of consecutive micro-operations. The combination of these two design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to overlapping operation of fetch and execute) at a reduced hardware footprint (due to the multi-cycle concept).
As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access. However, these two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses have higher priority). Hence, all memory addresses including peripheral devices are mapped to a single unified 32-bit Address Space.
Linear/In-Order Execution Only
The CPU does not perform any speculative/out-of-order operations at all. Hence, it is not vulnerable to security issues
caused by speculative execution (like Spectre or Meltdown).
|
3.4.1. CPU Register File
The data register file contains the general purpose architecture registers x0
to x31
. For the rv32e
ISA only the lower
16 registers are implemented. Register zero (x0
/zero
) always read as zero and any write access to it has no effect.
Up to four individual synchronous read ports allow to fetch up to 4 register operands at once. The write and read accesses
are mutually exclusive as they happen in separate cycles. Hence, there is no need to consider things like "read-during-write"
behavior.
The register file provides two different implementation options configured via the top’s REGFILE_HW_RST
generic.
-
REGFILE_HW_RST = false
(default): In this configuration the register file is implemented as plain memory array without a dictated hardware reset. This architecture allows to infer FPGA block RAM for the entire register file resulting in minimal general logic utilization. -
REGFILE_HW_RST = true
: This configuration is based on individual FFs that do provide a dedicated hardware reset. Hence, the register cannot be mapped to FPGA block RAM. This optional can be selected if the application requires a reset of the register file (e.g. for security reasons) or if the design shall be synthesized for an ASIC implementation. Using individual FFs for th register file might also improve timing as no long routing lines are required to connect to block RAM primitives.
The state of this configuration generic can be checked by software via the mxisa
CSR.
FPGA Implementation
Enabling the REGFILE_HW_RST option for FPGA implementation is not recommended as this will massively increase the amount
of required logic resources.
|
Implementation of the
Register zero Register within FPGA Block RAMzero is also mapped to a physical memory location within the register file’s block RAM. By this, there is no need
to add a further multiplexer to "insert" zero if reading from register zero reducing logic requirements and shortening the
critical path. However, this also requires that the physical storage bits of register zero are explicitly initialized (set
to zero) by the hardware. This is done transparently by the CPU control requiring no additional processing overhead.
|
Block RAM Ports
The default register file configuration uses two access ports: a read-only port for reading register rs2 (second source operand)
and a read/write port for reading register rs1 (first source operand) and for writing processing results to register rd
(destination register). Hence, a simple dual-port RAM can be used to implement the entire register file. From a functional point
of view, read and write accesses to the register file do never occur in the same clock cycle, so no bypass logic is required at all.
|
3.4.2. CPU Arithmetic Logic Unit
The arithmetic/logic unit (ALU) is used for actual data processing as well as generating memory and branch addresses.
All "simple" I
ISA Extension computational instructions (like add
and or
) are implemented as plain combinatorial logic
requiring only a single cycle to complete. More sophisticated instructions like shift operations or multiplications are processed
by so-called "ALU co-processors".
The co-processors are implemented as iterative units that require several cycles to complete processing. Besides the base ISA’s
shift instructions, the co-processors are used to implement all further processing-based ISA extensions (e.g. M
ISA Extension
and B
ISA Extension).
Multi-Cycle Execution Monitor
The CPU control will raise an illegal instruction exception if a multi-cycle functional unit (like the Custom Functions Unit (CFU))
does not complete processing in a bound amount of time (configured via the package’s monitor_mc_tmo_c constant; default = 512 clock cycles).
|
Tuning Options
The ALU architecture can be tuned for an application-specific area-vs-performance trade-off. The FAST_MUL_EN and FAST_SHIFT_EN
generics can be used to implement performance-optimized barrel shifters and DSP blocks, respectively. See sections I ISA Extension,
B ISA Extension and M ISA Extension for specific examples.
|
3.4.3. CPU Bus Unit
The bus unit takes care of handling data memory accesses via load and store instructions. It handles data adjustment when accessing
sub-word data quantities (16-bit or 8-bit) and performs sign-extension for singed load operations. The bus unit also includes the optional
Smpmp
ISA Extension that performs permission checks for all data and instruction accesses.
A list of the bus interface signals and a detailed description of the protocol can be found in section Bus Interface. All bus interface signals are driven/buffered by registers; so even a complex SoC interconnection bus network will not effect maximal operation frequency.
Unaligned Accesses
The CPU does not support a hardware-based handling of unaligned memory accesses! Any unaligned access will raise a bus load/store unaligned
address exception. The exception handler can be used to emulate unaligned memory accesses in software.
See the NEORV32 Runtime Environment’s Application Context Handling section for more information.
|
3.4.4. CPU Control Unit
The CPU control unit is responsible for generating all the control signals for the different CPU modules. The control unit is split into a "front-end" and a "back-end".
Front-End
The front-end is responsible for fetching instructions in chunks of 32-bits. This can be a single aligned 32-bit instruction,
two aligned 16-bit instructions or a mixture of those. The instructions including control and exception information are stored
to a FIFO queue - the instruction prefetch buffer (IPB). This FIFO has a depth of two entries by default but can be customized
via the ipb_depth_c
VHDL package constant.
The FIFO allows the front-end to do "speculative" instruction fetches, as it keeps fetching the next consecutive instruction all the time. This also allows to decouple front-end (instruction fetch) and back-end (instruction execution) so both modules can operate in parallel to increase performance. However, all potential side effects that are caused by this "speculative" instruction fetch are already handled by the CPU front-end ensuring a defined execution stage while preventing security side attacks.
Back-End
Instruction data from the instruction prefetch buffer is decompressed (if the C
ISA extension is enabled) and sent to the
CPU back-end for actual execution. Execution is conducted by a state-machine that controls all of the CPU modules. The back-end also
includes the Control and Status Registers (CSRs) as well as the trap controller.
3.4.5. Sleep Mode
The NEORV32 CPU provides a single sleep mode that can be entered to power-down the core reducing
dynamic power consumption. Sleep mode is entered by executing the wfi
("wait for interrupt") instruction.
Execution Details
The wfi instruction will raise an illegal instruction exception when executed in user-mode
if TW in mstatus is set. When executed in debug-mode or during single-stepping wfi will behave as
simple nop without entering sleep mode.
|
After executing the wfi
instruction the CPU’s sleep_o
signal (CPU Top Entity - Signals) will become set
as soon as the CPU has fully halted ("CPU is sleeping"):
CPU-external modules like memories, timers and peripheral interfaces are not affected by this. Furthermore, the CPU will
continue to buffer/enqueue incoming interrupt. The CPU will leave sleep mode as soon as any enabled interrupt (via mie
)
source becomes pending or if a debug session is started.
Power-Down Mode
Optionally, the sleep mode can also be used to shut down the CPU’s main clock to further reduce power consumption
by halting the core’s clock tree. This clock gating mode is enabled by the CLOCK_GATING_EN
generic
(Processor Top Entity - Generics). See section Processor Clocking for more information.
3.4.6. Full Virtualization
Just like the RISC-V ISA, the NEORV32 aims to provide maximum virtualization capabilities on CPU and SoC level to allow a high standard of execution safety. The CPU supports all traps specified by the official RISC-V specifications. Thus, the CPU provides defined hardware fall-backs via traps for any expected and unexpected situations (e.g. executing a malformed or not supported instruction or accessing a non-allocated memory address). For any kind of trap the core is always in a defined and fully synchronized state throughout the whole system (i.e. there are no out-of-order operations that might have to be reverted). This allows a defined and predictable execution behavior at any time improving overall execution safety.
3.5. Bus Interface
The NEORV32 CPU provides separated instruction fetch and data access interfaces making it a Harvard Architecture:
the instruction fetch interface (i_bus_*
signals) is used for fetching instructions and the data access interface
(d_bus_*
signals) is used to access data via load and store operations. Each of these interfaces can access an address
space of up to 232 bytes (4GB).
The bus interface uses two custom interface types: bus_req_t
is used to propagate the bus access requests. These
signals are driven by the accessing device (i.e. the CPU core). bus_rsp_t
is used to return the bus response and
is driven by the accessed device or bus system (i.e. a processor-internal memory or IO device).
Signal | Width | Description |
---|---|---|
|
32 |
Access address (byte addressing) |
|
32 |
Write data |
|
4 |
Byte-enable for each byte in |
|
1 |
Request trigger ("strobe", single-shot) |
|
1 |
Access direction ( |
|
1 |
Access source ( |
|
1 |
Set if privileged (M-mode) access |
|
1 |
Set if current access is a reservation-set operation ( |
|
1 |
Data/instruction fence operation; valid without |
Signal | Width | Description |
---|---|---|
|
32 |
Read data (single-shot) |
|
1 |
Transfer acknowledge / success (single-shot) |
|
1 |
Transfer error / fail (single-shot) |
3.5.1. Bus Interface Protocol
Transactions are triggered entirely by the request bus. A new bus request is initiated by setting the strobe
signal stb
high for exactly one cycle. All remaining signals of the bus are set together with stb
and will
remain unchanged until the transaction is completed.
The transaction is completed when the accessed device returns a response via the response interface:
ack
is high for exactly one cycle if the transaction was completed successfully. err
is high for exactly
one cycle if the transaction failed to complete. These two signals are mutually exclusive. In case of a read
access the read data is returned together with the ack
signal. Otherwise, the return data signal is
kept at all-zero allowing wired-or interconnection of all response buses.
The figure below shows three exemplary bus accesses:
-
A read access to address
A_addr
returningrdata
after several cycles (slow response;ACK
arrives after several cycles). -
A write access to address
B_addr
writingwdata
(fastest response;ACK
arrives right in the next cycle). -
A failing read access to address
C_addr
(slow response;ERR
arrives after several cycles).
Adding Register Stages
Arbitrary pipeline stages can be added to the request and response buses at any point to relax timing (at the cost of
additional latency). However, all bus signals (request and response) need to be registered.
|
3.5.2. Atomic Accesses
The load-reservate (lr.w
) and store-conditional (sc.w
) instructions from the Zalrsc
ISA Extension execute as standard
load/store bus transactions but with the rvso
("reservation set operation") signal being set. It is the task of the
Reservation Set Controller to handle these LR/SC bus transactions accordingly. Note that these reservation set operations
are intended for processor-internal usage only (i.e. the reservation state is not available for processor-external modules yet).
Reservation Set Controller
See section Address Space / Reservation Set Controller for more information.
|
The figure below shows three exemplary bus accesses (1 to 3 from left to right). The req
signal record represents
the CPU-side of the bus interface. For easier understanding the current state of the reservation set is added as rvs_valid
signal.
-
A load-reservate (LR) instruction using
addr
as address. This instruction returns the loaded datardata
viarsp.data
and also registers a reservation for the addressaddr
(rvs_valid
becomes set). -
A store-conditional (SC) instruction attempts to write
wdata1
to addressaddr
. This SC operation succeeds, sowdata1
is actually written to addressaddr
. The successful operation is indicated by a 0 being returned viarsp.data
together withack
. As the LR/SC is completed the registered reservation is invalidated (rvs_valid
becomes cleared). -
Another store-conditional (SC) instruction attempts to write
wdata2
to addressaddr
. As the reservation set is already invalidated (rvs_valid
is0
) the store access fails, sowdata2
is not written to addressaddr
at all. The failed operation is indicated by a 1 being returned viarsp.data
together withack
.
Store-Conditional Status
The "normal" load data mechanism is used to return success/failure of the sc.w instruction to the CPU (via the LSB of rsp.data ).
|
Cache Coherency
Atomic operations always bypass the CPU caches using direct/uncached accesses. Care must be taken
to maintain data cache coherency (e.g. by using the fence instruction).
|
3.6. Instruction Sets and Extensions
The NEORV32 CPU provides several optional RISC-V-compliant and custom/user-defined ISA extensions. The extensions can be enabled/configured via the according Processor Top Entity - Generics. This chapter gives a brief overview of all available ISA extensions.
Name | Description | Enabled by Generic |
---|---|---|
Bit manipulation instructions |
Implicitly enabled |
|
Compressed (16-bit) instructions |
||
Embedded CPU extension (reduced register file size) |
||
Integer base ISA |
Enabled if |
|
Integer multiplication and division instructions |
||
Less-privileged user mode extension |
||
Platform-specific / NEORV32-specific extension |
Always enabled |
|
Atomic reservation-set instructions |
||
Shifted-add bit manipulation instructions |
||
Basic bit manipulation instructions |
||
Scalar cryptographic bit manipulation instructions |
||
Scalar cryptographic carry-less multiplication instructions |
||
Scalar cryptographic crossbar permutation instructions |
||
Single-bit bit manipulation instructions |
||
Floating-point instructions using integer registers |
||
Instruction stream synchronization instruction |
Always enabled |
|
Base counters extension |
||
Integer conditional operations |
||
Control and status register access instructions |
Always enabled |
|
Hardware performance monitors extension |
||
Scalar cryptographic NIST algorithm suite |
Implicitly enabled |
|
Scalar cryptographic NIST AES decryption instructions |
||
Scalar cryptographic NIST AES encryption instructions |
||
Scalar cryptographic NIST hash function instructions |
||
Data independent execution time (of cryptographic operations) |
Implicitly enabled |
|
Scalar cryptographic ShangMi algorithm suite |
Implicitly enabled |
|
Scalar cryptographic ShangMi block cypher instructions |
||
Scalar cryptographic ShangMi hash instructions |
||
Integer multiplication-only instructions |
||
Custom / user-defined instructions |
||
Physical memory protection (PMP) extension |
||
External debug support extension |
||
Trigger module extension |
RISC-V ISA Specification
For more information regarding the RISC-V ISA extensions please refer to the "RISC-V Instruction Set Manual - Volume
I: Unprivileged ISA" and "The RISC-V Instruction Set Manual Volume II: Privileged Architecture". A copy of these
documents can be found in the projects docs/references folder.
|
Discovering ISA Extensions
Software can discover available ISA extensions via the misa and mxisa CSRs or by executing an instruction
and checking for an illegal instruction exception (i.e. Full Virtualization).
|
Instruction Cycles
This chapter shows the CPI values (cycles per instruction) for each individual instruction/type. Note that
values reflect optimal conditions (i.e. no additional memory delay, no cache misses, no pipeline waits, etc.).
To benchmark a certain processor configuration for its setup-specific CPI value please refer to the
sw/example/performance_tests test programs.
|
3.6.1. B
ISA Extension
The B
ISA extension adds instructions for bit-manipulation operations.
This ISA extension cannot be enabled by a specific generic. Instead, it is enabled if a specific set of
bit-manipulation sub-extensions are enabled.
The B
extension is shorthand for the following set of other extensions:
-
Zba
ISA Extension - Address-generation / shifted-add instructions. -
Zbb
ISA Extension - Basic bit manipulation instructions. -
Zbs
ISA Extension - Single-bit operations.
A processor configuration which implements B
must implement all of the above extensions.
3.6.2. C
ISA Extension
The "compressed" ISA extension provides 16-bit encodings of commonly used instructions to reduce code space size.
Class | Instructions | Execution cycles |
---|---|---|
ALU |
|
2 |
ALU |
|
3 + 1..32; FAST_SHIFT: 4 |
Branches |
|
taken: 6; not taken: 3 |
Jumps / calls |
|
6 |
Memory access |
|
4 |
System |
|
3 |
3.6.3. E
ISA Extension
The "embedded" ISA extensions reduces the size of the general purpose register file from 32 entries to 16 entries to
shrink hardware size. It provides the same instructions as the the base I
ISA extensions.
Alternative MABI
Due to the reduced register file size an alternate toolchain ABI (ilp32e* ) is required.
|
3.6.4. I
ISA Extension
The I
ISA extensions is the base RISC-V integer ISA that is always enabled.
Class | Instructions | Execution cycles |
---|---|---|
ALU |
|
2 |
No-operation |
“nop” |
2 |
ALU shifts |
|
3 + 1..32; FAST_SHIFT: 4 |
Branches |
|
taken: 6; not taken: 3 |
Jump/call |
|
6 |
Load/store |
|
5 |
System |
|
3 |
Data fence |
|
5 |
System |
|
3 |
System |
|
5 |
Illegal inst. |
- |
3 |
fence Instructionfence instruction word’s predecessor and successor bits (used for memory ordering) are not evaluated
by the hardware at all. For the NEORV32 the fence instruction behaves exactly like the fence.i instruction
(see Zifencei ISA Extension). However, software should still use distinct fence and fence.i to provide
platform-compatibility and to indicate the actual intention of the according fence instruction(s).
|
wfi Instructionwfi instruction is used to enter Sleep Mode. Executing the wfi instruction in user-mode
will raise an illegal instruction exception if the TW bit of mstatus is set.
|
Barrel Shifter
The shift operations are implemented as multi-cycle ALU co-process (rtl/core/neorv32_cpu_cp_shifter.vhd ).
These operations can be accelerated (at the cost of additional logic resources) by enabling the FAST_SHIFT_EN
configuration option that will replace the (time-variant) bit-serial shifter by a (time-constant) barrel shifter.
|
3.6.5. M
ISA Extension
Hardware-accelerated integer multiplication and division operations are available via the RISC-V M
ISA extension.
This ISA extension is implemented as multi-cycle ALU co-process (rtl/core/neorv32_cpu_cp_muldiv.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
Multiplication |
|
36; FAST_MUL: 4 |
Division |
|
36 |
DSP Blocks
Multiplication operations can be accelerated (at the cost of additional logic resources) by enabling the FAST_MUL_EN
configuration option that will replace the (time-variant) bit-serial multiplier by (time-constant) FPGA DSP blocks.
|
3.6.6. U
ISA Extension
In addition to the highest-privileged machine-mode, the user-mode ISA extensions adds a second less-privileged operation mode. Code executed in user-mode has reduced CSR access rights. Furthermore, user-mode accesses to the address space (like peripheral/IO devices) can be constrained via the physical memory protection. Any kind of privilege rights violation will raise an exception to allow Full Virtualization.
3.6.7. X
ISA Extension
The NEORV32-specific ISA extensions X
is always enabled. The most important points of the NEORV32-specific extensions are:
* The CPU provides 16 fast interrupt interrupts (FIRQ
), which are controlled via custom bits in the mie
and mip
CSRs. These extensions are mapped to CSR bits, that are available for custom use according to the
RISC-V specs. Also, custom trap codes for mcause
are implemented.
* All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see Full Virtualization).
* There are NEORV32-Specific CSRs.
3.6.8. Zalrsc
ISA Extension
The Zalrsc
ISA extension is a sub-extension of the RISC-V atomic memory access (A
) ISA extension and includes
instructions for reservation-set operations (load-reservate lr
and store-conditional sc
) only.
It is enabled by the top’s RISCV_ISA_Zalrsc
generic.
AMO /
The atomic memory access / read-modify-write operations of the A EmulationA ISA extension can be emulated using the
LR and SC operations (quote from the RISC-V spec.: "Any AMO can be emulated by an LR/SC pair.").
The NEORV32 Core Libraries provide an emulation wrapper for emulating AMO/read-modify-write instructions that is
based on LR/SC pairs. A demo/program can be found in sw/example/atomic_test .
|
Class | Instructions | Execution cycles |
---|---|---|
Load-reservate word |
|
5 |
Store-conditional word |
|
5 |
aq and rl Bitsaq and lr memory ordering bits are not evaluated by the hardware at all.
|
Atomic Memory Access on Hardware Level
More information regarding the atomic memory accesses and the according reservation
sets can be found in section Reservation Set Controller.
|
3.6.9. Zifencei
ISA Extension
The Zifencei
CPU extension allows manual synchronization of the instruction stream. This extension is always enabled.
NEORV32 Fence Instructions
The NEORV32 treats both fence instructions (fence = data fence, fence.i = instruction fence) in exactly the same way.
Both instructions cause a flush of the CPU’s instruction prefetch buffer and also send a fence request via the system
bus (see Bus Interface). This system bus fence operation will, for example, clear/flush all downstream caches.
|
Class | Instructions | Execution cycles |
---|---|---|
Instruction fence |
|
5 |
3.6.10. Zfinx
ISA Extension
The Zfinx
floating-point extension is an alternative of the standard F
floating-point ISA extension.
It also uses the integer register file x
to store and operate on floating-point data
instead of a dedicated floating-point register file. Thus, the Zfinx
extension requires
less hardware resources and features faster context changes. This also implies that there are NO dedicated f
register file-related load/store or move instructions. The Zfinx
extension’S floating-point unit is controlled
via dedicated Floating-Point CSRs.
This ISA extension is implemented as multi-cycle ALU co-process (rtl/core/neorv32_cpu_cp_fpu.vhd
).
Fused / Multiply-Add Instructions
Fused multiply-add instructions f[n]m[add/sub].s are not supported. A special GCC switch is used to prevent the
compiler from emitting contracted/fused floating-point operations (see Default Compiler Flags).
|
Division and Squarer Root Instructions
Division fdiv.s and square root fsqrt.s instructions are not supported yet.
|
Subnormal Number
Subnormal numbers ("de-normalized" numbers, i.e. exponent = 0) are not supported by the NEORV32 FPU.
Subnormal numbers are flushed to zero setting them to +/- 0 before being processed by any FPU operation.
If a computational instruction generates a subnormal result it is also flushed to zero during normalization.
|
Class | Instructions | Execution cycles |
---|---|---|
Artihmetic |
|
110 |
Artihmetic |
|
112 |
Artihmetic |
|
22 |
Compare |
|
13 |
Conversion |
|
48 |
Misc |
|
12 |
3.6.11. Zicntr
ISA Extension
The Zicntr
ISA extension adds the basic cycle[h]
, mcycle[h]
, instret[h]
and minstret[h]
counter CSRs. Section (Machine) Counter and Timer CSRs shows a list of all Zicntr
-related CSRs.
Time CSRs
The user-mode time[h] CSRs are not implemented. Any access will trap allowing the trap handler to
retrieve system time from the Machine System Timer (MTIME).
|
Mandatory Extension
This extensions is stated as mandatory by the RISC-V spec. However, area-constrained setups may remove
support for these counters.
|
Constrained Access
User-level access to the counter CSRs can be constrained by the mcounteren CSR.
|
3.6.12. Zicond
ISA Extension
The Zicond
ISA extension adds integer conditional move primitives that allow to implement branch-less
control flows. It is enabled by the top’s RISCV_ISA_Zicond
generic.
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_cond.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
Conditional |
|
3 |
3.6.13. Zicsr
ISA Extension
This ISA extensions provides instructions for accessing the Control and Status Registers (CSRs) as well as further privileged-architecture extensions. This extension is mandatory and cannot be disabled. Hence, there is no generic for enabling/disabling this ISA extension.
Side-Effects if Destination is Zero-Register
If rd=x0 for the csrrw[i] instructions there will be no actual read access to the according CSR.
However, access privileges are still enforced so these instruction variants do cause side-effects
(the RISC-V spec. state that these combinations "shall" not cause any side-effects).
|
Class | Instructions | Execution cycles |
---|---|---|
System |
|
3 |
3.6.14. Zihpm
ISA Extension
In additions to the base counters the NEORV32 CPU provides up to 13 hardware performance monitors (HPM 3..15),
which can be used to benchmark applications. Each HPM consists of an N-bit wide counter (split in a high-word 32-bit
CSR and a low-word 32-bit CSR), where N is defined via the top’s HPM_CNT_WIDTH
generic and a corresponding event
configuration CSR.
The event configuration CSR defines the architectural events that lead to an increment of the associated HPM counter. See section Hardware Performance Monitors (HPM) CSRs for a list of all HPM-related CSRs and event configurations.
Machine-Mode HPMs Only
Note that only the machine-mode hardware performance counter CSR are available (mhpmcounter*[h] ).
Accessing any user-mode HPM CSR (hpmcounter*[h] ) will raise an illegal instruction exception.
|
Increment Inhibit
The event-driven increment of the HPMs can be deactivated individually via the mcountinhibit CSR.
|
3.6.15. Zba
ISA Extension
The Zba
sub-extension is part of the RISC-V bit manipulation ISA specification (B
ISA Extension)
and adds shifted-add / address-generation instructions. It is enabled by the top’s
RISCV_ISA_Zba
generic. This ISA extension is implemented as multi-cycle
ALU co-processor (rtl/core/neorv32_cpu_cp_bitmanip.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
Shifted-add |
|
4 |
3.6.16. Zbb
ISA Extension
The Zbb
sub-extension is part of the RISC-V bit manipulation ISA specification (B
ISA Extension)
and adds the basic bit manipulation instructions. It is enabled by the top’s RISCV_ISA_Zbb
generic. This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_bitmanip.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
Logic with negate |
|
4 |
Count leading/trailing zeros |
|
6 + 1..32; FAST_SHIFT: 4 |
Count population |
|
6 + 32; FAST_SHIFT: 4 |
Integer maximum/minimum |
|
4 |
Sign/zero extension |
|
4 |
Bitwise rotation |
|
6 + shift_amount; FAST_SHIFT: 4 |
OR-combine |
|
4 |
Byte-reverse |
|
4 |
Shift Operations
Shift operations can be accelerated (at the cost of additional logic resources) by enabling the FAST_SHIFT_EN
configuration option that will replace the (time-variant) bit-serial shifter by a (time-constant) barrel shifter.
|
3.6.17. Zbs
ISA Extension
The Zbs
sub-extension is part of the RISC-V bit manipulation ISA specification (B
ISA Extension)
and adds single-bit operations. It is enabled by the top’s RISCV_ISA_Zbs
generic.
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_bitmanip.vhd
).
Single-bit | sbset[i] sbclr[i] sbinv[i] sbext[i] |
4 |
---|
3.6.18. Zbkb
ISA Extension
The Zbkb
sub-extension is part of the RISC-V scalar cryptography ISA specification and extends the RISC-V bit manipulation
ISA extension with additional instructions. It is enabled by the top’s RISCV_ISA_Zbkb
generic.
Note that enabling this extension will also enable the Zbb
basic bit-manipulation ISA extension (which is extended by Zknb
).
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_bitmanip.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
Packing |
|
4 |
Interleaving |
|
4 |
Byte-wise bit reversal |
|
4 |
3.6.19. Zbkc
ISA Extension
The Zbkc
sub-extension is part of the RISC-V scalar cryptography ISA extension and adds carry-less multiplication instruction.
ISA extension with additional instructions. It is enabled by the top’s RISCV_ISA_Zbkc
generic.
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_bitmanip.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
Carry-less multiply |
|
6 + 32 |
3.6.20. Zbkx
ISA Extension
The Zbkx
sub-extension is part of the RISC-V scalar cryptography ISA specification and adds crossbar permutation instructions.
It is enabled by the top’s RISCV_ISA_Zbkx
generic.
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_crypto.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
Crossbar permutation |
|
4 |
3.6.21. Zkn
ISA Extension
The Zkn
ISA extension is part of the RISC-V scalar cryptography ISA specification and defines the "NIST algorithm suite".
This ISA extension cannot be enabled by a specific generic. Instead, it is enabled if a specific set of cryptography-related
sub-extensions is enabled.
The Zkn
extension is shorthand for the following set of other extensions:
-
Zbkb
ISA Extension - Bit manipulation instructions for cryptography. -
Zbkc
ISA Extension - Carry-less multiply instructions. -
Zbkx
ISA Extension - Cross-bar permutation instructions. -
Zkne
ISA Extension - AES encryption instructions. -
Zknd
ISA Extension - AES decryption instructions. -
Zknh
ISA Extension - SHA2 hash function instructions.
A processor configuration which implements Zkn
must implement all of the above extensions.
3.6.22. Zknd
ISA Extension
The Zknd
sub-extension is part of the RISC-V scalar cryptography ISA specification and adds NIST AES decryption instructions.
It is enabled by the top’s RISCV_ISA_Zknd
generic.
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_crypto.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
AES decryption |
|
6 |
3.6.23. Zkne
ISA Extension
The Zkne
sub-extension is part of the RISC-V scalar cryptography ISA specification and adds NIST AES encryption instructions.
It is enabled by the top’s RISCV_ISA_Zkne
generic.
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_crypto.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
AES decryption |
|
6 |
3.6.24. Zknh
ISA Extension
The Zknh
sub-extension is part of the RISC-V scalar cryptography ISA specification and adds NIST hash function instructions.
It is enabled by the top’s RISCV_ISA_Zknh
generic.
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_crypto.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
sha256 |
|
4 |
sha512 |
|
4 |
3.6.25. Zks
ISA Extension
The Zks
ISA extension is part of the RISC-V scalar cryptography ISA specification and defines the "ShangMi algorithm suite".
This ISA extension cannot be enabled by a specific generic. Instead, it is enabled if a specific set of cryptography-related
sub-extensions is enabled.
The Zks
extension is shorthand for the following set of other extensions:
-
Zbkb
ISA Extension - Bit manipulation instructions for cryptography. -
Zbkc
ISA Extension - Carry-less multiply instructions. -
Zbkx
ISA Extension - Cross-bar permutation instructions. -
Zksed
ISA Extension - SM4 block cipher instructions. -
Zksh
ISA Extension - SM3 hash function instructions.
A processor configuration which implements Zks
must implement all of the above extensions.
3.6.26. Zksed
ISA Extension
The Zksed
sub-extension is part of the RISC-V scalar cryptography ISA specification and adds ShangMi block cypher
and key schedule instructions. It is enabled by the top’s RISCV_ISA_Zksed
generic.
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_crypto.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
Block cyphers |
|
6 |
Key schedule |
|
6 |
3.6.27. Zksh
ISA Extension
The Zksh
sub-extension is part of the RISC-V scalar cryptography ISA specification and adds ShangMi hash function instructions.
It is enabled by the top’s RISCV_ISA_Zksh
generic.
This ISA extension is implemented as multi-cycle ALU co-processor (rtl/core/neorv32_cpu_cp_crypto.vhd
).
Class | Instructions | Execution cycles |
---|---|---|
Hash |
|
6 |
3.6.28. Zkt
ISA Extension
The Zkt
sub-extension is part of the RISC-V scalar cryptography ISA specification and guarantees data independent execution
times of cryptographic and cryptography-related instructions. The ISA extension cannot be enabled by a specific generic.
Instead, it is enabled implicitly by certain CPU configurations.
The RISC-V Zkt
specifications provides a list of instructions that are included within this specification.
However, not all instructions are required to be implemented. Rather, every one of these instructions that the
core does implement must adhere to the requirements of Zkt
.
Parent extension | Instructions | Data independent execution time? |
---|---|---|
|
|
yes |
|
yes if |
|
|
|
yes |
|
|
yes |
|
yes if |
|
|
|
yes |
|
|
yes |
|
yes if |
3.6.29. Zmmul
- ISA Extension
This is a sub-extension of the M
ISA Extension ISA extension. It implements only the multiplication operations
of the M
extensions and is intended for size-constrained setups that require hardware-based
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
Note that the Zmmul
- ISA Extension and M
ISA Extension are mutually exclusive.
3.6.30. Zxcfu
ISA Extension
The Zxcfu
presents a NEORV32-specific ISA extension. It adds the Custom Functions Unit (CFU) to
the CPU core, which allows to add custom RISC-V instructions to the processor core.
For detailed information regarding the CFU, its hardware and the according software interface
see section Custom Functions Unit (CFU).
Software can utilize the custom instructions by using intrinsics, which are basically inline assembly functions that behave like regular C functions but that evaluate to a single custom instruction word (no calling overhead at all).
CFU Execution Time
The actual CFU execution time depends on the logic being implemented. The CPU architecture requires a minimal execution
time of 3 cycles (purely combinatorial CFU operation) and automatically terminates execution after 512 cycles if the CFU
does not complete operation within this time window.
|
Class | Instructions | Execution cycles |
---|---|---|
Custom instructions |
Instruction words with |
3 … 3+512 |
3.6.31. Smpmp
ISA Extension
The NEORV32 physical memory protection (PMP) provides an elementary memory protection mechanism that can be used to constrain read, write and execute rights of arbitrary memory regions. The NEORV32 PMP is fully compatible to the RISC-V Privileged Architecture Specifications. In general, the PMP can grant permissions to user mode, which by default has none, and can revoke permissions from M-mode, which by default has full permissions. The PMP is configured via the Machine Physical Memory Protection CSRs.
Several Processor Top Entity - Generics are provided to fine-tune the CPU’s PMP capabilities:
* PMP_NUM_REGIONS
defines the number of implemented PMP region
* PMP_MIN_GRANULARITY
defines the minimal granularity of each region
* PMP_TOR_MODE_EN
controls the implementation of the top-of-region (TOR) mode
* PMP_NAP_MODE_EN
controls the implementation of the naturally-aligned-power-of-two (NA4 and NAPOT) modes
PMP Rules when in Debug Mode
When in debug-mode all PMP rules are ignored making the debugger have maximum access rights.
|
Protected Instruction Fetches
New instruction fetches are always triggered even when denied by a certain PMP rule. However, the fetched instruction(s)
will not be executed and will not change CPU core state. Instead, they will raise a bus exception when reaching the CPU’s
executions stage.
|
3.6.32. Sdext
ISA Extension
This ISA extension enables the RISC-V-compatible "external debug support" by implementing the CPU "debug mode", which is required for the on-chip debugger. See section On-Chip Debugger (OCD) / CPU Debug Mode for more information.
Class | Instructions | Execution cycles |
---|---|---|
System |
|
5 |
3.6.33. Sdtrig
ISA Extension
This ISA extension implements the RISC-V-compatible "trigger module". See section On-Chip Debugger (OCD) / Trigger Module for more information.
3.7. Custom Functions Unit (CFU)
The Custom Functions Unit (CFU) is the central part of the NEORV32-specific Zxcfu
ISA Extension and
represents the actual hardware module that can be used to implement custom RISC-V instructions.
The CFU is intended for operations that are inefficient in terms of performance, latency, energy consumption or program memory requirements when implemented entirely in software. Some potential application fields and exemplary use-cases might include:
-
AI: sub-word / vertical vector/SIMD operations like processing all four sub-bytes of a 32-bit data word individually
-
Cryptographic: bit substitution and permutation
-
Communication: data conversions like binary to gray-code
-
Arithmetic: BCD (binary-coded decimal) operations; multiply-add operations; shift-and-add algorithms like CORDIC or BKM
-
Image processing: look-up-tables for color space transformations
-
implementing instructions from other RISC-V ISA extensions that are not yet supported by NEORV32
The NEORV32 CFU supports two different instruction formats (R3-type and R4-type; see CFU Instruction Formats) and also allows to implement up to 4 CFU-internal custom control and status registers (see CFU Control and Status Registers (CFU-CSRs)).
CFU Complexity
The CFU is not intended for complex and CPU-independent functional units that implement complete accelerators
(like block-based AES encryption). These kind of accelerators should be implemented as memory-mapped co-processor via the
Custom Functions Subsystem (CFS) to allow CPU-independent operation. A comparative survey of all NEORV32-specific
hardware extension/customization options is provided in the user guide section
Adding Custom Hardware Modules.
|
Default CFU Hardware Example
The default CFU module (rtl/core/neorv32_cpu_cp_cfu.vhd ) implements the Extended Tiny Encryption Algorithm (XTEA)
as "real world" application example.
|
3.7.1. CFU Instruction Formats
The custom instructions executed by the CFU utilize a specific opcode space in the rv32
32-bit instruction
encoding space that has been explicitly reserved for user-defined extensions by the RISC-V specifications ("Guaranteed
Non-Standard Encoding Space"). The NEORV32 CFU uses the custom-0
and custom-1
opcodes to identify the instruction
implemented by the CFU and to differentiate between the predefined instruction formats.
The NEORV32 CFU utilizes these two opcodes to support user-defined R3-type instructions (2 source registers, 1 destination register) and R4-type instructions (3 source registers, 1 destination register). Both instruction formats are compliant to the RISC-V specification.
-
custom-0
:0001011
RISC-V standard, used for NEORV32 CFU R3-Type Instructions (3x register addresses) -
custom-1
:0101011
RISC-V standard, used for NEORV32 CFU R4-Type Instructions (4x register addresses)
The provided instructions formats are predefined to allow an easy integration framework. However, system designers are free to ignore these and use their own instruction types and formats. |
CFU R3-Type Instructions
The R3-type CFU instructions operate on two source registers rs1
and rs2
and return the processing result to
the destination register rd
. The actual operation can be defined by using the funct7
and funct3
bit fields.
These immediates can also be used to pass additional data to the CFU like offsets, look-up-tables addresses or
shift-amounts. However, the actual functionality is entirely user-defined. Note that all immediate values are
always compile-time-static.
Example operation: rd ⇐ rs1 xnor rs2
(bit-wise logical XNOR)
-
funct7
: 7-bit immediate (immediate data or function select) -
rs2
: address of second source register (providing 32-bit source data) -
rs1
: address of first source register (providing 32-bit source data) -
funct3
: 3-bit immediate (immediate data or function select) -
rd
: address of destination register (32-bit processing result) -
opcode
:0001011
(RISC-Vcustom-0
opcode)
Instruction encoding space
By using the funct7 and funct3 bit fields entirely for selecting the actual operation a total of 1024 custom
R3-type instructions can be implemented (7-bit + 3-bit = 10 bit → 1024 different values).
|
CFU R4-Type Instructions
The R4-type CFU instructions operate on three source registers rs1
, rs2
and rs2
and return the processing
result to the destination register rd
. The actual operation can be defined by using the funct3
bit field.
Alternatively, this immediate can also be used to pass additional data to the CFU like offsets, look-up-tables
addresses or shift-amounts. However, the actual functionality is entirely user-defined. Note that all immediate
values are always compile-time-static.
Example operation: rd ⇐ (rs1 * rs2 + rs3)[31:0]
(multiply-and-accumulate; "MAC")
-
rs3
: address of third source register (providing 32-bit source data) -
rs2
: address of second source register (providing 32-bit source data) -
rs1
: address of first source register (providing 32-bit source data) -
funct3
: 3-bit immediate (immediate data or function select) -
rd
: address of destination register (32-bit processing result) -
opcode
:0101011
(RISC-Vcustom-1
opcode) -
⚠️ bits [26:25] of the R4-type instruction word are unused. However, these bits are ignored by CPU’s instruction decoder and can be retrieved via the CFU’s
funct7_i(6 downto 5)
signal.
Instruction encoding space
By using the funct3 bit field entirely for selecting the actual operation a total of 8 custom R4-type
instructions can be implemented (3-bit → 8 different values).
|
Re-purposing R4-type instructions as additional R3-type instructions
Advanced user can use the custom-1 opcode to implement additional R3-type instructions instead of the
predefined r4-type instructions.
|
3.7.2. Using Custom Instructions in Software
The custom instructions provided by the CFU can be used in plain C code by using intrinsics. Intrinsics behave like "normal" C functions but under the hood they are a set of macros that hide the complexity of inline assembly, which is used to construct the custom 32-bit instruction words. Using intrinsics removes the need to modify the compiler, built-in libraries or the assembler when using custom instructions. Each intrinsic will be compiled into a single 32-bit instruction word without any overhead providing maximum code efficiency.
The NEORV32 software framework provides two pre-defined prototypes for custom instructions, which are defined in
sw/lib/include/neorv32_cpu_cfu.h
:
uint32_t neorv32_cfu_r3_instr(funct7, funct3, rs1, rs2); // R3-type instructions
uint32_t neorv32_cfu_r4_instr(funct3, rs1, rs2, rs3); // R4-type instructions
The intrinsic functions always return a 32-bit value of type uint32_t
(the processing result), which can be
discarded if not needed. Each intrinsic function requires several arguments depending on the instruction type/format:
-
funct7
- 7-bit immediate (R3-type) -
funct3
- 3-bit immediate (R3-type, R4-type) -
rs1
- source operand 1, 32-bit (R3-type, R4-type) -
rs2
- source operand 2, 32-bit (R3-type, R4-type) -
rs3
- source operand 3, 32-bit (R4-type)
The funct3
and funct7
bit-fields are used to pass 3-bit or 7-bit literals to the CFU. The rs1
, rs2
and
rs3
arguments pass the actual data to the CFU via register addresses. These register arguments can be populated
with variables or literals; the compiler will add the required code to move the data into a register before
passing it to the CFU. The following examples shows how to pass arguments:
uint32_t tmp = some_function();
...
uint32_t res = neorv32_cfu_r3_instr(0b0000000, 0b101, tmp, 123);
uint32_t foo = neorv32_cfu_r4_instr(0b011, tmp, res, (uint32_t)some_array[i]);
neorv32_cfu_r3_instr(0b0100100, 0b001, tmp, foo); // discard result
CFU Example Program
There is an example program for the CFU, which shows how to use the default CFU hardware module.
This example program is located in sw/example/demo_cfu .
|
3.7.3. CFU Control and Status Registers (CFU-CSRs)
The CPU provides up to four control and status registers (cfureg*
) to be used within the CFU.
These CSRs are mapped to the "custom user-mode read/write" CSR address space, which is explicitly reserved for
platform-specific application by the RISC-V spec. For example, these CSRs can be used to pass additional operands
to the CFU, to obtain additional results, to check processing status or to configure operation modes.
neorv32_cpu_csr_write(CSR_CFUREG0, 0xabcdabcd); // write data to CFU CSR 0
uint32_t tmp = neorv32_cpu_csr_read(CSR_CFUREG3); // read data from CFU CSR 3
Additional CFU-internal CSRs
If more than four CFU-internal CSRs are required the designer can implement an "indirect access mechanism" based
on just two of the default CSRs: one CSR is used to configure the index while the other is used as alias to exchange
data with the indexed CFU-internal CSR - this concept is similar to the RISC-V Indirect CSR Access Extension
Specification (Smcsrind ).
|
Security Considerations
The CFU CSRs are mapped to the user-mode CSR space so software running at any privilege level can access these
CSRs.
|
3.7.4. Custom Instructions Hardware
The actual functionality of the CFU’s custom instructions is defined by the user-defined logic inside the CFU
hardware module (rtl/core/neorv32_cpu_cp_cfu.vhd
). This file is highly commented to explain the interface and
to illustrate hardware design considerations.
CFU operations can be entirely combinatorial (like bit-reversal) so the result is available at the end of the current clock cycle. However, operations can also take several clock cycles to complete (like multiplications) and may also include internal states and memories.
CFU Hardware Resource Requirements
Enabling the CFU and actually implementing R4-type instructions (or more precisely, using the third register
source rs3 ) will add an additional read port to the core’s register file increasing resource requirements
of the register file by 50%.
|
CFU Execution Time
The CFU has to complete computation within a bound time window (default = 512 clock cycles). Otherwise,
the CFU operation is terminated by the CPU execution logic and an illegal instruction exception is raised. See section
CPU Arithmetic Logic Unit for more information.
|
CFU Exception
The CFU can intentionally raise an illegal instruction exception by not asserting the done at all causing an
execution timeout. For example this can be used to signal invalid configurations/operations to the runtime
environment. See the documentation in the CFU’s VHDL source file for more information.
|
3.8. Control and Status Registers (CSRs)
The following table shows a summary of all available NEORV32 CSRs. The address field defines the CSR address for
the CSR access instructions. The "Name [ASM]" column provides the CSR name aliases that can be used in (inline) assembly.
The "Name [C]" column lists the name aliases that are defined by the NEORV32 core library. These can be used in plain C code.
The "Access" column shows the minimal required privilege mode required for accessing the according CSR (M
= machine-mode,
U
= user-mode, D
= debug-mode) and the read/write capabilities (RW
= read-write, RO
= read-only)
Unused, Reserved, Unimplemented and Disabled CSRs
All CSRs and CSR bits that are not listed in the table below are unimplemented and are hardwired to zero. Additionally,
CSRs that are unavailable ("disabled") because the according ISA extension is not enabled are also considered unimplemented
and are also hardwired to zero. Any access to such a CSR will raise an illegal instruction exception. All writable CSRs provide
WARL behavior (write all values; read only legal values). Application software should always read back a CSR after writing
to check if the targeted bits can actually be modified.
|
Address | Name [ASM] | Name [C] | Access | Description |
---|---|---|---|---|
0x001 |
|
URW |
Floating-point accrued exceptions |
|
0x002 |
|
URW |
Floating-point dynamic rounding mode |
|
0x003 |
|
URW |
Floating-point control and status |
|
0x300 |
|
MRW |
Machine status register - low word |
|
0x301 |
|
MRW |
Machine CPU ISA and extensions |
|
0x304 |
|
MRW |
Machine interrupt enable register |
|
0x305 |
|
MRW |
Machine trap-handler base address for ALL traps |
|
0x306 |
|
MRW |
Machine counter-enable register |
|
0x310 |
|
MRW |
Machine status register - high word |
|
0x30a |
|
MRW |
Machine environment configuration register - low word |
|
0x31a |
|
MRW |
Machine environment configuration register - high word |
|
0x320 |
|
MRW |
Machine counter-inhibit register |
|
0x340 |
|
MRW |
Machine scratch register |
|
0x341 |
|
MRW |
Machine exception program counter |
|
0x342 |
|
MRW |
Machine trap cause |
|
0x343 |
|
MRW |
Machine trap value |
|
0x344 |
|
MRW |
Machine interrupt pending register |
|
0x34a |
|
MRW |
Machine trap instruction |
|
0x3a0 .. 0x303 |
|
MRW |
Physical memory protection configuration registers |
|
0x3b0 .. 0x3bf |
|
MRW |
Physical memory protection address registers |
|
0x7a0 |
|
MRW |
Trigger select register |
|
0x7a1 |
|
MRW |
Trigger data register 1 |
|
0x7a2 |
|
MRW |
Trigger data register 2 |
|
0x7a4 |
|
MRW |
Trigger information register |
|
0x7b0 |
- |
DRW |
Debug control and status register |
|
0x7b1 |
- |
DRW |
Debug program counter |
|
0x7b2 |
- |
DRW |
Debug scratch register 0 |
|
0x800 .. 0x803 |
|
URW |
Custom CFU registers 0 to 3 |
|
0xb00 |
|
MRW |
Machine cycle counter low word |
|
0xb02 |
|
MRW |
Machine instruction-retired counter low word |
|
0xb80 |
|
MRW |
Machine cycle counter high word |
|
0xb82 |
|
MRW |
Machine instruction-retired counter high word |
|
0xc00 |
|
URO |
Cycle counter low word |
|
0xc02 |
|
URO |
Instruction-retired counter low word |
|
0xc80 |
|
URO |
Cycle counter high word |
|
0xc82 |
|
URO |
Instruction-retired counter high word |
|
0x323 .. 0x32f |
|
MRW |
Machine performance-monitoring event select for counter 3..15 |
|
0xb03 .. 0xb0f |
|
MRW |
Machine performance-monitoring counter 3..15 low word |
|
0xb83 .. 0xb8f |
|
MRW |
Machine performance-monitoring counter 3..15 high word |
|
0xf11 |
|
MRO |
Machine vendor ID |
|
0xf12 |
|
MRO |
Machine architecture ID |
|
0xf13 |
|
MRO |
Machine implementation ID / version |
|
0xf14 |
|
MRO |
Machine hardware thread ID |
|
0xf15 |
|
MRO |
Machine configuration pointer register |
|
0xfc0 |
|
MRO |
NEORV32-specific "eXtended" machine CPU ISA and extensions |
3.8.1. Floating-Point CSRs
fflags
Name |
Floating-point accrued exceptions |
Address |
|
Reset value |
|
ISA |
|
Description |
FPU status flags. |
Bit | R/W | Function |
---|---|---|
0 |
r/w |
NX: inexact |
1 |
r/w |
UF: underflow |
2 |
r/w |
OF: overflow |
3 |
r/w |
DZ: division by zero |
4 |
r/w |
NV: invalid operation |
frm
Name |
Floating-point dynamic rounding mode |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Bit | R/W | Function |
---|---|---|
2:0 |
r/w |
Rounding mode |
fcsr
Name |
Floating-point control and status register |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Bit | R/W | Function |
---|---|---|
4:0 |
r/w |
Accrued exception flags ( |
7:5 |
r/w |
Rounding mode ( |
3.8.2. Machine Trap Setup CSRs
mstatus
Name |
Machine status register - low word |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Bit | Name [C] | R/W | Function |
---|---|---|---|
3 |
|
r/w |
MIE: Machine-mode interrupt enable flag |
7 |
|
r/w |
MPIE: Previous machine-mode interrupt enable flag state |
12:11 |
|
r/w |
MPP: Previous machine privilege mode, |
17 |
|
r/w |
MPRV: Effective privilege mode for load/stores; use |
21 |
|
r/w |
TW: Trap on execution of |
If the core is in user-mode, machine-mode interrupts are globally enabled even if mstatus.mie is cleared:
"Interrupts for higher-privilege modes, y>x, are always globally enabled regardless of the setting of the global yIE
bit for the higher-privilege mode." - RISC-V ISA Spec.
|
misa
Name |
ISA and extensions |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
The NEORV32 misa CSR is read-only. Hence, active CPU extensions are entirely defined by pre-synthesis configurations
and cannot be switched on/off during runtime. For compatibility reasons any write access to this CSR is simply ignored and
will not cause an illegal instruction exception.
|
Bit | Name [C] | R/W | Function |
---|---|---|---|
1 |
|
r/- |
B: CPU extension (bit-manipulation) available, set when |
2 |
|
r/- |
C: CPU extension (compressed instruction) available, set when |
4 |
|
r/- |
E: CPU extension (embedded) available, set when |
8 |
|
r/- |
I: CPU base ISA, cleared when |
12 |
|
r/- |
M: CPU extension (mul/div) available, set when |
20 |
|
r/- |
U: CPU extension (user mode) available, set when |
23 |
|
r/- |
X: bit is always set to indicate non-standard / NEORV32-specific extensions |
31:30 |
|
r/- |
MXL: 32-bit architecture indicator (always |
Machine-mode software can discover available Z* sub-extensions (like Zicsr or Zfinx ) by checking the NEORV32-specific
mxisa CSR.
|
mie
Name |
Machine interrupt-enable register |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Bit | Name [C] | R/W | Function |
---|---|---|---|
3 |
|
r/w |
MSIE: Machine software interrupt enable |
7 |
|
r/w |
MTIE: Machine timer interrupt enable (from Machine System Timer (MTIME)) |
11 |
|
r/w |
MEIE: Machine external interrupt enable |
31:16 |
|
r/w |
Fast interrupt channel 15..0 enable |
mtvec
Name |
Machine trap-handler base address |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Bit | R/W | Function |
---|---|---|
1:0 |
r/w |
MODE: mode configuration, |
31:2 |
r/w |
BASE: in DIRECT mode = 4-byte-aligned base address of trap base handler, all traps jump to |
Interrupt Latency
The vectored mtvec mode is useful for reducing the time between interrupt request (IRQ) and servicing it (ISR).
As software does not need to determine the interrupt cause the reduction in latency can be 5 to 10 times and as low as 26 cycles.
|
mcounteren
Name |
Machine counter enable |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Bit | Name [C] | R/W |
---|---|---|
Function |
0 |
|
r/w |
CY: User-mode is allowed to read |
1 |
- |
r/- |
TM: not implemented, hardwired to zero |
2 |
|
r/w |
IR: User-mode is allowed to read |
31:3 |
- |
mstatush
Name |
Machine status register - high word |
Address |
|
Reset value |
|
ISA |
|
Description |
The features of this CSR are not implemented yet. The register is read-only and always returns zero. |
3.8.3. Machine Trap Handling CSRs
mscratch
Name |
Scratch register for machine trap handlers |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
mepc
Name |
Machine exception program counter |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
mepc[0] is hardwired to zero. If IALIGN = 32 (i.e. C ISA Extension is disabled) then mepc[1] is also hardwired to zero.
|
mcause
Name |
Machine trap cause |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Bit | R/W | Function |
---|---|---|
4:0 |
r/w |
Exception code: see NEORV32 Trap Listing |
31 |
r/w |
Interrupt: |
mtval
Name |
Machine trap value |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Read-Only
Note that the NEORV32 mtval CSR is updated by the hardware only and cannot be written from software.
However, any write-access will be ignored and will not cause an exception to maintain RISC-V compatibility.
|
mip
Name |
Machine interrupt pending |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Bit | Name [C] | R/W | Function |
---|---|---|---|
3 |
|
r/- |
MSIP: Machine software interrupt pending; cleared by platform-defined mechanism |
7 |
|
r/- |
MTIP: Machine timer interrupt pending; cleared by platform-defined mechanism |
11 |
|
r/- |
MEIP: Machine external interrupt pending; cleared by platform-defined mechanism |
31:16 |
|
r/- |
FIRQxP: Fast interrupt channel 15..0 pending; cleared by platform-defined mechanism |
FIRQ Channel Mapping
See section NEORV32-Specific Fast Interrupt Requests for the mapping of the FIRQ channels and the according
interrupt-triggering processor module.
|
mtinst
Name |
Machine trap instruction |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Read-Only
Note that the NEORV32 mtinst CSR is updated by the hardware only and cannot be written from software.
However, any write-access will be ignored and will not cause an exception to maintain RISC-V compatibility.
|
Instruction Transformation
The RISC-V priv. spec. suggests that the instruction word written to mtinst by the hardware should be "transformed".
However, the NEORV32 mtinst CSR uses a simplified transformation scheme: if the trap-causing instruction is a
standard 32-bit instruction, mtinst contains the exact instruction word that caused the trap. If the trap-causing
instruction is a compressed instruction, mtinst contains the de-compressed 32-bit equivalent with bit 1 being cleared
while all remaining bits represent the pre-decoded 32-bit instruction equivalent.
|
3.8.4. Machine Configuration CSRs
menvcfg
Name |
Machine environment configuration register - low word |
Address |
|
Reset value |
|
ISA |
|
Description |
Currently, the features of this CSR are not supported. Hence, the entire register is hardwired to all-zero. |
menvcfgh
Name |
Machine environment configuration register - high word |
Address |
|
Reset value |
|
ISA |
|
Description |
Currently, the features of this CSR are not supported. Hence, the entire register is hardwired to all-zero. |
3.8.5. Machine Physical Memory Protection CSRs
The physical memory protection system is configured via the PMP_NUM_REGIONS
and PMP_MIN_GRANULARITY
top entity
generics. PMP_NUM_REGIONS
defines the total number of implemented regions. Note that the maximum number of regions
is constrained to 16. If trying to access a PMP-related CSR beyond PMP_NUM_REGIONS
no illegal instruction exception
is triggered. The according CSRs are read-only (writes are ignored) and always return zero.
See section Smpmp
ISA Extension for more information.
pmpcfg
Name |
PMP region configuration registers |
Address |
|
|
|
|
|
|
|
Reset value |
|
ISA |
|
Description |
Configuration of physical memory protection regions. Each region provides an individual 8-bit array in these CSRs. |
Bit | Name [C] | R/W | Function |
---|---|---|---|
0 |
|
r/w |
R: Read permission |
1 |
|
r/w |
W: Write permission |
2 |
|
r/w |
X: Execute permission |
4:3 |
|
r/w |
A: Mode configuration ( |
7 |
|
r/w |
L: Lock bit, prevents further write accesses, also enforces access rights in machine-mode, can only be cleared by CPU reset |
Implemented Modes
In order to reduce the CPU size certain PMP modes (A bits) can be excluded from synthesis.
Use the PMP_TOR_MODE_EN and PMP_NAP_MODE_EN Processor Top Entity - Generics to control
implementation of the according modes.
|
pmpaddr
The pmpaddr*
CSRs are used to configure the region’s address boundaries.
Name |
Physical memory protection address registers |
Address |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reset value |
|
ISA |
|
Description |
Region address configuration. The two MSBs of each CSR are hardwired to zero (= bits 33:32 of the physical address). |
3.8.6. Custom Functions Unit (CFU) CSRs
cfureg
Name |
Custom (user-defined) CFU CSRs |
Address |
|
|
|
|
|
|
|
Reset value |
|
ISA |
|
Description |
User-defined CSRs to be used within the Custom Functions Unit (CFU). |
3.8.7. (Machine) Counter and Timer CSRs
time[h] CSRs (Wall Clock Time)time[h] registers. Any access to these registers will trap.
It is recommended that the trap handler software provides a means of accessing the platform-defined Machine System Timer (MTIME).
|
Instruction Retired Counter Increment
The [m]instret[h] counter always increments when a instruction enters the pipeline’s execute stage no matter
if this instruction is actually going to retire or if it causes an exception.
|
cycle[h]
Name |
Cycle counter |
Address |
|
|
|
Reset value |
|
ISA |
|
Description |
The |
instret[h]
Name |
Instructions-retired counter |
Address |
|
|
|
Reset value |
|
ISA |
|
Description |
The |
mcycle[h]
Name |
Machine cycle counter |
Address |
|
|
|
Reset value |
|
ISA |
|
Description |
If not halted via the |
minstret[h]
Name |
Machine instructions-retired counter |
Address |
|
|
|
Reset value |
|
ISA |
|
Description |
If not halted via the |
Instruction Retiring
Note that all executed instruction do increment the [m]instret [h] counters even if they do not retire
(e.g. if the instruction causes an exception).
|
3.8.8. Hardware Performance Monitors (HPM) CSRs
Machine-Mode HPMs Only
Note that only the machine-mode hardware performance counter CSR are available (mhpmcounter*[h] ).
Accessing any user-mode HPM CSR (hpmcounter*[h] ) will raise an illegal instruction exception.
|
The actual number of implemented hardware performance monitors is configured via the HPM_NUM_CNTS
top entity generic,
Note that always all 13 HPM counter and configuration registers (mhpmcounter*[h]
) are implemented, but
only the actually configured ones are implemented as "real" physical registers - the remaining ones will be hardwired to zero.
If trying to access an HPM-related CSR beyond HPM_NUM_CNTS
no illegal instruction exception is
triggered. These CSRs are read-only, writes are ignored and reads always return zero.
The total counter width of the HPMs can be configured before synthesis via the HPM_CNT_WIDTH
generic (0..64-bit).
If HPM_NUM_CNTS
is less than 64, all remaining MSB-aligned bits are hardwired to zero.
mhpmevent
Name |
Machine hardware performance monitor event select |
Address |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reset value |
|
ISA |
|
Description |
The value in these CSRs define the architectural events that cause an increment of the according |
Bit | Name [C] | R/W | Event Description |
---|---|---|---|
RISC-V-compatible |
|||
0 |
|
r/w |
active clock cycle (CPU not in Sleep Mode) |
1 |
|
r/- |
not implemented, hardwired to zero |
2 |
|
r/w |
any executed instruction (16-bit/compressed or 32-bit/uncompressed) |
NEORV32-specific |
|||
3 |
|
r/w |
any executed 16-bit/compressed ( |
4 |
|
r/w |
instruction dispatch wait cycle (wait for instruction prefetch-buffer refill (CPU Control Unit IPB); caused by a fence instruction, a control flow transfer or a instruction fetch bus wait cycle) |
5 |
|
r/w |
any delay/wait cycle caused by a multi-cycle CPU Arithmetic Logic Unit operation |
6 |
|
r/w |
any executed branch instruction (unconditional, conditional-taken or conditional-not-taken) |
7 |
|
r/w |
any control transfer operation (unconditional jump, taken conditional branch or trap entry/exit) |
8 |
|
r/w |
any executed load operation (including atomic memory operations, |
9 |
|
r/w |
any executed store operation (including atomic memory operations, |
10 |
|
r/w |
any memory/bus/cache/etc. delay/wait cycle while executing any load or store operation (caused by a data bus wait cycle)) |
11 |
|
r/w |
starting processing of any trap (Traps, Exceptions and Interrupts) |
Instruction Retiring ("Retired == Executed")
The CPU HPM/counter logic treats all executed instruction as "retired" even if they raise an exception,
cause an interrupt, trigger a privilege mode change or were not meant to retire (i.e. claimed by the RISC-V spec.).
|
mhpmcounter[h]
Name |
Machine hardware performance monitor (HPM) counter |
Address |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reset value |
|
ISA |
|
Description |
If not halted via the |
3.8.9. Machine Counter Setup CSRs
mcountinhibit
Name |
Machine counter-inhibit register |
Address |
|
Reset value |
|
ISA |
|
Description |
Set bit to halt the according counter CSR. |
Bit | Name [C] | R/W | Description |
---|---|---|---|
0 |
|
r/w |
IR: Set to |
1 |
- |
r/- |
TM: Hardwired to zero as |
2 |
|
r/w |
CY: Set to |
15:3 |
|
r/w |
HPMx: Set to |
3.8.10. Machine Information CSRs
mvendorid
Name |
Machine vendor ID |
Address |
|
Reset value |
|
ISA |
|
Description |
Vendor ID (JEDEC identifier, lowest 11 bits), assigned via the |
marchid
Name |
Machine architecture ID |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
mimpid
Name |
Machine implementation ID |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
mhartid
Name |
Machine hardware thread ID |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
mconfigptr
Name |
Machine configuration pointer register |
Address |
|
Reset value |
|
ISA |
|
Description |
The features of this CSR are not implemented yet. The register is read-only and always returns zero. |
3.8.11. NEORV32-Specific CSRs
All NEORV32-specific CSRs are mapped to addresses that are explicitly reserved for custom Machine-Mode, read-only CSRs (assured by the RISC-V privileged specifications). Hence, these CSRs can only be accessed when in machine-mode. Any access outside of machine-mode will raise an illegal instruction exception. |
mxisa
Name |
Machine extended ISA and extensions register |
Address |
|
Reset value |
|
ISA |
|
Description |
The |
Bit | Name [C] | R/W | Description |
---|---|---|---|
0 |
|
r/- |
|
1 |
|
r/- |
|
2 |
|
r/- |
|
3 |
|
r/- |
|
4 |
|
r/- |
|
5 |
|
r/- |
|
6 |
|
r/- |
|
7 |
|
r/- |
|
8 |
|
r/- |
|
9 |
|
r/- |
|
10 |
|
r/- |
|
11 |
|
r/- |
|
12 |
|
r/- |
|
13 |
|
r/- |
|
14 |
|
r/- |
|
15 |
|
r/- |
|
16 |
|
r/- |
|
17 |
|
r/- |
|
18 |
|
r/- |
|
19 |
|
r/- |
|
20 |
|
r/- |
|
21 |
|
r/- |
|
22 |
|
r/- |
|
23 |
|
r/- |
|
24 |
|
r/- |
|
25 |
|
r/- |
|
27:26 |
- |
r/- |
reserved, hardwired to zero |
28 |
|
r/- |
full hardware reset of register file available when set ( |
29 |
|
r/- |
fast multiplication available when set ( |
30 |
|
r/- |
fast shifts available when set ( |
31 |
|
r/- |
set if CPU is being simulated (⚠️ not guaranteed) |
3.9. Traps, Exceptions and Interrupts
In this document the following terminology is used (derived from the RISC-V trace specification available at https://github.com/riscv-non-isa/riscv-trace-spec):
-
exception: an unusual condition occurring at run time associated (i.e. synchronous) with an instruction in a RISC-V hart
-
interrupt: an external asynchronous event that may cause a RISC-V hart to experience an unexpected transfer of control
-
trap: the transfer of control to a trap handler caused by either an exception or an interrupt
Whenever an exception or interrupt is triggered, the CPU switches to machine-mode (if not already in machine-mode)
and continues operation at the address being stored in the mtvec
CSR. The cause of the the trap can be determined via the
mcause
CSR. A list of all implemented mcause
values and the according description can be found below in section
NEORV32 Trap Listing. The address that reflects the current program counter when a trap was taken is stored to
mepc
CSR. Additional information regarding the cause of the trap can be retrieved from the mtval
and mtinst
CSRs.
The traps are prioritized. If several exceptions occur at once only the one with highest priority is triggered while all remaining exceptions are ignored and discarded. If several interrupts trigger at once, the one with highest priority is serviced first while the remaining ones stay pending. After completing the interrupt handler the interrupt with the second highest priority will get serviced and so on until no further interrupts are pending.
Interrupts when in User-Mode
If the core is currently operating in less privileged user-mode, interrupts are globally enabled
even if mstatus .mie is cleared.
|
Interrupt Signal Requirements - Standard RISC-V Interrupts
All interrupt request signals are high-active. Once triggered, a interrupt request line should stay high
until it is explicitly acknowledged by a source-specific mechanism (for example by writing to a specific memory-mapped register).
|
Instruction Atomicity and Forward-Progress
All instructions execute as atomic operations - interrupts can only trigger between consecutive instructions.
Additionally, if there is a permanent interrupt request, exactly one instruction from the interrupted program will be executed before
another interrupt handler can start. This allows program progress even if there are permanent interrupt requests.
|
3.9.1. Memory Access Exceptions
If a load operation causes any exception, the instruction’s destination register is not written at all. Furthermore, exceptions caused by a misaligned memory address a physical memory protection fault do not trigger a memory access request at all.
For 32-bit-only instructions (= no C
extension) the misaligned instruction exception is raised if bit 1 of the fetch
address is set (i.e. not on a 32-bit boundary). If the C
extension is implemented there will never be a misaligned
instruction exception at all.
3.9.2. Custom Fast Interrupt Request Lines
As a custom extension, the NEORV32 CPU features 16 fast interrupt request (FIRQ) lines via the firq_i
CPU top
entity signals. These interrupts have custom configuration and status flags in the mie
and mip
CSRs and also
provide custom trap codes in mcause
. These FIRQs are reserved for NEORV32 processor-internal usage only.
3.9.3. NEORV32 Trap Listing
The following tables show all traps that are currently supported by the NEORV32 CPU. It also shows the prioritization and the CSR side-effects.
Table Annotations
The "Prio." column shows the priority of each trap with the highest priority being 1. The "RTE Trap ID" aliases are
defined by the NEORV32 core library (the runtime environment RTE) and can be used in plain C code when interacting
with the pre-defined RTE function. The mcause
, mepc
, mtval
and mtinst
columns show the value being
written to the according CSRs when a trap is triggered:
-
I-PC - address of intercepted instruction (instruction has not been executed yet)
-
PC - address of instruction that caused the trap (instruction has been executed)
-
ADR - bad data memory access address that caused the trap
-
INS - the transformed/decompressed instruction word that caused the trap
-
0 - zero
Prio. | mcause |
RTE Trap ID | Cause | mepc |
mtval |
mtinst |
---|---|---|---|---|---|---|
Exceptions (synchronous to instruction execution) |
||||||
1 |
|
|
instruction access fault |
I-PC |
0 |
INS |
2 |
|
|
illegal instruction |
PC |
0 |
INS |
3 |
|
|
instruction address misaligned |
PC |
0 |
INS |
4 |
|
|
environment call from M-mode |
PC |
0 |
INS |
5 |
|
|
environment call from U-mode |
PC |
0 |
INS |
6 |
|
|
software breakpoint / trigger firing |
PC |
0 |
INS |
7 |
|
|
store address misaligned |
PC |
ADR |
INS |
8 |
|
|
load address misaligned |
PC |
ADR |
INS |
9 |
|
|
store access fault |
PC |
ADR |
INS |
10 |
|
|
load access fault |
PC |
ADR |
INS |
Interrupts (asynchronous to instruction execution) |
||||||
11 |
|
|
fast interrupt request channel 0 |
I-PC |
0 |
0 |
12 |
|
|
fast interrupt request channel 1 |
I-PC |
0 |
0 |
13 |
|
|
fast interrupt request channel 2 |
I-PC |
0 |
0 |
14 |
|
|
fast interrupt request channel 3 |
I-PC |
0 |
0 |
15 |
|
|
fast interrupt request channel 4 |
I-PC |
0 |
0 |
16 |
|
|
fast interrupt request channel 5 |
I-PC |
0 |
0 |
17 |
|
|
fast interrupt request channel 6 |
I-PC |
0 |
0 |
18 |
|
|
fast interrupt request channel 7 |
I-PC |
0 |
0 |
19 |
|
|
fast interrupt request channel 8 |
I-PC |
0 |
0 |
20 |
|
|
fast interrupt request channel 9 |
I-PC |
0 |
0 |
21 |
|
|
fast interrupt request channel 10 |
I-PC |
0 |
0 |
22 |
|
|
fast interrupt request channel 11 |
I-PC |
0 |
0 |
23 |
|
|
fast interrupt request channel 12 |
I-PC |
0 |
0 |
24 |
|
|
fast interrupt request channel 13 |
I-PC |
0 |
0 |
25 |
|
|
fast interrupt request channel 14 |
I-PC |
0 |
0 |
26 |
|
|
fast interrupt request channel 15 |
I-PC |
0 |
0 |
27 |
|
|
machine external interrupt (MEI) |
I-PC |
0 |
0 |
28 |
|
|
machine software interrupt (MSI) |
I-PC |
0 |
0 |
29 |
|
|
machine timer interrupt (MTI) |
I-PC |
0 |
0 |
Trap ID [C] | Triggered when … |
---|---|
|
bus timeout, bus access error or PMP rule violation during instruction fetch |
|
trying to execute an invalid instruction word (malformed or not supported) or on a privilege violation |
|
fetching a 32-bit instruction word that is not 32-bit-aligned (see note below) |
|
executing |
|
executing |
|
executing |
|
storing data to an address that is not naturally aligned to the data size (half/word) |
|
loading data from an address that is not naturally aligned to the data size (half/word) |
|
bus timeout, bus access error or PMP rule violation during load data operation |
|
bus timeout, bus access error or PMP rule violation during store data operation |
|
caused by interrupt-condition of processor-internal modules, see NEORV32-Specific Fast Interrupt Requests |
|
machine external interrupt (via dedicated Processor Top Entity - Signals) |
|
machine software interrupt (via dedicated Processor Top Entity - Signals) |
|
machine timer interrupt (internal Machine System Timer (MTIME) or via dedicated Processor Top Entity - Signals) |
Resumable Exceptions
Note that not all exceptions are resumable. For example, the "instruction access fault" exception or the "instruction
address misaligned" exception are not resumable in most cases. These exception might indicate a fatal memory hardware failure.
|
4. Software Framework
The NEORV32 project comes with a complete software ecosystem called the "software framework" which is based on the C-language RISC-V GCC port and consists of the following parts:
Software Documentation
All core libraries and example programs are documented "in-code" using Doxygen.
The documentation is automatically built and deployed to GitHub pages and is available online
at https://stnolting.github.io/neorv32/sw/files.html.
|
Example Programs
A collection of annotated example programs illustrating how to use certain CPU functions
and peripheral/IO modules can be found in sw/example .
|
4.1. Compiler Toolchain
The toolchain for this project is based on the free and open RISC-V GCC-port. You can find the compiler sources and build instructions in the official RISC-V GNU toolchain GitHub repository: https://github.com/riscv/riscv-gnutoolchain.
Toolchain Installation
More information regarding the toolchain (building from scratch or downloading prebuilt ones) can be found in the
user guide section Software Toolchain Setup.
|
4.2. Core Libraries
The NEORV32 project provides a set of pre-defined C libraries that allow an easy integration of the processor/CPU features
(also called "HAL" - hardware abstraction layer). All driver and runtime-related files are located in
sw/lib
. These library files are automatically included and linked by adding the following include statement:
#include <neorv32.h> // NEORV32 HAL, core and runtime libraries
The NEORV32 HAL consists of the following files.
C source file | C header file | Description |
---|---|---|
- |
|
Main NEORV32 library file |
|
|
General auxiliary/helper function |
|
|
|
|
|
|
|
|
|
|
|
Emulation functions for the read-modify-write |
|
Control and Status Registers (CSRs) definitions |
|
|
|
|
|
|
|
|
|
|
|
|
|
- |
|
Macros for intrinsics and custom instructions |
- |
|
Legacy compatibility layer / wrappers (do not use for new designs) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Primary Universal Asynchronous Receiver and Transmitter (UART0) and UART1 HAL |
|
|
|
|
|
|
|
|
|
|
- |
Platform-specific system calls for newlib |
Core Library Documentation
The Doxygen-based documentation of the software framework including all core libraries is available online at
https://stnolting.github.io/neorv32/sw/files.html.
|
4.3. System View Description File (SVD)
A CMSIS-SVD-compatible System View Description (SVD) file including all peripherals is available in sw/svd
.
4.4. Application Makefile
Application compilation is based on a centralized GNU makefile (sw/common/common.mk
). Each software project
(for example the ones in sw/example
folder) should provide a local makefile that just includes the central makefile:
# Set path to NEORV32 root directory
NEORV32_HOME ?= ../../..
# Include the main NEORV32 makefile
include $(NEORV32_HOME)/sw/common/common.mk
Thus, the functionality of the central makefile (including all targets) becomes available for the project. The project-local makefile should be used to define all setup-relevant configuration options instead of changing the central makefile to keep the code base clean. Setting variables in the project-local makefile will override the default configuration. Most example projects already provide a makefile that list all relevant configuration options.
The following example shows all relevant configuration variables:
# Override the default CPU ISA
MARCH = rv32imc_zicsr_zifencei
# Override the default RISC-V GCC prefix
RISCV_PREFIX ?= riscv-none-elf-
# Override default optimization goal
EFFORT = -Os
# Add extended debug symbols
USER_FLAGS += -ggdb -gdwarf-3
# Additional sources
APP_SRC += $(wildcard ./*.c)
APP_INC += -I .
# Adjust processor IMEM size
USER_FLAGS += -Wl,--defsym,__neorv32_rom_size=16k
# Adjust processor DMEM size
USER_FLAGS += -Wl,--defsym,__neorv32_ram_size=8k
# Adjust maximum heap size
USER_FLAGS += -Wl,--defsym,__neorv32_heap_size=1k
# Additional compiler flags (append to this variable)
#USER_FLAGS += ...
# Set path to NEORV32 root directory
NEORV32_HOME ?= ../../..
# Include the main NEORV32 makefile
include $(NEORV32_HOME)/sw/common/common.mk
New Project
When creating a new project, copy an existing project folder or at least the makefile to the new project folder.
It is recommended to create new projects also in sw/example to keep the file dependencies. However, these
dependencies can be manually configured via makefile variables if the new project is located somewhere else.
|
4.4.1. Makefile Targets
Invoking a project-local makefile (executing make
or make help
) will show the help menu that lists all
available targets as well as all variable including their current setting.
neorv32/sw/example/hello_world$ make
NEORV32 Software Makefile
Find more information at https://github.com/stnolting/neorv32
Targets:
help - show this text
check - check toolchain
info - show makefile/toolchain configuration
gdb - start GNU debugging session
asm - compile and generate <main.asm> assembly listing file for manual debugging
elf - compile and generate <main.elf> ELF file
exe - compile and generate <neorv32_exe.bin> executable image file for bootloader upload (includes a HEADER!)
bin - compile and generate <neorv32_raw_exe.bin> executable memory image
hex - compile and generate <neorv32_raw_exe.hex> executable memory image
coe - compile and generate <neorv32_raw_exe.coe> executable memory image
mem - compile and generate <neorv32_raw_exe.mem> executable memory image
mif - compile and generate <neorv32_raw_exe.mif> executable memory image
image - compile and generate VHDL IMEM application boot image <neorv32_application_image.vhd> in local folder
install - compile, generate and install VHDL IMEM application boot image <neorv32_application_image.vhd>
sim - in-console simulation using default/simple testbench and GHDL
hdl_lists - regenerate HDL file-lists (*.f) in NEORV32_HOME/rtl
all - exe + install + hex + bin + asm
elf_info - show ELF layout info
elf_sections - show ELF sections
clean - clean up project home folder
clean_all - clean up whole project, core libraries and image generator
bl_image - compile and generate VHDL BOOTROM bootloader boot image <neorv32_bootloader_image.vhd> in local folder
bootloader - compile, generate and install VHDL BOOTROM bootloader boot image <neorv32_bootloader_image.vhd>
Variables:
USER_FLAGS - Custom toolchain flags [append only]: "-ggdb -gdwarf-3 -Wl,--defsym,__neorv32_rom_size=16k -Wl,--defsym,__neorv32_ram_size=8k"
USER_LIBS - Custom libraries [append only]: ""
EFFORT - Optimization level: "-Os"
MARCH - Machine architecture: "rv32i_zicsr_zifencei"
MABI - Machine binary interface: "ilp32"
APP_INC - C include folder(s) [append only]: "-I ."
APP_SRC - C source folder(s) [append only]: "./main.c "
ASM_INC - ASM include folder(s) [append only]: "-I ."
RISCV_PREFIX - Toolchain prefix: "riscv32-unknown-elf-"
NEORV32_HOME - NEORV32 home folder: "../../.."
GDB_ARGS - GDB (connection) arguments: "-ex target extended-remote localhost:3333"
GHDL_RUN_FLAGS - GHDL simulation run arguments: ""
4.4.2. Default Compiler Flags
The central makefile uses specific compiler flags to tune the code to the NEORV32 hardware. Hence, these flags should not be altered. However, experienced users can modify them to further tune compilation.
|
Enable all compiler warnings. |
|
Put functions in independent sections. This allows a code optimization as dead code can be easily removed. |
|
Put data segment in independent sections. This allows a code optimization as unused data can be easily removed. |
|
Do not use the default start code. Instead, the NEORV32-specific start-up code ( |
|
Use built-in software functions for floating-point divisions and square roots (since the according instructions are not supported yet). |
|
Unaligned memory accesses cannot be resolved by the hardware and require emulation. |
|
Branching costs a lot of cycles. |
|
Make the linker perform dead code elimination. |
|
Disable floating-point expression contraction. |
|
Add (simple) debug information. |
Checking Compiler Flags from a Compiled Program
The makefile’s CC_OPTS is exported as define to be available within a C program; for example
neorv32_uart0_printf("%s\n", CC_OPTS); .
|
|
Include/link with |
|
Search for the standard C library when linking. |
|
Make sure we have no unresolved references to internal GCC library subroutines. |
Advanced Debug Symbols
By default, only "simple" symbols are added to the ELF (-g ). Extended debug flags (e.g. for Eclipse) can be added
using the USER_FLAGS variable (e.g. USER_FLAGS += -ggdb -gdwarf-3 ). Note that other debug flags may be required
depending of the GCC/GDB version
|
4.5. Linker Script
The NEORV32-specific linker script (sw/common/neorv32.ld
) is used to link the compiled sources according to the
processor’s Address Space). For the final executable, only two memory segments are required:
Memory section | Description |
---|---|
|
Instruction memory address space (processor-internal Instruction Memory (IMEM) and/or external memory) |
|
Data memory address space (processor-internal Data Memory (DMEM) and/or external memory) |
These two sections are configured by several variables defined in the linker script and exposed to the build framework (aka the makefile). Those variable allow to customized the RAM/ROM sizes and base addresses. Additionally, a certain amount of the RAM can be reserved for the software-managed heap (see RAM Layout).
Memory section | Description | Default |
---|---|---|
|
"ROM" size (instruction memory / IMEM) |
16kB |
|
"RAM" size (data memory / DMEM) |
8kB |
|
"ROM" base address (instruction memory / IMEM) |
|
|
"RAM" base address (data memory / DMEM) |
|
|
Maximum heap size; part of the "RAM" |
0kB |
Each variable provides a default value (e.g. "16K" for the instruction memory /ROM /IMEM size). These defaults can
be overridden by setup-specific values to take the user-defined processor configuration into account (e.g. a different IMEM
size). The USER_FLAGS
variable provided by the Application Makefile can also be used to customize the memory
configuration. For example, the following line can be added to a project-specific local makefile to adjust the memory
sizes:
USER_FLAGS += "-Wl,--defsym,__neorv32_rom_size=64k -Wl,--defsym,__neorv32_ram_size=32k"
Memory Configuration Constraints
Memory sizes have to be a multiple of 4 bytes. Memory base addresses have to be 32-bit-aligned.
|
4.5.1. RAM Layout
The default NEORV32 linker script uses the defined RAM size to map several sections. Note that depending on the application some sections might have zero size.
-
Constant data (
.data
): The constant data section is placed right at the beginning of the RAM. For example, this section contains explicitly initialized global variables. This section is initialized by the Start-Up Code (crt0). -
Dynamic data (
.bss
): The constant data section is followed by the dynamic data section that contains uninitialized data like global variables without explicit initialization. This section is cleared by the Start-Up Code (crt0). -
Heap (
.heap
): The heap is used for dynamic memory that is managed by functions likemalloc()
andfree()
. The heap grows upwards. This section is not initialized at all. -
Stack: The stack starts at the end of the RAM at the last 16-byte aligned address. According to the RISC-V ABI / calling convention the stack is 128-bit-aligned before procedure entry. The stack grows downwards.
Heap Size
The maximum size of the heap is defined by the __neorv32_heap_size variable. This variable has to be
explicitly defined in order to define a heap size (and to use dynamic memory allocation at all) other than zero.
|
Heap-Stack Collision
Take care when using dynamic memory to avoid collision of the heap and stack memory areas. There is no compile-time
protection mechanism available as the actual heap and stack size are defined by runtime data.
|
4.6. C Standard Library
The default software framework relies on newlib as default C standard library. Newlib provides hooks for common
"system calls" (like file handling and standard input/output) that are used by other C libraries like stdio
.
These hooks are available in sw/lib/source/newlib.c
and were adapted for the NEORV32 processor.
Standard Consoles
The UART0
is used to implement all the standard input, output and error consoles (STDIN , STDOUT and STDERR ).
Note that \n (newline) is automatically converted to \r\n (carriage-return and newline).
|
Constructors and Destructors
Constructors and destructors for plain C code or for C++ applications are supported by the software framework.
See sw/example/hello_cpp for a minimal example.
|
Newlib Test/Demo Program
A simple test and demo program that uses some of newlib’s system functions (like malloc /free and read /write )
is available in sw/example/demo_newlib .
|
4.7. Start-Up Code (crt0)
The CPU and also the processor require a minimal start-up and initialization code to bring the hardware into an
operational state. Furthermore, the C runtime requires an initialization before compiled code can be executed.
This setup is done by the start-up code (sw/common/crt0.S
) which is automatically linked with every application
program and gets mapped before the actual application code so it gets executed right after boot.
The crt0.S
start-up performs the following operations:
-
Clear
mstatus
CSR. -
Clear
mie
CSR disabling all interrupt sources. -
Install an Early Trap Handler to
mtvec
CSR. -
Initialize the global pointer
gp
and the stack pointersp
according to the RAM Layout provided by the linker script. -
Initialize all integer register
x1
-x31
(onlyx1
-x15
if theE
CPU extension is enabled). -
Setup
.data
section to configure initialized variables. -
Clear the
.bss
section. -
Call all constructors (if there are any).
-
Call the application’s
main()
function (with no arguments;argc
=argv
= 0). -
If
main()
returns:-
All interrupt sources are disabled by clearing
mie
CSR. -
The return value of
main()
is copied to themscratch
CSR to allow inspection by the debugger. -
Call all destructors (if there are any).
-
Re-install an Early Trap Handler to
mtvec
CSR. If any destructor causes an exception the Early Trap Handler is used for handling. -
The CPU enters sleep mode executing the
wfi
instruction in an endless loop.
-
4.7.1. Early Trap Handler
The start-up code provides a very basic trap handler for the early boot phase. This handler does nothing but
trying to move on to the next linear instruction whenever an interrupt or synchronous exception is encountered.
This simple trap handler does not interact with the stack at all as it just uses a single register that is backup-ed
using the mscratch
CSR.
4.8. Executable Image Formats
The compiled and linked executable (ELF file) is further processed by the NEORV32 image generator (sw/image_gen
) to
generate the final executable file. The image generator can generate several types of executable file formats selected
by a flag when calling the generator.
Note that all these options are managed by the makefile (see Makefile Targets).
|
Generates an executable binary file (including a bootloader header) for upload via the bootloader. |
|
Generates an executable VHDL memory initialization image for the processor-internal IMEM. |
|
Generates an executable VHDL memory initialization image for the processor-internal BOOT ROM. |
|
Generates a raw 8x ASCII hex-char file for custom purpose. |
|
Generates a raw binary file `for custom purpose. |
|
Generates a raw COE file for FPGA memory initialization. |
|
Generates a raw MEM file for FPGA memory initialization. |
|
Generates a raw MIF file for FPGA memory initialization. |
Image Generator Compilation
The sources of the image generator are automatically compiled when invoking the makefile
(requiring a native GCC installation).
|
Executable Header
for the app_bin option the image generator adds a small header to the executable. This header is required by the
Bootloader to identify and manage the executable. The header consists of three 32-bit words located right
at the beginning of the file. The first word of the executable is the signature word and is always 0x4788cafe .
Based on this word the bootloader can identify a valid image file. The next word represents the size in bytes of the
actual program image in bytes. A simple complement checksum of the actual program image is given by the third word.
This provides a simple protection against data transmission or storage errors.
Note that this executable format cannot be used for direct execution (e.g. via XIP or direct memory access).
|
4.9. Bootloader
Pre-Built Bootloader Image
This section refers to the default NEORV32 bootloader. A pre-compiled memory image for the processor-internal
Bootloader ROM (BOOTROM) is available in the project’s rtl folder: rtl/core/neorv32_bootloader_image.vhd .
This image is automatically inserted into the boot ROM when synthesizing the processor with the bootloader being
enabled. Note that the default bootloader image was compiled for a minimal rv32i + priv. ISA!
|
The NEORV32 bootloader (sw/bootloader/bootloader.c
) provides an optional built-in firmware that
allows to upload new application executables at any time without the need to re-synthesize the FPGA’s bitstream.
A UART connection is used to provide a simple text-based user interface that allows to upload executables.
Furthermore, the bootloader provides options to store an executable to a processor-external SPI flash. An "auto boot" feature can optionally fetch this executable right after reset if there is no user interaction via UART. This allows to build processor setups with non-volatile application storage while maintaining the option to update the application software at any timer.
4.9.1. Bootloader SoC/CPU Requirements
The bootloader requires certain CPU and SoC extensions and modules to be enabled in order to operate correctly.
REQUIRED |
The Boot Configuration ( |
REQUIRED |
The bootloader requires the privileged architecture CPU extension ( |
REQUIRED |
At least 512 bytes of data memory (processor-internal DMEM or processor-external DMEM) are required for the bootloader’s stack and global variables. |
RECOMMENDED |
For user interaction via the Bootloader Console (like uploading executables) the primary UART (Primary Universal Asynchronous Receiver and Transmitter (UART0)) is required. |
RECOMMENDED |
The default bootloader uses bit 0 of the General Purpose Input and Output Port (GPIO) output port to drive a high-active "heart beat" status LED. |
RECOMMENDED |
The Machine System Timer (MTIME) is used to control blinking of the status LED and also to automatically trigger the Auto Boot Sequence. |
OPTIONAL |
The SPI controller (Serial Peripheral Interface Controller (SPI)) is needed to store/load executable from external flash using the Auto Boot Sequence. |
OPTIONAL |
The XIP controller (Execute In Place Module (XIP)) is needed to boot/execute code directly from a pre-programmed SPI flash. |
4.9.2. Bootloader Flash Requirements
The bootloader can access an SPI-compatible flash via the processor’s top entity SPI port. By default, the flash
chip-select line is driven by spi_csn_o(0)
and the SPI clock uses 1/8 of the processor’s main clock as clock frequency.
The SPI flash has to support single-byte read and write operations, 24-bit addresses and at least the following standard commands:
-
0x02
: Program page (write byte) -
0x03
: Read data (byte) -
0x04
: Write disable (for volatile status register) -
0x05
: Read (first) status register -
0x06
: Write enable (for volatile status register) -
0xAB
: Wake-up from sleep mode (optional) -
0xD8
: Block erase (64kB)
Custom Configuration
Most properties (like chip select line, flash address width, SPI clock frequency, …) of the default bootloader can be reconfigured
without the need to change the source code. Custom configuration can be made using command line switches (defines) when recompiling
the bootloader. See the User Guide https://stnolting.github.io/neorv32/ug/#_customizing_the_internal_bootloader for more information.
|
4.9.3. Bootloader Console
To interact with the bootloader, connect the primary UART (UART0) signals (uart0_txd_o
and uart0_rxd_o
) of the processor’s top
entity via a serial port (-adapter) to your computer (hardware flow control is not used so the according interface signals can be
ignored), configure your terminal program using the following settings and perform a reset of the processor.
Terminal console settings (19200-8-N-1
):
-
19200 Baud
-
8 data bits
-
no parity bit
-
1 stop bit
-
newline on
\r\n
(carriage return, newline) -
no transfer protocol / control flow protocol - just raw bytes
Terminal Program
Any terminal program that can connect to a serial port should work. However, make sure the program
can transfer data in raw byte mode without any protocol overhead (e.g. XMODEM). Some terminal programs struggle with
transmitting files larger than 4kB (see https://github.com/stnolting/neorv32/pull/215). Try a different terminal program
if uploading of a binary does not work.
|
The bootloader uses the LSB of the top entity’s gpio_o
output port as high-active status LED. All other
output pins are set to low level and won’t be altered. After reset, the status LED will start blinking at 2Hz and the
following intro screen shows up:
<< NEORV32 Bootloader >>
BLDV: Mar 7 2023
HWV: 0x01080107
CLK: 0x05f5e100
MISA: 0x40901106
XISA: 0xc0000fab
SOC: 0xffff402f
IMEM: 0x00008000
DMEM: 0x00002000
Autoboot in 10s. Press any key to abort.
The start-up screen gives some brief information about the bootloader and several system configuration parameters:
|
Bootloader version (built date). |
|
Processor hardware version (the |
|
Processor clock speed in Hz (via the |
|
RISC-V CPU extensions ( |
|
NEORV32-specific CPU extensions ( |
|
Processor configuration (via the |
|
Internal IMEM size in byte (via the |
|
Internal DMEM size in byte (via the |
Now you have 10 seconds to press any key. Otherwise, the bootloader starts the Auto Boot Sequence. When you press any key within the 10 seconds, the actual bootloader user console starts:
<< NEORV32 Bootloader >>
BLDV: Mar 7 2023
HWV: 0x01080107
CLK: 0x05f5e100
MISA: 0x40901106
XISA: 0xc0000fab
SOC: 0xffff402f
IMEM: 0x00008000
DMEM: 0x00002000
Autoboot in 10s. Press any key to abort. (1)
Aborted.
Available CMDs:
h: Help
r: Restart
u: Upload
s: Store to flash
l: Load from flash
x: Boot from flash (XIP)
e: Execute
CMD:>
1 | Auto boot sequence aborted due to user console input. |
The auto boot countdown is stopped and the bootloader’s user console is ready to receive one of the following commands:
-
h
: Show the help text (again) -
r
: Restart the bootloader and the auto-boot sequence -
u
: Upload new program executable (neorv32_exe.bin
) via UART into the instruction memory -
s
: Store executable to SPI flash atspi_csn_o(0)
(little-endian byte order) -
l
: Load executable from SPI flash atspi_csn_o(0)
(little-endian byte order) -
x
: Boot program directly from flash via XIP (requires a pre-programmed image) -
e
: Start the application, which is currently stored in the instruction memory (IMEM)
A new executable can be uploaded via UART by executing the u
command. After that, the executable can be directly
executed via the e
command. To store the recently uploaded executable to an attached SPI flash press s
. To
directly load an executable from the SPI flash press l
. The bootloader and the auto-boot sequence can be
manually restarted via the r
command.
Executable Upload
Make sure to upload the NEORV32 executable neorv32_exe.bin . Uploading any other file (like main.bin )
will cause an ERR_EXE bootloader error (see Bootloader Error Codes).
|
Booting via XIP
The bootloader allows to execute an application right from flash using the Execute In Place Module (XIP) module.
This requires a pre-programmed flash. The bootloader’s "store" option can not be used to program an XIP image.
|
SPI Flash Power Down Mode
The bootloader will issue a "wake-up" command prior to using the SPI flash to ensure it is not
in sleep mode / power-down mode (see https://github.com/stnolting/neorv32/pull/552).
|
Default Configuration
More information regarding the default SPI, GPIO, XIP, etc. configuration can be found in the User Guide
section https://stnolting.github.io/neorv32/ug/#_customizing_the_internal_bootloader.
|
SPI Flash Programming
For detailed information on using an SPI flash for application storage see User Guide section
Programming an External SPI Flash via the Bootloader.
|
4.9.4. Auto Boot Sequence
When you reset the NEORV32 processor, the bootloader waits 8 seconds for a UART console input before it
starts the automatic boot sequence. This sequence tries to fetch a valid boot image from the external SPI
flash, connected to SPI chip select spi_csn_o(0)
. If a valid boot image is found that can be successfully
transferred into the instruction memory, it is automatically started. If no SPI flash is detected or if there
is no valid boot image found, and error code will be shown.
4.9.5. Bootloader Error Codes
If something goes wrong during bootloader operation an error code and a short message is shown. In this case the processor is halted (entering Sleep Mode), the bootloader status LED is permanently activated and the processor has to be reset manually.
In many cases the error source is just temporary (like some HF spike during an UART upload). Just try again. |
|
If you try to transfer an invalid executable (via UART or from the external SPI flash), this error message shows up. There might be a transfer protocol configuration error in the terminal program or maybe just the wrong file was selected. Also, if no SPI flash was found during an auto-boot attempt, this message will be displayed. |
|
Your program is way too big for the internal processor’s instructions memory. Increase the memory size or reduce your application code. |
|
This indicates a checksum error. Something went wrong during the transfer of the program image (upload via UART or loading from the external SPI flash). If the error was caused by a UART upload, just try it again. When the error was generated during a flash access, the stored image might be corrupted. |
|
This error occurs if the attached SPI flash cannot be accessed. Make sure you have the right type of flash and that it is properly connected to the NEORV32 SPI port using chip select #0. |
|
The bootloader encountered an unexpected exception during operation. This might be caused when it tries to access peripherals that were not implemented during synthesis. Example: executing commands |
4.10. NEORV32 Runtime Environment
The NEORV32 software framework provides a minimal runtime environment (abbreviated "RTE") that takes care of a stable and safe execution environment by handling all traps (exceptions & interrupts). The RTE simplifies trap handling by wrapping the CPU’s privileged architecture (i.e. trap-related CSRs) into a unified software API.
Once initialized, the RTE provides Default RTE Trap Handlers that catch all possible traps. These default handlers just output a message via UART to inform the user when a certain trap has been triggered. The default handlers can be overridden by the application code to install application-specific handler functions for each trap.
Using the RTE is optional but highly recommended. The RTE provides a simple and comfortable way of delegating traps to application-specific handlers while making sure that all traps (even though they are not explicitly used by the application) are handled correctly. Performance-optimized applications or embedded operating systems may not use the RTE at all in order to increase response time. |
4.10.1. RTE Operation
The RTE manages the trap-related CSRs of the CPU’s privileged architecture (Machine Trap Handling CSRs).
It initializes the mtvec
CSR in DIRECT mode, which then provides the base entry point for all traps. The address
stored to this register defines the address of the first-level trap handler, which is provided by the
NEORV32 RTE. Whenever an exception or interrupt is triggered this first-level trap handler is executed.
The first-level handler performs a complete context save, analyzes the source of the trap and calls the according second-level trap handler, which takes care of the actual exception/interrupt handling. The RTE manages a private look-up table to store the addresses of the according second-level trap handlers.
After the initial RTE setup, each entry in the RTE’s trap handler look-up table is initialized with a Default RTE Trap Handlers. These default handler do not execute any trap-related operations - they just output a message via the primary UART (UART0) to inform the user that a trap has occurred, which is not (yet) handled by the actual application. After sending this message, the RTE tries to continue executing the actual program by resolving the trap cause.
4.10.2. Using the RTE
All provided RTE functions can be called only from machine-mode code. |
The NEORV32 is part of the default NEORV32 software framework. However, it has to explicitly enabled by calling the RTE’s setup function:
void neorv32_rte_setup(void);
The RTE should be enabled right at the beginning of the application’s main function.
|
It is recommended to not use the mscratch CSR when using the RTE as this register is used to provide services
for Application Context Handling (i.e. modifying the registers of application code that caused a trap).
|
As mentioned above, all traps will just trigger execution of the RTE’s Default RTE Trap Handlers at first. To use application-specific handlers, which actually "handle" a trap, the default handlers can be overridden by installing user-defined ones:
int neorv32_rte_handler_install(uint8_t id, void (*handler)(void));
The first argument id
defines the "trap ID" (for example a certain interrupt request) that shall be handled
by the user-defined handler. These IDs are defined in sw/lib/include/neorv32_rte.h
:
enum NEORV32_RTE_TRAP_enum {
RTE_TRAP_I_MISALIGNED = 0, /**< Instruction address misaligned */
RTE_TRAP_I_ACCESS = 1, /**< Instruction (bus) access fault */
RTE_TRAP_I_ILLEGAL = 2, /**< Illegal instruction */
RTE_TRAP_BREAKPOINT = 3, /**< Breakpoint (EBREAK instruction) */
RTE_TRAP_L_MISALIGNED = 4, /**< Load address misaligned */
RTE_TRAP_L_ACCESS = 5, /**< Load (bus) access fault */
RTE_TRAP_S_MISALIGNED = 6, /**< Store address misaligned */
RTE_TRAP_S_ACCESS = 7, /**< Store (bus) access fault */
RTE_TRAP_UENV_CALL = 8, /**< Environment call from user mode (ECALL instruction) */
RTE_TRAP_MENV_CALL = 9, /**< Environment call from machine mode (ECALL instruction) */
RTE_TRAP_MSI = 10, /**< Machine software interrupt */
RTE_TRAP_MTI = 11, /**< Machine timer interrupt */
RTE_TRAP_MEI = 12, /**< Machine external interrupt */
RTE_TRAP_FIRQ_0 = 13, /**< Fast interrupt channel 0 */
RTE_TRAP_FIRQ_1 = 14, /**< Fast interrupt channel 1 */
RTE_TRAP_FIRQ_2 = 15, /**< Fast interrupt channel 2 */
RTE_TRAP_FIRQ_3 = 16, /**< Fast interrupt channel 3 */
RTE_TRAP_FIRQ_4 = 17, /**< Fast interrupt channel 4 */
RTE_TRAP_FIRQ_5 = 18, /**< Fast interrupt channel 5 */
RTE_TRAP_FIRQ_6 = 19, /**< Fast interrupt channel 6 */
RTE_TRAP_FIRQ_7 = 20, /**< Fast interrupt channel 7 */
RTE_TRAP_FIRQ_8 = 21, /**< Fast interrupt channel 8 */
RTE_TRAP_FIRQ_9 = 22, /**< Fast interrupt channel 9 */
RTE_TRAP_FIRQ_10 = 23, /**< Fast interrupt channel 10 */
RTE_TRAP_FIRQ_11 = 24, /**< Fast interrupt channel 11 */
RTE_TRAP_FIRQ_12 = 25, /**< Fast interrupt channel 12 */
RTE_TRAP_FIRQ_13 = 26, /**< Fast interrupt channel 13 */
RTE_TRAP_FIRQ_14 = 27, /**< Fast interrupt channel 14 */
RTE_TRAP_FIRQ_15 = 28 /**< Fast interrupt channel 15 */
The second argument *handler
is the actual function that implements the user-defined trap handler.
The custom handler functions need to have a specific format without any arguments and with no return value:
void custom_trap_handler_xyz(void) {
// handle trap...
}
Custom Trap Handler Attributes
Do NOT use the interrupt attribute for the application trap handler functions! This
will place a mret instruction to the end of it making it impossible to return to the first-level
trap handler of the RTE core, which will cause stack corruption.
|
The following example shows how to install a custom handler (custom_mtime_irq_handler
) for handling
the RISC-V machine timer (MTIME) interrupt:
neorv32_rte_handler_install(RTE_TRAP_MTI, custom_mtime_irq_handler);
User-defined trap handlers can also be un-installed. This will remove the users trap handler from the RTE core and will re-install the Default RTE Trap Handlers for the specific trap.
int neorv32_rte_handler_uninstall(uint8_t id);
The argument id
defines the identifier of the according trap that shall be un-installed.
The following example shows how to un-install the custom handler custom_mtime_irq_handler
from the
RISC-V machine timer (MTIME) interrupt:
neorv32_rte_handler_uninstall(RTE_TRAP_MTI);
The current RTE configuration can be printed via UART0 via the neorv32_rte_info function.
|
4.10.3. Default RTE Trap Handlers
The default RTE trap handlers are executed when a certain trap is triggered that is not (yet) handled by an application-defined trap handler. The default handler will output a message giving additional debug information via the Primary Universal Asynchronous Receiver and Transmitter (UART0) to inform the user and it will also try to resume normal program execution. Some exemplary RTE outputs are shown below.
Continuing Execution
In most cases the RTE can successfully continue operation - for example if it catches an interrupt request
that is not handled by the actual application program. However, if the RTE catches an un-handled trap like
a bus access fault exception continuing execution will most likely fail making the CPU crash. Some exceptions
cannot be resolved by the default debug trap handlers and will halt the CPU (see example below).
|
<NEORV32-RTE> [M] Illegal instruction @ PC=0x000002d6, MTINST=0x000000FF, MTVAL=0x00000000 </NEORV32-RTE> (1)
<NEORV32-RTE> [U] Illegal instruction @ PC=0x00000302, MTINST=0x00000000, MTVAL=0x00000000 </NEORV32-RTE> (2)
<NEORV32-RTE> [U] Load address misaligned @ PC=0x00000440, MTINST=0x01052603, MTVAL=0x80000101 </NEORV32-RTE> (3)
<NEORV32-RTE> [M] Fast IRQ 0x00000003 @ PC=0x00000820, MTINST=0x00000000, MTVAL=0x00000000 </NEORV32-RTE> (4)
<NEORV32-RTE> [M] Instruction access fault @ PC=0x90000000, MTINST=0x42078b63, MTVAL=0x00000000 !!FATAL EXCEPTION!! Halting CPU. </NEORV32-RTE>\n (5)
1 | Illegal 32-bit instruction MTINST=0x000000FF at address PC=0x000002d6 while the CPU was in machine-mode ([M] ). |
2 | Illegal 16-bit instruction MTINST=0x00000000 at address PC=0x00000302 while the CPU was in user-mode ([U] ). |
3 | Misaligned load access at address PC=0x00000440 caused by instruction MTINST=0x01052603 (trying to load a full 32-bit word from address MTVAL=0x80000101 ) while the CPU was in machine-mode ([U] ). |
4 | Fast interrupt request from channel 3 before executing instruction at address PC=0x00000820 while the CPU was in machine-mode ([M] ). |
5 | Instruction bus access fault at address PC=0x90000000 while executing instruction MTINST=0x42078b63 - this is fatal for the default debug trap handler while the CPU was in machine-mode ([M] ). |
The specific message right at the beginning of the debug trap handler message corresponds to the trap code
obtained from the mcause
CSR (see NEORV32 Trap Listing). A full list of all messages and the according
mcause
trap codes is shown below.
Trap identifier | According mcause CSR value |
---|---|
"Instruction address misaligned" |
|
"Instruction access fault" |
|
"Illegal instruction" |
|
"Breakpoint" |
|
"Load address misaligned" |
|
"Load access fault" |
|
"Store address misaligned" |
|
"Store access fault" |
|
"Environment call from U-mode" |
|
"Environment call from M-mode" |
|
"Machine software IRQ" |
|
"Machine timer IRQ" |
|
"Machine external IRQ" |
|
"Fast IRQ 0x00000000" |
|
"Fast IRQ 0x00000001" |
|
"Fast IRQ 0x00000002" |
|
"Fast IRQ 0x00000003" |
|
"Fast IRQ 0x00000004" |
|
"Fast IRQ 0x00000005" |
|
"Fast IRQ 0x00000006" |
|
"Fast IRQ 0x00000007" |
|
"Fast IRQ 0x00000008" |
|
"Fast IRQ 0x00000009" |
|
"Fast IRQ 0x0000000a" |
|
"Fast IRQ 0x0000000b" |
|
"Fast IRQ 0x0000000c" |
|
"Fast IRQ 0x0000000d" |
|
"Fast IRQ 0x0000000e" |
|
"Fast IRQ 0x0000000f" |
|
"Unknown trap cause" |
undefined |
4.10.4. Application Context Handling
Upon trap entry the RTE backups the entire application context (i.e. all x
general purpose registers)
to the stack. The context is restored automatically after trap completion. The base address of the according
stack frame is copied to the mscratch
CSR. By having this information available, the RTE provides dedicated
functions for accessing and altering the application context:
// Prototypes
uint32_t neorv32_rte_context_get(int x); // read register x
void neorv32_rte_context_put(int x, uint32_t data); write data to register x
// Examples
uint32_t tmp = neorv32_rte_context_get(9); // read register 'x9'
neorv32_rte_context_put(28, tmp); // write 'tmp' to register 'x28'
RISC-V
Registers E Extensionx16..x31 are not available if the RISC-V E ISA Extension is enabled.
|
The context access functions can be used by application-specific trap handlers to emulate unsupported CPU / SoC features like unimplemented IO modules, unsupported instructions and even unaligned memory accesses.
Demo Program: Emulate Unaligned Memory Access
A demo program, which showcases how to emulate unaligned memory accesses using the NEORV32 runtime environment
can be found in sw/example/demo_emulate_unaligned .
|
5. On-Chip Debugger (OCD)
The NEORV32 Processor features an on-chip debugger (OCD) implementing the execution-based debugging scheme
compatible to the Minimal RISC-V Debug Specification. A copy of the specification is available in docs/references
.
The on-chip debugger is implemented via the OCD_EN
processor top generic.
Key Features
-
standard 4-wire JTAG access port
-
full control of the CPU: halting, single-stepping and resuming
-
indirect access to all core registers and the entire processor address space (via program buffer)
-
compatible with upstream OpenOCD and GDB
-
optional trigger module for hardware breakpoints
-
optional authentication for increased security
Hands-On Tutorial
A simple example on how to use NEORV32 on-chip debugger in combination with OpenOCD and the GNU debugger is shown in
section Debugging using the On-Chip Debugger
of the User Guide.
|
Section Structure
The NEORV32 on-chip debugger is based on five hardware modules:
-
Debug Transport Module (DTM): JTAG access tap to allow an external adapter to interface with the debug module (DM).
-
Debug Module (DM): RISC-V debug module that is configured by the DTM. From the CPU’s perspective this module behaves as another memory-mapped peripheral that can be accessed via the processor-internal bus. The memory-mapped registers provide an internal data buffer for data transfer from/to the DM, a code ROM containing the "park loop" code, a program buffer to allow the debugger to execute small programs defined by the DM and a status register that is used to communicate exception, halt, resume and execute requests/acknowledges from/to the DM.
-
Debug Authentication: Authenticator module to secure on-chip debugger access. This module implements a very simple authentication mechanism as example. Users can modify/replace this default logic to implement arbitrary authentication mechanism.
-
CPU Debug Mode ISA extension: This ISA extension provides the "debug execution mode" as another operation mode that is used to execute the park loop code from the DM. This mode also provides additional CSRs and instructions.
-
CPU Trigger Module: This module provides a single hardware breakpoint.
Theory of Operation
When debugging the system using the OCD, the debugger (like GDB) issues a halt request to the CPU to make the it enter debug mode. In this mode the application-defined architectural state of the system/CPU is "frozen" so the debugger can monitor it without interfering with the actual application. However, the OCD can also modify the entire architectural state at any time. While in debug mode, the debugger has full control over the entire CPU and processor operating at highest-privileged mode.
While in debug mode, the CPU executes the "park loop" code from the code ROM of the debug module (DM). This park loop implements an endless loop, where the CPU polls a memory-mapped Status Register that is controlled by the DM. The flags in this register are used to communicate requests from the DM and to acknowledge them by the CPU: trigger execution of the program buffer or resume the halted application. Furthermore, the CPU uses this register to signal that the CPU has halted after a halt request or to signal that an exception has been raised while being in debug mode.
5.1. Debug Transport Module (DTM)
The debug transport module "DTM" (VHDL module: rtl/core/neorv32_debug_dtm.vhd
) provides a standard 4-wire JTAG test
access port ("tap") via the following top-level ports:
Name | Width | Direction | Description |
---|---|---|---|
|
1 |
in |
serial clock |
|
1 |
in |
serial data input |
|
1 |
out |
serial data output |
|
1 |
in |
mode select |
Maximum JTAG Clock
All JTAG signals are synchronized to the processor’s clock domain. Hence, no additional clock domain is required for the DTM.
However, this constraints the maximal JTAG clock frequency (jtag_tck_i ) to be less than or equal to 1/5 of the processor
clock frequency (clk_i ).
|
JTAG TAP Reset
The NEORV32 JTAG TAP does not provide a dedicated reset signal ("TRST"). However, the missing TRST is not a problem,
since JTAG-level resets can be triggered using with TMS signaling.
|
Maintaining JTAG Chain
If the on-chip debugger is disabled the JTAG serial input jtag_tdi_i is directly
connected to the JTAG serial output jtag_tdo_o to maintain the JTAG chain.
|
JTAG accesses are based on a single 5-bit instruction register IR
and several data registers DR
with different sizes. The individual data registers are accessed by writing the according address to the instruction
register. The following table shows the available data registers and their addresses:
Address (via IR ) |
Name | Size (bits) | Description |
---|---|---|---|
|
|
32 |
identifier, version and part ID fields are hardwired to zero, manufacturer ID is assigned via the |
|
|
32 |
debug transport module control and status register (see below) |
|
|
41 |
debug module interface: 7-bit address, 32-bit read/write data, 2-bit operation ( |
others |
|
1 |
default JTAG bypass register |
Bit(s) | Name | R/W | Description |
---|---|---|---|
31:18 |
- |
r/- |
reserved, hardwired to zero |
17 |
|
r/w |
setting this bit will reset the debug module interface; this bit auto-clears |
16 |
|
r/w |
setting this bit will clear the sticky error state; this bit auto-clears |
15 |
- |
r/- |
reserved, hardwired to zero |
14:12 |
|
r/- |
recommended idle states (= 0, no idle states required) |
11:10 |
|
r/- |
DMI status: |
9:4 |
|
r/- |
number of address bits in |
3:0 |
|
r/- |
|
5.2. Debug Module (DM)
The debug module "DM" (VHDL module: rtl/core/neorv32_debug_dm.vhd
) acts as a translation interface between abstract
operations issued by the debugger application (like GDB) and the platform-specific debugger hardware.
It supports the following features:
-
Gives the debugger necessary information about the implementation.
-
Allows the hart to be halted/resumed/reset and provides the current status.
-
Provides abstract read and write access to the halted hart’s general purpose registers.
-
Provides access to a reset signal that allows debugging from the very first instruction after reset.
-
Provides a program buffer to force the hart to execute arbitrary instructions.
-
Allows memory access from a hart’s point of view.
-
Optionally implements an authentication mechanism to secure on-chip debugger access.
The NEORV32 DM follows the "Minimal RISC-V External Debug Specification" to provide full debugging capabilities while keeping resource/area requirements at a minimum. It implements the execution based debugging scheme for a single hart and provides the following architectural core features:
-
program buffer with 2 entries and an implicit
ebreak
instruction -
indirect bus access via the CPU using the program buffer
-
abstract commands: "access register" plus auto-execution
-
halt-on-reset capability
-
optional authentication
DM Spec. Version
The NEORV32 DM complies to the RISC-V DM spec version 1.0.
|
From the DTM’s point of view, the DM implements a set of DM Registers that are used to control and monitor the debugging session. From the CPU’s point of view, the DM implements several memory-mapped registers that are used for communicating debugging control and status (DM CPU Access).
5.2.1. DM Registers
The DM is controlled via a set of registers that are accessed via the DTM. The following registers are implemented:
Unimplemented Registers
Write accesses to registers that are not implemented are simply ignored and read accesses
to these registers will always return zero.
|
Address | Name | Description |
---|---|---|
0x04 |
Abstract data 0, used for data transfer between debugger and processor |
|
0x10 |
Debug module control |
|
0x11 |
Debug module status |
|
0x12 |
Hart information |
|
0x16 |
Abstract control and status |
|
0x17 |
Abstract command |
|
0x18 |
Abstract command auto-execution |
|
0x1d |
|
Base address of next DM; reads as zero to indicate there is only one DM |
0x20 |
Program buffer 0 |
|
0x21 |
Program buffer 1 |
|
0x30 |
Data to/from the authentication module |
|
0x38 |
|
System bus access control and status; reads as zero to indicate there is no system bus access |
data0
0x04 |
Abstract data 0 |
|
Reset value: |
||
Basic read/write data exchange register to be used with abstract commands (for example to read/write data from/to CPU GPRs). |
dmcontrol
0x10 |
Debug module control register |
|
Reset value: |
||
Control of the overall debug module and the hart. The following table shows all implemented bits. All remaining bits/bit-fields are configured as "zero" and are read-only. Writing '1' to these bits/fields will be ignored. |
Bit | Name [RISC-V] | R/W | Description |
---|---|---|---|
31 |
|
-/w |
set/clear hart halt request |
30 |
|
-/w |
request hart to resume |
28 |
|
-/w |
write |
1 |
|
r/w |
put whole system (except OCD) into reset state when |
0 |
|
r/w |
DM enable; writing |
dmstatus
0x11 |
Debug module status register |
|
Reset value: |
||
Current status of the overall debug module and the hart. The entire register is read-only. |
Bit | Name [RISC-V] | Description |
---|---|---|
31:23 |
reserved |
reserved; zero |
22 |
|
|
21:20 |
reserved |
reserved; zero |
19 |
|
|
18 |
|
|
17 |
|
|
16 |
|
|
15 |
|
zero to indicate the hart is always existent |
14 |
|
|
13 |
|
|
12 |
|
|
11 |
|
|
10 |
|
|
9 |
|
|
8 |
|
|
7 |
|
set if authentication passed; see Debug Authentication |
6 |
|
set if authentication is busy, see Debug Authentication |
5 |
|
|
4 |
|
|
3:0 |
|
|
hartinfo
0x12 |
Hart information |
|
Reset value: see below |
||
This register gives information about the hart. The entire register is read-only. |
Bit | Name [RISC-V] | Description |
---|---|---|
31:24 |
reserved |
reserved; zero |
23:20 |
|
|
19:17 |
reserved |
reserved; zero |
16 |
|
|
15:12 |
|
|
11:0 |
|
= |
abstracts
0x16 |
Abstract control and status |
|
Reset value: |
||
Command execution info and status. |
Bit | Name [RISC-V] | R/W | Description |
---|---|---|---|
31:29 |
reserved |
r/- |
reserved; zero |
28:24 |
|
r/- |
|
23:11 |
reserved |
r/- |
reserved; zero |
12 |
|
r/- |
set when a command is being executed |
11 |
|
r/- |
|
10:8 |
|
r/w |
error during command execution (see below); has to be cleared by writing |
7:4 |
reserved |
r/- |
reserved; zero |
3:0 |
|
r/- |
|
Error codes in cmderr
(highest priority first):
-
000
- no error -
100
- command cannot be executed since hart is not in expected state -
011
- exception during command execution -
010
- unsupported command -
001
- invalid DM register read/write while command is/was executing
command
0x17 |
Abstract command |
|
Reset value: |
||
Writing this register will trigger the execution of an abstract command. New command can only be executed if
|
The NEORV32 DM only supports Access Register abstract commands. These commands can only access the
hart’s GPRs x0 - x15/31 (abstract command register index 0x1000 - 0x101f ).
|
Bit | Name [RISC-V] | R/W | Description / required value |
---|