neorv32 logo transparent
riscv logo

GitHub stnolting%2Fneorv32 ffbd00?style=flat square&logo=github& neorv32?longCache=true&style=flat square data%20sheet PDF ffbd00?longCache=true&style=flat square&logo=asciidoctor user%20guide PDF ffbd00?longCache=true&style=flat square&logo=asciidoctor  HTML ffbd00?longCache=true&style=flat square doxygen HTML ffbd00?longCache=true&style=flat square&logo=Doxygen

1. Overview

The NEORV32[1] is an open-source RISC-V compatible processor system that is intended as ready-to-go auxiliary processor within a larger SoC designs or as stand-alone custom / customizable microcontroller.

The system is highly configurable and provides optional common peripherals like embedded memories, timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb compatible on-chip debugger accessible via JTAG.

The software framework of the processor comes with application makefiles, software libraries for all CPU and processor features, a bootloader, a runtime environment and several example programs – including a port of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as default toolchain (prebuilt toolchains are also provided).

Check out the processor’s online User Guide that provides hands-on tutorial to get you started.
The project’s change log is available in CHANGELOG.md in the root directory of the NEORV32 repository. Please also check out the Legal section.

Structure

Links in this document are highlighted.

1.1. Rationale

Why did you make this?

I am fascinated by processor and CPU architecture design: it is the magic frontier where software meets hardware. This project has started as something like a journey into this magic realm to understand how things actually work down on this very low level.

But there is more! When I started to dive into the emerging RISC-V ecosystem I felt overwhelmed by the complexity. As a beginner it is hard to get an overview - especially when you want to setup a minimal platform to tinker with: Which core to use? How to get the right toolchain? What features do I need? How does the booting work? How do I create an actual executable? How to get that into the hardware? How to customize things? Where to start???

So this project aims to provides a simple to understand and easy to use yet powerful and flexible platform that targets FPGA and RISC-V beginners as well as advanced users. Join me and us on this journey! 🙃

Why a soft-core processor?

As a matter of fact soft-core processors cannot compete with discrete or FPGA hard-macro processors in terms of performance, energy and size. But they do fill a niche in FPGA design space. For example, soft-core processors allow to implement the control flow part of certain applications (like communication protocol handling) using software like plain C. This provides high flexibility as software can be easily changed, re-compiled and re-uploaded again.

Furthermore, the concept of flexibility applies to all aspects of a soft-core processor. The user can add exactly the features that are required by the application: additional memories, custom interfaces, specialized IP and even user-defined instructions.

Why RISC-V?

RISC-V is a free and open ISA enabling a new era of processor innovation through open standard collaboration.

— RISC-V International
https://riscv.org/about/

I love the idea of open-source. Knowledge can help best if it is freely available. While open-source has already become quite popular in software, hardware projects still need to catch up. Admittedly, there has been quite a development, but mainly in terms of platforms and applications (so schematics, PCBs, etc.). Although processors and CPUs are the heart of almost every digital system, having a true open-source silicon is still a rarity. RISC-V aims to change that. Even it is just one approach, it helps paving the road for future development.

Furthermore, I welcome the community aspect of RISC-V. The ISA and everything beyond is developed with direct contact to the community: this includes businesses and professionals but also hobbyist, amateurs and people that are just curious. Everyone can join discussions and contribute to RISC-V in their very own way.

Finally, I really like the RISC-V ISA itself. It aims to be a clean, orthogonal and "intuitive" ISA that resembles with the basic concepts of RISC: simple yet effective.

Yet another RISC-V core? What makes it special?

The NEORV32 is not based on another RISC-V core. It was build entirely from ground up (just following the official ISA specs) having a different design goal in mind. The project does not intend to replace certain RISC-V cores or just beat existing ones like VexRISC in terms of performance or SERV in terms of size.

The project aims to provide another option in the RISC-V / soft-core design space with a different performance vs. size trade-off and a different focus: embrace concepts like documentation, platform-independence / portability, RISC-V compatibility, customization and ease of use. See the Project Key Features below.

1.2. Project Key Features

  • open-source and documented; including user guides to get started

  • completely described in behavioral, platform-independent VHDL (yet platform-optimized modules are provided)

  • fully synchronous design, no latches, no gated clocks

  • small hardware footprint and high operating frequency for easy integration

  • NEORV32 CPU: 32-bit rv32i RISC-V CPU

    • RISC-V compatibility: passes the official architecture tests

    • base architecture + privileged architecture (optional) + ISA extensions (optional)

    • rich set of customization options (ISA extensions, design goal: performance / area (/ energy), …​)

    • official RISC-V open source architecture ID

  • NEORV32 Processor (SoC): highly-configurable full-scale microcontroller-like processor system

    • based on the NEORV32 CPU

    • optional serial interfaces (UARTs, TWI, SPI)

    • optional timers and counters (WDT, MTIME, NCO)

    • optional general purpose IO and PWM and native NeoPixel (c) compatible smart LED interface

    • optional embedded memories / caches for data, instructions and bootloader

    • optional external memory interface for custom connectivity (Wishbone or AXI4-Lite)

    • on-chip debugger compatible with OpenOCD and gdb

  • Software framework

    • GCC-based toolchain - prebuilt toolchains available; application compilation based on GNU makefiles

    • internal bootloader with serial user interface

    • core libraries for high-level usage of the provided functions and peripherals

    • runtime environment and several example programs

    • doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html

    • FreeRTOS port + demos available

For more in-depth details regarding the feature provided by he hardware see the according sections: NEORV32 Central Processing Unit (CPU) and NEORV32 Processor (SoC).

1.3. Project Folder Structure

neorv32            - Project home folder
├.ci              - Scripts for continuous integration
├setups           - Example setups for various FPGA boards and toolchains
│└...
├CHANGELOG.md     - Project change log
├docs             - Project documentation
│├doxygen_build  - Software framework documentation (generated by doxygen)
│├src_adoc       - AsciiDoc sources for this document
│├references     - Data sheets and RISC-V specs.
│└figures        - Figures and logos
├riscv-arch-test  - Port files for the official RISC-V architecture tests
├rtl              - VHDL sources
│├core           - Sources of the CPU & SoC
│└templates      - Alternate/additional top entities/wrappers
│ ├processor      - Processor wrappers
│ └system         - System wrappers for advanced connectivity
├sim              - Simulation files
│└rtl_modules    - Processor modules for simulation-only
└sw               - Software framework
 ├bootloader      - Sources and scripts for the NEORV32 internal bootloader
 ├common          - Linker script and crt0.S start-up code
 ├example         - Various example programs
 │└...
 ├ocd_firmware    - source code for on-chip debugger's "park loop"
 ├openocd         - OpenOCD on-chip debugger configuration files
 ├image_gen       - Helper program to generate NEORV32 executables
 └lib             - Processor core library
  ├include        - Header files (*.h)
  └source         - Source files (*.c)
There are further files and folders starting with a dot which – for example – contain data/configurations only relevant for git or for the continuous integration framework (.ci).

1.4. VHDL File Hierarchy

All necessary VHDL hardware description files are located in the project’s rtl/core folder. The top entity of the entire processor including all the required configuration generics is neorv32_top.vhd.

All core VHDL files from the list below have to be assigned to a new design library named neorv32. Additional files, like alternative top entities, can be assigned to any library.
neorv32_top.vhd                      - NEORV32 Processor top entity
├neorv32_boot_rom.vhd               - Bootloader ROM
│└neorv32_bootloader_image.vhd     - Bootloader boot ROM memory image
├neorv32_busswitch.vhd              - Processor bus switch for CPU buses (I&D)
├neorv32_bus_keeper.vhd             - Processor-internal bus monitor
├neorv32_icache.vhd                 - Processor-internal instruction cache
├neorv32_cfs.vhd                    - Custom functions subsystem
├neorv32_cpu.vhd                    - NEORV32 CPU top entity
│├neorv32_package.vhd              - Processor/CPU main VHDL package file
│├neorv32_cpu_alu.vhd              - Arithmetic/logic unit
│├neorv32_cpu_bus.vhd              - Bus interface unit + physical memory protection
│├neorv32_cpu_control.vhd          - CPU control, exception/IRQ system and CSRs
││└neorv32_cpu_decompressor.vhd   - Compressed instructions decoder
│├neorv32_cpu_cp_fpu.vhd           - Floating-point co-processor (Zfinx extension)
│├neorv32_cpu_cp_muldiv.vhd        - Mul/Div co-processor (M extension)
│└neorv32_cpu_regfile.vhd          - Data register file
├neorv32_debug_dm.vhd               - on-chip debugger: debug module
├neorv32_debug_dtm.vhd              - on-chip debugger: debug transfer module
├neorv32_dmem.vhd                   - Processor-internal data memory
├neorv32_gpio.vhd                   - General purpose input/output port unit
├neorv32_imem.vhd                   - Processor-internal instruction memory
│└neor32_application_image.vhd     - IMEM application initialization image
├neorv32_mtime.vhd                  - Machine system timer
├neorv32_nco.vhd                    - Numerically-controlled oscillator
├neorv32_neoled.vhd                 - NeoPixel (TM) compatible smart LED interface
├neorv32_pwm.vhd                    - Pulse-width modulation controller
├neorv32_spi.vhd                    - Serial peripheral interface controller
├neorv32_sysinfo.vhd                - System configuration information memory
├neorv32_trng.vhd                   - True random number generator
├neorv32_twi.vhd                    - Two wire serial interface controller
├neorv32_uart.vhd                   - Universal async. receiver/transmitter
├neorv32_wdt.vhd                    - Watchdog timer
└neorv32_wb_interface.vhd           - External (Wishbone) bus interface

1.5. FPGA Implementation Results

This chapter shows exemplary implementation results of the NEORV32 CPU and Processor. Please note, that the provided results are just a relative measure as logic functions of different modules might be merged between entity boundaries, so the actual utilization results might vary a bit.

1.5.1. CPU

Hardware version:

1.5.5.5

Top entity:

rtl/core/neorv32_cpu.vhd

CPU LEs FFs MEM bits DSPs fmax

rv32i

980

409

1024

0

123 MHz

rv32i_Zicsr

1835

856

1024

0

124 MHz

rv32im_Zicsr

2443

1134

1024

0

124 MHz

rv32imc_Zicsr

2669

1149

1024

0

125 MHz

rv32imac_Zicsr

2685

1156

1024

0

124 MHz

rv32imac_Zicsr + debug_mode

3058

1225

1024

0

120 MHz

rv32imac_Zicsr + u

2698

1162

1024

0

124 MHz

rv32imac_Zicsr_Zifencei + u

2715

1162

1024

0

122 MHz

rv32imac_Zicsr_Zifencei_Zfinx + u

4004

1812

1024

7

121 MHz

1.5.2. Processor Modules

Hardware version:

1.5.5.9

Top entity:

rtl/core/neorv32_top.vhd

Table 1. Hardware utilization by the processor modules (mandatory core modules in bold)
Module Description LEs FFs MEM bits DSPs

Boot ROM

Bootloader ROM (4kB)

3

1

32768

0

BUSKEEPER

Processor-internal bus monitor

11

6

0

0

BUSSWITCH

Bus mux for CPU instr. and data interface

49

8

0

0

CFS

Custom functions subsystem

-

-

-

-

DMEM

Processor-internal data memory (8kB)

18

2

65536

0

DM

On-chip debugger - debug module

493

240

0

0

DTM

On-chip debugger - debug transfer module (JTAG)

254

218

0

0

GPIO

General purpose input/output ports

67

65

0

0

iCACHE

Instruction cache (1x4 blocks, 256 bytes per block)

220

154

8192

0

IMEM

Processor-internal instruction memory (16kB)

6

2

131072

0

MTIME

Machine system timer

289

200

0

0

NCO

Numerically-controlled oscillator

254

226

0

0

NEOLED

Smart LED Interface (NeoPixel/WS28128) [4xFIFO]

347

309

0

0

PWM

Pulse_width modulation controller (4 channels)

71

69

0

0

SPI

Serial peripheral interface

138

124

0

0

SYSINFO

System configuration information memory

10

10

0

0

TRNG

True random number generator

132

105

0

0

TWI

Two-wire interface

77

44

0

0

UART0/1

Universal asynchronous receiver/transmitter 0/1

176

132

0

0

WDT

Watchdog timer

60

45

0

0

WISHBONE

External memory interface

129

104

0

0

1.5.3. Exemplary Setups

Check out the example setups in the setups folder (@GitHub: https://github.com/stnolting/neorv32/tree/master/setups), which provides (script-based) demo setups for various FPGA boards and toolchains.

The following table shows exemplary NEORV32 processor implementation results for different FPGA platforms. Most setups use the default peripheral configuration (like no CFS, no caches and no TRNG), no external memory interface and only internal instruction and data memories (IMEM uses 16kB and DMEM uses 8kB memory space).

Hardware version:

1.4.9.0

Table 2. Hardware utilization for exemplary NEORV32 setups
Vendor FPGA Board Toolchain CPU LUT FF DSP Memory f

Intel

Cyclone IV EP4CE22F17-C6N

Terasic DE0-Nano

Quartus Prime Lite 20.1

rv32imcu_Zicsr_Zifencei + PMP

3813 (17%)

1890 (8%)

0 (0%)

Memory bits: 231424 (38%)

119 MHz

Lattice

iCE40 UltraPlus iCE40UP5KSG48I

Upduino v3.0

Radiant 2.1

rv32icu_Zicsr_Zifencei

5123 (97%)

1972 (37%)

0 (0%)

EBR: 12 (40%) SPRAM: 4 (100%)

24 MHz

Xilinx

Artix-7 XC7A35TICSG324-1L

Arty A7-35T

Vivado 2019.2

rv32imcu_Zicsr_Zifencei + PMP

2465 (12%)

1912 (5%)

0 (0%)

BRAM: 8 (16%)

100 MHz

Notes

  • The Lattice iCE40 UltraPlus setup uses the FPGA’s SPRAM memory primitives for the internal IMEM and DEMEM (each 64kB).

  • The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32 bootloader to store and automatically boot an application program after reset (both tested successfully).

  • The setups with PMP implement 2 regions with a minimal granularity of 64kB.

  • No HPM counters are used.

1.6. CPU Performance

1.6.1. CoreMark Benchmark

Table 3. Configuration

Hardware:

32kB IMEM, 16kB DMEM, no caches, 100MHz clock

CoreMark:

2000 iterations, MEM_METHOD is MEM_STACK

Compiler:

RISCV32-GCC 10.1.0

Peripherals:

UART for printing the results

Compiler flags:

default, see makefile

The performance of the NEORV32 was tested and evaluated using the Core Mark CPU benchmark. This benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole system. The according source code and the SW project can be found in the sw/example/coremark folder.

The resulting CoreMark score is defined as CoreMark iterations per second. The execution time is determined via the RISC-V [m]cycle[h] CSRs. The relative CoreMark score is defined as CoreMark score divided by the CPU’s clock frequency in MHz.

Hardware version:

1.4.9.8

Table 4. CoreMark results
CPU (incl. Zicsr) Executable size CoreMark Score CoreMarks/Mhz

rv32i

28756 bytes

36.36

0.3636

rv32im

27516 bytes

68.97

0.6897

rv32imc

22008 bytes

68.97

0.6897

rv32imc + FAST_MUL_EN

22008 bytes

86.96

0.8696

rv32imc + FAST_MUL_EN + FAST_SHIFT_EN

22008 bytes

90.91

0.9091

All executable were generated using maximum optimization -O3. The FAST_MUL_EN configuration uses DSPs for the multiplier of the M extension (enabled via the FAST_MUL_EN generic). The FAST_SHIFT_EN configuration uses a barrel shifter for CPU shift operations (enabled via the FAST_SHIFT_EN generic).

1.6.2. Instruction Timing

The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of several consecutive micro operations. Hence, each instruction requires several clock cycles to execute.

The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on the available CPU extensions. The following table shows the performance results for successfully (!) running 2000 CoreMark iterations.

The average CPI is computed by dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles) by the number of executed instructions ([m]instret[h] CSRs). The executables were generated using optimization -O3.

Hardware version:

1.4.9.8

Table 5. CoreMark instruction timing
CPU (incl. Zicsr) Required clock cycles Executed instruction Average CPI

rv32i

5595750503

1466028607

3.82

rv32im

2966086503

598651143

4.95

rv32imc

2981786734

611814918

4.87

rv32imc + FAST_MUL_EN

2399234734

611814918

3.92

rv32imc + FAST_MUL_EN + FAST_SHIFT_EN

2265135174

611814948

3.70

The FAST_MUL_EN configuration uses DSPs for the multiplier of the M extension (enabled via the FAST_MUL_EN generic). The FAST_SHIFT_EN configuration uses a barrel shifter for CPU shift operations (enabled via the FAST_SHIFT_EN generic).
More information regarding the execution time of each implemented instruction can be found in chapter Instruction Timing.

2. NEORV32 Processor (SoC)

The NEORV32 Processor is based on the NEORV32 CPU. Together with common peripheral interfaces and embedded memories it provides a RISC-V-based full-scale microcontroller-like SoC platform.

neorv32 processor

Key Features

  • optional processor-internal data and instruction memories (DMEM/IMEM) + cache (iCACHE)

  • optional internal bootloader (BOOTROM) with UART console & SPI flash boot option

  • optional machine system timer (MTIME), RISC-V-compatible

  • optional two independent universal asynchronous receivers and transmitters (UART0, UART1) with optional hardware flow control (RTS/CTS)

  • optional 8/16/24/32-bit serial peripheral interface controller (SPI) with 8 dedicated CS lines

  • optional two wire serial interface controller (TWI), compatible to the I²C standard

  • optional general purpose parallel IO port (GPIO), 32xOut, 32xIn

  • optional 32-bit external bus interface, Wishbone b4 / AXI4-Lite compatible (WISHBONE)

  • optional watchdog timer (WDT)

  • optional PWM controller with up to 60 channels & 8-bit duty cycle resolution (PWM)

  • optional ring-oscillator-based true random number generator (TRNG)

  • optional custom functions subsystem for custom co-processor extensions (CFS)

  • optional numerically-controlled oscillator (NCO) with 3 independent channels

  • optional NeoPixel™/WS2812-compatible smart LED interface (NEOLED)

  • optional on-chip debugger with JTAG TAP (OCD)

  • system configuration information memory to check HW configuration via software (SYSINFO)

2.1. Processor Top Entity - Signals

The following table shows all interface ports of the processor top entity (rtl/core/neorv32_top.vhd). The type of all signals is std_ulogic or std_ulogic_vector, respectively.

A wrapper for the NEORV32 Processor setup providing resolved port signals can be found in rtl/templates/processor/neorv32_ProcessorTop_stdlogic.vhd.
Signal Width Dir. Function

Global Control

clk_i

1

in

global clock line, all registers triggering on rising edge

rstn_i

1

in

global reset, asynchronous, low-active

JTAG Access Port for On-Chip Debugger (OCD)

jtag_trst_i

1

in

TAP reset, low-active (optional[2])

`jtag_tck_i `

1

in

serial clock

`jtag_tdi_i `

1

in

serial data input

`jtag_tdo_o `

1

out

serial data output[3]

`jtag_tms_i `

1

in

mode select

External Bus Interface (WISHBONE)

wb_tag_o

3

out

tag (access type identifier)

wb_adr_o

32

out

destination address

wb_dat_i

32

in

write data

wb_dat_o

32

out

read data

wb_we_o

1

out

write enable ('0' = read transfer)

wb_sel_o

4

out

byte enable

wb_stb_o

1

out

strobe

wb_cyc_o

1

out

valid cycle

wb_lock_o

1

out

exclusive access request

wb_ack_i

1

in

transfer acknowledge

wb_err_i

1

in

transfer error

Advanced Memory Control Signals

fence_o

1

out

indicates an executed fence instruction

fencei_o

1

out

indicates an executed fencei instruction

General Purpose Inputs & Outputs (GPIO)

gpio_o

32

out

general purpose parallel output

gpio_i

32

in

general purpose parallel input

Primary Universal Asynchronous Receiver/Transmitter (UART0)

uart0_txd_o

1

out

UART0 serial transmitter

uart0_rxd_i

1

in

UART0 serial receiver

uart0_rts_o

1

out

UART0 RX ready to receive new char

uart0_cts_i

1

in

UART0 TX allowed to start sending

Primary Universal Asynchronous Receiver/Transmitter (UART1)

uart1_txd_o

1

out

UART1 serial transmitter

uart1_rxd_i

1

in

UART1 serial receiver

uart1_rts_o

1

out

UART1 RX ready to receive new char

uart1_cts_i

1

in

UART1 TX allowed to start sending

Serial Peripheral Interface Controller (SPI)

spi_sck_o

1

out

SPI controller clock line

spi_sdo_o

1

out

SPI serial data output

spi_sdi_i

1

in

SPI serial data input

spi_csn_o

8

out

SPI dedicated chip select (low-active)

Two-Wire Interface Controller (TWI)

twi_sda_io

1

inout

TWI serial data line

twi_scl_io

1

inout

TWI serial clock line

Custom Functions Subsystem (CFS)

cfs_in_i

32

in

custom CFS input signal conduit

cfs_out_o

32

out

custom CFS output signal conduit

Pulse-Width Modulation Channels (PWM)

pwm_o

4

out

pulse-width modulated channels

Numerically-Controller Oscillator (NCO)

nco_o

3

out

NCO output channels

Smart LED Interface - NeoPixel™ compatible (NEOLED)

neoled_o

1

out

asynchronous serial data output

System time (MTIME)

mtime_i

64

in

machine timer time (to time[h] CSRs) from external MTIME unit if the processor-internal MTIME unit is NOT implemented

mtime_o

64

out

machine timer time from internal MTIME unit if processor-internal MTIME unit IS implemented

Processor Interrupts

nm_irq_i

1

in

non-maskable interrupt

soc_firq_i

6

in

platform fast interrupt channels (custom)

mtime_irq_i

1

in

machine timer interrupt13 (RISC-V)

msw_irq_i

1

in

machine software interrupt (RISC-V)

mext_irq_i

1

in

machine external interrupt (RISC-V)

2.2. Processor Top Entity - Generics

This is a list of all configuration generics of the NEORV32 processor top entity rtl/neorv32_top.vhd. The generic name is shown in orange, followed by the type in printed in black and concluded by the default value printed in light gray.

The NEORV32 generics allow to configure the system according to your needs. The generics are used to control implementation of certain CPU extensions and peripheral modules and even allow to optimize the system for certain design goals like minimal area or maximum performance.
Privileged software can determine the actual CPU and processor configuration via the misa and mzext (see Machine Trap Setup and NEORV32-Specific Custom CSRs) CSRs and via the memory-mapped SYSINFO module (see System Configuration Information Memory (SYSINFO)), respectively.
If optional modules (like CPU extensions or peripheral devices) are not enabled the according circuitry will not be synthesized at all. Hence, the disabled modules do not increase area and power requirements and do not impact the timing.
Not all configuration combinations are valid. The processor RTL code provides sanity checks to inform the user during synthesis/simulation if an invalid combination has been detected.

CSR Description

The description of each CSR provides the following summary:

Table 6. Generic description

Generic

type

default value

Description

2.2.1. General

See section System Configuration Information Memory (SYSINFO) for more information.

CLOCK_FREQUENCY

CLOCK_FREQUENCY

natural

0

The clock frequency of the processor’s clk_i input port in Hertz (Hz).

INT_BOOTLOADER_EN

INT_BOOTLOADER_EN

boolean

true

Implement the processor-internal boot ROM, pre-initialized with the default bootloader image when true. This will also change the processor’s boot address from the beginning of the instruction memory address space (default = 0x00000000) to the base address of the boot ROM. See section Boot Configuration for more information.

USER_CODE

USER_CODE

std_ulogic_vector(31 downto 0)

x"00000000"

Custom user code that can be read by software via the SYSINFO module.

HW_THREAD_ID

HW_THREAD_ID

natural

0

The hart ID of the CPU. Can be read via the mhartid CSR. Hart IDs must be unique within a system.

ON_CHIP_DEBUGGER_EN

ON_CHIP_DEBUGGER_EN

boolean

false

Implement on-chip debugger (OCD). See chapter On-Chip Debugger (OCD).

2.2.2. RISC-V CPU Extensions

See section Instruction Sets and Extensions for more information.

CPU_EXTENSION_RISCV_A

CPU_EXTENSION_RISCV_A

boolean

false

Implement atomic memory access operations when true.

CPU_EXTENSION_RISCV_C

CPU_EXTENSION_RISCV_C

boolean

false

Implement compressed instructions (16-bit) when true.

CPU_EXTENSION_RISCV_E

CPU_EXTENSION_RISCV_E

boolean

false

Implement the embedded CPU extension (only implement the first 16 data registers) when true.

CPU_EXTENSION_RISCV_M

CPU_EXTENSION_RISCV_M

boolean

false

Implement integer multiplication and division instructions when true.

CPU_EXTENSION_RISCV_U

CPU_EXTENSION_RISCV_U

boolean

false

Implement less-privileged user mode when true.

CPU_EXTENSION_RISCV_Zfinx

CPU_EXTENSION_RISCV_Zfinx

boolean

false

Implement the 32-bit single-precision floating-point extension (using integer registers) when true. For more information see section Zfinx Single-Precision Floating-Point Operations.

CPU_EXTENSION_RISCV_Zicsr

CPU_EXTENSION_RISCV_Zicsr

boolean

true

Implement the control and status register (CSR) access instructions when true. Note: When this option is disabled, the complete privileged architecture / trap system will be excluded from synthesis. Hence, no interrupts, no exceptions and no machine information will be available.

CPU_EXTENSION_RISCV_Zifencei

CPU_EXTENSION_RISCV_Zifencei

boolean

false

Implement the instruction fetch synchronization instruction fence.i. For example, this option is required for self-modifying code (and/or for i-cache flushes).

2.2.3. Extension Options

See section Instruction Sets and Extensions for more information.

FAST_MUL_EN

FAST_MUL_EN

boolean

false

When this generic is enabled, the multiplier of the M extension is realized using DSPs blocks instead of an iterative bit-serial approach. This generic is only relevant when the multiplier and divider CPU extension is enabled (CPU_EXTENSION_RISCV_M is true).

FAST_SHIFT_EN

FAST_SHIFT_EN

boolean

false

When this generic is enabled the shifter unit of the CPU’s ALU is implement as fast barrel shifter (requiring more hardware resources).

TINY_SHIFT_EN

TINY_SHIFT_EN

boolean

false

If this generic is enabled the shifter unit of the CPU’s ALU is implemented as (slow but tiny) single-bit iterative shifter (requires up to 32 clock cycles for a shift operations, but reducing hardware footprint). The configuration of this generic is ignored if FAST_SHIFT_EN is true.

CPU_CNT_WIDTH

CPU_CNT_WIDTH

natural

0

This generic configures the total size of the CPU’s cycle and instret CSRs (low word + high word). The maximum value is 64, the minimal is 0. See section (Machine) Counters and Timers for more information. Note: Configurations with CPU_CNT_WIDTH less than 64 are not RISC-V compliant.

2.2.4. Physical Memory Protection (PMP)

See section PMP Physical Memory Protection for more information.

PMP_NUM_REGIONS

PMP_NUM_REGIONS

natural

0

Total number of implemented protections regions (0..64). If this generics is zero no physical memory protection logic will be implemented at all. Setting PMP_NUM_REGIONS_ > 0 will set the CSR_MZEXT_PMP flag in the mzext CSR.

PMP_MIN_GRANULARITY

PMP_MIN_GRANULARITY

natural

64*1024

Minimal region granularity in bytes. Has to be a power of two. Has to be at least 8 bytes.

2.2.5. Hardware Performance Monitors (HPM)

See section HPM Hardware Performance Monitors for more information.

HPM_NUM_CNTS

HPM_NUM_CNTS

natural

0

Total number of implemented hardware performance monitor counters (0..29). If this generics is zero no hardware performance monitor logic will be implemented at all. Setting HPM_NUM_CNTS > 0 will set the CSR_MZEXT_HPM flag in the mzext CSR.

HPM_CNT_WIDTH

HPM_CNT_WIDTH

natural

40

This generic defines the total LSB-aligned size of each HPM counter (size([m]hpmcounter*h)
size([m]hpmcounter*)). The maximum value is 64, the minimal is 0. If the size is less than 64-bit, the unused MSB-aligned counter bits are hardwired to zero.

2.2.6. Internal Instruction Memory

See sections Address Space and Instruction Memory (IMEM) for more information.

MEM_INT_IMEM_EN

MEM_INT_IMEM_EN

boolean

true

Implement processor internal instruction memory (IMEM) when true.

MEM_INT_IMEM_SIZE

MEM_INT_IMEM_SIZE

natural

16*1024

Size in bytes of the processor internal instruction memory (IMEM). Has no effect when MEM_INT_IMEM_EN is false.

2.2.7. Internal Data Memory

See sections Address Space and Data Memory (DMEM) for more information.

MEM_INT_DMEM_EN

MEM_INT_DMEM_EN

boolean

true

Implement processor internal data memory (DMEM) when true.

MEM_INT_DMEM_SIZE

MEM_INT_DMEM_SIZE

natural

8*1024

Size in bytes of the processor-internal data memory (DMEM). Has no effect when MEM_INT_DMEM_EN is false.

2.2.8. Internal Cache Memory

See section Processor-Internal Instruction Cache (iCACHE) for more information.

ICACHE_EN

ICACHE_EN

boolean

false

Implement processor internal instruction cache when true.

ICACHE_NUM_BLOCK

ICACHE_NUM_BLOCKS

natural

4

Number of blocks (cache "pages" or "lines") in the instruction cache. Has to be a power of two. Has no effect when ICACHE_DMEM_EN is false.

ICACHE_BLOCK_SIZE

ICACHE_BLOCK_SIZE

natural

64

Size in bytes of each block in the instruction cache. Has to be a power of two. Has no effect when ICACHE_EN is false.

ICACHE_ASSOCIATIVITY

ICACHE_ASSOCIATIVITY

natural

1

Associativity (= number of sets) of the instruction cache. Has to be a power of two. Allowed configurations: 1 = 1 set, direct mapped; 2 = 2-way set-associative. Has no effect when ICACHE_EN is false.

2.2.9. External Memory Interface

MEM_EXT_EN

MEM_EXT_EN

boolean

false

Implement external bus interface (WISHBONE) when true.

MEM_EXT_TIMEOUT

MEM_EXT_TIMEOUT

natural

255

Clock cycles after which a pending external bus access will auto-terminate and raise a bus fault exception. Set to 0 to disable auto-timeout.

2.2.10. Processor Peripheral/IO Modules

See section Processor-Internal Modules for more information.

IO_GPIO_EN

IO_GPIO_EN

boolean

true

Implement general purpose input/output port unit (GPIO) when true. See section General Purpose Input and Output Port (GPIO) for more information.

IO_MTIME_EN

IO_MTIME_EN

boolean

true

Implement machine system timer (MTIME) when true. See section Machine System Timer (MTIME) for more information.

IO_UART0_EN

IO_UART0_EN

boolean

true

Implement primary universal asynchronous receiver/transmitter (UART0) when true. See section Primary Universal Asynchronous Receiver and Transmitter (UART0) for more information.

IO_UART1_EN

IO_UART1_EN

boolean

true

Implement secondary universal asynchronous receiver/transmitter (UART1) when true. See section Secondary Universal Asynchronous Receiver and Transmitter (UART1) for more information.

IO_SPI_EN

IO_SPI_EN

boolean

true

Implement serial peripheral interface controller (SPI) when true. See section Serial Peripheral Interface Controller (SPI) for more information.

IO_TWI_EN

IO_TWI_EN

boolean

true

Implement two-wire interface controller (TWI) when true. See section Two-Wire Serial Interface Controller (TWI) for more information.

IO_PWM_NUM_CH

IO_PWM_NUM_CH

natural

4

Number of pulse-width modulation (PWM) channels (0..60) to implement. The PWM controller is not implemented if zero. See section Pulse-Width Modulation Controller (PWM) for more information.

IO_WDT_EN

IO_WDT_EN

boolean

true

Implement watchdog timer (WDT) when true. See section Watchdog Timer (WDT) for more information.

IO_TRNG_EN

IO_TRNG_EN

boolean

false

Implement true-random number generator (TRNG) when true. See section True Random-Number Generator (TRNG) for more information.

IO_CFS_EN

IO_CFS_EN

boolean

false

Implement custom functions subsystem (CFS) when true. See section Custom Functions Subsystem (CFS) for more information.

IO_CFS_CONFIG

IO_CFS_CONFIG

std_ulogic_vector(31 downto 0)

0x"00000000"

This is a "conduit" generic that can be used to pass user-defined CFS implementation flags to the custom functions subsystem entity. See section Custom Functions Subsystem (CFS) for more information.

IO_CFS_IN_SIZE

IO_CFS_IN_SIZE

positive

32

Defines the size of the CFS input signal conduit (cfs_in_i). See section Custom Functions Subsystem (CFS) for more information.

IO_CFS_OUT_SIZE

IO_CFS_OUT_SIZE

positive

32

Defines the size of the CFS output signal conduit (cfs_out_o). See section Custom Functions Subsystem (CFS) for more information.

IO_NCO_EN

IO_NCO_EN

boolean

true

Implement numerically-controlled oscillator (NCO) when true. See section Numerically-Controlled Oscillator (NCO) for more information.

IO_NEOLED_EN

IO_NEOLED_EN

boolean

true

Implement smart LED interface (WS2812 / NeoPixel™-compatible) (NEOLED) when true. See section Smart LED Interface (NEOLED) for more information.

2.3. Processor Interrupts

The interrupt request signals have specific mip CSR bits (see Machine Trap Setup), specifc mie CSR bits (see Machine Trap Handling) and specifc mcause CSR trap codes and trap priorities. For more information (also regarding the signaling protocol) see section Traps, Exceptions and Interrupts.

RISC-V Standard Interrupts

The processor setup features the standard RISC-V interrupt lines for "machine timer interrupt", "machine software interrupt" and "machine external interrupt". The software and external interrupt lines are available via the processor’s top entity. By default, the timer interrupt is connected to the internal machine timer MTIME timer unit (Machine System Timer (MTIME)). If this module has not been enabled for synthesis, the machine timer interrupt is also available via the processor’s top entity.

NEORV32-Specific Fast Interrupt Requests

As part of the custom/NEORV32-specific CPU extensions, the CPU features 16 fast interrupt request signals (FIRQ0FIRQ15).

The fast interrupt request signals are divided into two groups. The FIRQs with higher priority (FIRQ0 – FIRQ9) are dedicated for processor-internal usage. The FIRQs with lower priority (FIRQ10 – FIRQ15) are available for custom usage via the processor’s top entity signal soc_firq_i.

The mapping of the 16 FIRQ channels is shown in the following table (the channel number corresponds to the FIRQ priority):

Table 7. NEORV32 fast interrupt channel mapping
Channel Source Description

0

WDT

watchdog timeout interrupt

1

CFS

custom functions subsystem (CFS) interrupt (user-defined)

2

UART0 (RXD)

UART0 data received interrupt (RX complete)

3

UART0 (TXD)

UART0 sending done interrupt (TX complete)

4

UART1 (RXD)

UART1 data received interrupt (RX complete)

5

UART1 (TXD)

UART1 sending done interrupt (TX complete)

6

SPI

SPI transmission done interrupt

7

TWI

TWI transmission done interrupt

8

GPIO

GPIO input pin-change interrupt

9

NEOLED

NEOLED buffer TX empty / not full interrupt

10:15

soc_firq_i(5:0)

Custom platform use; available via processor’s top signal

Non-Maskable Interrupt

The NEORV32 features a single non-maskable interrupt source via the nm_irq_i top entity signal that can be used to signal critical system conditions. This interrupt source cannot be disabled. Hence, it does not provide configuration/status flags in the mie and mip CSRs. The RISC-V-compatible mcause value 0x80000000 is used to indicate the non-maskable interrupt.

2.4. Address Space

The NEORV32 Processor provides 32-bit physical addresses accessing up to 4GB of address space. By default, this address space is divided into four main regions:

  1. Instruction address space – for instructions (=code) and constants. A configurable section of this address space is used by internal and/or external instruction memory (IMEM).

  2. Data address space – for application runtime data (heap, stack, etc.). A configurable section of this address space is used by internal and/or external data memory (DMEM).

  3. Bootloader address space. A fixed section of this address space is used by internal bootloader memory (BOOTLDROM).

  4. IO/peripheral address space – for the processor-internal IO/peripheral devices (e.g., UART).

These four memory regions are handled by the linker when compiling a NEORV32 executable. See section Executable Image Format for more information.
900
Figure 1. NEORV32 processor - address space (default configuration)

2.4.1. CPU Data and Instruction Access

The CPU can access all of the 4GB address space from the instruction fetch interface (I) and also from the data access interface (D). These two CPU interfaces are multiplexed by a simple bus switch (rtl/core/neorv32_busswitch.vhd) into a single processor-internal bus. All processor-internal memories, peripherals and also the external memory interface are connected to this bus. Hence, both CPU interfaces (instruction fetch & data access) have access to the same (identical) address space making the setup a modified von-Neumann architecture.

1300
Figure 2. Processor-internal bus architecture
The internal processor bus might appear as bottleneck. In order to reduce traffic jam on this bus (when instruction fetch and data interface access the bus at the same time) the instruction fetch of the CPU is equipped with a prefetch buffer. Instruction fetches can be further buffered using the i-cache. Furthermore, data accesses (loads and stores) have higher priority than instruction fetch accesses.
Please note that all processor-internal components including the peripheral/IO devices can also be accessed from programs running in less-privileged user mode. For example, if the system relies on a periodic interrupt from the MTIME timer unit, user-level programs could alter the MTIME configuration corrupting this interrupt. This kind of security issues can be compensated using the PMP system (see Machine Physical Memory Protection).

2.4.2. Address Space Layout

The general address space layout consists of two main configuration constants: ispace_base_c defining the base address of the instruction memory address space and dspace_base_c defining the base address of the data memory address space. Both constants are defined in the NEORV32 VHDL package file rtl/core/neorv32_package.vhd:

-- Architecture Configuration ----------------------------------------------------
-- ----------------------------------------------------------------------------------
constant ispace_base_c : std_ulogic_vector(31 downto 0) := x"00000000";
constant dspace_base_c : std_ulogic_vector(31 downto 0) := x"80000000";

The default configuration assumes the instruction memory address space starting at address 0x00000000 and the data memory address space starting at 0x80000000. Both values can be modified for a specific setup and the address space may overlap or can be completely identical. Make sure that both base addresses are aligned to a 4-byte boundary.

The base address of the internal bootloader (at 0xFFFF0000) and the internal IO region (at 0xFFFFFE00) for peripheral devices are also defined in the package and are fixed. These address regions cannot not be used for other applications – even if the bootloader or all IO devices are not implemented - without modifying the core’s hardware sources.

2.4.3. Physical Memory Attributes

The processor setup defines fixed attributes for the four processor-internal address space regions. Accessing a memory region in a way that violates any of these attributes will raise an according access exception..

  • r – read access (from CPU data access interface, e.g. via "load")

  • w – write access (from CPU data access interface, e.g. via "store")

  • x – execute access (from CPU instruction fetch interface)

  • a – atomic access (from CPU data access interface)

  • 8 – byte (8-bit)-accessible (when writing)

  • 16 – half-word (16-bit)-accessible (when writing)

  • 32 – word (32-bit)-accessible (when writing)

Read accesses (i.e. loads) can always access data in word, half-word and byte quantities (requiring an accordingly aligned address).
# Region Base address Size Attributes

4

IO/peripheral devices

0xfffffe00

512 bytes

r/w/a/32

3

bootloader ROM

0xffff0000

up to 32kB

r/x/a

2

DMEM

0x80000000

up to 2GB (-64kB)

r/w/x/a/8/16/32

1

IMEM

0x00000000

up to 2GB

r/w/x/a/8/16/32

The following table shows the provided physical memory attributes of each region. Additional attributes (for example controlling certain right for specific address space regions) can be provided using the RISC-V Machine Physical Memory Protection extension.

2.4.4. Memory Configuration

The NEORV32 Processor was designed to provide maximum flexibility for the memory configuration. The processor can populate the instruction address space and/or the data address space with internal memories for instructions (IMEM) and data (DMEM). Processor external memories can be used as an alternative or even in combination with the internal ones. The figure below show some exemplary memory configurations.

800
Figure 3. Exemplary memory configurations
Internal Memories

The processor-internal memories (Instruction Memory (IMEM) and Data Memory (DMEM)) are enabled (=implemented) via the MEM_INT_IMEM_EN and MEM_INT_DMEM_EN generics. Their sizes are configures via the according MEM_INT_IMEM_SIZE and MEM_INT_DMEM_SIZE generics.

If the processor-internal IMEM is implemented, it is located right at the base address of the instruction address space (default ispace_base_c = 0x00000000). Vice versa, the processor-internal data memory is located right at the beginning of the data address space (default dspace_base_c = 0x80000000) when implemented.

The default processor setup uses only internal memories.
If the IMEM (internal or external) is less than the (default) maximum size (2GB), there is a "dead address space" between it and the DMEM. This provides an additional safety feature since data corrupting scenarios like stack overflow cannot directly corrupt the content of the IMEM: any access to the "dead address space" in between will raise an exception that can be caught by the runtime environment.
External Memories

If external memories (or further IP modules) shall be connected via the processor’s external bus interface, the interface has to be enabled via MEM_EXT_EN generic (=true). More information regarding this interface can be found in section Processor-External Memory Interface (WISHBONE) (AXI4-Lite).

Any CPU access (data or instructions), which does not fulfill at least one of the following conditions, is forwarded via the processor’s bus interface to external components:

  • access to the processor-internal IMEM and processor-internal IMEM is implemented

  • access to the processor-internal DMEM and processor-internal DMEM is implemented

  • access to the bootloader ROM and beyond → addresses >= BOOTROM_BASE (default 0xFFFF0000) will never be forwarded to the external memory interface

If no (or not all) processor-internal memories are implemented, the according base addresses are mapped to external memories. For example, if the processor-internal IMEM is not implemented (MEM_INT_IMEM_EN = false), the processor will forward any access to the instruction address space (starting at ispace_base_c) via the external bus interface to the external memory system.

If the external interface is deactivated, any access exceeding the internal memory address space (instruction, data, bootloader) or the internal peripheral address space will trigger a bus access fault exception.

2.4.5. Boot Configuration

Due to the flexible memory configuration concept, the NEORV32 Processor provides several different boot mechanisms. The figure below shows details and further options of the two most common boot scenarios.

800
Figure 4. NEORV32 boot configurations
The configuration of internal or external data memory (DMEM; MEM_INT_DMEM_EN = true / false) is not further relevant for the boot configuration itself. Hence, it is not further illustrated here.

The general boot scenario (1 or 2) is configured via the INT_BOOTLOADER_EN generic. If this generic is set true, boot scenario 1 is used. If it is set false, boot scenario 2 is used.

Please note that the provided boot scenarios are just exemplary setups that (should) fit most common requirements. Much more sophisticated boot scenarios are possible by combining internal and external memories. For example, the default internal bootloader could be used as first-level bootloader that loads (from extern SPI flash) a second-level bootloader that is placed and execute in internal IMEM. This second-level bootloader could then fetch the actual application and store it to external data memory and transfers CPU control to that.
Boot Scenario 1

Boot scenarios 1a and 1b use the processor-internal Bootloader. This general setup is enabled by setting the INT_BOOTLOADER_EN generic to true, which will implement the processor-internal Bootloader ROM (BOOTROM). This read-only memory is pre-initialized with the default bootloader and is mapped to the processor bootloader address space.

The bootloader provides several options to upload an executable (via UART or from external SPI flash) and store it to the instruction address space so the CPU can execute it. Boot scenario 1a uses the processor-internal IMEM (MEM_INT_IMEM_EN = true). This scenario implements the internal Instruction Memory (IMEM) as non-initialized true RAM so the bootloader can write the actual executable to it.

Boot scenario 1b uses a processor-external IMEM (MEM_INT_IMEM_EN = false) that is connected via the processor’s bus interface. In this scenario the internal Instruction Memory (IMEM) is not implemented at all and the bootloader will write the executable to the processor-external memory.

Boot Scenario 2

Boot scenarios 2a and 2b do not use the processor-internal bootloader. Hence, the INT_BOOTLOADER_EN generic is set false. In this configuration to Bootloader ROM (BOOTROM) is not implemented at all and a "pre-initialization" mechanism is required in order to provide an executable in memory.

Boot scenario 2a uses the processor-internal IMEM (MEM_INT_IMEM_EN = true) that is implemented as read-only memory in this scenario. It is pre-initialized (by the bitstream) with the actual application executable.

In contrast, boot scenario 2b uses a processor-external IMEM (MEM_INT_IMEM_EN = false). In this scenario the system designer is responsible for providing a initialized external memory that contains the actual application to be executed.

2.5. Processor-Internal Modules

Basically, the processor is a SoC consisting of the NEORV32 CPU, peripheral/IO devices, embedded memories, an external memory interface and a bus infrastructure to interconnect all units. Additionally, the system implements an internal reset generator and a global clock generator/divider.

Internal Reset Generator

Most processor-internal modules – except for the CPU and the watchdog timer – do not have a dedicated reset signal. However, all devices can be reset by software by clearing the corresponding unit’s control register. The automatically included application start-up code (crt0.S) will perform a software-reset of all modules to ensure a clean system reset state.

The hardware reset signal of the processor can either be triggered via the external reset pin (rstn_i, low-active) or by the internal watchdog timer (if implemented). Before the external reset signal is applied to the system, it is extended to have a minimal duration of eight clock cycles.

Internal Clock Divider

An internal clock divider generates 8 clock signals derived from the processor’s main clock input clk_i. These derived clock signals are not actual clock signals. Instead, they are derived from a simple counter and are used as "clock enable" signal by the different processor modules. Thus, the whole design operates using only the main clock signal (single clock domain). Some of the processor peripherals like the Watchdog or the UARTs can select one of the derived clock enabled signals for their internal operation. If none of the connected modules require a clock signal from the divider, it is automatically deactivated to reduce dynamic power.

The peripheral devices, which feature a time-based configuration, provide a three-bit prescaler select in their according control register to select one out of the eight available clocks. The mapping of the prescaler select bits to the actually obtained clock are shown in the table below. Here, f represents the processor main clock from the top entity’s clk_i signal.

Prescaler bits:

0b000

0b001

0b010

0b011

0b100

0b101

0b110

0b111

Resulting clock:

f/2

f/4

f/8

f/64

f/128

f/1024

f/2048

f/4096

Peripheral / IO Devices

The processor-internal peripheral/IO devices are located at the end of the 32-bit address space at base address 0xFFFFFE00. A region of 512 bytes is reserved for this devices. Hence, all peripheral/IO devices are accessed using a memory-mapped scheme. A special linker script as well as the NEORV32 core software library abstract the specific memory layout for the user.

When accessing an IO device that hast not been implemented (via the according IO_x_EN generic), a load/store access fault exception is triggered.
The peripheral/IO devices can only be written in full-word mode (i.e. 32-bit). Byte or half-word (8/16-bit) writes will trigger a store access fault exception. Read accesses are not size constrained. Processor-internal memories as well as modules connected to the external memory interface can still be written with a byte-wide granularity.
You should use the provided core software library to interact with the peripheral devices. This prevents incompatibilities with future versions, since the hardware driver functions handle all the register and register bit accesses.
Most of the IO devices do not have a hardware reset. Instead, the devices are reset via software by writing zero to the unit’s control register. A general software-based reset of all devices is done by the application start-up code crt0.S.

Nomenclature for the Peripheral / IO Devices Listing

Each peripheral device chapter features a register map showing accessible control and data registers of the according device including the implemented control and status bits. You can directly interact with these registers/bits via the provided C-code defines. These defines are set in the main processor core library include file sw/lib/include/neorv32.h. The registers and/or register bits, which can be accessed directly using plain C-code, are marked with a "[C]".

Not all registers or register bits can be arbitrarily read/written. The following read/write access types are available:

  • r/w registers / bits can be read and written

  • r/- registers / bits are read-only; any write access to them has no effect

  • -/w these registers / bits are write-only; they auto-clear in the next cycle and are always read as zero

Bits / registers that are not listed in the register map tables are not (yet) implemented. These registers / bits are always read as zero. A write access to them has no effect, but user programs should only write zero to them to keep compatible with future extension.
When writing to read-only registers, the access is nevertheless acknowledged, but no actual data is written. When reading data from a write-only register the result is undefined.

2.5.1. Instruction Memory (IMEM)

Hardware source file(s):

neorv32_imem.vhd

Software driver file(s):

none

implicitly used

Top entity port:

none

Configuration generics:

MEM_INT_IMEM_EN

implement processor-internal IMEM when true

MEM_INT_IMEM_SIZE

IMEM size in bytes

INT_BOOTLOADER_EN

use internal bootlodaer when true (implements IMEM as ROM)

CPU interrupts:

none

The default neorv32_imem.vhd HDL source file provides a generic memory design that infers embedded memory for larger memory configurations. You might need to replace/modify the source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping and/or timing.

Implementation of the processor-internal instruction memory is enabled via the processor’s MEM_INT_IMEM_EN generic. The size in bytes is defined via the MEM_INT_IMEM_SIZE generic. If the IMEM is implemented, the memory is mapped into the instruction memory space and located right at the beginning of the instruction memory space (default ispace_base_c = 0x00000000).

By default, the IMEM is implemented as RAM, so the content can be modified during run time. This is required when using a bootloader that can update the content of the IMEM at any time. If you do not need the bootloader anymore – since your application development has completed and you want the program to permanently reside in the internal instruction memory – the IMEM is automatically implemented as pre-intialized ROM when the processor-internal bootloader is disabled (INT_BOOTLOADER_EN = false).

When the IMEM is implemented as ROM, it will be initialized during synthesis with the actual application program image. The compiler toolchain will generate a VHDL initialization file rtl/core/neorv32_application_image.vhd, which is automatically inserted into the IMEM. If the IMEM is implemented as RAM (default), the memory will not be initialized at all.

2.5.2. Data Memory (DMEM)

Hardware source file(s):

neorv32_dmem.vhd

Software driver file(s):

none

implicitly used

Top entity port:

none

Configuration generics:

MEM_INT_DMEM_EN

implement processor-internal DMEM when true

MEM_INT_DMEM_SIZE

DMEM size in bytes

CPU interrupts:

none

The default neorv32_dmem.vhd HDL source file provides a generic memory design that infers embedded memory for larger memory configurations. You might need to replace/modify the source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping and/or timing.

Implementation of the processor-internal data memory is enabled via the processor’s MEM_INT_DMEM_EN generic. The size in bytes is defined via the MEM_INT_DMEM_SIZE generic. If the DMEM is implemented, the memory is mapped into the data memory space and located right at the beginning of the data memory space (default dspace_base_c = 0x80000000). The DMEM is always implemented as RAM.

2.5.3. Bootloader ROM (BOOTROM)

Hardware source file(s):

neorv32_boot_rom.vhd

Software driver file(s):

none

implicitly used

Top entity port:

none

Configuration generics:

INT_BOOTLOADER_EN

implement processor-internal bootloader when true

CPU interrupts:

none

The default neorv32_boot_rom.vhd HDL source file provides a generic memory design that infers embedded memory for larger memory configurations. You might need to replace/modify the source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping and/or timing.

This HDL modules provides a read-only memory that contain the executable code image of the bootloader. If the INT_BOOTLOADER_EN generic is true this module will be implemented and the CPU boot address is modified to directly execute the code from the bootloader ROM after reset.

The bootloader ROM is located at address 0xFFFF0000 and can occupy a address space of up to 32kB. The base address as well as the maximum address space size are fixed and cannot (should not!) be modified as this might address collision with other processor modules.

The bootloader memory is read-only and is automatically initialized with the bootloader executable image rtl/core/neorv32_bootloader_image.vhd during synthesis. The actual physical size of the ROM is also determined via synthesis and expanded to the next power of two. For example, if the bootloader code requires 10kB of storage, a ROM with 16kB will be generated. The maximum size must not exceed 32kB.

Bootloader - Software
See section Bootloader for more information regarding the actual bootloader software/executable itself.
Boot Configuration
See section Boot Configuration for more information regarding the processor’s different boot scenarios.

2.5.4. Processor-Internal Instruction Cache (iCACHE)

Hardware source file(s):

neorv32_icache.vhd

Software driver file(s):

none

implicitly used

Top entity port:

none

Configuration generics:

ICACHE_EN

implement processor-internal instruction cache when true

ICACHE_NUM_BLOCKS

number of cache blocks (pages/lines)

ICACHE_BLOCK_SIZE

size of a cache block in bytes

ICACHE_ASSOCIATIVITY

associativity / number of sets

CPU interrupts:

none

The default neorv32_icache.vhd HDL source file provides a generic memory design that infers embedded memory. You might need to replace/modify the source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping and/or timing.

The processor features an optional cache for instructions to compensate memories with high latency. The cache is directly connected to the CPU’s instruction fetch interface and provides a full-transparent buffering of instruction fetch accesses to the entire 4GB address space.

The instruction cache is intended to accelerate instruction fetch via the external memory interface. Since all processor-internal memories provide an access latency of one cycle (by default), caching internal memories does not bring any performance gain. However, it might reduce traffic on the processor-internal bus.

The cache is implemented if the ICACHE_EN generic is true. The size of the cache memory is defined via ICACHE_BLOCK_SIZE (the size of a single cache block/page/line in bytes; has to be a power of two and >= 4 bytes), ICACHE_NUM_BLOCKS (the total amount of cache blocks; has to be a power of two and >= 1) and the actual cache associativity ICACHE_ASSOCIATIVITY (number of sets; 1 = direct-mapped, 2 = 2-way set-associative, has to be a power of two and >= 1).

If the cache associativity (ICACHE_ASSOCIATIVITY) is > 1 the LRU replacement policy (least recently used) is used.

Keep the features of the targeted FPGA’s memory resources (block RAM) in mind when configuring the cache size/layout to maximize and optimize resource utilization.

By executing the ifence.i instruction (Zifencei CPU extension) the cache is cleared and a reload from main memory is forced. Among other things, this allows to implement self-modifying code.

Bus Access Fault Handling

The cache always loads a complete cache block (ICACHE_BLOCK_SIZE bytes) aligned to the size of a cache block if a miss is detected. If any of the accessed addresses within a single block do not successfully acknowledge (i.e. issuing an error signal or timing out) the whole cache block is invalidate and any access to an address within this cache block will also raise an instruction fetch bus error fault exception.

2.5.5. Processor-External Memory Interface (WISHBONE) (AXI4-Lite)

Hardware source file(s):

neorv32_wishbone.vhd

Software driver file(s):

none

implicitly used

Top entity port:

wb_tag_o

request tag output (3-bit)

wb_adr_o

address output (32-bit)

wb_dat_i

data input (32-bit)

wb_dat_o

data output (32-bit)

wb_we_o

write enable (1-bit)

wb_sel_o

byte enable (4-bit)

wb_stb_o

strobe (1-bit)

wb_cyc_o

valid cycle (1-bit)

wb_lock_o

exclusive access request (1-bit)

wb_ack_i

acknowledge (1-bit)

wb_err_i

bus error (1-bit)

fence_o

an executed fence instruction

fencei_o

an executed fence.i instruction

Configuration generics:

MEM_EXT_EN

enable external memory interface when true

MEM_EXT_TIMEOUT

number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)

Configuration constants in VHDL package file neorv32_package.vhd:

wb_pipe_mode_c

when false (default): classic/standard Wishbone protocol; when true: pipelined Wishbone protocol

wb_big_endian_c

byte-order (Endianness) of external memory interface; true=BIG, false=little (default)

wb_rx_buffer_c

enable register buffer for RX path (default)

CPU interrupts:

none

The external memory interface uses the Wishbone interface protocol. The external interface port is available when the MEM_EXT_EN generic is true. This interface can be used to attach external memories, custom hardware accelerators additional IO devices or all other kinds of IP blocks. All memory accesses from the CPU, that do not target the internal bootloader ROM, the internal IO region or the internal data/instruction memories (if implemented at all) are forwarded to the Wishbone gateway and thus to the external memory interface.

When using the default processor setup, all access addresses between 0x00000000 and 0xffff0000 (= beginning of processor-internal BOOT ROM) are delegated to the external memory / bus interface if they are not targeting the (actually enabled/implemented) processor-internal instruction memory (IMEM) or the (actually enabled/implemented) processor-internal data memory (DMEM). See section Address Space for more information.

Wishbone Bus Protocol

The external memory interface either uses standard ("classic") Wishbone transactions (default) or pipelined Wishbone transactions. The transaction protocol is configured via the wb_pipe_mode_c constant in the in the main VHDL package file (rtl/neorv32_package.vhd):

-- external bus interface --
constant wb_pipe_mode_c : boolean := false;

When wb_pipe_mode_c is disabled, all bus control signals including STB are active (and stable) until the transfer is acknowledged/terminated. If wb_pipe_mode_c is enabled, all bus control except STB are active (and stable) until the transfer is acknowledged/terminated. In this case, STB is active only during the very first bus clock cycle.

Table 8. Exemplary Wishbone bus accesses using "classic" and "pipelined" protocol
700
700

Classic Wishbone read access

Pipelined Wishbone write access

A detailed description of the implemented Wishbone bus protocol and the according interface signals can be found in the data sheet "Wishbone B4 – WISHBONE System-on-Chip (SoC) Interconnection Architecture for Portable IP Cores". A copy of this document can be found in the docs folder of this project.

Interface Latency

By default, the Wishbone gateway introduces two additional latency cycles: processor-outgoing ("TX") and processor-incoming ("RX") signals are fully registered. Thus, any access from the CPU to a processor-external devices via Wishbone requires 2 additional clock cycles (at least; depending on device’s latency).

If the attached Wishbone network / peripheral already provides output registers or if the Wishbone network is not relevant for timing closure, the default buffering of incoming ("RX") data within the gateway can be disabled. The configuration is done via the wb_rx_buffer_c constant in the in the main VHDL package file (rtl/neorv32_package.vhd):

-- external bus interface --
constant wb_rx_buffer_c : boolean := false; -- false to implement "async" RX (non-default)

Bus Access Timeout

The Wishbone bus interface provides an option to configure a bus access timeout counter. The MEM_EXT_TIMEOUT top generic is used to specify the maximum time (in clock cycles) a bus access can be pending before it is automatically terminated. If MEM_EXT_TIMEOUT is set to zero, the timeout disabled an a bus access can take an arbitrary number of cycles to complete.

When MEM_EXT_TIMEOUT is greater than zero, the WIshbone adapter starts an internal countdown whenever the CPU accesses a memory address via the external memory interface. If the accessed memory / device does not acknowledge (via wb_ack_i) or terminate (via wb_err_i) the transfer within MEM_EXT_TIMEOUT clock cycles, the bus access is automatically canceled (setting wb_cyc_o low again) and a load/store/instruction fetch bus access fault exception is raised.

This feature can be used as safety guard if the external memory system does not check for "address space holes". That means that addresses, which do not belong to a certain memory or device, do not permanently stall the processor due to an unacknowledged/unterminated bus access. If the external memory system can guarantee to access any bus access (even it targets an unimplemented address) the timeout feature should be disabled (MEM_EXT_TIMEOUT = 0).

Wishbone Tag

The 3-bit wishbone wb_tag_o signal provides additional information regarding the access type. This signal is compatible to the AXI4 AxPROT signal.

  • wb_tag_o(0) 1: privileged access (CPU is in machine mode); 0: unprivileged access

  • wb_tag_o(1) always zero (indicating "secure access")

  • wb_tag_o(2) 1: instruction fetch access, 0: data access

Exclusive / Atomic Bus Access

If the atomic memory access CPU extension (via CPU_EXTENSION_RISCV_A) is enabled, the CPU can request an atomic/exclusive bus access via the external memory interface.

The load-reservate instruction (lr.w) will set the wb_lock_o signal telling the bus interconnect to establish a reservation for the current accessed address (start of an exclusive access). This signal will stay asserted until another memory access instruction is executed (for example a sc.w).

The memory system has to make sure that no other entity can access the reservated address until wb_lock_o is released again. If this attempt fails, the memory system has to assert wb_err_i in order to indicate that the reservation was broken.

See section Bus Interface for the CPU bus interface protocol.

Endianness

The NEORV32 CPU and the Processor setup are little-endian architectures. To allow direct connection to a big-endian memory system the external bus interface provides an Endianness configuration. The Endianness (of the external memory interface) can be configured via the global wb_big_endian_c constant in the main VHDL package file (rtl/neorv32_package.vhd). By default, the external memory interface uses little-endian byte-order.

-- external bus interface --
constant wb_big_endian_c : boolean := true;

Application software can check the Endianness configuration of the external bus interface via the SYSINFO_FEATURES_MEM_EXT_ENDIAN flag in the processor’s SYSINFO module (see section System Configuration Information Memory (SYSINFO) for more information).

AXI4-Lite Connectivity

The AXI4-Lite wrapper (rtl/templates/system/neorv32_SystemTop_axi4lite.vhd) provides a Wishbone-to- AXI4-Lite bridge, compatible with Xilinx Vivado (IP packager and block design editor). All entity signals of this wrapper are of type std_logic or std_logic_vector, respectively.

The AXI Interface has been verified using Xilinx Vivado IP Packager and Block Designer. The AXI interface port signals are automatically detected when packaging the core.

neorv32 axi soc
Figure 5. Example AXI SoC using Xilinx Vivado
Using the auto-termination timeout feature (MEM_EXT_TIMEOUT greater than zero) is not AXI4 compliant as the AXI protocol does not support canceling of bus transactions. Therefore, the NEORV32 top wrapper with AXI4-Lite interface (rtl/templates/system/neorv32_SystemTop_axi4lite) configures MEM_EXT_TIMEOUT = 0 by default.

2.5.6. General Purpose Input and Output Port (GPIO)

Hardware source file(s):

neorv32_gpio.vhd

Software driver file(s):

neorv32_gpio.c

neorv32_gpio.h

Top entity port:

gpio_o

32-bit parallel output port

gpio_i

32-bit parallel input port

Configuration generics:

IO_GPIO_EN

implement GPIO port when true

CPU interrupts:

FIRQ channel 8

pin-change interrupt (see Processor Interrupts)

Theory of Operation

The general purpose parallel IO port unit provides a simple 32-bit parallel input port and a 32-bit parallel output port. These ports can be used chip-externally (for example to drive status LEDs, connect buttons, etc.) or system-internally to provide control signals for other IP modules. When the modules is disabled for implementation the GPIO output port is tied to zero.

Pin-Change Interrupt

The parallel input port gpio_i features a single pin-change interrupt. Whenever an input pin has a low-to-high or high-to-low transition, the interrupt is triggered. By default, the pin-change interrupt is disabled and can be enabled using a bit mask that has to be written to the GPIO_INPUT register. Each set bit in this mask enables the pin-change interrupt for the corresponding input pin. If more than one input pin is enabled for triggering the pin-change interrupt, any transition on one of the enabled input pins will trigger the CPU’s pinchange interrupt. If the modules is disabled for implementation, the pin-change interrupt is also permanently disabled.

Table 9. GPIO unit register map
Address Name [C] Bit(s) R/W Function

0xffffff80

GPIO_INPUT

31:0

r/-

parallel input port

31:0

-/w

parallel input pin-change IRQ enable mask

0xffffff84

GPIO_OUTPUT

31:0

r/w

parallel output port

2.5.7. Watchdog Timer (WDT)

Hardware source file(s):

neorv32_wdt.vhd

Software driver file(s):

neorv32_wdt.c

neorv32_wdt.h

Top entity port:

none

Configuration generics:

IO_WDT_EN

implement GPIO port when true

CPU interrupts:

FIRQ channel 0

watchdog timer overflow (see Processor Interrupts)

Theory of Operation

The watchdog (WDT) provides a last resort for safety-critical applications. The WDT has an internal 20-bit wide counter that needs to be reset every now and then by the user program. If the counter overflows, either a system reset or an interrupt is generated (depending on the configured operation mode).

Configuration of the watchdog is done by a single control register WDT_CT. The watchdog is enabled by setting the WDT_CT_EN bit. The clock used to increment the internal counter is selected via the 3-bit WDT_CT_CLK_SELx prescaler:

WDT_CT_CLK_SELx Main clock prescaler Timeout period in clock cycles

0b000

2

2 097 152

0b001

4

4 194 304

0b010

8

8 388 608

0b011

64

67 108 864

0b100

128

134 217 728

0b101

1024

1 073 741 824

0b110

2048

2 147 483 648

0b111

4096

4 294 967 296

Whenever the internal timer overflows the watchdog executes one of two possible actions: Either a hard processor reset is triggered or an interrupt is requested at CPU’s fast interrupt channel #0. The WDT_CT_MODE bit defines the action to be taken on an overflow: When cleared, the Watchdog will trigger an IRQ, when set the WDT will cause a system reset. The configured actions can also be triggered manually at any time by setting the WDT_CT_FORCE bit. The watchdog is reset by setting the WDT_CT_RESET bit.

The cause of the last action of the watchdog can be determined via the WDT_CT_RCAUSE flag. If this flag is zero, the processor has been reset via the external reset signal. If this flag is set the last system reset was initiated by the watchdog.

The Watchdog control register can be locked in order to protect the current configuration. The lock is activated by setting bit WDT_CT_LOCK. In the locked state any write access to the configuration flags is ignored (see table below, "accessible if locked"). Read accesses to the control register are not effected. The lock can only be removed by a system reset (via external reset signal or via a watchdog reset action).

Table 10. WDT register map
Address Name [C] Bit(s), Name [C] R/W Writable if locked Function

0xffffff8c

WDT_CT

0 WDT_CT_EN

r/w

no

watchdog enable

1 WDT_CT_CLK_SEL0

r/w

no

3-bit clock prescaler select

2 WDT_CT_CLK_SEL1

r/w

no

3 WDT_CT_CLK_SEL2

r/w

no

4 WDT_CT_MODE

r/w

no

overflow action: 1=reset, 0=IRQ

5 WDT_CT_RCAUSE

r/-

-

cause of last system reset: 0=caused by external reset signal, 1=caused by watchdog

6 WDT_CT_RESET

-/w

yes

watchdog reset when set, auto-clears

7 WDT_CT_FORCE

-/w

yes

force configured watchdog action when set, auto-clears

8 WDT_CT_LOCK

r/w

no

lock access to configuration when set, clears only on system reset (via external reset signal OR watchdog reset action = reset)

2.5.8. Machine System Timer (MTIME)

Hardware source file(s):

neorv32_mtime.vhd

Software driver file(s):

neorv32_mtime.c

neorv32_mtime.h

Top entity port:

mtime_i

System time input from external MTIME

mtime_o

System time output (64-bit) for SoC

Configuration generics:

IO_MTIME_EN

implement MTIME when true

CPU interrupts:

MTI

machine timer interrupt (see Processor Interrupts)

Theory of Operation

The MTIME machine system timer implements the memory-mapped MTIME timer from the official RISC-V specifications. This unit features a 64-bit system timer incremented with the primary processor clock. The current system time can also be obtained using the time[h] CSRs and is made available for processor-external use via the top’s mtime_o signal.

If the processor-internal MTIME unit is NOT implemented, the top’s mtime_i input signal is used to update the time[h] CSRs and the MTI machine timer interrupt) CPU interrupt is directly connected to the top’s mtime_irq_i input.

The 64-bit system time can be accessed via the MTIME_LO and MTIME_HI memory-mapped registers (read/write) and also via the CPU’s time[h] CSRs (read-only). A 64-bit time compare register – accessible via memory-mapped MTIMECMP_LO and MTIMECMP_HI registers – are used to configure an interrupt to the CPU. The interrupt is triggered whenever MTIME (high & low part) >= MTIMECMP (high & low part) and is directly forwarded to the CPU’s MTI interrupt.

The interrupt request is a single-shot signal, so the CPU is triggered once if the system time is greater than or equal to the compare time. Hence, another MTIME IRQ is only possible when updating MTIMECMP.

The 64-bit counter and the 64-bit comparator are implemented as 2×32-bit counters and comparators with a registered carry to prevent a 64-bit carry chain and thus, to simplify timing closure.

Table 11. MTIME register map
Address Name [C] Bits R/W Function

0xffffff90

MTIME_LO

31:0

r/w

machine system time, low word

0xffffff94

MTIME_HI

31:0

r/w

machine system time, high word

0xffffff98

MTIMECMP_LO

31:0

r/w

time compare, low word

0xffffff9c

MTIMECMP_HI

31:0

r/w

time compare, high word

2.5.9. Primary Universal Asynchronous Receiver and Transmitter (UART0)

Hardware source file(s):

neorv32_uart.vhd

Software driver file(s):

neorv32_uart.c

neorv32_uart.h

Top entity port:

uart0_txd_o

serial transmitter output UART0

uart0_rxd_i

serial receiver input UART0

uart0_rts_o

flow control: RX ready to receive

uart0_cts_i

flow control: TX allowed to send

Configuration generics:

IO_UART0_EN

implement UART0 when true

CPU interrupts:

fast IRQ channel 2

RX done interrupt

fast IRQ channel 3

TX done interrupt (see Processor Interrupts)

Please note that ALL default example programs and software libraries of the NEORV32 software framework (including the bootloader and the runtime environment) use the primary UART (UART0) as default user console interface. For compatibility, all C-language function calls to neorv32_uart_* are mapped to the according primary UART (UART0) neorv32_uart0_* functions.

Theory of Operation

In most cases, the UART is a standard interface used to establish a communication channel between the computer/user and an application running on the processor platform. The NEORV32 UARTs features a standard configuration frame configuration: 8 data bits, an optional parity bit (even or odd) and 1 stop bit. The parity and the actual Baudrate are configurable by software.

The UART0 is enabled by setting the UART_CT_EN bit in the UART control register UART0_CT. The actual transmission Baudrate (like 19200) is configured via the 12-bit UART_CT_BAUDxx baud prescaler (baud_rate) and the 3-bit UART_CT_PRSCx clock prescaler.

Table 12. UART prescaler configuration
UART_CT_PRSCx 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111

Resulting clock_prescaler

2

4

8

64

128

1024

2048

4096

Baudrate = (fmain[Hz] / clock_prescaler) / (baud_rate + 1)

A new transmission is started by writing the data byte to be send to the lowest byte of the UART0_DATA register. The transfer is completed when the UART_CT_TX_BUSY control register flag returns to zero. A new received byte is available when the UART_DATA_AVAIL flag of the UART0_DATA register is set. A "frame error" in a received byte (broken stop bit) is indicated via the UART_DATA_FERR flag in the UART0_DATA register.

RX Double-Buffering

The UART receive engine provides a simple data buffer with two entries. These two entries are transparent for the user. The transmitting device can send up to 2 chars to the UART without risking data loss. If another char is sent before at least one char has been read from the buffer data loss occurs. This situation can be detected via the receiver overrun flag UART_DATA_OVERR in the UART0_DATA register. The flag is automatically cleared after reading UART0_DATA.

Parity Modes

The parity flag is added if the UART_CT_PMODE1 flag is set. When UART_CT_PMODE0 is zero the UART operates in "even parity" mode. If this flag is set, the UART operates in "odd parity" mode. Parity errors in received data are indicated via the UART_DATA_PERR flag in the UART_DATA registers. This flag is updated with each new received character. A frame error in the received data (i.e. stop bit is not set) is indicated via the UART_DATA_FERR flag in the UART0_DATA. This flag is also updated with each new received character

Hardware Flow Control – RTS/CTS

The UART supports hardware flow control using the standard CTS (clear to send) and/or RTS (ready to send / ready to receive "RTR") signals. Both hardware control flow mechanisms can be individually enabled.

If RTS hardware flow control is enabled by setting the UART_CT_RTS_EN control register flag, the UART will pull the uart0_rts_o signal low if the UART’s receiver is idle and no received data is waiting to get read by application software. As long as this signal is low the connected device can send new data. uart0_rts_o is always LOW if the UART is disabled.

The RTS line is de-asserted (going high) as soon as the start bit of a new incoming char has been detected. The transmitting device continues sending the current char and can also send another char (due to the RX double-buffering), which is done by most terminal programs. Any additional data send when RTS is still asserted will override the RX input buffer causing data loss. This will set the UART_DATA_OVERR flag in the UART0_DATA register. Any read access to this register clears the flag again.

If CTS hardware flow control is enabled by setting the UART_CT_CTS_EN control register flag, the UART’s transmitter will not start sending a new char until the uart0_cts_i signal goes low. If a new data to be send is written to the UART data register while uart0_cts_i is not asserted (=low), the UART will wait for uart0_cts_i to become asserted (=high) before sending starts. During this time, the UART busy flag UART_CT_TX_BUSY remains set.

If uart0_cts_i is asserted, no new data transmission will be started by the UART. The state of the uart0_cts_i signals has no effect on a transmission being already in progress.

Signal changes on uart0_cts_i during an active transmission are ignored. Application software can check the current state of the uart0_cts_o input signal via the UART_CT_CTS control register flag.

Please note that – just like the RXD and TXD signals – the RTS and CTS signals have to be cross-coupled between devices.

Interrupts

The UART features two interrupts: the "TX done interrupt" is triggered when a transmit operation (sending) has finished. The "RX done interrupt" is triggered when a data byte has been received. If the UART0 is not implemented, the UART0 interrupts are permanently tied to zero.

The UART’s RX interrupt is always triggered when a new data word has arrived – regardless of the state of the RX double-buffer.

Simulation Mode

The default UART0 operation will transmit any data written to the UART0_DATA register via the serial TX line at the defined baud rate. Even though the default testbench provides a simulated UART0 receiver, which outputs any received char to the simulator console, such a transmission takes a lot of time. To accelerate UART0 output during simulation (and also to dump large amounts of data for further processing like verification) the UART0 features a simulation mode.

The simulation mode is enabled by setting the UART_CT_SIM_MODE bit in the UART0’s control register UART0_CT. Any other UART0 configuration bits are irrelevant, but the UART0 has to be enabled via the UART_CT_EN bit. When the simulation mode is enabled, any written char to UART0_DATA (bits 7:0) is directly output as ASCII char to the simulator console. Additionally, all text is also stored to a text file neorv32.uart0.sim_mode.text.out in the simulation home folder. Furthermore, the whole 32-bit word written to UART0_DATA is stored as plain 8-char hexadecimal value to a second text file neorv32.uart0.sim_mode.data.out also located in the simulation home folder.

If the UART is configured for simulation mode there will be NO physical UART0 transmissions via uart0_txd_o at all. Furthermore, no interrupts (RX done or TX done) will be triggered in any situation.

More information regarding the simulation-mode of the UART0 can be found in section [_simulating_the_processor].
Table 13. UART0 register map
Address Name [C] Bit(s), Name [C] R/W Function

0xffffffa0

UART0_CT

11:0 UART_CT_BAUDxx

r/w

12-bit BAUD value configuration value

12 UART_CT_SIM_MODE

r/w

enable simulation mode

20 UART_CT_RTS_EN

r/w

enable RTS hardware flow control

21 UART_CT_CTS_EN

r/w

enable CTS hardware flow control

22 UART_CT_PMODE0

r/w

parity bit enable and configuration (00/01= no parity; 10=even parity; 11=odd parity)

23 UART_CT_PMODE1

r/w

24 UART_CT_PRSC0

r/w

3-bit baudrate clock prescaler select

25 UART_CT_PRSC1

r/w

26 UART_CT_PRSC2

r/w

27 UART_CT_CTS

r/-

current state of UART’s CTS input signal

28 UART_CT_EN

r/w

UART enable

31 UART_CT_TX_BUSY

r/-

trasmitter busy flag

0xffffffa4

UART0_DATA

7:0 UART_DATA_MSB : UART_DATA_LSB

r/w

receive/transmit data (8-bit)

31:0 -

-/w

simulation data output

28 UART_DATA_PERR

r/-

RX parity error

29 UART_DATA_FERR

r/-

RX data frame error (stop bit nt set)

30 UART_DATA_OVERR

r/-

RX data overrun

31 UART_DATA_AVAIL

r/-

RX data available when set

2.5.10. Secondary Universal Asynchronous Receiver and Transmitter (UART1)

Hardware source file(s):

neorv32_uart.vhd

Software driver file(s):

neorv32_uart.c

neorv32_uart.h

Top entity port:

uart1_txd_o

serial transmitter output UART1

uart1_rxd_i

serial receiver input UART1

uart1_rts_o

flow control: RX ready to receive

uart1_cts_i

flow control: TX allowed to send

Configuration generics:

IO_UART1_EN

implement UART1 when true

CPU interrupts:

fast IRQ channel 4

RX done interrupt

fast IRQ channel 5

TX done interrupt (see Processor Interrupts)

Theory of Operation

The secondary UART (UART1) is functional identical to the primary UART (Primary Universal Asynchronous Receiver and Transmitter (UART0)). Obviously, UART1 has different addresses for thw control register (UART1_CT) and the data register (UART1_DATA) – see the register map below. However, the register bits/flags use the same bit positions and naming. Furthermore, the "RX done" and "TX done" interrupts are mapped to different CPU fast interrupt channels.

Simulation Mode

The secondary UART (UART1) provides the same simulation options as the primary UART. However, output data is written to UART1-specific files: neorv32.uart1.sim_mode.text.out is used to store plain ASCII text and neorv32.uart1.sim_mode.data.out is used to store full 32-bit hexadecimal encoded data words.

Table 14. UART1 register map
Address Name [C] Bit(s), Name [C] R/W Function

0xffffffd0

UART1_CT

11:0 UART_CT_BAUDxx

r/w

12-bit BAUD value configuration value

12 UART_CT_SIM_MODE

r/w

enable simulation mode

20 UART_CT_RTS_EN

r/w

enable RTS hardware flow control

21 UART_CT_CTS_EN

r/w

enable CTS hardware flow control

22 UART_CT_PMODE0

r/w

parity bit enable and configuration (00/01= no parity; 10=even parity; 11=odd parity)

23 UART_CT_PMODE1

r/w

24 UART_CT_PRSC0

r/w

3-bit baudrate clock prescaler select

25 UART_CT_PRSC1

r/w

26 UART_CT_PRSC2

r/w

27 UART_CT_CTS

r/-

current state of UART’s CTS input signal

28 UART_CT_EN

r/w

UART enable

31 UART_CT_TX_BUSY

r/-

trasmitter busy flag

0xffffffd4

UART1_DATA

7:0 UART_DATA_MSB : UART_DATA_LSB

r/w

receive/transmit data (8-bit)

31:0 -

-/w

simulation data output

28 UART_DATA_PERR

r/-

RX parity error

29 UART_DATA_FERR

r/-

RX data frame error (stop bit nt set)

30 UART_DATA_OVERR

r/-

RX data overrun

31 UART_DATA_AVAIL

r/-

RX data available when set

2.5.11. Serial Peripheral Interface Controller (SPI)

Hardware source file(s):

neorv32_spi.vhd

Software driver file(s):

neorv32_spi.c

neorv32_spi.h

Top entity port:

spi_sck_o

1-bit serial clock output

spi_sdo_i

1-bit serial data output

spi_sdi_o

1-bit serial data input

spi_csn_i

8-bit dedicated chip select (low-active)

Configuration generics:

IO_SPI_EN

implement SPI controller when true

CPU interrupts:

fast IRQ channel 6

transmission done interrupt (see Processor Interrupts)

Theory of Operation

SPI is a synchronous serial transmission interface. The NEORV32 SPI transceiver allows 8-, 16-, 24- and 32- bit long transmissions. The unit provides 8 dedicated chip select signals via the top entity’s spi_csn_o signal.

The SPI unit is enabled via the SPI_CT_EN bit in the SPI_CT control register. The idle clock polarity is configured via the SPI_CT_CPHA bit and can be low (0) or high (1) during idle. The data quantity to be transferred within a single transmission is defined via the SPI_CT_SIZEx bits. The unit supports 8-bit (00), 16-bit (01), 24- bit (10) and 32-bit (11) transfers. Whenever a transfer is completed, the "transmission done interrupt" is triggered. A transmission is still in progress as long as the SPI_CT_BUSY flag is set.

The SPI controller features 8 dedicated chip-select lines. These lines are controlled via the control register’s SPI_CT_CSx bits. When a specifc SPI_CT_CSx bit is set, the according chip select line spi_csn_o(x) goes low (low-active chip select lines).

The SPI clock frequency is defined via the 3-bit SPI_CT_PRSCx clock prescaler. The following prescalers are available:

Table 15. SPI prescaler configuration
SPI_CT_PRSCx 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111

Resulting clock_prescaler

2

4

8

64

128

1024

2048

4096

Based on the SPI_CT_PRSCx configuration, the actual SPI clock frequency fSPI is derived from the processor’s main clock fmain and is determined by:

fSPI = fmain[Hz] / (2 * clock_prescaler)

A transmission is started when writing data to the SPI_DATA register. The data must be LSB-aligned. So if the SPI transceiver is configured for less than 32-bit transfers data quantity, the transmit data must be placed into the lowest 8/16/24 bit of SPI_DATA. Vice versa, the received data is also always LSB-aligned.

Table 16. SPI register map
Address Name [C] Bit(s), Name [C] R/W Function

0xffffffa8

SPI_CT

0 SPI_CT_CS0

r/w

Direct chip-select 0..7; setting spi_csn_o(x) low when set

1 SPI_CT_CS1

r/w

2 SPI_CT_CS2

r/w

3 SPI_CT_CS3

r/w

4 SPI_CT_CS4

r/w

5 SPI_CT_CS5

r/w

6 SPI_CT_CS6

r/w

7 SPI_CT_CS7

r/w

8 SPI_CT_EN

r/w

SPI enable

9 SPI_CT_CPHA

r/w

polarity of spi_sck_o when idle

10 SPI_CT_PRSC0

r/w

3-bit clock prescaler select

11 SPI_CT_PRSC1

r/w

12 SPI_CT_PRSC2

r/w

14 SPI_CT_SIZE0

r/w

transfer size (00=8-bit, 01=16-bit, 10=24-bit, 11=32-bit)

15 SPI_CT_SIZE1

r/w

31 SPI_CT_BUSY

r/-

transmission in progress when set

0xffffffac

SPI_DATA

31:0

r/w

receive/transmit data, LSB-aligned

2.5.12. Two-Wire Serial Interface Controller (TWI)

Hardware source file(s):

neorv32_twi.vhd

Software driver file(s):

neorv32_twi.c

neorv32_twi.h

Top entity port:

twi_sda_io

1-bit bi-directional serial data

twi_scl_io

1-bit bi-directional serial clock

Configuration generics:

IO_TWI_EN

implement TWI controller when true

CPU interrupts:

fast IRQ channel 7

transmission done interrupt (see Processor Interrupts)

Theory of Operation

The two wire interface – also called "I²C" – is a quite famous interface for connecting several on-board components. Since this interface only needs two signals (the serial data line twi_sda_io and the serial clock line twi_scl_io) – despite of the number of connected devices – it allows easy interconnections of several peripheral nodes.

The NEORV32 TWI implements a TWI controller. It features "clock stretching" (if enabled via the control register), so a slow peripheral can halt the transmission by pulling the SCL line low. Currently, no multi-controller support is available. Also, the NEORV32 TWI unit cannot operate in peripheral mode.

The TWI is enabled via the TWI_CT_EN bit in the TWI_CT control register. The user program can start / stop a transmission by issuing a START or STOP condition. These conditions are generated by setting the according bits (TWI_CT_START or TWI_CT_STOP) in the control register.

Data is send by writing a byte to the TWI_DATA register. Received data can also be read from this register. The TWI controller is busy (transmitting data or performing a START or STOP condition) as long as the TWI_CT_BUSY bit in the control register is set.

An accessed peripheral has to acknowledge each transferred byte. When the TWI_CT_ACK bit is set after a completed transmission, the accessed peripheral has send an acknowledge. If it is cleared after a transmission, the peripheral has send a not-acknowledge (NACK). The NEORV32 TWI controller can also send an ACK by itself ("controller acknowledge MACK") after a transmission by pulling SDA low during the ACK time slot. Set the TWI_CT_MACK bit to activate this feature. If this bit is cleared, the ACK/NACK of the peripheral is sampled in this time slot instead (normal mode).

In summary, the following independent TWI operations can be triggered by the application program:

  • send START condition (also as REPEATED START condition)

  • send STOP condition

  • send (at least) one byte while also sampling one byte from the bus

The serial clock (SCL) and the serial data (SDA) lines can only be actively driven low by the controller. Hence, external pull-up resistors are required for these lines.

The TWI clock frequency is defined via the 3-bit TWI_CT_PRSCx clock prescaler. The following prescalers are available:

Table 17. TWI prescaler configuration
TWI_CT_PRSCx 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111

Resulting clock_prescaler

2

4

8

64

128

1024

2048

4096

Based on the TWI_CT_PRSCx configuration, the actual TWI clock frequency fSCL is derived from the processor main clock fmain and is determined by:

fSCL = fmain[Hz] / (4 * clock_prescaler)

Table 18. TWI register map
Address Name [C] Bit(s), Name [C] R/W Function

0xffffffb0

TWI_CT

0 TWI_CT_EN

r/w

TWI enable

1 TWI_CT_START

r/w

generate START condition

2 TWI_CT_STOP

r/w

generate STOP condition

3 TWI_CT_PRSC0

r/w

3-bit clock prescaler select

4 TWI_CT_PRSC1

r/w

5 TWI_CT_PRSC2

r/w

6 TWI_CT_MACK

r/w

generate controller ACK for each transmission ("MACK")

7 TWI_CT_CKSTEN

r/w

allow clock-stretching by peripherals when set

30 TWI_CT_ACK

r/-

ACK received when set

31 TWI_CT_BUSY

r/-

transfer/START/STOP in progress when set

0xffffffb4

TWI_DATA

7:0 TWI_DATA_MSB : TWI_DATA_LSB_

r/w

receive/transmit data

2.5.13. Pulse-Width Modulation Controller (PWM)

Hardware source file(s):

neorv32_pwm.vhd

Software driver file(s):

neorv32_pwm.c

neorv32_pwm.h

Top entity port:

pwm_o

up to 60 PWM output channels (1-bit per channel)

Configuration generics:

IO_PWM_NUM_CH

number of PWM channels to implement (0..60)

CPU interrupts:

none

The PWM controller implements a pulse-width modulation controller with up to 60 independent channels and 8- bit resolution per channel. The actual number of implemented channels is defined by the IO_PWM_NUM_CH generic. Setting this generic to zero will completely remove the PWM controller from the design.

The PWM controller is based on an 8-bit base counter with a programmable threshold comparators for each channel that defines the actual duty cycle. The controller can be used to drive fancy RGB-LEDs with 24- bit true color, to dim LCD back-lights or even for "analog" control. An external integrator (RC low-pass filter) can be used to smooth the generated "analog" signals.

Theory of Operation

The PWM controller is activated by setting the PWM_CT_EN bit in the module’s control register PWM_CT. When this bit is cleared, the unit is reset and all PWM output channels are set to zero. The 8-bit duty cycle for each channel, which represents the channel’s "intensity", is defined via an 8-bit value. The module provides up to 15 duty cycle registers PWM_DUTY0 to PWM_DUTY14 (depending on the number of implemented channels). Each register contains the duty cycle configuration for 4 consecutive channels. For example, the duty cycle of channel 0 is defined via bits 7:0 in PWM_DUTY0. The duty cycle of channel 2 is defined via bits 15:0 in PWM_DUTY0. Channel 4’s duty cycle is defined via bits 7:0 in PWM_DUTY1 and so on.

Regardless of the configuration of IO_PWM_NUM_CH all module registers can be accessed without raising an exception. Software can discover the number of available channels by writing 0xff to all duty cycle configuration bytes and reading those values back. The duty-cycle of channels that were not implemented always reads as zero.

Based on the configured duty cycle the according intensity of the channel can be computed by the following formula:

Intensityx = PWM_DUTY_CHx / (28)

The base frequency of the generated PWM signals is defined by the PWM core clock. This clock is derived from the main processor clock and divided by a prescaler via the 3-bit PWM_CT_PRSCx in the unit’s control register. The following prescalers are available:

Table 19. PWM prescaler configuration
PWM_CT_PRSCx 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111

Resulting clock_prescaler

2

4

8

64

128

1024

2048

4096

The resulting PWM base frequency is defined by:

fPWM = fmain[Hz] / (28 * clock_prescaler)

Table 20. PWM register map
Address Name [C] Bit(s), Name [C] R/W Function

0xfffffe80

PWM_CT

0 PWM_CT_EN

r/w

PWM enable

1 PWM_CT_PRSC0

r/w

3-bit clock prescaler select

2 PWM_CT_PRSC1

r/w

3 PWM_CT_PRSC2

r/w

0xfffffe84

PWM_DUTY0

7:0

r/w

8-bit duty cycle for channel 0

15:8

r/w

8-bit duty cycle for channel 1

23:16

r/w

8-bit duty cycle for channel 2

31:24

r/w

8-bit duty cycle for channel 3

…​

…​

…​

r/w

…​

0xfffffebc

PWM_DUTY14

7:0

r/w

8-bit duty cycle for channel 56

15:8

r/w

8-bit duty cycle for channel 57

23:16

r/w

8-bit duty cycle for channel 58

31:24

r/w

8-bit duty cycle for channel 59

2.5.14. True Random-Number Generator (TRNG)

Hardware source file(s):

neorv32_trng.vhd

Software driver file(s):

neorv32_trng.c

neorv32_trng.h

Top entity port:

none

Configuration generics:

IO_TRNG_EN

implement TRNG when true

CPU interrupts:

none

Theory of Operation

The NEORV32 true random number generator provides physical true random numbers for your application. Instead of using a pseudo RNG like a LFSR, the TRNG of the processor uses a simple, straight-forward ring oscillator as physical entropy source. Hence, voltage and thermal fluctuations are used to provide true physical random data.

The TRNG features a platform independent architecture without FPGA-specific primitives, macros or attributes.

Architecture

The NEORV32 TRNG is based on simple ring oscillators, which are implemented as an inverter chain with an odd number of inverters. A latch is used to decouple each individual inverter. Basically, this architecture is some king of asynchronous LFSR.

The output of several ring oscillators are synchronized using two registers and are XORed together. The resulting output is de-biased using a von-Neumann randomness extractor. This de-biased output is further processed by a simple 8-bit Fibonacci LFSR to improve whitening. After at least 8 clock cycles the state of the LFSR is sampled and provided as final data output.

To prevent the synthesis tool from doing logic optimization and thus, removing all but one inverter, the TRNG uses simple latches to decouple an inverter and its actual output. The latches are reset when the TRNG is disabled and are enabled one by one by a "real" shift register when the TRNG is activated. This construct can be synthesized for any FPGA platform. Thus, the NEORV32 TRNG provides a platform independent architecture.

TRNG Configuration

The TRNG uses several ring-oscillators, where the next oscillator provides a slightly longer chain (more inverters) than the one before. This increment is constant for all implemented oscillators. This setup can be customized by modifying the "Advanced Configuration" constants in the TRNG’s VHDL file:

  • The num_roscs_c constant defines the total number of ring oscillators in the system. num_inv_start_c defines the number of inverters used by the first ring oscillators (has to be an odd number). Each additional ring oscillator provides num_inv_inc_c more inverters that the one before (has to be an even number).

  • The LFSR-based post-processing can be deactivated using the lfsr_en_c constant. The polynomial tap mask of the LFSR can be customized using lfsr_taps_c.

Using the TRNG

The TRNG features a single register for status and data access. When the TRNG_CT_EN control register bit is set, the TRNG is enabled and starts operation. As soon as the TRNG_CT_VALID bit is set, the currently sampled 8-bit random data byte can be obtained from the lowest 8 bits of the TRNG_CT register (TRNG_CT_DATA_MSB : TRNG_CT_DATA_LSB). The TRNG_CT_VALID bit is automatically cleared when reading the control register.

The TRNG needs at least 8 clock cycles to generate a new random byte. During this sampling time the current output random data is kept stable in the output register until a valid sampling of the new byte has completed.

Randomness "Quality" I have not verified the quality of the generated random numbers (for example using NIST test suites). The quality is highly effected by the actual configuration of the TRNG and the resulting FPGA mapping/routing. However, generating larger histograms of the generated random number shows an equal distribution (binary average of the random numbers = 127). A simple evaluation test/demo program can be found in sw/example/demo_trng.

Table 21. TRNG register map
Address Name [C] Bit(s), Name [C] R/W Function

0xffffff88

TRNG_CT

7:0 TRNG_CT_DATA_MSB : TRNG_CT_DATA_MSB

r/-

8-bit random data output

30 TRNG_CT_EN

r/w

TRNG enable

31 TRNG_CT_VALID

r/-

random data output is valid when set

2.5.15. Custom Functions Subsystem (CFS)

Hardware source file(s):

neorv32_gfs.vhd

Software driver file(s):

neorv32_gfs.c

neorv32_gfs.h

Top entity port:

cfs_in_i

custom input conduit

cfs_out_o

custom output conduit

Configuration generics:

IO_CFS_EN

implement CFS when true

IO_CFS_CONFIG

custom generic conduit

IO_CFS_IN_SIZE

size of cfs_in_i

IO_CFS_OUT_SIZE

size of cfs_out_o

CPU interrupts:

fast IRQ channel 1

CFS interrupt (see Processor Interrupts)

Theory of Operation

The custom functions subsystem can be used to implement application-specific user-defined co-processors (like encryption or arithmetic accelerators) or peripheral/communication interfaces. In contrast to connecting custom hardware accelerators via the external memory interface, the CFS provide a convenient and low-latency extension and customization option.

The CFS provides up to 32x 32-bit memory-mapped registers (see register map table below). The actual functionality of these register has to be defined by the hardware designer.

Take a look at the template CFS VHDL source file (rtl/core/neorv32_cfs.vhd). The file is highly commented to illustrate all aspects that are relevant for implementing custom CFS-based co-processor designs.

CFS Software Access

The CFS memory-mapped registers can be accessed by software using the provided C-language aliases (see register map table below). Note that all interface registers provide 32-bit access data of type uint32_t.

// C-code CFS usage example
CFS_REG_0 = (uint32_t)some_data_array(i); // write to CFS register 0
uint32_t temp = CFS_REG_20; // read from CFS register 20

CFS Interrupt

The CFS provides a single one-shot interrupt request signal mapped to the CPU’s fast interrupt channel 1. See section Processor Interrupts for more information.

CFS Configuration Generic

By default, the CFS provides a single 32-bit std_(u)logic_vector configuration generic IO_CFS_CONFIG that is available in the processor’s top entity. This generic can be used to pass custom configuration options from the top entity down to the CFS entity.

CFS Custom IOs

By default, the CFS also provides two unidirectional input and output conduits cfs_in_i and cfs_out_o. These signals are propagated to the processor’s top entity. The actual use of these signals has to be defined by the hardware designer. The size of the input signal conduit cfs_in_i is defined via the (top’s) IO_CFS_IN_SIZE configuration generic (default = 32-bit). The size of the output signal conduit cfs_out_o is defined via the (top’s) IO_CFS_OUT_SIZE configuration generic (default = 32-bit). If the custom function subsystem is not implemented (IO_CFS_EN = false) the cfs_out_o signal is tied to all-zero.

Table 22. CFS register map
Address Name [C] Bit(s) R/W Function

0xfffffe00

CFS_REG_0

31:0

(r)/(w)

custom CFS interface register 0

0xfffffe04

CFS_REG_1

31:0

(r)/(w)

custom CFS interface register 1

…​

…​

31:0

(r)/(w)

…​

0xfffffe78

CFS_REG_30

31:0

(r)/(w)

custom CFS interface register 30

0xfffffe7c

CFS_REG_31

31:0

(r)/(w)

custom CFS interface register 31

2.5.16. Numerically-Controlled Oscillator (NCO)

Hardware source file(s):

neorv32_nco.vhd

Software driver file(s):

neorv32_nco.c

neorv32_nco.h

Top entity port:

nco_o

NCO output (3x 1-bit channels)

Configuration generics:

IO_NCO_EN

implement NCO when true

CPU interrupts:

none

Theory of Operation

The numerically-controller oscillator (NCO) provides a precise arbitrary linear frequency generator with three independent channels. Based on a direct digital synthesis core, the NCO features a 20-bit wide accumulator that is incremented with a programmable "tuning word". Whenever the accumulator overflows, a flip flop is toggled that provides the actual frequency output. The accumulator increment is driven by one of eight configurable clock sources, which are derived from the processor’s main clock.

The NCO features four accessible registers: the control register NCO_CT and three NCO_TUNE_CHi registers for the tuning word of each channel i. The NCO is globally enabled by setting the NCO_CT_EN bit in the control register. If this bit is cleared, the accumulators of all channels are reset. The clock source for each channel i is selected via the three bits NCO_CT_CHi_PRSCx prescaler. The resulting clock is generated from the main processor clock (fmain) divided y the selected prescaler.

Table 23. NCO prescaler configuration
NCO_CT_CHi_PRSCx 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111

Resulting clock_prescaler

2

4

8

64

128

1024

2048

4096

The resulting output frequency of each channel i is defined by the following equation:

fNCO(i) = ( fmain[Hz] / clock_prescaler(i) ) * (tuning_word(i) / 2*220+1)

The maximum NCO frequency fNCOmax is configured when using the minimal clock prescaler and a maximum all-one tuning word:

fNCOmax = ( fmain[Hz] / 2 ) * (1 / 2*220+1)

The minimum "frequency" is always 0 Hz when the tuning word is zero. The frequency resolution fNCOres is defined using the maximum clock prescaler and a minimal non-zero tuning word (= 1):

fNCOres = ( fmain[Hz] / 4096 ) * (1 / 2*220+1)

Assuming a processor frequency of fmain = 100 MHz the maximum NCO output frequency is fNCOmax = 12.499 MHz with an NCO frequency resolution of fNCOres = 0.00582 Hz.

Advanced Configuration

The idle polarity of each channel is configured via the NCO_CT_CHi_IDLE_POL flag and can be either 0 (idle low) or 1 (idle high), which basically allows to invert the NCO output. If the NCO is globally disabled by clearing the NCO_CT_EN flag, nco_o(i) output bit i is set to the according NCO_CT_CHi_IDLE_POL.

The current state of each NCO channel output can be read by software via the NCO_CT_CHi_OUTPUT bit. The NCO frequency output is normally available via the top nco_o output signal. The according channel output can be permanently set to zero by clearing the according NCO_CT_CHi_OE bit.

Each NCO channel can operate either in standard mode or in pulse mode. The mode is configured via the according channel’s NCO_CT_CHi_MODE control register bit.

Standard Operation Mode

If this NCO_CT_CHi_MODE bit of channel i is cleared, the channel operates in standard mode providing a frequency with exactly 50% duty cycle (Thigh = Tlow).

Pulse Operation Mode

If the NCO_CT_CHi_MODE bit of channel i is set, the channel operates in pulse mode. In this mode, the duty cycle can be modified to generate active pulses with variable length. Note that the "active" pulse polarity is defined by the inverted NCO_CT_CHi_IDLE_POL bit.

Eight different pulse lengths are available. The active pulse length is defined as number of NCO clock cycles, where the NCO clock is defined via the clock prescaler bits NCO_CT_Chi_PRSCx. The pulse length of channel i is programmed by the 3-bit NCO_CT_CHi_PULSEx configuration:

Table 24. NCO pulse length configuration
NCO_CT_CHi_PULSEx 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111

Pulse length (in NCO clock cycles)

2

4

8

16

32

64

128

256

If NCO_CT_CHi_IDLE_POL is cleared, Thigh is defined by the NCO_CT_CHi_PULSEx configuration and Tlow = T – Thigh. If NCO_CT_CHi_IDLE_POL is set, Tlow is defined by the NCO_CT_CHi_PULSEx configuration and Thigh = T – Tlow.

The actual output frequency of the channel (defined via the clock prescaler and the tuning word) is not affected by the pulse configuration.

For simple PWM applications, that do not require a precise frequency but a more flexible duty cycle configuration, see section Pulse-Width Modulation Controller (PWM).

Table 25. NCO register map
Address Name [C] Bit(s), Name [C] R/W Function

0xffffffc0

NCO_CT

0 NCO_CT_EN

r/w

NCO enable

Channel 0 nco_o(0)

1 NCO_CT_CH0_MODE

r/w

output mode (0=fixed 50% duty cycle; 1=pulse mode)

2 NCO_CT_CH0_IDLE_POL

r/w

output idle polarity

3 NCO_CT_CH0_OE

r/w

enable output to nco_o(0)

4 NCO_CT_CH0_OUTPUT

r/-

current state of nco_o(0)

7:5 NCO_CT_CH0_PRSC02 : NCO_CT_CH0_PRSC0

r/w

3-bit clock prescaler select

10_:8 NCO_CT_CH0_PULSE2 : NCO_CT_CH0_PULSE0

r/w

3-bit pulse length select

Channel 1 nco_o(1)

11 NCO_CT_CH1_MODE

r/w

output mode (0=fixed 50% duty cycle; 1=pulse mode)

12 NCO_CT_CH1_IDLE_POL

r/w

output idle polarity

13 NCO_CT_CH1_OE

r/w

enable output to nco_o(1)

14 NCO_CT_CH1_OUTPUT

r/-

current state of nco_o(1)

17:15 NCO_CT_CH1_PRSC2 : NCO_CT_CH1_PRSC0

r/w

3-bit clock prescaler select

20:18 NCO_CT_CH1_PULSE2 : NCO_CT_CH1_PULSE0

r/w

3-bit pulse length select

Channel 2 nco_o(2)

21 NCO_CT_CH2_MODE

r/w

output mode (0=fixed 50% duty cycle; 1=pulse mode)

22 NCO_CT_CH2_IDLE_POL

r/w

output idle polarity

23 NCO_CT_CH2_OE

r/w

enable output to nco_o(2)

24 NCO_CT_CH2_OUTPUT

r/-

current state of nco_o(2)

27:25 NCO_CT_CH2_PRSC2 : NCO_CT_CH2_PRSC0

r/w

3-bit clock prescaler select

30:28 NCO_CT_CH2_PULSE2 : NCO_CT_CH2_PULSE0

r/w

3-bit pulse length select

2.5.17. Smart LED Interface (NEOLED)

Hardware source file(s):

neorv32_neoled.vhd

Software driver file(s):

neorv32_neoled.c

neorv32_neoled.h

Top entity port:

neoled_o

1-bit serial data

Configuration generics:

IO_NEOLED_EN

implement NEOLED when true

CPU interrupts:

fast IRQ channel 9

NEOLED interrupt (see Processor Interrupts)

Theory of Operation

The NEOLED module provides a dedicated interface for "smart RGB LEDs" like the WS2812 or WS2811. These LEDs provide a single interface wire that uses an asynchronous serial protocol for transmitting color data. Basically, data is transferred via LED-internal shift registers, which allows to cascade an unlimited number of smart LEDs. The protocol provides a RESET command to strobe the transmitted data into the LED PWM driver registers after data has shifted throughout all LEDs in a chain.

The NEOLED interface is compatible to the "Adafruit Industries NeoPixel" products, which feature WS2812 (or older WS2811) smart LEDs (see link:https://learn.adafruit.com/adafruit-neopixel-uberguide).

The interface provides a single 1-bit output neoled_o to drive an arbitrary number of LEDs. Since the NEOLED module provides 24-bit and 32-bit operating modes, a mixed setup with RGB LEDs (24-bit color) and RGBW LEDs (32-bit color including a dedicated white LED chip) is also possible.

Theory of Operation – Protocol

The interface of the WS2812 LEDs uses an 800kHz carrier signal. Data is transmitted in a serial manner starting with LSB-first. The intensity for each R, G & B LED chip (= color code) is defined via an 8-bit value. The actual data bits are transferred by modifying the duty cycle of the signal (the timings for the WS2812 are shown below). A RESET command is "send" by pulling the data line LOW for at least 50μs.

neopixel
Figure 6. WS2812 bit-level protocol - taken from the "Adafruit NeoPixel Überguide"
Table 26. WS2812 interface timing

Ttotal (Tcarrier)

1.25μs +/- 300ns

period for a single bit

T0H

0.4μs +/- 150ns

high-time for sending a 1

T0L

0.8μs +/- 150ns

low-time for sending a 1

T1H

0.85μs +/- 150ns

high-time for sending a 0

T1L

0.45μs +/- 150 ns

low-time for sending a 0

RESET

Above 50μs

low-time for sending a RESET command

Theory of Operation – NEOLED Module

The NEOLED modules provides two accessible interface register: the control register NEOLED_CT and the TX data register NEOLED_DATA. The NEOLED module is globally enabled via the control register’s NEOLED_CT_EN bit. Clearing this bit will terminate any current operation, reset the module and set the neoled_o output to zero. The precise timing (implementing the WS2812 protocol) and transmission mode are fully programmable via the NEOLED_CT register to provide maximum flexibility.

Timing Configuration

The basic carrier frequency (800kHz for the WS2812 LEDs) is configured via a 3-bit main clock prescaler (NEOLED_CT_PRSCx, see table below) that scales the main processor clock fmain and a 5-bit cycle multiplier NEOLED_CT_T_TOT_x.

Table 27. NEOLED prescaler configuration
NEOLED_CT_PRSCx 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111

Resulting clock_prescaler

2

4

8

64

128

1024

2048

4096

The duty-cycles (or more precisely: the high- and low-times for sending either a '1' bit or a '0' bit) are defined via the 5-bit NEOLED_CT_T_ONE_H_x and NEOLED_CT_T_ZERO_H_x values, respecively. These programmable timing constants allow to adapt the interface for a wide variety of smart LED protocol (for example WS2812 vs. WS2811).

Timing Configuration – Example (WS2812)

Generate the base clock fTX for the NEOLED TX engine:

  • processor clock fmain = 100 MHz

  • NEOLED_CT_PRSCx = 0b001 = fmain / 4

fTX = fmain[Hz] / clock_prescaler = 100MHz / 4 = 25MHz

TTX = 1 / fTX = 40ns

Generate carrier period (Tcarrier) and high-times (duty cycle) for sending 0 (T0H) and 1 (T1H) bits:

  • NEOLED_CT_T_TOT = 0b11110 (= decimal 30)

  • NEOLED_CT_T_ZERO_H = 0b01010 (= decimal 10)

  • NEOLED_CT_T_ONE_H = 0b10100 (= decimal 20)

Tcarrier = TTX * NEOLED_CT_T_TOT = 40ns * 30 = 1.4µs

T0H = TTX * NEOLED_CT_T_ZERO_H = 40ns * 10 = 0.4µs

T1H = TTX * NEOLED_CT_T_ONE_H = 40ns * 20 = 0.8µs

The NEOLED SW driver library (neorv32_neoled.h) provides a simplified configuration function that configures all timing parameters for driving WS2812 LEDs based on the processor clock configuration.

RGB / RGBW Configuration

NeoPixel are available in two "color" version: LEDs with three chips providing RGB color and LEDs with four chips providing RGB color plus a dedicated white LED chip (= RGBW). Since the intensity of every LED chip is defined via an 8-bit value the RGB LEDs require a frame of 24-bit per module and the RGBW LEDs require a frame of 32-bit per module.

The data transfer quantity of the NEOLED module can be configured via the NEOLED_MODE_EN control register bit. If this bit is cleared, the NEOLED interface operates in 24-bit mode and will transmit bits 23:0 of the data written to NEOLED_DATA. If NEOLED_MODE_EN is set, the NEOLED interface operates in 32-bit mode and will transmit bits 31:0 of the data written to NEOLED_DATA.

TX Data FIFO

The interface features a TX data buffer (a FIFO) to allow CPU-independent operation. The buffer depth is configured via the tx_buffer_entries_c constant (default = 4 entries) in the module’s VHDL source file rtl/core/neorv32_neoled.vhd. The current configuration can be read via the NEOLED_CT_BUFS_x control register bits, which result log2(tx_buffer_entries_c).

When writing data to the NEOLED_DATA register the data is automatically written to the TX buffer. Whenever data is available in the buffer the serial transmission engine will take it and transmit it to the LEDs.

The data transfer size (NEOLED_MODE_EN) can be modified at every time since this control register bit is also buffered in the FIFO. This allows to arbitrarily mixing RGB and RGBW LEDs in the chain.

Please note that the timing configurations (NEOLED_CT_PRSCx, NEOLED_CT_T_TOT_x, NEOLED_CT_T_ONE_H_x and NEOLED_CT_T_ZERO_H_x) are NOT stored to the buffer. Changing these value while the buffer is not empty or the TX engine is still sending will cause data corruption.

Status Configuration

The NEOLED modules features two read-only status bits in the control register: NEOLED_CT_BUSY and NEOLED_CT_TX_STATUS.

If the NEOLED_CT_TX_STATUS is set the serial TX engine is still busy sending serial data to the LED stripes. If the flag is cleared, the TX engine is idle and the serial data output neoled_o is set LOW.

The NEOLED_CT_BUSY flag provides a programmable option to check for the TX buffer state. The control register’s NEOLED_CT_BSCON bit is used to configure the "meaning" of the NEOLED_CT_BUSY flag. The condition for sending an interrupt request (IRQ) to the CPU is also configured via the NEOLED_CT_BSCON bit.

NEOLED_CT_BSCON NEOLED_CT_BUSY Sending an IRQ when …​

0

the busy flag will clear if there IS at least one free entry in the TX buffer

the IRQ will fire if at least one entry GETS free in the TX buffer

1

the busy flag will clear if the whole TX buffer IS empty

the IRQ will fire if the whole TX buffer GETS empty

When NEOLED_CT_BSCON is set, the CPU can write up to tx_buffer_entries_c of new data words to NEOLED_DATA without checking the busy flag NEOLED_CT_BUSY. This highly relaxes time constraints for sending a continuous data stream to the LEDs (as an idle time beyond 50μs will trigger the LED’s a RESET command).

Table 28. NEOLED register map
Address Name [C] Bit(s), Name [C] R/W Function

0xffffffd8

NEOLED_CT

0 NEOLED_CT_EN

r/w

NCO enable

1 NEOLED_CT_MODE

r/w

data transfer size; 0=24-bit; 1=32-bit

2 NEOLED_CT_BSCON

r/w

busy flag / IRQ trigger configuration (see table above)

3 NEOLED_CT_PRSC0

r/w

3-bit clock prescaler, bit 0

4 NEOLED_CT_PRSC1

r/w

3-bit clock prescaler, bit 1

5 NEOLED_CT_PRSC2

r/w

3-bit clock prescaler, bit 2

6 NEOLED_CT_BUFS0

r/-

4-bit log2(tx_buffer_entries_c)

7 NEOLED_CT_BUFS1

r/-

8 NEOLED_CT_BUFS2

r/-

9 NEOLED_CT_BUFS3

r/-

10 NEOLED_CT_T_TOT_0

r/w

5-bit pulse clock ticks per total single-bit period (Ttotal)

11 NEOLED_CT_T_TOT_1

r/w

12 NEOLED_CT_T_TOT_2

r/w

13 NEOLED_CT_T_TOT_3

r/w

14 NEOLED_CT_T_TOT_4

r/w

20 NEOLED_CT_ONE_H_0

r/w

5-bit pulse clock ticks per high-time for sending a one-bit (TH1)

21 NEOLED_CT_ONE_H_1

r/w

22 NEOLED_CT_ONE_H_2

r/w

23 NEOLED_CT_ONE_H_3

r/w

24 NEOLED_CT_ONE_H_4

r/w

30 NEOLED_CT_TX_STATUS

r/-

transmit engine busy when 1

31 NEOLED_CT_BUSY

r/-

busy / buffer status flag; configured via NEOLED_CT_BSCON (see table above)

0xffffffdc

NEOLED_DATA

31:0 / 23:0

-/w

TX data (32-/24-bit)

2.5.18. System Configuration Information Memory (SYSINFO)

Hardware source file(s):

neorv32_sysinfo.vhd

Software driver file(s):

(neorv32.h)

Top entity port:

none

Configuration generics:

*

most of the top’s configuration generics

CPU interrupts:

none

Theory of Operation

The SYSINFO allows the application software to determine the setting of most of the processor’s top entity generics that are related to processor/SoC configuration. All registers of this unit are read-only.

This device is always implemented – regardless of the actual hardware configuration. The bootloader as well as the NEORV32 software runtime environment require information from this device (like memory layout and default clock speed) for correct operation.

Table 29. SYSINFO register map
Address Name [C] Function

0xffffffe0

SYSINFO_CLK

clock speed in Hz (via top’s CLOCK_FREQUENCY generic)

0xffffffe4

SYSINFO_USER_CODE

custom user code, assigned via top’s USER_CODE generic

0xffffffe8

SYSINFO_FEATURES

specific hardware configuration (see next table)

0xffffffec

SYSINFO_CACHE

cache configuration information (see next table)

0xfffffff0

SYSINFO_ISPACE_BASE

instruction address space base (defined via ispace_base_c constant in the neorv32_package.vhd file)

0xfffffff4

SYSINFO_IMEM_SIZE

internal IMEM size in bytes (defined via top’s MEM_INT_IMEM_SIZE generic)

0xfffffff8

SYSINFO_DSPACE_BASE

data address space base (defined via sdspace_base_c constant in the neorv32_package.vhd file)

0xfffffffc

SYSINFO_DMEM_SIZE

internal DMEM size in bytes (defined via top’s MEM_INT_DMEM_SIZE generic)

Table 30. SYSINFO_FEATURES bits
Bit Name [C] Function

0

SYSINFO_FEATURES_BOOTLOADER

set if the processor-internal bootloader is implemented (via top’s INT_BOOTLOADER_EN generic)

1

SYSINFO_FEATURES_MEM_EXT

set if the external Wishbone bus interface is implemented (via top’s MEM_EXT_EN generic)

2

SYSINFO_FEATURES_MEM_INT_IMEM

set if the processor-internal DMEM implemented (via top’s MEM_INT_DMEM_EN generic)

3

SYSINFO_FEATURES_MEM_INT_DMEM

set if the processor-internal IMEM is implemented (via top’s MEM_INT_IMEM_EN generic)

4

SYSINFO_FEATURES_MEM_EXT_ENDIAN

set if external bus interface uses BIG-endian byte-order (via package’s wb_big_endian_c constant)

5

SYSINFO_FEATURES_ICACHE

set if processor-internal instruction cache is implemented (via ICACHE_EN generic)

14

SYSINFO_FEATURES_HW_RESET

set if on-chip debugger implemented (via ON_CHIP_DEBUGGER_EN generic)

15

SYSINFO_FEATURES_HW_RST

set if a dedicated hardware reset of all core registers is implemented (via package’s dedicated_reset_c constant)

15

SYSINFO_FEATURES_HW_RST

set if a dedicated hardware reset of all core registers is implemented (via package’s dedicated_reset_c constant)

16

SYSINFO_FEATURES_IO_GPIO

set if the GPIO is implemented (via top’s IO_GPIO_EN generic)

17

SYSINFO_FEATURES_IO_MTIME

set if the MTIME is implemented (via top’s IO_MTIME_EN generic)

18

SYSINFO_FEATURES_IO_UART0

set if the primary UART0 is implemented (via top’s IO_UART0_EN generic)

19

SYSINFO_FEATURES_IO_SPI

set if the SPI is implemented (via top’s IO_SPI_EN generic)

20

SYSINFO_FEATURES_IO_TWI

set if the TWI is implemented (via top’s IO_TWI_EN generic)

21

SYSINFO_FEATURES_IO_PWM

set if the PWM is implemented (via top’s IO_PWM_EN generic)

22

SYSINFO_FEATURES_IO_WDT

set if the WDT is implemented (via top’s IO_WDT_EN generic)

23

SYSINFO_FEATURES_IO_CFS

set if the custom functions subsystem is implemented (via top’s IO_CFS_EN generic)

24

SYSINFO_FEATURES_IO_TRNG

set if the TRNG is implemented (via top’s IO_TRNG_EN generic)

25

SYSINFO_FEATURES_IO_NCO

set if the NCO is implemented (via top’s IO_NCO_EN generic)

26

SYSINFO_FEATURES_IO_UART1

set if the secondary UART1 is implemented (via top’s IO_UART1_EN generic)

27

SYSINFO_FEATURES_IO_NEOLED

set if the NEOLED is implemented (via top’s IO_NEOLED_EN generic)

3. NEORV32 Central Processing Unit (CPU)

riscv logo

Key Features

  • 32-bit pipelined/multi-cycle in-order rv32 RISC-V CPU

  • Optional RISC-V extensions:

    • A - atomic memory access operations

    • C - 16-bit compressed instructions

    • I - integer base ISA (always enabled)

    • E - embedded CPU version (reduced register file size)

    • M - integer multiplication and division hardware

    • U - less-privileged user mode

    • Zfinx - single-precision floating-point unit

    • Zicsr - control and status register access (privileged architecture)

    • Zifencei - instruction stream synchronization

    • PMP - physical memory protection

    • HPM - hardware performance monitors

    • DB - debug mode

  • Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+)

  • Official RISC-V open-source architecture ID

  • Standard RISC-V interrupts (external, timer, software) plus 16 fast interrupts and 1 non-maskable interrupt

  • Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions

  • Optional physical memory configuration (PMP), compatible to the RISC-V specifications

  • Optional hardware performance monitors (HPM) for application benchmarking

  • Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for the NEORV32 processor)

  • little-endian byte order

  • Configurable hardware reset

  • No hardware support of unaligned data/instruction accesses – they will trigger an exception. If the C extension is enabled instructions can also be 16-bit aligned and a misaligned instruction address exception is not possible anymore

It is recommended to use the NEORV32 Processor as default top instance even if you only want to use the actual CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This setup also allows to further use the default bootloader and software framework. From this base you can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode.
This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications.

3.1. Architecture

The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture specifications. The following figure shows the simplified architecture of the CPU.

neorv32 cpu

The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch) is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is stored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit instruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressed in this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructions via the execute engine.

These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI (cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers due to a taken branch.

Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes every single instruction in a series of consecutive micro-operations. The combination of these two classical design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach).

The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann Architecture.

3.2. RISC-V Compatibility

The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the NEORV32 processor are located in the repository’s riscv-arch-test folder. See section [_risc_v_architecture_test_framework] for information how to run the tests on the NEORV32.

RISC-V rv32_m/C Tests
Check cadd-01           ... OK
Check caddi-01          ... OK
Check caddi16sp-01      ... OK
Check caddi4spn-01      ... OK
Check cand-01           ... OK
Check candi-01          ... OK
Check cbeqz-01          ... OK
Check cbnez-01          ... OK
Check cebreak-01        ... OK
Check cj-01             ... OK
Check cjal-01           ... OK
Check cjalr-01          ... OK
Check cjr-01            ... OK
Check cli-01            ... OK
Check clui-01           ... OK
Check clw-01            ... OK
Check clwsp-01          ... OK
Check cmv-01            ... OK
Check cnop-01           ... OK
Check cor-01            ... OK
Check cslli-01          ... OK
Check csrai-01          ... OK
Check csrli-01          ... OK
Check csub-01           ... OK
Check csw-01            ... OK
Check cswsp-01          ... OK
Check cxor-01           ... OK
--------------------------------
OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32
RISC-V rv32_m/I Tests
Check add-01            ... OK
Check addi-01           ... OK
Check and-01            ... OK
Check andi-01           ... OK
Check auipc-01          ... OK
Check beq-01            ... OK
Check bge-01            ... OK
Check bgeu-01           ... OK
Check blt-01            ... OK
Check bltu-01           ... OK
Check bne-01            ... OK
Check fence-01          ... OK
Check jal-01            ... OK
Check jalr-01           ... OK
Check lb-align-01       ... OK
Check lbu-align-01      ... OK
Check lh-align-01       ... OK
Check lhu-align-01      ... OK
Check lui-01            ... OK
Check lw-align-01       ... OK
Check or-01             ... OK
Check ori-01            ... OK
Check sb-align-01       ... OK
Check sh-align-01       ... OK
Check sll-01            ... OK
Check slli-01           ... OK
Check slt-01            ... OK
Check slti-01           ... OK
Check sltiu-01          ... OK
Check sltu-01           ... OK
Check sra-01            ... OK
Check srai-01           ... OK
Check srl-01            ... OK
Check srli-01           ... OK
Check sub-01            ... OK
Check sw-align-01       ... OK
Check xor-01            ... OK
Check xori-01           ... OK
--------------------------------
OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32
RISC-V rv32_m/M Tests
Check div-01            ... OK
Check divu-01           ... OK
Check mul-01            ... OK
Check mulh-01           ... OK
Check mulhsu-01         ... OK
Check mulhu-01          ... OK
Check rem-01            ... OK
Check remu-01           ... OK
--------------------------------
OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32
RISC-V rv32_m/privilege Tests
Check ebreak            ... OK
Check ecall             ... OK
Check misalign-beq-01   ... OK
Check misalign-bge-01   ... OK
Check misalign-bgeu-01  ... OK
Check misalign-blt-01   ... OK
Check misalign-bltu-01  ... OK
Check misalign-bne-01   ... OK
Check misalign-jal-01   ... OK
Check misalign-lh-01    ... OK
Check misalign-lhu-01   ... OK
Check misalign-lw-01    ... OK
Check misalign-sh-01    ... OK
Check misalign-sw-01    ... OK
Check misalign1-jalr-01 ... OK
Check misalign2-jalr-01 ... OK
--------------------------------
OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32
RISC-V rv32_m/Zifencei Tests
Check Fencei            ... OK
--------------------------------
OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32

3.2.1. RISC-V Incompatibility Issues and Limitations

This list shows the currently known issues regarding full RISC-V-compatibility. More specific information can be found in section Instruction Sets and Extensions.

The misa CSR is read-only. It shows the synthesized CPU extensions. Hence, all implemented CPU extensions are always active and cannot be enabled/disabled dynamically during runtime. Any write access to it (in machine mode) is ignored and will not cause any exception or side-effects.
The mip CSR is read-only. Pending IRQs can be cleared using the mie CSR.
The mtval CSR is read-only.
The physical memory protection (see section Machine Physical Memory Protection) only supports the modes OFF and NAPOT yet and a minimal granularity of 8 bytes per region.
The A CPU extension (atomic memory access) only implements the lr.w and sc.w instructions yet. However, these instructions are sufficient to emulate all further AMO operations.

3.3. CPU Top Entity - Signals

The following table shows all interface signals of the CPU top entity rtl/core/neorv32_cpu.vhd. The type of all signals is std_ulogic or std_ulogic_vector, respectively. The "Dir." column shows the signal direction seen from the CPU.

Table 31. NEORV32 CPU top entity signals
Signal Width Dir. Function

Global Signals

clk_i

1

in

global clock line, all registers triggering on rising edge

rstn_i

1

in

global reset, low-active

sleep_o

1

out

CPU is in sleep mode when set

Instruction Bus Interface (Bus Interface)

i_bus_addr_o

32

out

destination address

i_bus_rdata_i

32

in

read data

i_bus_wdata_o

32

out

write data (always zero)

i_bus_ben_o

4

out

byte enable

i_bus_we_o

1

out

write transaction (always zero)

i_bus_re_o

1

out

read transaction

i_bus_lock_o

1

out

exclusive access request (always zero)

i_bus_ack_i

1

in

bus transfer acknowledge from accessed peripheral

i_bus_err_i

1

in

bus transfer terminate from accessed peripheral

i_bus_fence_o

1

out

indicates an executed fence.i instruction

i_bus_priv_o

2

out

current CPU privilege level

Data Bus Interface (Bus Interface)

d_bus_addr_o

32

out

destination address

d_bus_rdata_i

32

in

read data

d_bus_wdata_o

32

out

write data

d_bus_ben_o

4

out

byte enable

d_bus_we_o

1

out

write transaction

d_bus_re_o

1

out

read transaction

d_bus_lock_o

1

out

exclusive access request

d_bus_ack_i

1

in

bus transfer acknowledge from accessed peripheral

d_bus_err_i

1

in

bus transfer terminate from accessed peripheral

d_bus_fence_o

1

out

indicates an executed fence instruction

d_bus_priv_o

2

out

current CPU privilege level

System Time (see time[h] CSR)

time_i

64

in

system time input (from MTIME)

Non-Maskable Interrupt (Traps, Exceptions and Interrupts)

nm_irq_i

1

in

non-maskable interrupt

Interrupts, RISC-V-compatible (Traps, Exceptions and Interrupts)

msw_irq_i

1

in

RISC-V machine software interrupt

mext_irq_i

1

in

RISC-V machine external interrupt

mtime_irq_i

1

in

RISC-V machine timer interrupt

Fast Interrupts, NEORV32-specific (Traps, Exceptions and Interrupts)

firq_i

16

in

fast interrupt request signals

firq_ack_o

16

out

fast interrupt acknowledge signals

Enter Debug Mode Request (On-Chip Debugger (OCD))

db_halt_req_i

1

in

request CPU to halt and enter debug mode

3.4. CPU Top Entity - Generics

Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section Processor Top Entity - Generics). and are not listed here. However, the CPU provides some specific generics that are used to configure the CPU for the NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration. The specific generics are listed below.

CPU_BOOT_ADDR

std_ulogic_vector(31 downto 0)

0x00000000

This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction memory (IMEM) if the bootloader is disabled (INT_BOOTLOADER_EN = false). See section Address Space for more information.

CPU_DEBUG_ADDR

std_ulogic_vector(31 downto 0)

0x00000000

This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address of the debugger memory. See section On-Chip Debugger (OCD) for more information.

CPU_EXTENSION_RISCV_DEBUG

boolean

false

Implement RISC-V-compatible "debug" CPU operation mode. See section CPU Debug Mode for more information.

3.5. Instruction Sets and Extensions

The NEORV32 is an RISC-V rv32i architecture that provides several optional RISC-V CPU and ISA (instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please see the The RISC-V Instruction Set Manual – Volume I: Unprivileged ISA and The RISC-V Instruction Set Manual Volume II: Privileged Architecture, which are available in the projects docs/references folder.

The CPU can discover available ISA extensions via the misa and mzext CSRs or by executing an instruction and checking for an illegal instruction exception.

3.5.1. A - Atomic Memory Access

Atomic memory access instructions (for implementing semaphores and mutexes) are available when the CPU_EXTENSION_RISCV_A configuration generic is true. In this case the following additional instructions are available:

  • lr.w: load-reservate

  • sc.w: store-conditional

Even though only lr.w and sc.w instructions are implemented yet, all further atomic operations (load-modify-write instruction) can be emulated using these two instruction. Furthermore, the instruction’s ordering flags (aq and lr) are ignored by the CPU hardware. Using any other (not yet implemented) AMO (atomic memory operation) will trigger an illegal instruction exception.
The atomic instructions have special requirements for memory system / bus interconnect. More information can be found in sections Bus Interface and Processor-External Memory Interface (WISHBONE) (AXI4-Lite), respectively.

3.5.2. C - Compressed Instructions

Compressed 16-bit instructions are available when the CPU_EXTENSION_RISCV_C configuration generic is true. In this case the following instructions are available:

  • c.addi4spn, c.lw, c.sw, c.nop, c.addi, c.jal, c.li, c.addi16sp, c.lui, c.srli, c.srai c.andi, c.sub, c.xor, c.or, c.and, c.j, c.beqz, c.bnez, c.slli, c.lwsp, c.jr, c.mv, c.ebreak, c.jalr, c.add, c.swsp

When the compressed instructions extension is enabled, branches to an unaligned and uncompressed address require an additional instruction fetch to load the required second half-word of that instruction. The performance can be increased again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC -falign-functions=4, -falign-labels=4, -falign-loops=4 and -falign-jumps=4 compile flags (via the makefile).

3.5.3. E - Embedded CPU

The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardware requirements. This extensions is enabled when the CPU_EXTENSION_RISCV_E configuration generic is true. Accesses to registers beyond x15 will raise and illegal instruction exception.

Due to the reduced register file an alternate ABI (ilp32e) is required for the toolchain.

3.5.4. I - Base Integer ISA

The CPU always supports the complete rv32i base integer instruction set. This base set is always enabled regardless of the setting of the remaining exceptions. The base instruction set includes the following instructions:

  • immediates: lui, auipc

  • jumps: jal, jalr

  • branches: beq, bne, blt, bge, bltu, bgeu

  • memory: lb, lh, lw, lbu, lhu, sb, sh, sw

  • alu: addi, slti, sltiu, xori, ori, andi, slli, srli, srai, add, sub, sll, slt, sltu, xor, srl, sra, or, and

  • environment: ecall, ebreak, fence

In order to keep the hardware footprint low, the CPU’s shift unit uses a hybrid parallel/serial approach. Shift operations are split in coarse shifts (multiples of 4) and a final fine shift (0 to 3). The total execution time depends on the shift amount. Alternatively, the shift operations can be processed completely in parallels by a fast (but large) barrel shifter when the FAST_SHIFT_EN generic is true. In that case, shift operations complete within 2 cycles regardless of the shift amount. Shift operations can also be executed in a pure serial manner when then TINY_SHIFT_EN generic is true. In that case, shift operations take up to 32 cycles depending on the shift amount.
Internally, the fence instruction does not perform any operation inside the CPU. It only sets the top’s d_bus_fence_o signal high for one cycle to inform the memory system a fence instruction has been executed. Any flags within the fence instruction word are ignore by the hardware.

3.5.5. M - Integer Multiplication and Division

Hardware-accelerated integer multiplication and division instructions are available when the CPU_EXTENSION_RISCV_M configuration generic is true. In this case the following instructions are available:

  • multiplication: mul, mulh, mulhsu, mulhu

  • division: div, divu, rem, remu

By default, multiplication and division operations are executed in a bit-serial approach. Alternatively, the multiplier core can be implemented using DSP blocks if the FAST_MUL_EN generic is true allowing faster execution. Multiplications and divisions always require a fixed amount of cycles to complete - regardless of the input operands.

3.5.6. U - Less-Privileged User Mode

Adds the less-privileged user mode when the CPU_EXTENSION_RISCV_U configuration generic is true. For instance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (like peripheral/IO devices) can be limited via the physical memory protection (PMP) unit for code running in user mode.

3.5.7. X - NEORV32-Specific (Custom) Extensions

The NEORV32-specific extensions are always enabled and are indicated by the set X bit in the misa CSR.

The CPU provides 16 fast interrupt interrupts (FIRQ), which are controlled via custom bits in the mie and mip CSR. This extension is mapped to bits, that are available for custom use (according to the RISC-V specs). Also, custom trap codes for mcause are implemented.
The CPU provides a single non-maskable interrupt (NMI) that also provides a custom trap code for mcause.
A custom CSR mzext is available that can be used to check for implemented Z* CPU extensions (for example Zifencei). This CSR is mapped to the official "custom CSR address region".
All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception (see Execution Safety).

3.5.8. Zfinx Single-Precision Floating-Point Operations

The Zfinx floating-point extension is an alternative of the F floating-point instruction that also uses the integer register file x to store and operate on floating-point data (hence, F-in-x). Since not dedicated floating-point f register file exists, the Zfinx extension requires less hardware resources and features faster context changes. This also implies that there are NO dedicated f register file related load/store or move instructions. The official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx

The NEORV32 floating-point unit used by the Zfinx extension is compatible to the IEEE-754 specifications.

The Zfinx extensions only supports single-precision (.s suffix) yet (so it is a direct alternative to the F extension). The Zfinx extension is implemented when the CPU_EXTENSION_RISCV_Zfinx configuration generic is true. In this case the following instructions and CSRs are available:

  • conversion: fcvt.s.w, fcvt.s.wu, fcvt.w.s, fcvt.wu.s

  • comparison: fmin.s, fmax.s, feq.s, flt.s, fle.s

  • computational: fadd.s, fsub.s, fmul.s

  • sign-injection: fsgnj.s, fsgnjn.s, fsgnjx.s

  • number classification: fclass.s

  • additional CSRs: fcsr, frm, fflags

Fused multiply-add instructions f[n]m[add/sub].s are not supported! Division fdiv.s and square root fsqrt.s instructions are not supported yet!
Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU. Subnormal numbers (exponent = 0) are flushed to zero (setting them to +/- 0) before entering the FPU’s processing core. If a computational instruction (like fmul.s) generates a subnormal result, the result is also flushed to zero during normalization.
The Zfinx extension is not yet officially ratified, but is expected to stay unchanged. There is no software support for the Zfinx extension in the upstream GCC RISC-V port yet. However, an intrinsic library is provided to utilize the provided Zfinx floating-point extension from C-language code (see sw/example/floating_point_test).

3.5.9. Zicsr Control and Status Register Access / Privileged Architecture

The CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the CPU_EXTENSION_RISCV_Zicsr configuration generic is true. In this case the following instructions are available:

  • CSR access: csrrw, csrrs, csrrc, csrrwi, csrrsi, csrrci

  • environment: mret, wfi

If the Zicsr extension is disabled the CPU does not provide any kind of interrupt or exception support at all. In order to provide the full spectrum of functions and to allow a secure executions environment, the Zicsr extension should always be enabled.
The "wait for interrupt instruction" wfi works like a sleep command. When executed, the CPU is halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to be enabled via the mie CSR and the global interrupt enable flag in mstatus has to be set.

3.5.10. Zifencei Instruction Stream Synchronization

The Zifencei CPU extension is implemented if the CPU_EXTENSION_RISCV_Zifencei configuration generic is true. It allows manual synchronization of the instruction stream via the following instruction:

  • fence.i

The fence.i instruction resets the CPU’s internal instruction fetch engine and flushes the prefetch buffer. This allows a clean re-fetch of modified data from memory. Also, the top’s i_bus_fencei_o signal is set high for one cycle to inform the memory system. Any additional flags within the fence.i instruction word are ignore by the hardware.
If the Zifencei extension is disabled (CPU_EXTENSION_RISCV_Zifencei generic = false) executing a fence.i instruction will be executed as nop (and will not trap) and none of the functions described above will be executed.

3.5.11. PMP Physical Memory Protection

The NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs. The CPU PMP only supports NAPOT mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configured via the top PMP_MIN_GRANULARITY generic to reduce hardware requirements. The physical memory protection system is implemented when the PMP_NUM_REGIONS configuration generic is >0. In this case the following additional CSRs are available:

  • pmpcfg* (0..15, depending on configuration): PMP configuration registers

  • pmpaddr* (0..63, depending on configuration): PMP address registers

See section Machine Physical Memory Protection for more information regarding the PMP CSRs.

Configuration

The actual number of regions and the minimal region granularity are defined via the top entity PMP_MIN_GRANULARITY and PMP_NUM_REGIONS generics. PMP_MIN_GRANULARITY defines the minimal available granularity of each region in bytes. PMP_NUM_REGIONS defines the total number of implemented regions and thus, the number of available pmpcfg* and pmpaddr* CSRs.

When implementing more PMP regions that a certain critical limit an additional register stage is automatically inserted into the CPU’s memory interfaces to reduce critical path length. Unfortunately, this will also increase the latency of instruction fetches and data access by +1 cycle.

The critical limit can be adapted for custom use by a constant from the main VHDL package file (rtl/core/neorv32_package.vhd). The default value is 8:

-- "critical" number of PMP regions --
constant pmp_num_regions_critical_c : natural := 8;

Operation

Any memory access address (from the CPU’s instruction fetch or data access interface) is tested if it is accessing any of the specified (configured via pmpaddr* and enabled via pmpcfg*) PMP regions. If an address accesses one of these regions, the configured access rights (attributes in pmpcfg*) are checked:

  • a write access (store) will fail if no write attribute is set

  • a read access (load) will fail if no read attribute is set

  • an instruction fetch access will fail if no execute attribute is set

If an access to a protected region does not have the according access rights (attributes) it will raise the according instruction/load/store access fault exception.

By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical memory protection also for machine-level programs you need to active the locked bit in the according pmpcfg* configuration.

After updating the address configuration registers pmpaddr* the system requires up to 33 cycles for internal (iterative) computations before the configuration becomes valid.
For more information regarding RISC-V physical memory protection see the official The RISC-V Instruction Set Manual – Volume II: Privileged Architecture specifications.

3.5.12. HPM Hardware Performance Monitors

In additions to the mandatory cycles ([m]cycle[h]) and instruction ([m]instret[h]) counters the NEORV32 CPU provides up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top’s HPM_CNT_WIDTH generic (0..64-bit), and a corresponding event configuration CSR. The event configuration CSR defines the architectural events that lead to an increment of the associated HPM counter.

The cycle, time and instructions-retired counters ([m]cycle[h], time[h], [m]instret[h]) are mandatory performance monitors on every RISC-V platform and have fixed increment event. For example, the instructions-retired counter increments with each executed instructions. The actual hardware performance monitors are optional and can be configured to increment on arbitrary hardware events. The number of available HPM is configured via the top’s HPM_NUM_CNTS generic at synthesis time. Assigning a zero will exclude all HPM logic from the design.

Depending on the configuration, the following additional CSR are available:

  • counters: [m]hpmcounter*[h] (3..31, depending on configuration)

  • event configuration: mhpmevent* (3..31, depending on configuration)

User-level access to the counter registers hpmcounter*[h] can be individually restricted via the mcounteren CSR. Auto-increment of the HPMs can be individually deactivated via the mcountinhibit CSR.

If HPM_NUM_CNTS is lower than the maximumg value (=29) the remaining HPMs are not implemented. However, accessing their associated CSRs will not raise an illegal instructions exception. These CSR are read-only and will always return 0.

For a list of all allocated HPM-related CSRs and all provided event configurations see section Hardware Performance Monitors (HPM).

3.6. Instruction Timing

The instruction timing listed in the table below shows the required clock cycles for executing a certain instruction. These instruction cycles assume a bus access without additional wait states and a filled pipeline.

Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU configurations are presented in CPU Performance.

Table 32. Clock cycles per instruction
Class ISA Instruction(s) Execution cycles

ALU

I/E

addi slti sltiu xori ori andi add sub slt sltu xor or and lui auipc

2

ALU

C

c.addi4spn c.nop c.addi c.li c.addi16sp c.lui c.andi c.sub c.xor c.or c.and c.add c.mv

2

ALU

I/E

slli srli srai sll srl sra

3 + SA[4]/4 + SA%4; FAST_SHIFT[5]: 4; TINY_SHIFT[6]: 2..32

ALU

C

c.srli c.srai c.slli

3 + SA[7]/4 + SA%4; FAST_SHIFT[8]: 4; TINY_SHIFT[9]: 2..32

Branches

I/E

beq bne blt bge bltu bgeu

Taken: 5 + ML[10]; Not taken: 3

Branches

C

c.beqz c.bnez

Taken: 5 + ML[11]; Not taken: 3

Jumps / Calls

I/E

jal jalr

4 + ML

Jumps / Calls

C

c.jal c.j c.jr c.jalr

4 + ML

Memory access

I/E

lb lh lw lbu lhu sb sh sw

4 + ML

Memory access

C

c.lw c.sw c.lwsp c.swsp

4 + ML

Memory access

A

lr.w sc.w

4 + ML

Multiplication

M

mul mulh mulhsu mulhu

2+31+3; FAST_MUL[12]: 5

Division

M

div divu rem remu

22+32+4

Bit-manipulation - arithmetic/logic

B(Zbb)

sext.b sext.h min minu max maxu andn orn xnor zext(pack) rev8(grevi) orc.b(gorci)

3

Bit-manipulation - shifts

B(Zbb)

clz ctz

3 + 0..32

Bit-manipulation - shifts

B(Zbb)

cpop

3 + 32

Bit-manipulation - shifts

B(Zbb)

rol ror rori

3 + SA

Bit-manipulation - single-bit

B(Zbs)

sbset[i] sbclr[i] sbinv[i] sbext[i]

3

Bit-manipulation - shifted-add

B(Zba)

sh1add sh2add sh3add

3

CSR access

Zicsr

csrrw csrrs csrrc csrrwi csrrsi csrrci

4

System

I/E+Zicsr

ecall ebreak

4

System

I/E

fence

3

System

C+Zicsr

c.break

4

System

Zicsr

mret wfi

5

System

Zifencei

fence.i

5

Floating-point - artihmetic

Zfinx

fadd.s

110

Floating-point - artihmetic

Zfinx

fsub.s

112

Floating-point - artihmetic

Zfinx

fmul.s

22

Floating-point - compare

Zfinx

fmin.s fmax.s feq.s flt.s fle.s

13

Floating-point - misc

Zfinx

fsgnj.s fsgnjn.s fsgnjx.s fclass.s

12

Floating-point - conversion

Zfinx

fcvt.w.s fcvt.wu.s

47

Floating-point - conversion

Zfinx

fcvt.s.w fcvt.s.wu

48

The presented values of the floating-point execution cycles are average values – obtained from 4096 instruction executions using pseudo-random input values. The execution time for emulating the instructions (using pure-software libraries) is ~17..140 times higher.

3.7. Control and Status Registers (CSRs)

The following table shows a summary of all available CSRs. The address field defines the CSR address for the CSR access instructions. The [ASM] name can be used for (inline) assembly code and is directly understood by the assembler/compiler. The [C] names are defined by the NEORV32 core library and can be used as immediate in plain C code. The R/W column shows whether the CSR can be read and/or written. The NEORV32-specific CSRs are mapped to the official "custom CSRs" CSR address space.

The CSRs, the CSR-related instructions as well as the complete exception/interrupt processing system are only available when the CPU_EXTENSION_RISCV_Zicsr generic is true.
When trying to write to a read-only CSR (like the time CSR) or when trying to access a nonexistent CSR or when trying to access a machine-mode CSR from less-privileged user-mode an illegal instruction exception is raised.
CSR reset value: Please note that most of the CSRs do NOT provide a dedicated reset. Hence, these CSRs are not initialized by a hardware reset and keep an UNDEFINED value until they are explicitly initialized by the software (normally, this is already done by the NEORV32-specific crt0.S start-up code). For more information see section CPU Hardware Reset.

CSR Listing

The description of each single CSR provides the following summary:

Table 33. CSR description

Address

Description

ASM alias

Reset value: CSR content after hardware reset (also see CPU Hardware Reset)

Detailed description

Not Implemented CSRs / CSR Bits
All CSR bits that are unused / not implemented / not shown are hardwired to zero. All CSRs that are not implemented at all (and are not "disabled" using certain configuration generics) will trigger an exception on access. The CSR that are implemented within the NEORV32 might cause an exception if they are disabled. See the according CSR description for more information.
Debug Mode CSRs
The debug mode CSRs are not listed here since they are only accessible in debug mode and not during normal CPU operation. See section CPU Debug Mode CSRs.

CSR Listing Notes

CSRs with the following notes …​

  • X: custom - have or are a custom CPU-specific extension (that is allowed by the RISC-V specs)

  • R: read-only - are read-only (in contrast to the originally specified r/w capability)

  • C: constrained - have a constrained compatibility, not all specified bits are implemented

Table 34. NEORV32 Control and Status Registers (CSRs)
Address Name [ASM] Name [C] R/W Function Note

Floating-Point CSRs

0x001

fflags

CSR_FFLAGS

r/w

Floating-point accrued exceptions

0x002

frm

CSR_FRM

r/w

Floating-point dynamic rounding mode

0x003

fcsr

CSR_FCSR

r/w

Floating-point control and status (frm + fflags)

Machine Trap Setup

0x300

mstatus

CSR_MSTATUS

r/w

Machine status register

C

0x301

misa

CSR_MISA

r/-

Machine CPU ISA and extensions

R

0x304

mie

CSR_MIE

r/w

Machine interrupt enable register

X

0x305

mtvec

CSR_MTVEC

r/w

Machine trap-handler base address (for ALL traps)

0x306

mcounteren

CSR_MCOUNTEREN

r/w

Machine counter-enable register

C

Machine Trap Handling

0x340

mscratch

CSR_MSCRATCH

r/w

Machine scratch register

0x341

mepc

CSR_MEPC

r/w

Machine exception program counter

0x342

mcause

CSR_MCAUSE

r/w

Machine trap cause

X

0x343

mtval

CSR_MTVAL

r/-

Machine bad address or instruction

R

0x344

mip

CSR_MIP

r/-

Machine interrupt pending register

XR

Machine Physical Memory Protection

0x3a0 .. 0x3af

pmpcfg0 .. pmpcfg15

CSR_PMPCFG0 .. CSR_PMPCFG15

r/w

Physical memory protection config. for region 0..63

C

0x3b0 .. 0x3ef

pmpaddr0 .. pmpaddr63

CSR_PMPADDR0 .. CSR_PMPADDR63

r/w

Physical memory protection addr. register region 0..63

(Machine) Counters and Timers

0xb00

mcycle

CSR_MCYCLE

r/w

Machine cycle counter low word

0xb02

_minstret

CSR_MINSTRET

r/w

Machine instruction-retired counter low word

0xb80

mcycle[h]

CSR_MCYCLE

r/w

Machine cycle counter high word

0xb82

minstret[h]

CSR_MINSTRET

r/w

Machine instruction-retired counter high word

0xc00

cycle

CSR_CYCLE

r/-

Cycle counter low word

0xc01

time

CSR_TIME

r/-

System time (from MTIME) low word

0xc02

instret

CSR_INSTRET

r/-

Instruction-retired counter low word

0xc80

cycle[h]

CSR_CYCLEH

r/-

Cycle counter high word

0xc81

time[h]

CSR_TIMEH

r/-

System time (from MTIME) high word

0xc82

instret[h]

CSR_INSTRETH

r/-

Instruction-retired counter high word

Hardware Performance Monitors (HPM)

0x323 .. 0x33f

mhpmevent3 .. mhpmevent31

CSR_MHPMEVENT3 .. CSR_MHPMEVENT31

r/w

Machine performance-monitoring event selector 3..31

X

0xb03 .. 0xb1f

mhpmcounter3 .. mhpmcounter31

CSR_MHPMCOUNTER3 .. CSR_MHPMCOUNTER31

r/w

Machine performance-monitoring counter 3..31 low word

0xb83 .. 0xb9f

mhpmcounter3h .. mhpmcounter31h

CSR_MHPMCOUNTER3H .. CSR_MHPMCOUNTER31H

r/w

Machine performance-monitoring counter 3..31 high word

Machine Counter Setup

0x320

mcountinhibit

CSR_MCOUNTINHIBIT

r/w

Machine counter-enable register

Machine Information Registers

0xf11

mvendorid

CSR_MVENDORID

r/-

Vendor ID

0xf12

marchid

CSR_MARCHID

r/-

Architecture ID

0xf13

mimpid

CSR_MIMPID

r/-

Machine implementation ID / version

0xf14

mhartid

CSR_MHARTID

r/-

Machine thread ID

NEORV32-Specific Custom CSRs

0xfc0

mzext

CSR_MZEXT

r/-

Available Z* CPU extensions

3.7.1. Floating-Point CSRs

These CSRs are available if the Zfinx extensions is enabled (CPU_EXTENSION_RISCV_Zfinx is true). Otherwise any access to the floating-point CSRs will raise an illegal instruction exception.

fflags

0x001

Floating-point accrued exceptions

fflags

Reset value: UNDEFINED

The fflags CSR is compatible to the RISC-V specifications. It shows the accrued ("accumulated") exception flags in the lowest 5 bits. This CSR is only available if a floating-point CPU extension is enabled. See the RISC-V ISA spec for more information.

frm

0x002

Floating-point dynamic rounding mode

frm

Reset value: UNDEFINED

The frm CSR is compatible to the RISC-V specifications and is used to configure the rounding modes using the lowest 3 bits. This CSR is only available if a floating-point CPU extension is enabled. See the RISC-V ISA spec for more information.

fcsr

0x003

Floating-point control and status register

fcsr

Reset value: UNDEFINED

The fcsr CSR is compatible to the RISC-V specifications. It provides combined read/write access to the fflags and frm CSRs. This CSR is only available if a floating-point CPU extension is enabled. See the RISC-V ISA spec for more information.

3.7.2. Machine Trap Setup

mstatus

0x300

Machine status register

mstatus

Reset value: 0x00000000

The mstatus CSR is compatible to the RISC-V specifications. It shows the CPU’s current execution state. The following bits are implemented (all remaining bits are always zero and are read-only).

Table 35. Machine status register
Bit Name [C] R/W Function

12:11

CSR_MSTATUS_MPP_H : CSR_MSTATUS_MPP_L

r/w

Previous machine privilege level, 11 = machine (M) level, 00 = user (U) level

7

CSR_MSTATUS_MPIE

r/w

Previous machine global interrupt enable flag state

3

CSR_MSTATUS_MIE

r/w

Machine global interrupt enable flag

When entering an exception/interrupt, the MIE flag is copied to MPIE and cleared afterwards. When leaving the exception/interrupt (via the mret instruction), MPIE is copied back to MIE.

misa

0x301

ISA and extensions

misa

Reset value: configuration dependant

The misa CSR gives information about the actual CPU features. The lowest 26 bits show the implemented CPU extensions. The following bits are implemented (all remaining bits are always zero and are read-only).

The misa CSR is not fully RISC-V-compatible as it is read-only. Hence, implemented CPU extensions cannot be switch on/off during runtime. For compatibility reasons any write access to this CSR is simply ignored and will NOT cause an illegal instruction exception.
Table 36. Machine ISA and extension register
Bit Name [C] R/W Function

31:30

CSR_MISA_MXL_HI_EXT : CSR_MISA_MXL_LO_EXT

r/-

32-bit architecture indicator (always 01)

23

CSR_MISA_X_EXT

r/-

X extension bit is always set to indicate custom non-standard extensions

20

CSR_MISA_U_EXT

r/-

U CPU extension (user mode) available, set when CPU_EXTENSION_RISCV_U enabled

12

CSR_MISA_M_EXT

r/-

M CPU extension (mul/div) available, set when CPU_EXTENSION_RISCV_M enabled

8

CSR_MISA_I_EXT

r/-

I CPU base ISA, cleared when CPU_EXTENSION_RISCV_E enabled

4

CSR_MISA_E_EXT

r/-

E CPU extension (embedded) available, set when CPU_EXTENSION_RISCV_E enabled

2

CSR_MISA_C_EXT

r/-

C CPU extension (compressed instruction) available, set when CPU_EXTENSION_RISCV_C enabled

0

CSR_MISA_A_EXT

r/-

A CPU extension (atomic memory access) available, set when CPU_EXTENSION_RISCV_A enabled

Information regarding the implemented RISC-V Z* sub-extensions (like Zicsr or Zfinx) can be found in the mzext CSR.
mie

0x304

Machine interrupt-enable register

mie

Reset value: UNDEFINED

The mie CSR is compatible to the RISC-V specifications and features custom extensions for the fast interrupt channels. It is used to enabled specific interrupts sources. Please note that interrupts also have to be globally enabled via the CSR_MSTATUS_MIE flag of the mstatus CSR. The following bits are implemented (all remaining bits are always zero and are read-only):

Table 37. Machine ISA and extension register
Bit Name [C] R/W Function

31:16

CSR_MIE_FIRQ15E : CSR_MIE_FIRQ0E

r/w

Fast interrupt channel 15..0 enable

11

CSR_MIE_MEIE

r/w

Machine external interrupt enable

7

CSR_MIE_MTIE

r/w

Machine timer interrupt enable (from MTIME)

3

CSR_MIE_MSIE

r/w

Machine software interrupt enable

mtvec

0x305

Machine trap-handler base address

mtvec

Reset value: UNDEFINED

The mtvec CSR is compatible to the RISC-V specifications. It stores the base address for ALL machine traps. Thus, it defines the main entry point for exception/interrupt handling regardless of the actual trap source. The lowest two bits of this register are always zero and cannot be modified (= fixed address mode).

Table 38. Machine trap-handler base address
Bit R/W Function

31:2

r/w

4-byte aligned base address of trap base handler

1:0

r/-

Always zero

mcounteren

0x306

Machine counter enable

mcounteren

Reset value: UNDEFINED

The mcounteren CSR is compatible to the RISC-V specifications. The bits of this CSR define which counter/timer CSR can be accessed (read) from code running in a less-privileged modes. For example, if user-level code tries to read from a counter/timer CSR without enabled access, an illegal instruction exception is raised. If user mode in not implemented (CPU_EXTENSION_RISCV_U = false) all bits of the mcounteren CSR are tied to zero.

Table 39. Machine counter enable register
Bit Name [C] R/W Function

31:16

-

r/-

User-level code is not allowed to read HPM counter

2

CSR_MCOUNTEREN_IR

r/w

User-level code is allowed to read cycle[h] CSRs when set

1

CSR_MCOUNTEREN_TM

r/w

User-level code is allowed to read time[h] CSRs when set

0

CSR_MCOUNTEREN_CY

r/w

User-level code is allowed to read instret[h] CSRs when set

3.7.3. Machine Trap Handling

mscratch

0x340

Scratch register for machine trap handlers

mscratch

Reset value: UNDEFINED

The mscratch CSR is compatible to the RISC-V specifications. It is a general purpose scratch register that can be used by the exception/interrupt handler. The content pf this register after reset is undefined.

mepc

0x341

Machine exception program counter

mepc

Reset value: UNDEFINED

The mepc CSR is compatible to the RISC-V specifications. For exceptions (like an illegal instruction) this register provides the address of the exception-causing instruction. For Interrupt (like a machine timer interrupt) this register provides the address of the next not-yet-executed instruction.

mcause

0x342

Machine trap cause

mcause

Reset value: UNDEFINED

The mcause CSR is compatible to the RISC-V specifications. It show the cause ID for a taken exception.

Table 40. Machine trap cause register
Bit R/W Function

31

r/w

1 if the trap is caused by an interrupt (0 if the trap is caused by an exception)

30:5

r/-

Reserved, read as zero

4:0

r/w

Trap ID, see NEORV32 Trap Listing

mtval

0x343

Machine bad address or instruction

mtval

Reset value: UNDEFINED

The mtval CSR is compatible to the RISC-V specifications. When a trap is triggered, the CSR shows either the faulting address (for misaligned/faulting load/stores/fetch) or the faulting instruction itself (for illegal instructions). For interrupts the CSR is set to zero.

Table 41. Machine bad address or instruction register
Trap cause mtval content

misaligned instruction fetch address or instruction fetch access fault

address of faulting instruction fetch

breakpoint

program counter (= address) of faulting instruction itself

misaligned load address, load access fault, misaligned store address or store access fault

program counter (= address) of faulting instruction itself

illegal instruction

actual instruction word of faulting instruction

anything else including interrupts

0x00000000 (always zero)

The NEORV32 mtval CSR is read-only. A write access will raise an illegal instruction exception.

mip

0x344

Machine interrupt Pending

mip

Reset value: 0x00000000

The mip CSR is partly compatible to the RISC-V specifications and also provides custom extensions. It shows currently pending interrupts. Since this register is read-only, pending interrupt can only be cleared by disabling and re-enabling the according mie CSr bit. Writing to this CSR will raise an illegal instruction exception. The following CSR bits are implemented (all remaining bits are always zero and are read-only).

Table 42. Machine interrupt pending register
Bit Name [C] R/W Function

31:16

CSR_MIP_FIRQ15P : CSR_MIP_FIRQ0P

r/-

fast interrupt channel 15..0 pending

11

CSR_MIP_MEIP

r/-

machine external interrupt pending

7

CSR_MIP_MTIP

r/-

machine timer interrupt pending

3

CSR_MIP_MSIP

r/-

machine software interrupt pending

3.7.4. Machine Physical Memory Protection

The available physical memory protection logic is configured via the PMP_NUM_REGIONS and PMP_MIN_GRANULARITY top entity generics. PMP_NUM_REGIONS defines the number of implemented protection regions and thus, the availability of the according pmpcfg* and pmpaddr* CSRs.

If trying to access an PMP-related CSR beyond PMP_NUM_REGIONS no illegal instruction exception is triggered. The according CSRs are read-only (writes are ignored) and always return zero.
The RISC-V-compatible NEORV32 physical memory protection only implements the NAPOT (naturally aligned power-of-two region) mode with a minimal region granularity of 8 bytes.
pmpcfg

0x3a0 - 0x3af

Physical memory protection configuration registers

pmpcfg0 - pmpcfg15

Reset value: 0x00000000

The pmpcfg* CSRs are compatible to the RISC-V specifications. They are used to configure the protected regions, where each pmpcfg* CSR provides configuration bits for four regions. The following bits (for the first PMP configuration entry) are implemented (all remaining bits are always zero and are read-only):

Table 43. Physical memory protection configuration register entry
Bit RISC-V name R/W Function

7

L

r/w

lock bit, can be set – but not be cleared again (only via CPU reset)

6:5

-

r/-

reserved, read as zero

4:3

A

r/w

mode configuration; only OFF (00) and NAPOT (11) are supported

2

X

r/w

execute permission

1

W

r/w

write permission

0

R

r/w

read permission

pmpaddr

0x3b0 - 0x3ef

Physical memory protection configuration registers

pmpaddr0 - pmpaddr63

Reset value: UNDEFINED

The pmpaddr* CSRs are compatible to the RISC-V specifications. They are used to configure the base address and the region size.

When configuring PMP make sure to set pmpaddr* before activating the according region via pmpcfg*. When changing the PMP configuration, deactivate the according region via pmpcfg* before modifying pmpaddr*.

3.7.5. (Machine) Counters and Timers

The CPU_CNT_WIDTH generic defines the total size of the CPU’s cycle[h] and instret[h] / mcycle[h] and minstret[h] counter CSRs (low and high words combined); the time CSRs are not affected by this generic. Any configuration with CPU_CNT_WIDTH less than 64 is not RISC-V compliant.
If CPU_CNT_WIDTH is less than 64 (the default value) and greater than or equal 32, the according MSBs of [m]cycleh and [m]instreth are read-only and always read as zero. This configuration will also set the ZXSCNT flag in the mzext CSR.

If CPU_CNT_WIDTH is less than 32 and greater than 0, the [m]cycleh and [m]instreth do not exist and any access will raise an illegal instruction exception. Furthermore, the according MSBs of [m]cycle and [m]instret are read-only and always read as zero. This configuration will also set the ZXSCNT flag in the mzext CSR.

If CPU_CNT_WIDTH is 0, cycle[h] and instret[h] / mcycle[h] and minstret[h] do not exist and any access will raise an illegal instruction exception. This configuration will also set the ZXNOCNT flag in the mzext CSR.
cycle[h]

0xc00

Cycle counter - low word

cycle

0xc80

Cycle counter - high word

cycleh

Reset value: UNDEFINED

The cycle[h] CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit cycle counter. The cycle[h] CSR is a read-only shadowed copy of the mcycle[h] CSR.

time[h]

0xc01

System time - low word

time

0xc81

System time - high word

timeh

Reset value: UNDEFINED

The time[h] CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit system time. The system time is either generated by the processor-internal MTIME system timer unit (if IO_MTIME_EN = true) or can be provided by an external timer unit via the processor’s mtime_i signal (if IO_MTIME_EN = false). CSR is read-only. Change the system time via the MTIME unit.

instret[h]

0xc02

Instructions-retired counter - low word

instret

0xc82

Instructions-retired counter - high word

instreth

Reset value: UNDEFINED

The instret[h] CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit retired instructions counter. The instret[h] CSR is a read-only shadowed copy of the minstret[h] CSR.

mcycle[h]

0xb00

Machine cycle counter - low word

mcycle

0xb80

Machine cycle counter - high word

mcycleh

Reset value: UNDEFINED

The mcycle[h] CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit cycle counter. The mcycle[h] CSR can also be written when in machine mode and is copied to the cycle[h] CSR.

minstret[h]

0xb02

Machine instructions-retired counter - low word

minstret

0xb82

Machine instructions-retired counter - high word

minstreth

Reset value: UNDEFINED

The minstret[h] CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit retired instructions counter. The minstret[h] CSR also be written when in machine mode and is copied to the instret[h] CSR.

3.7.6. Hardware Performance Monitors (HPM)

The available hardware performance logic is configured via the HPM_NUM_CNTS top entity generic, which defines the number of implemented performance monitors and thus, the availability of the according mhpmcounter*[h] and mhpmevent* CSRs.

The HPM system only implements machine-mode access. Hence, hpmcounter*[h] CSR are not implemented and any access (even) from machine mode will raise an exception. Furthermore, the according bits of mcounteren used to configure user-mode access to hpmcounter*[h] are hard-wired to zero.

The total counter size of the HPMs can be configured before synthesis via the HPM_CNT_WIDTH generic (0..64-bit).

If trying to access an HPM-related CSR beyond HPM_NUM_CNTS no illegal instruction exception is triggered. The according CSRs are read-only (writes are ignored) and always return zero.
The total LSB-aligned HPM counter size (low word CSR + high word CSR) is defined via the HPM_NUM_CNTS generic (0..64-bit). If HPM_NUM_CNTS is less than 64, all unused MSB-aligned bits are hardwired to zero.
mhpmevent

0x232 -0x33f

Machine hardware performance monitor event selector

mhpmevent3 - mhpmevent31

Reset value: UNDEFINED

The mhpmevent* CSRs are compatible to the RISC-V specifications. The configuration of these CSR define the architectural events that cause the according [m]hpmcounter*[h] counters to increment. All available events are listed in the table below. If more than one event is selected, the according counter will increment if any of the enabled events is observed (logical OR). Note that the counter will only increment by 1 step per clock cycle even if more than one event is observed. If the CPU is in sleep mode, no HPM counter will increment at all.

The available hardware performance logic is configured via the HPM_NUM_CNTS top entity generic. HPM_NUM_CNTS defines the number of implemented performance monitors and thus, the availability of the according [m]hpmcounter*[h] and mhpmevent* CSRs.

Table 44. HPM event selector
Bit Name [C] R/W Event

0

HPMCNT_EVENT_CY

r/w

active clock cycle (not in sleep)

1

-

r/-

not implemented, always read as zero

2

HPMCNT_EVENT_IR

r/w

retired instruction

3

HPMCNT_EVENT_CIR

r/w

retired compressed instruction

4

HPMCNT_EVENT_WAIT_IF

r/w

instruction fetch memory wait cycle (if more than 1 cycle memory latency)

5

HPMCNT_EVENT_WAIT_II

r/w

instruction issue pipeline wait cycle (if more than 1 cycle latency), caused by pipelines flushes (like taken branches)

6

HPMCNT_EVENT_WAIT_MC

r/w

multi-cycle ALU operation wait cycle

7

HPMCNT_EVENT_LOAD

r/w

load operation

8

HPMCNT_EVENT_STORE

r/w

store operation

9

HPMCNT_EVENT_WAIT_LS

r/w

load/store memory wait cycle (if more than 1 cycle memory latency)

10

HPMCNT_EVENT_JUMP

r/w

unconditional jump

11

HPMCNT_EVENT_BRANCH

r/w

conditional branch (taken or not taken)

12

HPMCNT_EVENT_TBRANCH

r/w

taken conditional branch

13

HPMCNT_EVENT_TRAP

r/w

entered trap

14

HPMCNT_EVENT_ILLEGAL

r/w

illegal instruction exception

mhpmcounter[h]

0xb03 - 0xb1f

Machine hardware performance monitor - counter low

mhpmcounter3 - mhpmcounter31

0xb83 - 0xb9f

Machine hardware performance monitor - counter high

mhpmcounter3h - mhpmcounter31h

Reset value: UNDEFINED

The mhpmcounter*[h] CSRs are compatible to the RISC-V specifications. These CSRs provide the lower/upper 32- bit of arbitrary event counters. The event(s) that trigger an increment of theses counters are selected via the according mhpmevent* CSRs bits.

3.7.7. Machine Counter Setup

mcountinhibit

0x320

Machine counter-inhibit register

mcountinhibit

Reset value: UNDEFINED

The mcountinhibit CSR is compatible to the RISC-V specifications. The bits in this register define which counter/timer CSR are allowed to perform an automatic increment. Automatic update is enabled if the according bit in mcountinhibit is cleared. The following bits are implemented (all remaining bits are always zero and are read-only).

Table 45. Machine counter-inhibit register
Bit Name [C] R/W Event

0

CSR_MCOUNTINHIBIT_IR

r/w

the [m]instret[h] CSRs will auto-increment with each committed instruction when set

2

CSR_MCOUNTINHIBIT_IR

r/w

the [m]cycle[h] CSRs will auto-increment with each clock cycle (if CPU is not in sleep state) when set

3:31

CSR_MCOUNTINHIBIT_HPM3 : _CSR_MCOUNTINHIBIT_HPM31

r/w

the [m]hpmcount*[h] CSRs will auto-increment according to the configured mhpmevent* selector

3.7.8. Machine Information Registers

mvendorid

0xf11

Machine vendor ID

mvendorid

Reset value: 0x00000000

The mvendorid CSR is compatible to the RISC-V specifications. It is read-only and always reads zero.

marchid

0xf12

Machine architecture ID

marchid

Reset value: 0x00000013

The marchid CSR is compatible to the RISC-V specifications. It is read-only and shows the NEORV32 official RISC-V open-source architecture ID (decimal: 19, 32-bit hexadecimal: 0x00000013).

mimpid

0xf13

Machine implementation ID

mimpid

Reset value: HW version number

The mimpid CSR is compatible to the RISC-V specifications. It is read-only and shows the version of the NEORV32 as BCD-coded number (example: mimpid = 0x01020312 → 01.02.03.12 → version 1.2.3.12).

mhartid

0xf14

Machine hardware thread ID

mhartid

Reset value: HW_THREAD_ID generic

The mhartid CSR is compatible to the RISC-V specifications. It is read-only and shows the core’s hart ID, which is assigned via the CPU’s HW_THREAD_ID generic.

3.7.9. NEORV32-Specific Custom CSRs

mzext

0xfc0

Available Z* extensions

mzext

Reset value: 0x00000000

The mzext CSR is a custom read-only CSR that shows the implemented Z* extensions. The following bits are implemented (all remaining bits are always zero). The entire CSR is read-only.

Table 46. Machine counter-inhibit register
Bit Name [C] R/W Event

0

CPU_MZEXT_ZICSR

r/-

Zicsr extensions available (enabled via CPU_EXTENSION_RISCV_Zicsr generic)

1

CPU_MZEXT_ZIFENCEI

r/-

Zifencei extensions available (enabled via CPU_EXTENSION_RISCV_Zifencei generic)

5

CPU_MZEXT_ZFINX

r/-

Zfinx extensions available (enabled via CPU_EXTENSION_RISCV_Zfinx generic)

6

CPU_MZEXT_ZXSCNT

r/-

custom extension: "Small CPU counters": cycle[h] & instret[h] CSRs have less than 64-bit when set (when CPU_CNT_WIDTH generic is less than 64)

7

CPU_MZEXT_ZXNOCNT

r/-

custom extension: "NO CPU counters": cycle[h] & instret[h] CSRs are not available at all when set (when CPU_CNT_WIDTH generic is 0)

8

CSR_MZEXT_PMP

r/-

PMP (physical memory protection) extension available (PMP_NUM_REGIONS generic > 0)

9

CSR_MZEXT_HPM

r/-

HPM (hardware performance monitors) extension available (HPM_NUM_CNTS generic > 0)

10

CSR_MZEXT_DEBUGMODE

r/-

RISC-V "CPU debug mode" extension available (enabled via CPU_EXTENSION_RISCV_DEBUG generic)

3.7.10. Execution Safety

The hardware of the NEORV32 CPU was designed for maximum execution safety. If the Zicsr CPU extension is enabled, the core supports all traps specified by the official RISC-V specifications (obviously, not the ones that are related to yet unimplemented extensions/features). Thus, the CPU provides well-defined hardware fall-backs for (nearly) everything that can go wrong. Even if any kind of trap is triggered, the core is always in a defined and fully synchronized state throughout the whole architecture (i.e. no need to make out-of-order operations undone) that allows predictable execution behavior at any time.

Core Safety Features

  • Due to the acknowledged memory accesses the CPU is always sync with the memory system (no speculative execution / out-of-order states).

  • The CPU supports all bus exceptions including bus access exceptions that are triggered if an accessed address does not respond or encounters an internal error during access (which is a rare feature in many open-source RISC-V cores).

  • The CPU raises an illegal instruction trap for all unimplemented/malformed/illegal instructions (to support full virtualization).

  • If user-level code tries to read from machine-level-only CSRs (like mstatus) an illegal instruction exception is raised. The results of this operations is always zero (though, machine-level code handling this exception can modify the target register of the illegal access-causing instruction to allow full virtualization). Illegal write accesses to machine CSRs will not be write any data at all.

  • Illegal user-level memory accesses to protected addresses or address regions (via physical memory protection) will not be conducted at all (no actual write and no actual read; prevents triggering of memory-mapped devices). Illegal load operations will not return any data (the instruction’s destination register will not be written at all).

3.7.11. Traps, Exceptions and Interrupts

In this document a (maybe) special nomenclature regarding traps is used:

  • interrupt = asynchronous exceptions

  • exceptions = synchronous exceptions

  • traps = exceptions + interrupts (synchronous or asynchronous exceptions)

Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in the mtvec CSR. The cause of the according interrupt or exception can be determined via the content of the mcause CSR The address that reflected the current program counter when a trap was taken is stored to mepc. Additional information regarding the cause of the trap can be retrieved from mtval.

The traps are prioritized. If several exceptions occur at once only the one with highest priority is triggered. If several interrupts trigger at once, the one with highest priority is triggered while the remaining ones are queued. After completing the interrupt handler the interrupt with the second highest priority will issues and so on.

Memory Access Exceptions

If a load operation causes any exception, the destination register is not written at all. Exceptions caused by a misalignment or a physical memory protection fault do not trigger a bus read-operation at all. Exceptions caused by a store address misalignment or a store physical memory protection fault do not trigger a bus write-operation at all.

Instruction Atomicity

All instructions execute as atomic operations – interrupts can only trigger between two instructions.

Custom Fast Interrupt Request Lines

As a custom extension, the NEORV32 CPU features 16 fast interrupt request lines via the firq_i CPU (/Processor) top entity signals. These interrupts have custom configuration and status flags in the mie and mip CSRs and also provide custom trap codes in mcause.

Non-Maskable Interrupt

The NEORV32 CPU features a single non-maskable interrupt source via the nm_irq_i CPU (/Processor) top entity signal that can be used to signal critical system conditions. This interrupt source cannot be disabled at all (even not in interrupt service routines). Hence, it does not provide configuration/status flags in the mie and mip CSRs. The RISC-V-compatible mcause value 0x80000000 is used to indicate the non-maskable interrupt.

All CPU/Processor interrupt request signals are triggered when the signal is high for exactly one cycle (being high for several cycles might cause multiple triggering of the interrupt).
NEORV32 Trap Listing
Table 47. NEORV32 trap listing
Prio. mcause [RISC-V] ID [C] Cause mepc mtval

1

0x80000000

1.0

TRAP_CODE_NMI

non-maskable interrupt

I-PC

0

2

0x8000000B

1.11

TRAP_CODE_MEI

machine external interrupt

I-PC

0

3

0x80000003

1.3

TRAP_CODE_MSI

machine software interrupt

I-PC

0

4

0x80000007

1.7

TRAP_CODE_MTI

machine timer interrupt

I-PC

0

5

0x80000010

1.16

TRAP_CODE_FIRQ_0

fast interrupt request channel 0

I-PC

0

6

0x80000011

1.17

TRAP_CODE_FIRQ_1

fast interrupt request channel 1

I-PC

0

7

0x80000012

1.18

TRAP_CODE_FIRQ_2

fast interrupt request channel 2

I-PC

0

8

0x80000013

1.19

TRAP_CODE_FIRQ_3

fast interrupt request channel 3

I-PC

0

9

0x80000014

1.20

TRAP_CODE_FIRQ_4

fast interrupt request channel 4

I-PC

0

10

0x80000015

1.21

TRAP_CODE_FIRQ_5

fast interrupt request channel 5

I-PC

0

11

0x80000016

1.22

TRAP_CODE_FIRQ_6

fast interrupt request channel 6

I-PC

0

12

0x80000017

1.23

TRAP_CODE_FIRQ_7

fast interrupt request channel 7

I-PC

0

13

0x80000018

1.24

TRAP_CODE_FIRQ_8

fast interrupt request channel 8

I-PC

0

14

0x80000019

1.25

TRAP_CODE_FIRQ_9

fast interrupt request channel 9

I-PC

0

15

0x8000001a

1.26

TRAP_CODE_FIRQ_10

fast interrupt request channel 10

I-PC

0

16

0x8000001b

1.27

TRAP_CODE_FIRQ_11

fast interrupt request channel 11

I-PC

0

17

0x8000001c

1.28

TRAP_CODE_FIRQ_12

fast interrupt request channel 12

I-PC

0

18

0x8000001d

1.29

TRAP_CODE_FIRQ_13

fast interrupt request channel 13

I-PC

0

19

0x8000001e

1.30

TRAP_CODE_FIRQ_14

fast interrupt request channel 14

I-PC

0

20

0x8000001f

1.31

TRAP_CODE_FIRQ_15

fast interrupt request channel 15

I-PC

0

21

0x00000001

0.1

TRAP_CODE_I_ACCESS

instruction access fault

B-ADR

PC

22

0x00000002

0.2

TRAP_CODE_I_ILLEGAL

illegal instruction

PC

Inst

23

0x00000000

0.0

TRAP_CODE_I_MISALIGNED

instruction address misaligned

B-ADR

PC

24

0x0000000B

0.11

TRAP_CODE_MENV_CALL

environment call from M-mode (ECALL in machine-mode)

PC

PC

25

0x00000008

0.8

TRAP_CODE_UENV_CALL

environment call from U-mode(ECALL in user-mode)

PC

PC

26

0x00000003

0.3

TRAP_CODE_BREAKPOINT

breakpoint (EBREAK)

PC

PC

27

0x00000006

0.6

TRAP_CODE_S_MISALIGNED

store address misaligned

B-ADR

B-ADR

28

0x00000004

0.4

TRAP_CODE_L_MISALIGNED

load address misaligned

B-ADR

B-ADR

29

0x00000007

0.7

TRAP_CODE_S_ACCESS

store access fault

B-ADR

B-ADR

30

0x00000005

0.5

TRAP_CODE_L_ACCESS

lad access fault

B-ADR

B-ADR

Notes

The "Prio." column shows the priority of each trap. The highest priority is 1. The “mcause” column shows the cause ID of the according trap that is written to mcause CSR. The "[RISC-V]" columns show the interrupt/exception code value from the official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (sw/lib/include/neorv32.h) and can be used in plain C code. The “mepc” and “mtval” columns show the value written to mepc and mtval CSRs when a trap is triggered:

  • I-PC - address of interrupted instruction (instruction has not been execute/completed yet)

  • B-ADR- bad memory access address that cause the trap

  • PC - address of instruction that caused the trap

  • 0 - zero

  • Inst - the faulting instruction itself

3.7.12. Bus Interface

The CPU provides two independent bus interfaces: One for fetching instructions (i_bus_*) and one for accessing data (d_bus_*) via load and store operations. Both interfaces use the same interface protocol.

Address Space

The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard Architecture. Each of this interfaces can access an address space of up to 232 bytes (4GB). The memory system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU does not support unaligned memory accesses in hardware – however, a software-based handling can be implemented as any unaligned memory access will trigger an according exception.

Interface Signals

The following table shows the signals of the data and instruction interfaces seen from the CPU (*_o signals are driven by the CPU / outputs, *_i signals are read by the CPU / inputs).

Table 48. CPU bus interface
Signal Size Function

bus_addr_o

32

access address

bus_rdata_i

32

data input for read operations

bus_wdata_o

32

data output for write operations

bus_ben_o

4

byte enable signal for write operations

bus_we_o

1

bus write access

bus_re_o

1

bus read access

bus_lock_o

1

exclusive access request

bus_ack_i

1

accessed peripheral indicates a successful completion of the bus transaction

bus_err_i

1

accessed peripheral indicates an error during the bus transaction

bus_fence_o

1

this signal is set for one cycle when the CPU executes a data/instruction fence operation

bus_priv_o

2

current CPU privilege level

Currently, there a no pipelined or overlapping operations implemented within the same bus interface. So only a single transfer request can be "on the fly".
Protocol

A bus request is triggered either by the bus_re_o signal (for reading data) or by the bus_we_o signal (for writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is completed when the accessed peripheral either sets the bus_ack_i signal (→ successful completion) or the bus_err_i signal is set (→ failed completion). All these control signals are only active (= high) for one single cycle. An error indicated via the bus_err_i signal during a transfer will trigger the according instruction bus access fault or load/store bus access fault exception.

The transfer can be completed directly in the same cycle as it was initiated (via the bus_re_o or bus_we_o signal) if the peripheral sets bus_ack_i or bus_err_i high for one cycle. However, in order to shorten the critical path such "asynchronous" completion should be avoided. The default processor-internal module provide exactly one cycle delay between initiation and completion of transfers.
Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency
Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle). However, the bus transaction has to be completed (= acknowledged) within a certain response time window. This time window is defined by the global max_proc_int_response_time_c constant (default = 15 cycles) from the processor’s VHDL package file (rtl/neorv32_package.vhd). It defines the maximum number of cycles after which an unacknowledged processor-internal bus transfer will timeout and raise a bus fault exception. The BUSKEEPER hardware module (rtl/core/neorv32_bus_keeper.vhd) keeps track of all internal bus transactions. If any bus operations times out (for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception. Note that the bus keeper does not track external accesses via the external memory bus interface. However, the external memory bus interface also provides an optional bus timeout (see section Processor-External Memory Interface (WISHBONE) (AXI4-Lite)).

Exemplary Bus Accesses

Table 49. Example bus accesses: see read/write access description below
read
write

Read access

Write access

Write Access

For a write access, the accessed address (bus_addr_o), the data to be written (bus_wdata_o) and the byte enable signals (bus_ben_o) are set when bus_we_o goes high. These three signals are kept stable until the transaction is completed. In the example the accessed peripheral cannot answer directly in the next cycle after issuing. Here, the transaction is successful and the peripheral sets the bus_ack_i signal several cycles after issuing.

Read Access

For a read access, the accessed address (bus_addr_o) is set when bus_re_o goes high. The address is kept stable until the transaction is completed. In the example the accessed peripheral cannot answer directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as the bus transaction is completed (here, the transaction is successful and the peripheral sets the bus_ack_i signal).

Access Boundaries

The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16- bit) and word (= 32-bit) boundaries.

Exclusive (Atomic) Access

The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional combination. Normally, these combinations should target the same memory address.

The CPU starts an exclusive access to memory via the load-reservate instruction (lr.w). This instruction will set the CPU-internal exclusive access lock, which directly drives the d_bus_lock_o. It is the task of the memory system to manage this exclusive access reservation by storing the according access address and the source of the access itself (for example via the CPU ID in a multi-core system).

When the CPU executes a store-conditional instruction (sc.w) the CPU-internal exclusive access lock is evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back zero and will allow the according store operation to the memory system. If the lock is broken, the instruction will write-back non-zero and will not generate an actual memory store operation.

The CPU-internal exclusive access lock is broken if at least one of the situations appear.

  • when executing any other memory-access operation than lr.w

  • when any trap (sync. or async.) is triggered (for example to force a context switch)

  • when the memory system signals a bus error (via the bus_err_i signal)

For more information regarding the SoC-level behavior and requirements of atomic operations see section Processor-External Memory Interface (WISHBONE) (AXI4-Lite).

Memory Barriers

Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle (d_bus_fence_o for a fence instruction; i_bus_fence_o for a fencei instruction). It is the task of the memory system to perform the necessary operations (like a cache flush and refill).

3.7.13. CPU Hardware Reset

In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use a dedicated hardware reset. "Uncritical registers" in this context means that the initial value of these registers after power-up is not relevant for a defined CPU boot process.

Rational

A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage of the engine features an N-bit data register and a 1-bit status register. The status register is set when the data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback of the processing result to some kind of memory. The initial status of the data registers after power-up is irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not control the actual operation (in contrast to the status register). This makes the pipeline data registers from this example "uncritical registers".

NEORV32 CPU Reset

In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main control engine that actually features a defined reset. The initialization of most of the CPU’s core CSRs (like interrupt control) is done by the software (to be more specific, this is done by the crt0.S start-up code).

During the very early boot process (where crt0.S is running) there is no chance for undefined behavior due to the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (mie) does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire because the global interrupt enabled flag in the status register (mstatsus(mie)) provides a dedicated hardware reset setting it to low (globally disabling interrupts).

Reset Configuration

Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don’t care" value (VHDL '-') is used for initialization of the uncritical register, effectively generating a flip-flop without a reset. However, certain applications or situations (like advanced gate-level / timing simulations) might require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can be enabled via a constant in the main VHDL package file (rtl/core/neorv32_package.vhd):

-- "critical" number of PMP regions --
constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value
for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW),
default; TRUE=defined LOW reset value)

4. Software Framework

To make actual use of the NEORV32 processor, the project comes with a complete software eco-system. This ecosystem is based on the RISC-V port of the GCC GNU Compiler Collection and consists of the following elementary parts:

Application/bootloader start-up code

sw/common/crt0.S

Application/bootloader linker script

sw/common/neorv32.ld

Core hardware driver libraries

sw/lib/include/ & sw/lib/source/

Makefiles

e.g. sw/example/blink_led/makefile

Auxiliary tool for generating NEORV32 executables

sw/image_gen/

Default bootloader

sw/bootloader/bootloader.c

Last but not least, the NEORV32 ecosystem provides some example programs for testing the hardware, for illustrating the usage of peripherals and for general getting in touch with the project (sw/example).

4.1. Compiler Toolchain

The toolchain for this project is based on the free RISC-V GCC-port. You can find the compiler sources and build instructions on the official RISC-V GNU toolchain GitHub page: https://github.com/riscv/riscv-gnutoolchain.

The NEORV32 implements a 32-bit base integer architecture (rv32i) and a 32-bit integer and soft-float ABI (ilp32), so make sure you build an according toolchain.

Alternatively, you can download my prebuilt rv32i/e toolchains for 64-bit x86 Linux from: https://github.com/stnolting/riscv-gcc-prebuilt

The default toolchain prefix used by the project’s makefiles is (can be changed in the makefiles): riscv32-unknown-elf

More information regarding the toolchain (building from scratch or downloading the prebuilt ones) can be found in the user guides' section Software Toolchain Setup.

4.2. Core Libraries

The NEORV32 project provides a set of C libraries that allows an easy usage of the processor/CPU features. Just include the main NEORV32 library file in your application’s source file(s):

#include <neorv32.h>

Together with the makefile, this will automatically include all the processor’s header files located in sw/lib/include into your application. The actual source files of the core libraries are located in sw/lib/source and are automatically included into the source list of your software project. The following files are currently part of the NEORV32 core library:

C source file C header file Description

-

neorv32.h

main NEORV32 definitions and library file

neorv32_cfs.c

neorv32_cfs.h

HW driver (stub)[13] functions for the custom functions subsystem

neorv32_cpu.c

neorv32_cpu.h

HW driver functions for the NEORV32 CPU

neorv32_gpio.c

neorv32_gpio.h

HW driver functions for the GPIO

-

neorv32_intrinsics.h

macros for custom intrinsics/instructions

neorv32_mtime.c

neorv32_mtime.h

HW driver functions for the MTIME

neorv32_nco.c

neorv32_nco.h

HW driver functions for the NCO

neorv32_neoled.c

neorv32_neoled.h

HW driver functions for the NEOLED

neorv32_pwm.c

neorv32_pwm.h

HW driver functions for the PWM

neorv32_rte.c

neorv32_rte.h

NEORV32 runtime environment and helpers

neorv32_spi.c

neorv32_spi.h

HW driver functions for the SPI

neorv32_trng.c

neorv32_trng.h

HW driver functions for the TRNG

neorv32_twi.c

neorv32_twi.h

HW driver functions for the TWI

neorv32_uart.c

neorv32_uart.h

HW driver functions for the UART0 and UART1

neorv32_wdt.c

neorv32_wdt.h

HW driver functions for the WDT

Documentation
All core library software sources are highly documented using doxygen. See section [Building the Software Framework Documentation]. The documentation is automatically built and deployed to GitHub pages by the CI workflow (:https://stnolting.github.io/neorv32/sw/files.html).

4.3. Application Makefile

Application compilation is based on GNU makefiles. Each project in the sw/example folder features a makefile. All these makefiles are identical. When creating a new project, copy an existing project folder or at least the makefile to your new project folder. I suggest to create new projects also in sw/example to keep the file dependencies. Of course, these dependencies can be manually configured via makefiles variables when your project is located somewhere else.

Before you can use the makefiles, you need to install the RISC-V GCC toolchain. Also, you have to add the installation folder of the compiler to your system’s PATH variable. More information can be found in chapter [_lets_get_it_started].

The makefile is invoked by simply executing make in your console:

neorv32/sw/example/blink_led$ make

4.3.1. Targets

Just executing make will show the help menu showing all available targets. The following targets are available:

help

Show a short help text explaining all available targets.

check

Check the compiler toolchain. You should run this target at least once after installing the toolchain.

info

Show the makefile configuration (see section Configuration).

exe

Compile all sources and generate application executable for upload via bootloader.

install

Compile all sources, generate executable (via exe target) for upload via bootloader and generate and install IMEM VHDL initialization image file rtl/core/neorv32_application_image.vhd.

all

Execute exe and install.

clean

Remove all generated files in the current folder.

clean_all

Remove all generated files in the current folder and also removes the compiled core libraries and the compiled image generator tool.

bootloader

Compile all sources, generate executable and generate and install BOOTROM VHDL initialization image file rtl/core/neorv32_bootloader_image.vhd. This target modifies the ROM origin and length in the linker script by setting the make_bootloader define.

upload

Upload NEORV32 executable to the bootloader via serial port

An assembly listing file (main.asm) is created by the compilation flow for further analysis or debugging purpose.

4.3.2. Configuration

The compilation flow is configured via variables right at the beginning of the makefile:

# *****************************************************************************
# USER CONFIGURATION
# *****************************************************************************
# User's application sources (*.c, *.cpp, *.s, *.S); add additional files here
APP_SRC ?= $(wildcard ./*.c) $(wildcard ./*.s) $(wildcard ./*.cpp) $(wildcard ./*.S)
# User's application include folders (don't forget the '-I' before each entry)
APP_INC ?= -I .
# User's application include folders - for assembly files only (don't forget the '-I' before each
entry)
ASM_INC ?= -I .
# Optimization
EFFORT ?= -Os
# Compiler toolchain
RISCV_TOOLCHAIN ?= riscv32-unknown-elf
# CPU architecture and ABI
MARCH ?= -march=rv32i
MABI  ?= -mabi=ilp32
# User flags for additional configuration (will be added to compiler flags)
USER_FLAGS ?=
# Serial port for executable upload via bootloer
COM_PORT ?= /dev/ttyUSB0
# Relative or absolute path to the NEORV32 home folder
NEORV32_HOME ?= ../../..
# *****************************************************************************

APP_SRC

The source files of the application (.c, .cpp, .S and .s files are allowed; file of these types in the project folder are automatically added via wildcards). Additional files can be added; separated by white spaces

APP_INC

Include file folders; separated by white spaces; must be defined with -I prefix

ASM_INC

Include file folders that are used only for the assembly source files (.S/.s).

EFFORT

Optimization level, optimize for size (-Os) is default; legal values: -O0, -O1, -O2, -O3, -Os

RISCV_TOOLCHAIN

The toolchain prefix to be used; follows the naming convention "architecture-vendor-output"

MARCH

The targetd RISC-V architecture/ISA. Only rv32 is supported by the NEORV32. Enable compiler support of optional CPU extension by adding the according extension letter (e.g. rv32im for M CPU extension). See section [_enabling_risc_v_cpu_extensions].

MABI

The default 32-bit integer ABI.

USER_FLAGS

Additional flags that will be forwarded to the compiler tools

NEORV32_HOME

Relative or absolute path to the NEORV32 project home folder. Adapt this if the makefile/project is not in the project’s sw/example folder.

COM_PORT

Default serial port for executable upload to bootloader.

4.3.3. Default Compiler Flags

The following default compiler flags are used for compiling an application. These flags are defined via the CC_OPTS variable. Custom flags can be appended via the USER_FLAGS variable to the CC_OPTS variable.

-Wall

Enable all compiler warnings.

-ffunction-sections

Put functions and data segment in independent sections. This allows a code optimization as dead code and unused data can be easily removed.

-nostartfiles

Do not use the default start code. The makefiles use the NEORV32-specific start-up code instead (sw/common/crt0.S).

-Wl,--gc-sections

Make the linker perform dead code elimination.

-lm

Include/link with math.h.

-lc

Search for the standard C library when linking.

-lgcc

Make sure we have no unresolved references to internal GCC library subroutines.

-mno-fdiv

Use builtin software functions for floating-point divisions and square roots (since the according instructions are not supported yet).

-falign-functions=4

Force a 32-bit alignment of functions and labels (branch/jump/call targets). This increases performance as it simplifies instruction fetch when using the C extension. As a drawback this will also slightly increase the program code.

-falign-labels=4

-falign-loops=4

-falign-jumps=4

The makefile configuration variables can be (re-)defined directly when invoking the makefile. For example: $ make MARCH=-march=rv32ic clean_all exe

4.4. Executable Image Format

In order to generate a file, which can be executed by the processor, all source files have to be compiler, linked and packed into a final executable.

4.4.1. Linker Script

When all the application sources have been compiled, they need to be linked in order to generate a unified program file. For this purpose the makefile uses the NEORV32-specific linker script sw/common/neorv32.ld for linking all object files that were generated during compilation.

The linker script defines three memory sections: rom, ram and iodev. Each section provides specific access attributes: read access (r), write access (w) and executable (x).

Table 50. Linker memory sections - general
Memory section Attributes Description

ram

rwx

Data memory address space (processor-internal/external DMEM)

rom

rx

Instruction memory address space (processor-internal/external IMEM) or internal bootloader ROM

iodev

rw

Processor-internal memory-mapped IO/peripheral devices address space

These sections are defined right at the beginning of the linker script:

Listing 1. Linker memory sections - cut-out from linker script neorv32.ld
MEMORY
{
  ram  (rwx) : ORIGIN = 0x80000000, LENGTH = 8*1024
  rom   (rx) : ORIGIN = DEFINED(make_bootloader) ? 0xFFFF0000 : 0x00000000, LENGTH = DEFINED(make_bootloader) ? 32K : 2048M
  iodev (rw) : ORIGIN = 0xFFFFFE00, LENGTH = 512
}

Each memory section provides a base address ORIGIN and a size LENGTH. The base address and size of the iodev section is fixed and must not be altered. The base addresses and sizes of the ram and rom regions correspond to the total available instruction and data memory address space (see section Address Space Layout).

ORIGIN of the ram section has to be always identical to the processor’s dspace_base_c hardware configuration. Additonally, ORIGIN of the rom section has to be always identical to the processor’s ispace_base_c hardware configuration.

The sizes of ram section has to be equal to the size of the physical available data instruction memory. For example, if the processor setup only uses processor-internal DMEM (MEM_INT_DMEM_EN = true and no external data memory attached) the LENGTH parameter of this memory section has to be equal to the size configured by the MEM_INT_DMEM_SIZE generic.

The sizes of rom section is a little bit more complicated. The default linker script configuration assumes a maximum of 2GB for this memory space which is also the default configuration of the processor’s hardware instruction memory address space. This size does not have to reflect the actual physical size of the instruction memory (internal IMEM and/or processor-external memory). It just provides a maximum limit. When uploading new executable via the bootloader, the bootloader itself checks if sufficient physical instruction memory is available. If a new executable is embedded right into the internal-IMEM the synthesis tool will check, if the configured instruction memory size is sufficient (e.g., via the MEM_INT_IMEM_SIZE generic).

The rom region uses a conditional assignment (via the make_bootloader symbol) for ORIGIN and LENGTH that is used to place "normal executable" (i.e. for the IMEM) or "the bootloader image" to their according memories.

The linker maps all the regions from the compiled object files into four final sections: .text, .rodata, .data and .bss. These four regions contain everything required for the application to run:

Table 51. Linker memory regions
Region Description

.text

Executable instructions generated from the start-up code and all application sources.

.rodata

Constants (like strings) from the application; also the initial data for initialized variables.

.data

This section is required for the address generation of fixed (= global) variables only.

.bss

This section is required for the address generation of dynamic memory constructs only.

The .text and .rodata sections are mapped to processor’s instruction memory space and the .data and .bss sections are mapped to the processor’s data memory space. Finally, the .text, .rodata and .data sections are extracted and concatenated into a single file main.bin.

4.4.2. Executable Image Generator

The main.bin file is packed by the NEORV32 image generator (sw/image_gen) to generate the final executable file.

The sources of the image generator are automatically compiled when invoking the makefile.

The image generator can generate three types of executables, selected by a flag when calling the generator:

-app_bin

Generates an executable binary file neorv32_exe.bin (for UART uploading via the bootloader).

-app_img

Generates an executable VHDL memory initialization image for the processor-internal IMEM. This option generates the rtl/core/neorv32_application_image.vhd file.

-bld_img

Generates an executable VHDL memory initialization image for the processor-internal BOOT ROM. This option generates the rtl/core/neorv32_bootloader_image.vhd file.

All these options are managed by the makefile. The normal application compilation flow will generate the neorv32_exe.bin executable to be upload via UART to the NEORV32 bootloader.

The image generator add a small header to the neorv32_exe.bin executable, which consists of three 32-bit words located right at the beginning of the file. The first word of the executable is the signature word and is always 0x4788cafe. Based on this word the bootloader can identify a valid image file. The next word represents the size in bytes of the actual program image in bytes. A simple "complement" checksum of the actual program image is given by the third word. This provides a simple protection against data transmission or storage errors.

4.4.3. Start-Up Code (crt0)

The CPU and also the processor require a minimal start-up and initialization code to bring the CPU (and the SoC) into a stable and initialized state and to initialize the C runtime environment before the actual application can be executed. This start-up code is located in sw/common/crt0.S and is automatically linked every application program and placed right before the actual application code so it gets executed right after reset.

The crt0.S start-up performs the following operations:

  1. Initialize all integer registers x1 - x31 (or jsut x1 - x15 when using the E CPU extension) to a defined value.

  2. Initialize the global pointer gp and the stack pointer sp according to the .data segment layout provided by the linker script.

  3. Initialize all CPU core CSRs and also install a default "dummy" trap handler for all traps. This handler catches all traps during the early boot phase.

  4. Clear IO area: Write zero to all memory-mapped registers within the IO region (iodev section). If certain devices have not been implemented, a bus access fault exception will occur. This exception is captured by the dummy trap handler.

  5. Clear the .bss section defined by the linker script.

  6. Copy read-only data from the .text section to the .data section to set initialized variables.

  7. Call the application’s main function (with no arguments: argc = argv = 0).

  8. If the main function returns crt0 can call an "after-main handler" (see below)

  9. If there is no after-main handler or after returning from the after-main handler the processor goes to an endless sleep mode (using a simple loop or via the wfi instruction if available).

After-Main Handler

If the application’s main() function actually returns, an after main handler can be executed. This handler can be a normal function since the C runtime is still available when executed. If this handler uses any kind of peripheral/IO modules make sure these are already initialized within the application or you have to initialize them inside the handler.

Listing 2. After-main handler - function prototype
void __neorv32_crt0_after_main(int32_t return_code);

The function has exactly one argument (return_code) that provides the return value of the application’s main function. For instance, this variable contains -1 if the main function returned with return -1;.

A simple printf can be used to inform the user when the application main function return (this example assumes that UART0 has been already properly configured in the actual application):

Listing 3. After-main handler - example
void __neorv32_crt0_after_main(int32_t return_code) {

  neorv32_uart_printf("Main returned with code: %i\n", return_code);
}

4.5. Bootloader

The default bootloader (sw/bootloader/bootloader.c) of the NEORV32 processor allows to upload new program executables at every time. If there is an external SPI flash connected to the processor (like the FPGA’s configuration memory), the bootloader can store the program executable to it. After reset, the bootloader can directly boot from the flash without any user interaction.

The bootloader is only implemented when the INT_BOOTLOADER_EN generic is true and requires the CSR access CPU extension (CPU_EXTENSION_RISCV_Zicsr generic is true).
The bootloader requires the primary UART (UART0) for user interaction (IO_UART0_EN generic is true).
For the automatic boot from an SPI flash, the SPI controller has to be implemented (IO_SPI_EN generic is true) and the machine system timer MTIME has to be implemented (IO_MTIME_EN generic is true), too, to allow an auto-boot timeout counter.

To interact with the bootloader, connect the primary UART (UART0) signals (uart0_txd_o and uart0_rxd_o) of the processor’s top entity via a serial port (-adapter) to your computer (hardware flow control is not used so the according interface signals can be ignored.), configure your terminal program using the following settings and perform a reset of the processor.

Terminal console settings (19200-8-N-1):

  • 19200 Baud

  • 8 data bits

  • no parity bit

  • 1 stop bit

  • newline on \r\n (carriage return, newline)

  • no transfer protocol / control flow protocol - just the raw byte stuff

The bootloader uses the LSB of the top entity’s gpio_o output port as high-active status LED (all other output pin are set to low level by the bootloader). After reset, this LED will start blinking at ~2Hz and the following intro screen should show up in your terminal:

<< NEORV32 Bootloader >>

BLDV: Mar 23 2021
HWV:  0x01050208
CLK:  0x05F5E100
USER: 0x10000DE0
MISA: 0x40901105
ZEXT: 0x00000023
PROC: 0x0EFF0037
IMEM: 0x00004000 bytes @ 0x00000000
DMEM: 0x00002000 bytes @ 0x80000000

Autoboot in 8s. Press key to abort.

This start-up screen also gives some brief information about the bootloader and several system configuration parameters:

BLDV

Bootloader version (built date).

HWV

Processor hardware version (from the mimpid CSR) in BCD format (example: 0x01040606 = v1.4.6.6).

USER

Custom user code (from the USER_CODE generic).

CLK

Processor clock speed in Hz (via the SYSINFO module, from the CLOCK_FREQUENCY generic).

MISA

CPU extensions (from the misa CSR).

ZEXT

CPU sub-extensions (from the mzext CSR)

PROC

Processor configuration (via the SYSINFO module, from the IO_* and MEM_* configuration generics).

IMEM

IMEM memory base address and size in byte (from the MEM_INT_IMEM_SIZE generic).

DMEM

DMEM memory base address and size in byte (from the MEM_INT_DMEM_SIZE generic).

Now you have 8 seconds to press any key. Otherwise, the bootloader starts the auto boot sequence. When you press any key within the 8 seconds, the actual bootloader user console starts:

<< NEORV32 Bootloader >>

BLDV: Mar 23 2021
HWV:  0x01050208
CLK:  0x05F5E100
USER: 0x10000DE0
MISA: 0x40901105
ZEXT: 0x00000023
PROC: 0x0EFF0037
IMEM: 0x00004000 bytes @ 0x00000000
DMEM: 0x00002000 bytes @ 0x80000000

Autoboot in 8s. Press key to abort.
Aborted.

Available commands:
h: Help
r: Restart
u: Upload
s: Store to flash
l: Load from flash
e: Execute
CMD:>

The auto-boot countdown is stopped and now you can enter a command from the list to perform the corresponding operation:

  • h: Show the help text (again)

  • r: Restart the bootloader and the auto-boot sequence

  • u: Upload new program executable (neorv32_exe.bin) via UART into the instruction memory

  • s: Store executable to SPI flash at spi_csn_o(0)

  • l: Load executable from SPI flash at spi_csn_o(0)

  • e: Start the application, which is currently stored in the instruction memory (IMEM)

  • #: Shortcut for executing u and e afterwards (not shown in help menu)

A new executable can be uploaded via UART by executing the u command. After that, the executable can be directly executed via the e command. To store the recently uploaded executable to an attached SPI flash press s. To directly load an executable from the SPI flash press l. The bootloader and the auto-boot sequence can be manually restarted via the r command.

The CPU is in machine level privilege mode after reset. When the bootloader boots an application, this application is also started in machine level privilege mode.

4.5.1. External SPI Flash for Booting

If you want the NEORV32 bootloader to automatically fetch and execute an application at system start, you can store it to an external SPI flash. The advantage of the external memory is to have a non-volatile program storage, which can be re-programmed at any time just by executing some bootloader commands. Thus, no FPGA bitstream recompilation is required at all.

SPI Flash Requirements

The bootloader can access an SPI compatible flash via the processor top entity’s SPI port and connected to chip select spi_csn_o(0). The flash must be capable of operating at least at 1/8 of the processor’s main clock. Only single read and write byte operations are used. The address has to be 24 bit long. Furthermore, the SPI flash has to support at least the following commands:

  • READ (0x03)

  • READ STATUS (0x05)

  • WRITE ENABLE (0x06)

  • PAGE PROGRAM (0x02)

  • SECTOR ERASE (0xD8)

  • READ ID (0x9E)

Compatible (FGPA configuration) SPI flash memories are for example the "Winbond W25Q64FV2 or the "Micron N25Q032A".

SPI Flash Configuration

The base address SPI_FLASH_BOOT_ADR for the executable image inside the SPI flash is defined in the "user configuration" section of the bootloader source code (sw/bootloader/bootloader.c). Most FPGAs that use an external configuration flash, store the golden configuration bitstream at base address 0. Make sure there is no address collision between the FPGA bitstream and the application image. You need to change the default sector size if your flash has a sector size greater or less than 64kB:

/** SPI flash boot image base address */
#define SPI_FLASH_BOOT_ADR 0x00800000
/** SPI flash sector size in bytes */
#define SPI_FLASH_SECTOR_SIZE (64*1024)
More information regarding customization of the bootloader can be found in section Customizing the Internal Bootloader of the NEORV32 user guide. This guide also provides a tutorial how to program an external SPI flash to automatically boot from it after reset ( Programming an External SPI Flash via the Bootloader).

4.5.2. Auto Boot Sequence

When you reset the NEORV32 processor, the bootloader waits 8 seconds for a user console input before it starts the automatic boot sequence. This sequence tries to fetch a valid boot image from the external SPI flash, connected to SPI chip select spi_csn_o(0). If a valid boot image is found and can be successfully transferred into the instruction memory, it is automatically started. If no SPI flash was detected or if there was no valid boot image found, the bootloader stalls and the status LED is permanently activated.

4.5.3. Bootloader Error Codes

If something goes wrong during bootloader operation, an error code is shown. In this case the processor stalls, a bell command and one of the following error codes are send to the terminal, the bootloader status LED is permanently activated and the system must be reset manually.

ERROR_0

If you try to transfer an invalid executable (via UART or from the external SPI flash), this error message shows up. There might be a transfer protocol configuration error in the terminal program. See section [_uploading_and_starting_of_a_binary_executable_image_via_uart] for more information. Also, if no SPI flash was found during an auto-boot attempt, this message will be displayed.

ERROR_1

Your program is way too big for the internal processor’s instructions memory. Increase the memory size or reduce (optimize!) your application code.

ERROR_2

This indicates a checksum error. Something went wrong during the transfer of the program image (upload via UART or loading from the external SPI flash). If the error was caused by a UART upload, just try it again. When the error was generated during a flash access, the stored image might be corrupted.

ERROR_3

This error occurs if the attached SPI flash cannot be accessed. Make sure you have the right type of flash and that it is properly connected to the NEORV32 SPI port using chip select #0.

ERROR_4

This error pops up when an unexpected exception or interrupt was triggered. The cause of the trap (mcause CSR) is displayed for further investigation. This might be caused if an ISA extension is used that has not been synthesized.

ERROR_?

Something really bad happened when there is no specific error code available :(

4.6. NEORV32 Runtime Environment

The NEORV32 provides a minimal runtime environment (RTE) that takes care of a stable and safe execution environment by handling all traps (including interrupts).

Using the RTE is optional. The RTE provides a simple and comfortable way of delegating traps while making sure that all traps (even though they are not explicitly used by the application) are handled correctly. Performance-optimized applications or embedded operating systems should not use the RTE for delegating traps.

When execution enters the application’s main function, the actual runtime environment is responsible for catching all implemented exceptions and interrupts. To activate the NEORV32 RTE execute the following function:

void neorv32_rte_setup(void);

This setup initializes the mtvec CSR, which provides the base entry point for all trap handlers. The address stored to this register reflects the first-level exception handler provided by the NEORV32 RTE. Whenever an exception or interrupt is triggered, this first-level handler is called.

The first-level handler performs a complete context save, analyzes the source of the exception/interrupt and calls the according second-level exception handler, which actually takes care of the exception/interrupt handling. For this, the RTE manages a private look-up table to store the addresses of the according trap handlers.

After the initial setup of the RTE, each entry in the trap handler’s look-up table is initialized with a debug handler, that outputs detailed hardware information via the primary UART (UART0) when triggered. This is intended as a fall-back for debugging or for accidentally-triggered exceptions/interrupts. For instance, an illegal instruction exception catched by the RTE debug handler might look like this in the UART0 output:

<RTE> Illegal instruction @0x000002d6, MTVAL=0x00001537 </RTE>

To install the actual application’s trap handlers the NEORV32 RTE provides functions for installing and un-installing trap handler for each implemented exception/interrupt source.

int neorv32_rte_exception_install(uint8_t id, void (*handler)(void));
ID name [C] Description / trap causing entry

RTE_TRAP_I_MISALIGNED

instruction address misaligned

RTE_TRAP_I_ACCESS

instruction (bus) access fault

RTE_TRAP_I_ILLEGAL

illegal instruction

RTE_TRAP_BREAKPOINT

breakpoint (ebreak instruction)

RTE_TRAP_L_MISALIGNED

load address misaligned

RTE_TRAP_L_ACCESS

load (bus) access fault

RTE_TRAP_S_MISALIGNED

store address misaligned

RTE_TRAP_S_ACCESS

store (bus) access fault

RTE_TRAP_MENV_CALL

environment call from machine mode (ecall instruction)

RTE_TRAP_UENV_CALL

environment call from user mode (ecall instruction)

RTE_TRAP_MTI

machine timer interrupt

RTE_TRAP_MEI

machine external interrupt

RTE_TRAP_MSI

machine software interrupt

RTE_TRAP_FIRQ_0 : RTE_TRAP_FIRQ_15

fast interrupt channel 0..15

When installing a custom handler function for any of these exception/interrupts, make sure the function uses no attributes (especially no interrupt attribute!), has no arguments and no return value like in the following example:

void handler_xyz(void) {

  // handle exception/interrupt...
}
Do NOT use the interrupt attribute for the application exception handler functions! This will place an mret instruction to the end of it making it impossible to return to the first-level exception handler of the RTE, which will cause stack corruption.

Example: Installation of the MTIME interrupt handler:

neorv32_rte_exception_install(EXC_MTI, handler_xyz);

To remove a previously installed exception handler call the according un-install function from the NEORV32 runtime environment. This will replace the previously installed handler by the initial debug handler, so even un-installed exceptions and interrupts are further captured.

int neorv32_rte_exception_uninstall(uint8_t id);

Example: Removing the MTIME interrupt handler:

neorv32_rte_exception_uninstall(EXC_MTI);
More information regarding the NEORV32 runtime environment can be found in the doxygen software documentation (also available online at GitHub pages).

5. On-Chip Debugger (OCD)

The NEORV32 Processor features an on-chip debugger (OCD) implementing execution-based debugging that is compatible to the Minimal RISC-V Debug Specification Version 0.13.2. Please refer to this spec for in-deep information. A copy of the specification is available in docs/references/riscv-debug-release.pdf. The NEORV32 OCD provides the following key features:

  • JTAG test access port

  • run-control of the CPU: halting, single-stepping and resuming

  • executing arbitrary programs during debugging

  • accessing core registers (direct access to GPRs, indirect access to CSRs via program buffer)

  • indirect access to the whole processor address space (via program buffer))

  • compatible to the RISC-V port of OpenOCD; pre-built binaries can be obtained for example from SiFive

OCD Security Note
Access via the OCD is always authenticated (dmstatus.authenticated == 1). Hence, the whole system can always be accessed via the on-chip debugger.
The OCD requires additional resources for implementation and might also increase the critical path resulting in less performance. If the OCD is not really required for the final implementation, it can be disabled and thus, discarded from implementation. In this case all circuitry of the debugger is completely removed (no impact on area, energy or timing at all).
A simple example on how to use NEORV32 on-chip debugger in combination with OpenOCD and gdb is shown in chapter [_debugging_using_the_on_chip_debugger].

The NEORV32 on-chip debugger complex is based on three hardware modules:

neorv32 ocd complex
Figure 7. NEORV32 on-chip debugger complex
  1. Debug Transport Module (DTM) (rtl/core/neorv32_debug_dtm.vhd): External JTAG access tap to allow an external adapter to interface with the debug module(DM) using the debug module interface (dmi).

  2. Debug Module (DM) (rtl/core/neorv32_debug_tm.vhd): Debugger control unit that is configured by the DTM via the the dmi. Form the CPU’s "point of view" this module behaves as a memory-mapped "peripheral" that can be accessed via the processor-internal bus. The memory-mapped registers provide an internal data buffer for data transfer from/to the DM, a code ROM containing the "park loop" code, a program buffer to allow the debugger to execute small programs defined by the DM and a status register that is used to communicate halt, resume and execute requests/acknowledges from/to the DM.

  3. CPU CPU Debug Mode extension (part of`rtl/core/neorv32_cpu_control.vhd`): This extension provides the "debug execution mode" which executes the "park loop" code from the DM. The mode also provides additional CSRs.

Theory of Operation

When debugging the system using the OCD, the debugger issues a halt request to the CPU (via the CPU’s db_halt_req_i signal) to make the CPU enter debug mode. In this state, the application-defined architectural state of the system/CPU is "frozen" so the debugger can monitor and even modify it. While in debug mode, the CPU executes the "park loop" code from the code ROM of the DM. This park loop implements an endless loop, in which the CPU polls the memory-mapped status register that is controlled by the debug module (DM). The flags of these register are used to communicate requests from the DM and to acknowledge them by the CPU: trigger execution of the program buffer or resume the halted application.

5.1. Debug Transport Module (DTM)

The debug transport module (VHDL module: rtl/core/neorv32_debug_dtm.vhd) provides a JTAG test access port (TAP). The DTM is the first entity in the debug system, which connects and external debugger via JTAG to the next debugging entity: the debug module (DM). External JTAG access is provided by the following top-level ports.

Table 52. JTAG top level signals
Name Width Direction Description

jtag_trst_i

1

in

TAP reset (low-active); this signal is optional, make sure to pull it high if it is not used

jtag_tck_i

1

in

serial clock

jtag_tdi_i

1

in

serial data input

jtag_tdo_o

1

out

serial data output

jtag_tms_i

1

in

mode select

JTAG Clock
The actual JTAG clock signal is not used as primary clock. Instead it is used to synchronize JTGA accesses, while all internal operations trigger on the system clock. Hence, no additional clock domain is required for integration of this module. However, this constraints the maximal JTAG clock (jtag_tck_i) frequency to be less than or equal to 1/4 of the system clock (clk_i) frequency.
If the on-chip debugger is disabled (ON_CHIP_DEBUGGER_EN = false) the JTAG serial input jtag_tdi_i is directly connected to the JTAG serial output jtag_tdo_o to maintain the JTAG chain.
The NEORV32 JTAG TAP does not provide a boundary check function (yet?). Hence, physical device pins cannot be accessed.

The DTM uses the "debug module interface (dmi)" to access the actual debug module (DM). These accesses are controlled by TAP-internal registers. Each registers is selected by the JTAG instruction register (IR) and accessed through the JTAG data register (DR).

The DTM’s instruction and data registers can be accessed using OpenOCDs irscan and drscan commands. The RISC-V port of OpenOCD also provides low-level command (riscv dmi_read & riscv dmi_write) to access the dmi debug module interface.

JTAG access is conducted via the instruction register IR, which is 5 bit wide, and several data registers DR with different sizes. The data registers are accessed by writing the according address to the instruction register. The following table shows the available data registers:

Table 53. JTAG TAP registers
Address (via IR) Name Size [bits] Description

00001

IDCODE

32

identifier, default: 0x0CAFE001 (configurable via package’s jtag_tap_idcode_* constants)

10000

DTMCS

32

debug transport module control and status register

10001

DMI

41

debug module interface (dmi); 7-bit address, 32-bit read/write data, 2-bit operation (00 = NOP; 10 = write; 01 = read)

others

BYPASS

1

default JTAG bypass register

See the RISC-V debug specification for more information regarding the data registers and operations. A local copy can be found in docs/references.

5.2. Debug Module (DM)

According to the RISC-V debug specification, the DM (VHDL module: rtl/core/neorv32_debug_dm.vhd) acts as a translation interface between abstract operations issued by the debugger and the platform-specific debugger implementation. It supports the following features (excerpt from the debug spec):

  • Gives the debugger necessary information about the implementation.

  • Allows the hart to be halted and resumed and provides status of the current state.

  • Provides abstract read and write access to the halted hart’s GPRs.

  • Provides access to a reset signal that allows debugging from the very first instruction after reset.

  • Provides a mechanism to allow debugging the hart immediately out of reset. (still experimental)

  • Provides a Program Buffer to force the hart to execute arbitrary instructions.

  • Allows memory access from a hart’s point of view.

The NEORV32 DM follows the "Minimal RISC-V External Debug Specification" to provide full debugging capabilities while keeping resource (area) requirements at a minimum level. It implements the execution based debugging scheme for a single hart and provides the following hardware features:

  • program buffer with 2 entries and implicit ebreak instruction afterwards

  • no direct bus access (indirect bus access via the CPU)

  • abstract commands: "access register" plus auto-execution

  • no dedicated halt-on-reset capabilities yet (but can be emulated)

The DM provides two "sides of access": access from the DTM via the debug module interface (dmi) and access from the CPU via the processor-internal bus. From the DTM’s point of view, the DM implements a set of DM Registers that are used to control and monitor the actual debugging. From the CPU’s point of view, the DM implements several memory-mapped registers (within the normal address space) that are used for communicating debugging control and status (DM CPU Access).

5.2.1. DM Registers

The DM is controlled via a set of registers that are accessed via the DTM’s dmi. The "Minimal RISC-V Debug Specification" requires only a subset of the registers specified in the spec. The following registers are implemented. Write accesses to any other registers are ignored and read accesses will always return zero. Register names that are encapsulated in "( )" are not actually implemented; however, they are listed to explicitly show their functionality.

Table 54. Available DM registers
Address Name Description

0x04

data0

Abstract data 0, used for data transfer between debugger and processor

0x10

dmcontrol

Debug module control

0x11

dmstatus

Debug module status

0x12

hartinfo

Hart information

0x16

abstracts

Abstract control and status

0x17

command

Abstract command

0x18

abstractauto

Abstract command auto-execution

0x1d

(nextdm)

Base address of next DM; read as zero to indicate there is only one DM

0x20

progbuf0

Program buffer 0

0x21

progbuf1

Program buffer 1

0x38

(sbcs)

System bus access control and status; read as zero to indicate there is no direct system bus access

0x40

haltsum0

Halt summary 0

data

0x04

Abstract data 0

data0

Reset value: UNDEFINED

Basic read/write registers to be used with abstract command (for example to read/write data from/to CPU GPRs).

dmcontrol

0x10

Debug module control register

dmcontrol

Reset value: 0x00000000

Control of the overall debug module and the hart. The following table shows all implemented bits. All remaining bits/bit-fields are configures as "zero" and are read-only. Writing '1' to these bits/fields will be ignored.

Table 55. dmcontrol - debug module control register bits
Bit Name [RISC-V] R/W Description

31

haltreq

-/w

set/clear hart halt request

30

resumereq

-/w

request hart to resume

28

ackhavereset

-/w

write 1 to clear *havereset flags

1

ndmreset

r/w

put whole processor into reset when 1

0

dmactive

r/w

DM enable; writing 0-1 will reset the DM

dmstatus

0x11

Debug module status register

dmstatus

Reset value: 0x00000000

Current status of the overall debug module and the hart. The entire register is read-only.

Table 56. dmstatus - debug module status register bits
Bit Name [RISC-V] Description

31:23

reserved

reserved; always zero

22

impebreak

always 1; indicates an implicit ebreak instruction after the last program buffer entry

21:20

reserved

reserved; always zero

19

allhavereset

1 when the hart is in reset

18

anyhavereset

17

allresumeack

1 when the hart has acknowledged a resume request

16

anyresumeack

15

allnonexistent

always zero to indicate the hart is always existent

14

anynonexistent

13

allunavail

1 when the DM is disabled to indicate the hart is unavailable

12

anyunavail

11

allrunning

1 when the hart is running

10

anyrunning

9

allhalted

1 when the hart is halted

8

anyhalted

7

authenticated

always 1; there is no authentication

6

authbusy

always 0; there is no authentication

5

hasresethaltreq

always 0; halt-on-reset is not supported (directly)

4

confstrptrvalid

always 0; no configuration string available

3:0

version

0010 - DM is compatible to version 0.13

hartinfo

0x12

Hart information

hartinfo

Reset value: see below

This register gives information about the hart. The entire register is read-only.

Table 57. hartinfo - hart information register bits
Bit Name [RISC-V] Description

31:24

reserved

reserved; always zero

23:20

nscratch

0001, number of dscratch* CPU registers = 1

19:17

reserved

reserved; always zero

16

dataccess

0, the data registers are shadowed in the hart’s address space

15:12

datasize

0001, number of 32-bit words in the address space dedicated to shadowing the data registers = 1

11:0

dataaddr

= dm_data_base_c(11:0), signed base address of data words (see address map in DM CPU Access)

abstracts

0x16

Abstract control and status

abstracts

Reset value: see below

Command execution info and status.

Table 58. abstracts - abstract control and status register bits
Bit Name [RISC-V] R/W Description

31:29

reserved

r/-

reserved; always zero

28:24

progbufsize

r/-

0010; size of the program buffer (progbuf) = 2 entries

23:11

reserved

r/-

reserved; always zero

12

busy

r/-

1 when a command is being executed

11

reserved

r/-

reserved; always zero

10:8

cmerr

r/w

error during command execution (see below); has to be cleared by writing 111

7:4

reserved

r/-

reserved; always zero

3:0

datacount

r/-

0001; number of implemented data registers for abstract commands = 1

Error codes in cmderr (highest priority first):

  • 000 - no error

  • 100 - command cannot be executed since hart is not in expected state

  • 011 - exception during command execution

  • 010 - unsupported command

  • 001 - invalid DM register read/write while command is/was executing

command

0x17

Abstract command

command

Reset value: 0x00000000

Writing this register will trigger the execution of an abstract command. New command can only be executed if cmderr is zero. The entire register in write-only (reads will return zero).

The NEORV32 DM only supports Access Register abstract commands. These commands can only access the hart’s GPRs (abstract command register index 0x1000 - 0x101f).
Table 59. command - abstract command register - "access register" commands only
Bit Name [RISC-V] Description / required value

31:24

cmdtype

00000000 to indicate "access register" command

23

reserved

reserved, has to be 0 when writing

22:20

aarsize

010 to indicate 32-bit accesses

21

aarpostincrement

0, postincrement is not supported

18

postexec

if set the program buffer is executed after the command

17

transfer

if set the operation in write is conducted

16

write

1: copy data0 to [regno]; 0 copy [regno] to data0

15:0

regno

GPR-access only; has to be 0x1000 - 0x101f

abstractauto

0x18

Abstract command auto-execution

abstractauto

Reset value: 0x00000000s

Register to configure when a read/write access to a DM repeats execution of the last abstract command.

Table 60. abstractauto - Abstract command auto-execution register bits
Bit Name [RISC-V] R/W Description

17

autoexecprogbuf[1]

r/w

when set reading/writing from/to progbuf1 will execute command again

16

autoexecprogbuf[0]

r/w

when set reading/writing from/to progbuf0 will execute command again

0

autoexecdata[0]

r/w

when set reading/writing from/to data0 will execute command again

progbuf

0x20

Program buffer 0

progbuf0

0x21

Program buffer 1

progbuf1

Reset value: NOP-instruction

General purpose program buffer for the DM.

haltsum0

0x40

Halt summary 0

haltsum0

Reset value: UNDEFINED

Bit 0 of this register is set if the hart is halted (all remaining bits are always zero). The entire register is read-only.

5.2.2. DM CPU Access

From the CPU’s point of view, the DM behaves as a memory-mapped peripheral that includes

  • a small ROM that contains the code for the "park loop", which is executed when the CPU is in debug mode.

  • a program buffer populated by the debugger host to execute small programs

  • a data buffer to transfer data between the processor and the debugger host

  • a status register to communicate debugging requests

Park Loop Code Sources
The assembly sources of the park loop code are available in sw/ocd-firmware/park_loop.S. Please note, that these sources are not intended to be changed by the used. Hence, the makefile does not provide an automatic option to compile and "install" the debugger ROM code into the HDL sources and require a manual copy (see sw/ocd-firmware/README.md).

The DM uses a total address space of 128 words of the CPU’s address space (= 512 bytes) divided into four sections of 32 words (= 128 bytes) each. Please note, that the program buffer, the data buffer and the status register only uses a few effective words in this address space. However, these effective addresses are mirrored to fill up the whole 128 bytes of the section. Hence, any CPU access within this address space will succeed.

Table 61. DM CPU access - address map (divided into four sections)
Base address Name [VHDL package] Actual size Description

0xfffff800

dm_code_base_c (= dm_base_c)

128 bytes

Code ROM for the "park loop" code

0xfffff880

dm_pbuf_base_c

16 bytes

Program buffer, provided by DM

0xfffff900

dm_data_base_c

4 bytes

Data buffer (dm.data0)

0xfffff980

dm_sreg_base_c

4 bytes

Control and status register

From the CPU’s point of view, the DM is mapped to an "unused" address range within the processor’s Address Space right between the bootloader ROM (BOOTROM) and the actual processor-internal IO space at addresses 0xfffff800 - 0xfffff9ff

When the CPU enters or re-enters (for example via ebreak in the DM’s program buffer) debug mode, it jumps to the beginning of the DM’s "park loop" code ROM at dm_code_base_c. This is the normal entry point for the park loop code. If an exception is encountered during debug mode, the CPU jumps to dm_code_base_c + 4, which is the exception entry point.

Status Register

The status register provides a direct communication channel between the CPU executing the park loop and the host-controlled controller of the DM. Note that all bits that can be written by the CPU (acknowledge flags) cause a single-shot (1-cycle) signal to the DM controller and auto-clear (always read as zero). The bits that are driven by the DM controller and are read-only to the CPU and keep their state until the CPU acknowledges the according request.

Table 62. DM CPU access - status register
Bit Name CPU access Description

0

halt_ack

-/w

Set by the CPU to indicate that the CPU is halted and keeps iterating in the park loop

1

resume_req

r/-

Set by the DM to tell the CPU to resume normal operation (leave parking loop and leave debug mode via dret instruction)

2

resume_ack

-/w

Set by the CPU to acknowledge that the CPU is now going to leave parking loop & debug mode

3

execute_req

r/-

Set by the DM to tell the CPU to leave debug mode and execute the instructions from the program buffer; CPU will re-enter parking loop afterwards

4

execute_ack

-/w

Set by the CPU to acknowledge that the CPU is now going to execute the program buffer

5

exception_ack

-/w

Set by the CPU to inform the DM that an exception occurred during execution of the park loop or during execution of the program buffer

5.3. CPU Debug Mode

The NEORV32 CPU Debug Mode DB (part of rtl/core/neorv32_cpu_control.vhd) is compatible to the "Minimal RISC-V Debug Specification 0.13.2". It is enabled/implemented by setting the CPU generic CPU_EXTENSION_RISCV_DEBUG to "true" (done by setting processor generic ON_CHIP_DEBUGGER_EN). It provides a new operation mode called "debug mode". When enabled, three additional CSRs are available (section CPU Debug Mode CSRs) and also the "return from debug mode" instruction dret is available when the CPU is "in" debug mode.

The CPU debug mode requires the Zicsr CPU extension to be implemented (top generic CPU_EXTENSION_RISCV_Zicsr = true).

The CPU debug mode is entered when one of the following events appear:

  1. executing ebreak instruction (when dcsr.ebreakm is set and in machine mode OR when dcsr.ebreaku is set and in user mode)

  2. debug halt request from external DM (via CPU signal db_halt_req_i, high-active, triggering on rising-edge)

  3. finished executing of a single instruction while in single-step debugging mode (enabled via dcsr.step)

From a hardware point of view, these "entry conditions" are special synchronous (ebreak instruction) or asynchronous (single-stepping "interrupt"; halt request "interrupt") traps, that are handled invisibly by the control logic.

Whenever the CPU enters debug mode it performs the following operations:

  • move pc to dpcs

  • copy the hart’s current privilege level to dcsr.prv

  • set dcrs.cause according to the cause why debug mode is entered

  • no update of mtval, mcause, mtval and mstatus CSRs

  • load the address configured via the CPU CPU_DEBUG_ADDR generic to the pc to jump to "debugger park loop" code in the debug module (DM)

When the CPU is in debug mode the following things are important:

  • while in debug mode, the CPU executes the parking loop and the program buffer provided by the DM if requested

  • effective CPU privilege level is machine mode, PMP is not active

  • if an exception occurs

  • if the exception was caused by any debug-mode entry action the CPU jumps to the normal entry point ( = CPU_DEBUG_ADDR) of the park loop again (for example when executing ebreak in debug mode)

  • for all other exception sources the CPU jumps to the exception entry point ( = CPU_DEBUG_ADDR + 4) to signal an exception to the DM and restarts the park loop again afterwards

  • interrupts including non-maskable interrupts are disabled; however, they will be buffered and executed when the CPU has left debug mode

  • if the DM makes a resume request, the park loop exits and the CPU leaves debug mode (executing dret)

Debug mode is left either by executing the dret instruction [14] (in debug mode) or by performing a hardware reset of the CPU. Executing dret outside of debug mode will raise an illegal instruction exception. Whenever the CPU leaves debug mode the following things happen:

  • set the hart’s current privilege level according to dcsr.prv

  • restore pc from dpcs

  • resume normal operation at pc

5.3.1. CPU Debug Mode CSRs

Two additional CSRs are required by the Minimal RISC-V Debug Specification: The debug mode control and status register dcsr and the program counter dpc. Providing a general purpose scratch register for debug mode (dscratch0) allows faster execution of program provided by the debugger, since one general purpose register can be backup-ed and directly used.

The debug-mode control and status registers (CSRs) are only accessible when the CPU is in debug mode. If these CSRs are accessed outside of debug mode (for example when in machine mode) an illegal instruction exception is raised.
dcsr

0x7b0

Debug control and status register

dcsr

Reset value: 0x00000000

The dcsr CSR is compatible to the RISC-V debug spec. It is used to configure debug mode and provides additional status information. The following bits are implemented. The reaming bits are read-only and always read as zero.

Table 63. Debug control and status register bits
Bit Name [RISC-V] R/W Event

31:28

xdebugver

r/-

always 0100 - indicates external debug support exists

27:16

-

r/-

reserved, read as zero

15

ebereakm

r/w

ebreak instructions in machine mode will enter debug mode when set

14

ebereakh

r/-

0 - hypervisor mode not supported

13

ebereaks

r/-

0 - supervisor mode not supported

12

ebereaku

r/w

ebreak instructions in user mode will enter debug mode when set

11

stepie

r/-

0 - IRQs are disabled during single-stepping

10

stopcount

r/-

0 - counters increment as usual

9

stoptime

r/-

0 - timers increment as usual

8:6

cause

r/-

cause identifier - why was debug mode entered

5

-

r/-

reserved, read as zero

4

mprven

r/-

0 - mstatus.mprv is ignored when in debug mode

3

nmip

r/-

set when the non-maskable CPU/processor interrupt is pending

2

step

r/w

enable single-stepping when set

1:0

prv

r/w

CPU privilege level before/after debug mode

dpc

0x7b1

Debug program counter

dpc

Reset value: UNDEFINED

The dcsr CSR is compatible to the RISC-V debug spec. It is used to store the current program counter when debug mode is entered. The dret instruction will return to dpc by moving dpc to pc.

dscratch0

0x7b2

Debug scratch register 0

dscratch0

Reset value: UNDEFINED

The dscratch0 CSR is compatible to the RISC-V debug spec. It provides a general purpose debug mode-only scratch register.

License

BSD 3-Clause License

Copyright (c) 2021, Stephan Nolting. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF

The NEORV32 RISC-V Processor
Copyright (c) 2021, by Dipl.-Ing. Stephan Nolting. All rights reserved.
HQ: https://github.com/stnolting/neorv32
Contact: stnolting@gmail.com
made in Hanover, Germany

Proprietary Notice

  • "GitHub" is a Subsidiary of Microsoft Corporation.

  • "Vivado" and "Artix" are trademarks of Xilinx Inc.

  • "AXI" and "AXI4-Lite" are trademarks of Arm Holdings plc.

  • "ModelSim" is a trademark of Mentor Graphics – A Siemens Business.

  • "Quartus Prime" and "Cyclone" are trademarks of Intel Corporation.

  • "iCE40", "UltraPlus" and "Radiant" are trademarks of Lattice Semiconductor Corporation.

  • "Windows" is a trademark of Microsoft Corporation.

  • "Tera Term" copyright by T. Teranishi.

  • Timing diagrams made with WaveDrom Editor.

  • "NeoPixel" is a trademark of Adafruit Industries.

  • Documentation made with asciidoctor.

Disclaimer

This project is released under the BSD 3-Clause license. No copyright infringement intended. Other implied or used projects might have different licensing – see their documentation to get more information.

This document contains links to the websites of third parties ("external links"). As the content of these websites is not under our control, we cannot assume any liability for such external content. In all cases, the provider of information of the linked websites is liable for the content and accuracy of the information provided. At the point in time when the links were placed, no infringements of the law were recognizable to us. As soon as an infringement of the law becomes known to us, we will immediately remove the link in question.

Citing

If you are using the NEORV32 or parts of the project in some kind of publication, please cite it as follows:

Listing 4. BibTeX
@misc{nolting20,
  author       = {Nolting, S.},
  title        = {The NEORV32 RISC-V Processor},
  year         = {2020},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\url{https://github.com/stnolting/neorv32}}
}

Acknowledgments

A big shoutout to all contributors, who helped improving this project! ❤️

RISC-V - instruction sets want to be free!


1. Pronounced "neo-R-V-thirty-two" or "neo-risc-five-thirty-two" in its long form.
2. Pull high if not used.
3. If the on-chip debugger is not implemented (ON_CHIP_DEBUGGER_EN = false) jtag_tdi_i is directly forwarded to jtag_tdo_o to maintain the JTAG chain.
4. Shift amount.
5. Barrel shift when FAST_SHIFT_EN is enabled.
6. Serial shift when TINY_SHIFT_EN is enabled.
7. Shift amount.
8. Barrel shift when FAST_SHIFT_EN is enabled.
9. Serial shift when TINS_SHIFT_EN is enabled.
10. Memory latency.
11. Memory latency.
12. DSP-based multiplication; enabled via FAST_MUL_EN.
13. This driver file only represents a stub, since the real CFS drivers are defined by the actual CFS implementation.
14. dret should only be executed inside the debugger "park loop" code (→ code ROM in the debug module (DM).)