Skip to content

piolib: data transfers slower than expected #116

Open
@jepler

Description

@jepler

In my work for Adafruit, I've implemented a PIO prgram for driving HUB75 style displays.

The data transfer to the PIO peripheral is slower than anticipated, topping out at about 10MB/s. That's too bad, as we ideally would like to run at several times that speed. (naively, I'd expected that like on rp2, we could keep it fed with data every cycle even at the highest PIO frequencies)

Here is a simple reproducer that requires no hardware -- it just does an out x, 32 every cycle, consuming a FIFO entry each time. So e.g., at 1MHz it should consume 4MB/s, at 5MHz it should consume 20MB/s, etc. However, the transfer speed tops out at around 10MB/s:

$ for frequency in 1e6 2e6 5e6 10e6 20e6 200e6; do for xfersize in 65532; do ./build/examples/bench $frequency $xfersize ; done; done 2>/dev/null
{"frequency": 1e+06, "xfer_size": 65532, "rate": 3.99201e+06}
{"frequency": 2e+06, "xfer_size": 65532, "rate": 7.96719e+06}
{"frequency": 5e+06, "xfer_size": 65532, "rate": 1.07482e+07}
{"frequency": 1e+07, "xfer_size": 65532, "rate": 1.07461e+07}
{"frequency": 2e+07, "xfer_size": 65532, "rate": 1.07484e+07}
{"frequency": 2e+08, "xfer_size": 65532, "rate": 1.0746e+07}

Notice how the top rate is about 1e7 (i.e., 10MB/s), and does not continue increasing as the clock rate increases.

Firmware & kernel:

$ uname -a
Linux m5 6.6.70-v8+ #1 SMP PREEMPT Fri Jan 10 13:53:47 UTC 2025 aarch64 GNU/Linux
$ vcgencmd version
2025/01/08 17:52:48 
Copyright (c) 2012 Broadcom
version 97facbf4 (release) (embedded)
My test program
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include "piolib.h"
#include "ws2812.pio.h"

#define bench_wrap_target 0
#define bench_wrap 0

static const uint16_t bench_program_instructions[] = {
            // .wrap_target
    0x6020, // out x, 32
            // .wrap
};

static const struct pio_program bench_program = {
    .instructions = bench_program_instructions,
    .length = 1,
    .origin = -1,
};

static inline pio_sm_config bench_program_get_default_config(uint offset) {
    pio_sm_config c = pio_get_default_sm_config();
    sm_config_set_wrap(&c, offset + bench_wrap_target, offset + bench_wrap);
    sm_config_set_sideset(&c, 1, false, false);
    return c;
}

static inline float bench_program_init(PIO pio, int sm, int offset, float freq) {
    pio_sm_config c = bench_program_get_default_config(offset);
    sm_config_set_out_shift(&c, false, true, 32);
    sm_config_set_fifo_join(&c, PIO_FIFO_JOIN_TX);
    float div = clock_get_hz(clk_sys) / freq;
    if(div < 1) div = 1;
    if(div > 65535) div = 65535;
    int div_int = (int)div;
    int div_frac = (int)((div - div_int) * 256);
    sm_config_set_clkdiv_int_frac(&c, div_int, div_frac);
    pio_sm_init(pio, sm, offset, &c);
    pio_sm_set_enabled(pio, sm, true);
    return clock_get_hz(clk_sys) / (div_int + div_frac / 256.);
}


double monotonic() {
    struct timespec tv;
    clock_gettime(CLOCK_MONOTONIC, &tv);
    return tv.tv_sec + tv.tv_nsec * 1e-9;
}

long databuf[1048576];

int main(int argc, const char **argv)
{
    float frequency = argc > 1 ? atof(argv[1]) : 10e6;
    size_t xfer_size = argc > 2 ? atoi(argv[2]) : 256;
    PIO pio;
    int sm;
    uint offset;

    pio = pio0;
    sm = pio_claim_unused_sm(pio, true);
    pio_sm_config_xfer(pio, sm, PIO_DIR_TO_SM, xfer_size, 1);

    offset = pio_add_program(pio, &bench_program);
    fprintf(stderr, "Loaded program at %d, using sm %d\n", offset, sm);

    float actual_frequency = bench_program_init(pio, sm, offset, frequency);
    fprintf(stderr, "Actual frequency %fMHz\n", actual_frequency/1e6);
    pio_sm_clear_fifos(pio, sm);

    double t0 = monotonic();
    size_t xfer = 0;
    do {
        pio_sm_xfer_data(pio, sm, PIO_DIR_TO_SM, sizeof(databuf), databuf);
        xfer += sizeof(databuf);
    } while(monotonic() - t0 < 1);
    double t1 = monotonic();
    double dt = t1 - t0;
    double rate = xfer / dt; // bytes per second
    fprintf(stderr, "%zu bytes in %.1fms (%.1fMiB/s)\n",
        xfer, dt*1e3, rate / 1048576);
    printf("{\"frequency\": %g, \"xfer_size\": %zd, \"rate\": %g}\n",
        actual_frequency, xfer_size, rate);
    return 0;
}

PS is there a more appropriate repo to report this issue in?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions