Coyote accelerator backend #1347

bo3z · 2025-07-28T10:18:28Z

Description

📝 This PR introduces a new accelerator backend, CoyoteAccelerator, which leverages the open-source Coyote shell for deploying models on a PCI-attached FPGA.

Generally, Coyote offers several advantages, when compared to some other shells, including:

Networking support, so the backend can easily be extended to support distributed inference. Also interesting for in-network ML.
GPU - FPGA integration, so models can be executed on a combination of hardware
Dynamic reconfiguration, which could allow run-time reconfiguration of models
Multi-tenancy, so multiple models could be deployed concurrently.

The backend is briefly described in Section 9.7 of the paper: https://arxiv.org/pdf/2504.21538.

Type of change

New feature (non-breaking change which adds functionality)
A new research paper code implementation

Tests

This backend was compared agains a modified* version of the VivadoAccelerator backend: the backend was modified to run HLS synthesis with Vitis instead of Vivado (also using Vitis templates and optimizers), while the rest of the backend infrastructure (drivers, data movers remained the same since they also work in newer version of Vivado). Results are attached below - clearly indicating an advantage in Coyote, for two reasons (1) optimised data movement, bypassing card memory and (2) optimised host-side library (Python, C++).

In principle, the correct test would be to compare against VitisAccelerator (#991), but only after the io_parallel issues are resolved. However, the expectation is that the result will stay mostly the same, sine the underlying platform requires a data copy between host and card memory.

Will add some more results, also for io-stream CNN, and comparisons to VitisAccelerator.

Figure above: comparison of CoyoteAccelerator with modified Vivado Accelerator for the UNSW-NB15 dataset in io_parallel.

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

vloncar · 2025-08-05T16:35:37Z

hls4ml/templates/coyote_accelerator/host_libs.cpp

-    while (coyote_thread.checkCompleted(coyote::CoyoteOper::LOCAL_TRANSFER) != batch_size) {
-        std::this_thread::sleep_for(std::chrono::nanoseconds(50));
-    }
+    while (coyote_thread.checkCompleted(coyote::CoyoteOper::LOCAL_TRANSFER) != batch_size) {}


Wouldn't this cause 100% CPU usage while the program is polling?

On one of the cores, yes.

But sleep for less than 50us is not well-defined on most Linux platforms. Hence, measured latency can go from ~4us to >50us even though the "true" execution latency is still 4us.

…-backend (fastmachinelearning#1347) Merge branch 'init_interval_fix_zeropad_maxpooling' into coyote-accelerator-and-pooling

JanFSchulte

A few misc comments based on trying to run the CoyoteAccelerator for a dummy model. Right now, i am stuck with a python import error:

which is puzzling because I do have jinja2 installed in my environment and the same import works fine in an interactive python session.

Also, can you fix the pre-commit issues?

JanFSchulte · 2026-01-21T19:55:11Z

hls4ml/writer/coyote_accelerator_writer.py

+        filedir = os.path.dirname(os.path.abspath(__file__))
+        srcpath = os.path.join(filedir, '../contrib/Coyote/')
+        dstpath = f'{model.config.get_output_dir()}/Coyote'
+        copytree(srcpath, dstpath)


Do we want to use the dirs_exist_ok argument here? In the current version, this fails when running for the same project twice.

JanFSchulte · 2026-01-21T19:59:39Z

docs/backend/accelerator.rst

+                                                           output_dir='hls4ml_prj_coyote',
+                                                           backend='CoyoteAccelerator',
+                                                           board='u55c')
+    hls4ml.build(bitfile=True)


This should probably be hls_model instead of hls4ml

JanFSchulte · 2026-01-21T20:12:45Z

hls4ml/backends/coyote_accelerator/coyote_accelerator_backend.py

+        )
+
+        if not os.path.exists(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw'):
+            os.mkdir(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw')


I think this needs to use os.makedirs() because the build folder doesn't exist already.

JanFSchulte · 2026-01-21T20:29:24Z

docs/backend/accelerator.rst

+Example
+======================
+
+Similar to the ``VivadoAccelerator``backend, we first generate a bitstream from a Keras model ``model`` and a config.


Documentation should mention that hls4ml needs to be cloned with the submodules checked out to get Coyote, and that a Vitis installation is needed to be present.

vloncar

Looks good. I have mostly questions for better understanding and minor nitpicks that I don't feel are crucial, rather optional.

vloncar · 2026-01-23T15:08:15Z

docs/backend/accelerator.rst

+
+.. code-block:: Python
+
+    from hls4ml.backends.coyote_accelerator.coyote_accelerator_overlay import CoyoteOverlay


I was wondering if we need to call this coyote_accelerator or just coyote and it would be cleaner for the user to import from a shorter path, rather than going all the way into coyote_accelerator_overlay. Thus from hls4ml.backends.coyote import CoyoteOverlay would be neat

vloncar · 2026-01-23T15:09:12Z

hls4ml/backends/__init__.py

    register_backend('Catapult', CatapultBackend)
    register_backend('SymbolicExpression', SymbolicExpressionBackend)
    register_backend('oneAPI', OneAPIBackend)
+    register_backend('CoyoteAccelerator', CoyoteAcceleratorBackend)


How about just calling it Coyote?

vloncar · 2026-01-23T15:11:36Z

hls4ml/backends/coyote_accelerator/coyote_accelerator_backend.py

+        if not os.path.exists(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw'):
+            os.mkdir(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw')
+        os.chdir(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw')
+        os.system(cmake_cmd)


I would argue that new code (despite being inspired by existing code) should use subprocess instead of os.system and we gradually move towards phasing out os.system since it has limitations on tracking status.

vloncar · 2026-01-23T15:13:47Z

hls4ml/backends/coyote_accelerator/coyote_accelerator_overlay.py

+
+        self.coyote_lib.free_model_inference.argtypes = [ctypes.POINTER(ctypes.c_void_p)]
+
+    def program_hacc_fpga(self):


Is HACC the (former) name of Xilinx's academic partnership program, or the specific instance at ETH? I thought there are other HACCs around (one at UIUC for example)

vloncar · 2026-01-23T15:15:19Z

hls4ml/backends/coyote_accelerator/coyote_accelerator_overlay.py

+        if len(X.shape) == 1:
+            X = np.array([X])
+        if not (isinstance(X.dtype, float) or isinstance(X.dtype, np.float32)):
+            logging.warning('CoyoteOverlay only supports (for now) floating-point inputs; casting input data to float')


Have we completely switched to logging module or we still use warnings. Do these two play along nicely?

vloncar · 2026-01-23T15:30:19Z

hls4ml/templates/coyote_accelerator/myproject_host.cpp

+#include "defines.h"
+#include "host_libs.hpp"
+
+#include <boost/program_options.hpp>


Does this require us to have Boost library installed on the host or that comes with Coyote?

vloncar · 2026-01-23T15:31:31Z

hls4ml/templates/coyote_accelerator/myproject_host.cpp

+                avg_latency += (time / 1e3);
+                avg_throughput += (batch_size / (time * 1e-9));
+
+                // Functional correctness


Do you really need this? Is there a way to just run without checks, as in production?

vloncar · 2026-01-23T15:32:00Z

hls4ml/templates/coyote_accelerator/myproject_host.cpp

+
+        }
+
+        std::cout << "Batches processed: " << total_batches << std::endl;


Similar comment to the Python side with predict()

vloncar · 2026-01-23T15:36:24Z

hls4ml/writer/coyote_accelerator_writer.py

+
+        filedir = os.path.dirname(os.path.abspath(__file__))
+
+        f = open(os.path.join(filedir, '../templates/vivado/firmware/myproject.cpp'))


Imagine how cool it would be if we used pathlib.Path (which is imported) and resource management (with) like the other modern backends do... 😄

vloncar · 2026-01-23T15:37:26Z

hls4ml/contrib/Coyote

This is several months old now, do you want to update it?

bo3z added 8 commits July 25, 2025 12:36

Initialize CoyoteAccelerator Backend

72aa3ea

CoyoteAccelerator backend hardware modules

896665b

Init Coyote submodule

4be45d1

CoyoteAccelerator backend software modules

ecde593

CoyoteAccelerator backend writer + build scripts

be28016

CoyoteAccelerator backend Python Overlay for neural network inference

2c600a6

Minor fixes and documentation updates

3055a0b

Remove unnecessary sleep when polling which reduces performance

9554271

vloncar reviewed Aug 5, 2025

View reviewed changes

JanFSchulte added this to the v1.3.0 milestone Nov 5, 2025

Merge branch 'main' into coyote-accelerator

e0714b7

JanFSchulte reviewed Jan 21, 2026

View reviewed changes

vloncar reviewed Jan 23, 2026

View reviewed changes


		.. code-block:: Python

		from hls4ml.backends.coyote_accelerator.coyote_accelerator_overlay import CoyoteOverlay


		self.coyote_lib.free_model_inference.argtypes = [ctypes.POINTER(ctypes.c_void_p)]

		def program_hacc_fpga(self):


		}

		std::cout << "Batches processed: " << total_batches << std::endl;


		filedir = os.path.dirname(os.path.abspath(__file__))

		f = open(os.path.join(filedir, '../templates/vivado/firmware/myproject.cpp'))

Coyote accelerator backend #1347

Are you sure you want to change the base?

Coyote accelerator backend #1347

Uh oh!

Conversation

bo3z commented Jul 28, 2025

Description

Type of change

Tests

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanFSchulte left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vloncar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JanFSchulte left a comment •

edited

Loading