-
Notifications
You must be signed in to change notification settings - Fork 513
Coyote accelerator backend #1347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| while (coyote_thread.checkCompleted(coyote::CoyoteOper::LOCAL_TRANSFER) != batch_size) { | ||
| std::this_thread::sleep_for(std::chrono::nanoseconds(50)); | ||
| } | ||
| while (coyote_thread.checkCompleted(coyote::CoyoteOper::LOCAL_TRANSFER) != batch_size) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this cause 100% CPU usage while the program is polling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one of the cores, yes.
But sleep for less than 50us is not well-defined on most Linux platforms. Hence, measured latency can go from ~4us to >50us even though the "true" execution latency is still 4us.
…-backend (fastmachinelearning#1347) Merge branch 'init_interval_fix_zeropad_maxpooling' into coyote-accelerator-and-pooling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few misc comments based on trying to run the CoyoteAccelerator for a dummy model. Right now, i am stuck with a python import error:

which is puzzling because I do have jinja2 installed in my environment and the same import works fine in an interactive python session.
Also, can you fix the pre-commit issues?
| filedir = os.path.dirname(os.path.abspath(__file__)) | ||
| srcpath = os.path.join(filedir, '../contrib/Coyote/') | ||
| dstpath = f'{model.config.get_output_dir()}/Coyote' | ||
| copytree(srcpath, dstpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use the dirs_exist_ok argument here? In the current version, this fails when running for the same project twice.
| output_dir='hls4ml_prj_coyote', | ||
| backend='CoyoteAccelerator', | ||
| board='u55c') | ||
| hls4ml.build(bitfile=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be hls_model instead of hls4ml
| ) | ||
|
|
||
| if not os.path.exists(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw'): | ||
| os.mkdir(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to use os.makedirs() because the build folder doesn't exist already.
| Example | ||
| ====================== | ||
|
|
||
| Similar to the ``VivadoAccelerator``backend, we first generate a bitstream from a Keras model ``model`` and a config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation should mention that hls4ml needs to be cloned with the submodules checked out to get Coyote, and that a Vitis installation is needed to be present.
vloncar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I have mostly questions for better understanding and minor nitpicks that I don't feel are crucial, rather optional.
|
|
||
| .. code-block:: Python | ||
| from hls4ml.backends.coyote_accelerator.coyote_accelerator_overlay import CoyoteOverlay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we need to call this coyote_accelerator or just coyote and it would be cleaner for the user to import from a shorter path, rather than going all the way into coyote_accelerator_overlay. Thus from hls4ml.backends.coyote import CoyoteOverlay would be neat
| register_backend('Catapult', CatapultBackend) | ||
| register_backend('SymbolicExpression', SymbolicExpressionBackend) | ||
| register_backend('oneAPI', OneAPIBackend) | ||
| register_backend('CoyoteAccelerator', CoyoteAcceleratorBackend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just calling it Coyote?
| if not os.path.exists(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw'): | ||
| os.mkdir(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw') | ||
| os.chdir(f'{model.config.get_output_dir()}/build/{model.config.get_project_name()}_cyt_hw') | ||
| os.system(cmake_cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that new code (despite being inspired by existing code) should use subprocess instead of os.system and we gradually move towards phasing out os.system since it has limitations on tracking status.
|
|
||
| self.coyote_lib.free_model_inference.argtypes = [ctypes.POINTER(ctypes.c_void_p)] | ||
|
|
||
| def program_hacc_fpga(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is HACC the (former) name of Xilinx's academic partnership program, or the specific instance at ETH? I thought there are other HACCs around (one at UIUC for example)
| if len(X.shape) == 1: | ||
| X = np.array([X]) | ||
| if not (isinstance(X.dtype, float) or isinstance(X.dtype, np.float32)): | ||
| logging.warning('CoyoteOverlay only supports (for now) floating-point inputs; casting input data to float') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we completely switched to logging module or we still use warnings. Do these two play along nicely?
| #include "defines.h" | ||
| #include "host_libs.hpp" | ||
|
|
||
| #include <boost/program_options.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require us to have Boost library installed on the host or that comes with Coyote?
| avg_latency += (time / 1e3); | ||
| avg_throughput += (batch_size / (time * 1e-9)); | ||
|
|
||
| // Functional correctness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need this? Is there a way to just run without checks, as in production?
|
|
||
| } | ||
|
|
||
| std::cout << "Batches processed: " << total_batches << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to the Python side with predict()
|
|
||
| filedir = os.path.dirname(os.path.abspath(__file__)) | ||
|
|
||
| f = open(os.path.join(filedir, '../templates/vivado/firmware/myproject.cpp')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine how cool it would be if we used pathlib.Path (which is imported) and resource management (with) like the other modern backends do... 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is several months old now, do you want to update it?
Description
Generally, Coyote offers several advantages, when compared to some other shells, including:
The backend is briefly described in Section 9.7 of the paper: https://arxiv.org/pdf/2504.21538.
Type of change
Tests
This backend was compared agains a modified* version of the VivadoAccelerator backend: the backend was modified to run HLS synthesis with Vitis instead of Vivado (also using Vitis templates and optimizers), while the rest of the backend infrastructure (drivers, data movers remained the same since they also work in newer version of Vivado). Results are attached below - clearly indicating an advantage in Coyote, for two reasons (1) optimised data movement, bypassing card memory and (2) optimised host-side library (Python, C++).
In principle, the correct test would be to compare against VitisAccelerator (#991), but only after the io_parallel issues are resolved. However, the expectation is that the result will stay mostly the same, sine the underlying platform requires a data copy between host and card memory.
Will add some more results, also for io-stream CNN, and comparisons to VitisAccelerator.
Figure above: comparison of CoyoteAccelerator with modified Vivado Accelerator for the UNSW-NB15 dataset in io_parallel.
Checklist
pre-commiton the files I edited or added.