We propose a simple running example (based on a stencil computation) that is iteratively improved using FPGA specific optimizations, such us loop unrolling, kernel replication, High Bandwidth Memory ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results