

# Combination of Altera OpenCL kernel with video IP cores

# Kostyantyn Pelekh GlobalLogic Inc.



- Solution Overview
- Design Details
- Summary
- Additional information



# **Design Challenge**



Many companies consider using Intel® Cyclone® V SoC chip and Boards based on it because it is very affordable.

Even though it is low cost it has to do a real calculation. Like medical device scanning part of your body or network device doing packet filtering, etc. OpenCL helps here.

On the other hand in majority of cases it is mandatory for device to provide UI and allow user to operate it over touch screen for example.

There is a dilemma what to program the FPGA for: video or calculations. We will investigate a solution which optimally combines both.



#### **Hardware Description**



Cyclone<sup>®</sup> V SoC Development Kit offers a quick and simple approach to develop custom ARM<sup>®</sup> processor-based SoC designs accompanied by Altera's low-power, low-cost Cyclone V FPGA fabric.



The DE0-Nano-SoC-HDMI Development Kit presents a robust hardware design platform built around the Altera System-on-Chip (SoC) FPGA, which combines the latest dual-core Cortex-A9 embedded cores with industry-leading programmable logic for ultimate design flexibility.



The Terasic HDMI\_TX\_HSMC is a HDMI transmitter daughter board with HSMC (High Speed Mezzanine Connector) interface. Host boards, supporting HSMC-compliant connectors, can control the HDMI\_TX\_HSMC daughter board through the HSMC interface.



## **Altera Video Design Framework**

This is a combination of IP cores, interface standards and system level design tools that are developed to enable a plug-and-play video system design flow. Altera provides a comprehensive suite of video function IP blocks that can be connected together to design and build video systems, allowing replacement of Altera IP with own custom function block





# Altera OpenCL SDK

- Allows to use the software development flow instead of the traditional hardware FPGA development flow
- Provides tools to easy develop apps onto FPGAs by abstracting away the complexities of FPGA design
- Allows software programmers to write hardwareaccelerated kernel functions in OpenCL C



#### Programming model:

- Device programming language based on C99 with extensions to support parallelism
- Application Programming Interface (API) for device management **Performance:**
- CPU offload
- Software to leverage silicon acceleration



# **Resulting Solution**



DE0-Nano-SoC-HDMI

Is a design where OpenCL hardware is combined with customized Video subsystem based on Altera VIP framework, which allows to show the calculation results on a display. The design demonstrates a framework for rapid development of video and image processing systems using Altera VIP Suite.





- Accelerate calculation with any algorithm that fits the CycloneV SX
- Calculation results output to HDMI / DVI display
- Support of eight common resolutions from VGA to HD1080
- Resolution change on the fly without calculation interruption
- Turn on/off the hardware acceleration. Display frame rate in real-time



#### **Performance Metrics of Mandelbrot**





- Solution Overview
- Design Details
- Summary
- Additional information



## **Top-Level Design**



- Common open Avalon data and control interfaces are used to facilitate connection of video functions chain and video system modeling
- Video submodule is connected to the HPS and DDR memory through the H2F Lightweight and F2H ports, respectively
- Operating System configures all VIP and PLL cores through the HPS ports to work with appropriate screen resolution
- Operating System prepares frame window in the specific DDR memory region for video output stream



# **Use HPS For Configuration**



 HPS initializes all possible resolution modes in the Clocked Video Output (CVO) core

Details

- HPS changes the frame window size in the Framereader VIP core that will cause automatic mode switch in the CVO
- HPS sets appropriate video sync frequency by configuring the PLL



#### **FPGA Design Partitions**

|  | mandelbrot_kernel_system_region | ■ HPS                                 | Logic (ALM)                | 20,833 / 41,910 (50 %)    |
|--|---------------------------------|---------------------------------------|----------------------------|---------------------------|
|  | 18390                           | <ul> <li>Mandelbrot Kernel</li> </ul> | Total block<br>memory bits | 405,656 / 5,662,720 (7 %) |
|  | 732                             | <ul> <li>OpenCL Kernel</li> </ul>     | Total RAM<br>Blocks        | 99 / 553 (18 %)           |
|  | vid io_sys em_region            |                                       | Total DSP<br>Blocks        | 28 / 112 (25 %)           |
|  | 4629                            | <ul> <li>Video Subsystem</li> </ul>   | Total PLLs                 | 4 / 12 (33 %)             |
|  |                                 |                                       | Total DLLs                 | 1 / 4 (25 %)              |
|  |                                 |                                       |                            |                           |



#### **Lessons Learned**

#### **Clock setup issues**

- Framereader VIP core and Clocked Video Output core should be initialized in Qsys editor with maximum ratings. If you have several resolution modes, then the cores should be initialized with the highest resolution settings.
- All VIP cores should be connected to one clock domain. The main cores' clock should be equal or bigger than video sync clock.
- To work with high resolutions the PLL should be initialized in Qsys editor to generate integer frequency value without decimal part.

#### **FPGA space optimizations**

• Removed the additional Nios-based submodule that was configuring PLL, HDMI and VIP cores during the HPS launch. Switch all configuration procedures to HPS.



- Solution Overview
- Design Details
- Summary
- Additional information



## **Summary and Next Steps**

- Motivated the need of combined OpenCL & Video Framebuffer
- Demonstrated the combined design architecture
- Described the benefits and performance metrics
- Provided development constraints and limitations
- The next steps
  - Search for applied areas and define use cases
  - Create OpenCL kernels for other purposes
  - Work on solution optimization for lower space and better performance
  - Add video layering functionality



- Solution Overview
- Design Details
- Summary
- Additional information



#### **Additional Sources of Information**

- Altera OpenCL SDK: <u>https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html</u>
- Altera Video Design Framework: <u>https://www.altera.com/solutions/technology/dsp/dsp-video-solutions/dsp-1080p-hd-video.html</u>
- Mandelbrot algorithm: https://en.wikipedia.org/wiki/Mandelbrot\_set

