Encoder Decoder Transformer Architecture

Context-Aware Pedestrian Trajectory Prediction with Multimodal Transformer

Abstract: We propose a novel solution for predicting future trajectories of pedestrians. Our method uses a multimodal encoder-decoder transformer architecture, which takes as input both pedestrian ...

Tech Times

Baidu OCR Breaks Long-Document Memory Wall: New Architecture Beats DeepSeek

Open-source OCR from Baidu eliminates the GPU memory wall that limits long-document parsing. Unlimited OCR uses a constant KV ...

GitHub

KAT: A Knowledge Augmented Transformer for Vision-and-Language

Can multimodal transformers leverage explicit knowledge in their reasoning? Existing, primarily unimodal, methods have explored approaches under the paradigm of knowledge retrieval followed by answer ...

IEEE

ColonFormer: An Efficient Transformer Based Method for Colon Polyp Segmentation

Abstract: Identifying polyps is challenging for automatic analysis of endoscopic images in computer-aided clinical support systems. Models based on convolutional networks (CNN), transformers, and ...

GitHub

HunyuanVideo: A Systematic Framework For Large Video Generation Model

We present HunyuanVideo, a novel open-source video foundation model that exhibits performance in video generation that is comparable to, if not superior to, leading closed-source models. In order to ...

InfoQ

Gemma 4 12B Enables On-Device, Multimodal Agentic Workflows with an Encoder-free Architecture

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results