Encoder/Decoder Transformer Model

Baidu OCR Breaks Long-Document Memory Wall: New Architecture Beats DeepSeek

Open-source OCR from Baidu eliminates the GPU memory wall that limits long-document parsing. Unlimited OCR uses a constant KV ...

marktechpost

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop

Gemma 4 12B is a 12-billion-parameter decoder-only transformer. It handles text, images, audio, and video natively. There are no separate vision or audio encoders. The decoder uses the same structure ...

IEEE

Temporal Convolutional and Fusional Transformer Model With Bi-LSTM Encoder-Decoder for Multi-Time-Window Remaining Useful Life Prediction

Abstract: Health prediction is crucial for ensuring reliability, minimizing downtime, and optimizing maintenance in industrial systems. Remaining Useful Life (RUL) prediction is a key component of ...

IEEE

MaS-TransUNet: A Multiattention Swin Transformer U-Net for Medical Image Segmentation

Abstract: U-shaped encoder-decoder models have excelled in automatic medical image segmentation due to their hierarchical feature learning capabilities, robustness, and upgradability. Purely CNN-based ...

marktechpost

Zyphra Release Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models That Cut Time-to-First-Token by About an Order of Magnitude

Zyphra has released Zamba2-VL, a family of open vision-language models. The release covers three sizes: 1.2B, 2.7B, and 7B parameters. Each model is built on the Zamba2 hybrid SSM–Transformer backbone ...

GitHub

Show inaccessible results

Baidu OCR Breaks Long-Document Memory Wall: New Architecture Beats DeepSeek

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop

Temporal Convolutional and Fusional Transformer Model With Bi-LSTM Encoder-Decoder for Multi-Time-Window Remaining Useful Life Prediction

MaS-TransUNet: A Multiattention Swin Transformer U-Net for Medical Image Segmentation

Zyphra Release Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models That Cut Time-to-First-Token by About an Order of Magnitude

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

SwiftMSeg: lightweight multi-scale local–global context modeling with transformer for medical image segmentation

NVIDIA-AI-IOT/nanosam

DPCrossU-Net: a dual-branch parallel CNN–Transformer network for lung nodule segmentation