Open-source OCR from Baidu eliminates the GPU memory wall that limits long-document parsing. Unlimited OCR uses a constant KV ...
Gemma 4 12B is a 12-billion-parameter decoder-only transformer. It handles text, images, audio, and video natively. There are no separate vision or audio encoders. The decoder uses the same structure ...
Abstract: Health prediction is crucial for ensuring reliability, minimizing downtime, and optimizing maintenance in industrial systems. Remaining Useful Life (RUL) prediction is a key component of ...
Abstract: U-shaped encoder-decoder models have excelled in automatic medical image segmentation due to their hierarchical feature learning capabilities, robustness, and upgradability. Purely CNN-based ...
Zyphra has released Zamba2-VL, a family of open vision-language models. The release covers three sizes: 1.2B, 2.7B, and 7B parameters. Each model is built on the Zamba2 hybrid SSM–Transformer backbone ...
This repository contains the implementation for the paper: MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers by Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, ...
The authors thank Dr. Khadija Tul Kubra and Dr. Sahria Bakar for validating the test images and evaluating the results of the models. The authors are also thankful to the AI for Medical Imaging (AIM) ...
NanoSAM is a Segment Anything (SAM) model variant that is capable of running in 🔥 real-time 🔥 on NVIDIA Jetson Orin Platforms with NVIDIA TensorRT. NanoSAM is trained by distilling the MobileSAM ...
We propose DPCrossU-Net, a dual-branch parallel encoder–decoder network that integrates convolutional and Vision Transformer representations. The encoder employs parallel CNN and ViT branches with a ...