Implementation of the MCNN-14 model for fashion image classification, achieving 93.08% accuracy on Fashion-MNIST. Based on our paper “An Efficient Multiple Convolutional Neural Network Model (MCNN-14) ...
GroupViT is a framework for learning semantic segmentation purely from text captions without using any mask supervision. It learns to perform bottom-up heirarchical spatial grouping of ...