Operation Blossom

Operation Blossom

Vision Transformers (ViT) is an up and coming technology with news that it will be replacing Convolutional Neural Networks (CNN). We want to demystify this claim by experimenting ourselves with a flower images dataset. The task will be to identify the flower in an image. We will build a Streamlit app that runs both a CNN and ViT and compare the accuracy and duration time of Convolutional Neural Networks and Vision Transformers. The model predictions will be displayed along with the training and testing results of both CNN and ViT.


  1. Run a CNN and tune the model
  2. Run and tune a ViT model
  3. Build a Streamlit app allowing users to upload images
  4. Build a pipeline that takes users images and makes predictions
  5. Display training process of both CNN and ViT

Ren W

Tariq R

  • Twitter: @taraqur
  • Discord: the_proton_crusher#8317
  • ETH: 0x32aE0C5b4e34340e4e1550a92d0B6206c68663A3
1 Like

It appears that this is a topic that has been investigated to some degree of detail. Have you taken a look at, for instance: https://towardsdatascience.com/vision-transformers-or-convolutional-neural-networks-both-de1a2c3c62e4
or https://towardsdatascience.com/are-transformers-better-than-cnns-at-image-recognition-ced60ccc7c8 and the links in them?

1 Like


Thanks for the info. We were aware that it is not a novel idea but trying to implement it ourselves. We were planning on displaying results in an interactive way in streamlit.