A sneak peek at TorchVision v0.11 – Memoirs of a TorchVision developer – 2

October 10, 2021
Vasilis Vryniotis
. No comments

Framework Machine Learning & Statistics Programming

The last couple of weeks were super busy in “PyTorch Land” as we are frantically preparing the release of PyTorch v1.10 and TorchVision v0.11. In this 2nd instalment of the series, I’ll cover some of the upcoming features that are currently included in the release branch of TorchVision.

Disclaimer: Though the upcoming release is packed with numerous enhancements and bug/test/documentation improvements, here I’m highlighting new “user-facing” features on domains I’m personally interested. After writing the blog post, I also noticed a bias towards features I reviewed, wrote or followed closely their development. Covering (or not covering) a feature says nothing about its importance. Opinions expressed are solely my own.

New Models

The new release is packed with new models:

Kai Zhang has added an implementation of the RegNet architecture along with pre-trained weights for 14 variants which closely reproduce the original paper.
I’ve recently added an implementation of the EfficientNet architecture along with pre-trained weights for variants B0-B7 provided by Luke Melas-Kyriazi and Ross Wightman.

New Data Augmentations

A few new Data Augmentation techniques have been added to the latest version:

Samuel Gabriel has contributed TrivialAugment, a new simple but highly effective strategy that seems to provide superior results to AutoAugment.
I’ve added the RandAugment method in auto-augmentations.
I’ve provided an implementation of Mixup and CutMix transforms in references. These will be moved in transforms on the next release once their API is finalized.

New Operators and Layers

A number of new operators and layers have been included:

Allen Goodman and Aditya Oke have ported from DETR an operator that converts semantic segmentation masks to bounding boxes.
Victor Fomin has contributed the backwards implementations of bilinear and bicubic interpolation with anti-alias option for CPUs and GPUs.
Kai Zhang and I have refactored common building blocks of models and written re-usable implementations for the Squeeze-Excitation and Conv-Norm-Activation layers.
I’ve added an implementation for the Stochastic Depth layer.

References / Training Recipes

Though the improvement of our reference scripts is a continuous effort, here are a few new features included in the upcoming version:

Prabhat Roy has added support of Exponential Moving Average in our classification recipe.
I’ve updated our references to support Label Smoothing, which was recently introduced by Joel Schlosser and Thomas J. Fan on PyTorch core.
I’ve included the option to perform Learning Rate Warmup, using the latest LR schedulers developed by Ilqar Ramazanli.

Other improvements

Here are some other notable improvements added in the release:

Alexander Soare and Francisco Massa have developed an FX-based utility which allows extracting arbitrary intermediate features from model architectures.
Nikita Shulga has added support of CUDA 11.3 to TorchVision.
Zhongkai Zhu has fixed the dependency issues of JPEG lib (this issue has caused major headaches to many of our users).

In-progress & Next-up

There are lots of exciting new features under-development which didn’t make it in this release. Here are a few:

Moto Hira, Parmeet Singh Bhatia and I have drafted an RFC, which proposes a new mechanism for Model Versioning and for handling meta-data associated to pre-trained weights. This will enable us to support multiple pre-trained weights for each model and attach associated information such as labels, preprocessing transforms etc to the models.
I’m currently working on using the primitives added by the “Batteries Included” project in order to improve the accuracy of our pre-trained models. The target is to achieve best-in-class results for the most popular pre-trained models provided by TorchVision.
Philip Meier and Francisco Massa are working on an exciting prototype for TorchVision’s new Dataset and Transforms API.
Prabhat Roy is working on extending PyTorch Core’s AveragedModel class to support the averaging of the buffers in addition to parameters. The lack of this feature is commonly reported as bug and will enable numerous downstream libraries and frameworks to remove their custom EMA implementations.
Aditya Oke wrote a utility which allows plotting the results of Keypoint models on the original images (the feature didn’t make it to the release as we got swamped and couldn’t review it in time 🙁 )
I’m building a prototype FX-utility which aims to to detect Residual Connections in arbitrary Model architectures and modify the network to add regularization blocks (such as StochasticDepth).

Finally there are a few new features in our backlog (PRs coming soon):

Nicholas Hug is working to add the RAFT model for Optical Flow.
Yiwen Song is planning to implement Vision Transformer (ViT) in TorchVision.

I hope you found the above summary interesting. Any ideas on how to adapt the format of the blog series are very welcome. Hit me up on LinkedIn or Twitter.