PyTorch review: A deep learning framework built for speed

PyTorch 1.0 shines for rapid prototyping with dynamic neural networks, auto-differentiation, deep Python integration, and strong support for GPUs

1 2 Page 2
Page 2 of 2
# -*- coding: utf-8 -*-
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(size_average=False)

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algorithms. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

PyTorch installation

You can install PyTorch on Linux, MacOS or Windows using conda, pip, or building from source code, on Python 2.7, 3.5, or 3.6. PyTorch supports CUDA 8, 9, 9.1, or CPU-only. On a Mac, as shown in the figure below, CUDA support requires building from source. That’s worse than it sounds, because the latest versions of PyTorch, CUDA, and Xcode are incompatible.

pytorch installation IDG

The PyTorch home page provides a GUI for generating the correct command lines for installing PyTorch with different operating systems, package managers, Python versions, and CUDA versions.

I successfully installed the CPU-only, Python 2.7 version of PyTorch 0.4.0 on a MacBook Pro in about eight seconds using Pip:

Martins-Retina-MacBook:~ martinheller$ time sudo pip install torch torchvision

Installing collected packages: torch, pillow, torchvision
  Found existing installation: Pillow 3.3.0
    Uninstalling Pillow-3.3.0:
      Successfully uninstalled Pillow-3.3.0
Successfully installed pillow-5.1.0 torch-0.4.0 torchvision-0.2.1

real    0m8.133s
user    0m3.452s
sys 0m1.490s

Unfortunately, the protobuf package version installed for PyTorch is incompatible with the version of TensorFlow I had installed; I’ll need to update TensorFlow before I can run it again. By the way, this kind of conflict is the motivation for Anaconda’s virtual Python environments. In any case, the PyTorch installation checked out:

Martins-Retina-MacBook:~ martinheller$ python
Python 2.7.10 (default, Oct  6 2017, 22:29:07)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import torch
>>> x = torch.rand(5,3)
>>> print(x)
tensor([[ 0.3790,  0.9947,  0.9051],
        [ 0.1091,  0.0228,  0.8803],
        [ 0.8954,  0.6769,  0.9435],
        [ 0.5566,  0.8588,  0.1373],
        [ 0.9341,  0.7344,  0.3667]])
>>> print(x.size())
(5, 3)
>>> print(torch.cuda.is_available())
False
>>>

To do serious model training, you’ll want to run PyTorch on a server or workstation-class machine or VM with one or more recent Nvidia GPUs. AWS, Google Cloud Platform, and Azure all support PyTorch 1.0 in their machine learning services and deep learning virtual machine images; IBM Cloud also supports PyTorch in Kubernetes clusters. Even if the image you want to use doesn’t already have the latest PyTorch, installation with pip or conda is easy and quick, as I saw on my laptop.

Adapting PyTorch models for production

For PyTorch 1.0, the project contributors will complete the work of marrying PyTorch and Caffe2, and will add a few additional features. The production goals include:

  • Exporting to C++ runtimes for use in larger projects
  • Optimizing mobile systems on iPhone, Android, Qualcomm, and other systems
  • Using more efficient data layouts and performing kernel fusion to do faster inference (saving 10 percent of speed or memory at scale is a big win)
  • Quantized inference (such as 8-bit inference) to allow models to run faster and use less power on constrained hardware

Facebook has already supported all of these with Caffe2. One of the ways PyTorch is getting this level of production support without any sacrifice in hackability is through torch.jit, a just-in-time (JIT) compiler that at runtime takes your PyTorch models and rewrites them to run at production efficiency. The JIT compiler can also export your model to run in a C++ runtime based on Caffe2 bits.

Although PyTorch is still in beta, the API seems to be stable, and the package is roughly as capable and performs as well as TensorFlow, CNTK, and MXNet. Because PyTorch APIs all execute immediately, PyTorch models are a bit easier to debug than models that create an acyclic graph to be solved in a session, the way TensorFlow works by default.

While I wouldn’t rush to convert existing deep learning projects to PyTorch just yet, I’d certainly use it for training new models. If the current progress is anything to go by, PyTorch should be as good as any deep learning framework by the time of the PyTorch 1.0 release later this summer.

At a Glance
  • Pros

    • NumPy-like tensor computations with strong GPU support
    • Dynamic neural networks
    • Tape-based automatic differentiation
    • Compatible with standard Python libraries
    • Strong collection of neural network layers, optimization algorithms, and loss functions

    Cons

    • Production MacOS binaries don’t support GPUs
    • Building from source can be tricky

Copyright © 2018 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2