PyTorch introduction1 | lovelyfrog's blog

PyTorch

深度学习

发布日期: 2019-04-26

文章字数: 2k

本文介绍了pytorch 的一些基础知识，比如 tensors，operations等，还有pytorch 的自动微分机制
本文参考自DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ

Pytorch是什么

pytorch 是一个科学计算包：

numpy 的一个替代（可以使用GPU）
深度学习框架提供了最大限度的自由和速度

开始

tensors

tensors 类似于 numpy 里的 ndarrays，除此之外它可以在 GPU 中加速运算

import torch

构建一个未初始化 5x3 的矩阵

x = torch.empty(5,3)
print(x)

tensor([[0.0000e+00, 8.5899e+09, 0.0000e+00],
        [8.5899e+09, 5.6052e-45, 1.4714e-43],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [1.6956e-43, 0.0000e+00, 4.5318e-22]])

构建一个随机初始化矩阵

x = torch.rand(5,3)
print(x)

tensor([[0.5588, 0.6994, 0.2045],
        [0.4634, 0.0390, 0.9279],
        [0.8828, 0.4020, 0.3207],
        [0.0759, 0.3997, 0.0225],
        [0.5889, 0.9263, 0.1755]])

构建一个全 0 而且 dtype 为 long 型的矩阵

x = torch.zeros(5,3,dtype=torch.long)
print(x)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

构建一个从直接从数据得到的 tensor

x = torch.tensor([1,2])
print(x)

tensor([1, 2])

x = x.new_ones(5,3, dtype=torch.double)    #new_* 方法 传入 sizes
print(x)
x = torch.randn_like(x, dtype=torch.float)  # override dtype
print(x)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[ 0.5565, -3.5219,  1.5671],
        [ 0.5087,  0.6783,  0.6356],
        [ 0.7736,  1.0188,  0.7899],
        [ 0.4047,  1.5271, -0.0099],
        [-0.9604, -1.4309,  1.8827]])

得到 x 的 sizes

print(x.size())

torch.Size([5, 3])

note:
torch.size 实际上是一个 tuple 所以支持所有的 tuple 操作

operations

operations 有很多语法，我们先看一下加法

addition：语法1

y = torch.rand(5,3)
print(x+y)

tensor([[ 1.0673, -3.4936,  2.1493],
        [ 0.8217,  1.6515,  1.5904],
        [ 1.2576,  1.8584,  0.9448],
        [ 1.1150,  1.9759,  0.3460],
        [-0.8035, -0.8520,  1.9764]])

addition：语法2

print(torch.add(x,y))

tensor([[ 1.0673, -3.4936,  2.1493],
        [ 0.8217,  1.6515,  1.5904],
        [ 1.2576,  1.8584,  0.9448],
        [ 1.1150,  1.9759,  0.3460],
        [-0.8035, -0.8520,  1.9764]])

addtion：提供一个output tensor 作为参数

result = torch.empty(5,3)
torch.add(x, y, out=result)
print(result)

tensor([[ 1.0673, -3.4936,  2.1493],
        [ 0.8217,  1.6515,  1.5904],
        [ 1.2576,  1.8584,  0.9448],
        [ 1.1150,  1.9759,  0.3460],
        [-0.8035, -0.8520,  1.9764]])

addition：in-place

y.add_(x)
print(y)

tensor([[ 1.0673, -3.4936,  2.1493],
        [ 0.8217,  1.6515,  1.5904],
        [ 1.2576,  1.8584,  0.9448],
        [ 1.1150,  1.9759,  0.3460],
        [-0.8035, -0.8520,  1.9764]])

任何在原地(in-place)改变 tensor 的 operations 必须在后面加一个 _ 后缀，例如：x.copy_(y), x.t_() 将会改变 x

可以对 tensor 做任何 numy-like 的索引操作

print(x[:,1])

tensor([-3.5219,  0.6783,  1.0188,  1.5271, -1.4309])

resizing: 如果你想 resize/reshape tensor,可以使用 torch.view

x = torch.randn(4,4)
y = x.view(16)
z = x.view(-1,8)
print(x.size(), y.size(), z.size())
print(x)
print(y)
print(z)

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])
tensor([[-0.6481,  0.7575, -0.7781, -0.1815],
        [ 0.9559, -0.8391, -0.7656, -1.6857],
        [-0.6311, -0.5878, -1.5626,  1.7300],
        [-0.1870,  0.3486, -0.5177, -0.6897]])
tensor([-0.6481,  0.7575, -0.7781, -0.1815,  0.9559, -0.8391, -0.7656, -1.6857,
        -0.6311, -0.5878, -1.5626,  1.7300, -0.1870,  0.3486, -0.5177, -0.6897])
tensor([[-0.6481,  0.7575, -0.7781, -0.1815,  0.9559, -0.8391, -0.7656, -1.6857],
        [-0.6311, -0.5878, -1.5626,  1.7300, -0.1870,  0.3486, -0.5177, -0.6897]])

如果你有一个只有一个元素的 tensor,可以用 .item() 来得到它的值

x = torch.randn(1)
print(x)
print(x.item())

tensor([-0.6449])
-0.6449040174484253

Read Later
more operations

Numpy bridge

将 torch tensor 转化成 numpy array 非常简单（反过来也是），它们共享潜在的内存位置，改变一个也会将另外一个改变

将 torch tensor 转化成 numpy array

a = torch.ones(5)
print(a)

tensor([1., 1., 1., 1., 1.])

b = a.numpy()
print(b)
b

[1. 1. 1. 1. 1.]





array([1., 1., 1., 1., 1.], dtype=float32)

看看 numpy array 是怎么改变的

a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]

将 numpy array 转化成 torch tensor

看看 torch tensor 是怎么改变的

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

所有在 CPU 上运行的 tensor 除了 CharTensor 都支持转化为 numpy 或者相反的

CUDA Tensors

如果你有 cuda 的话你可以实验一下，但是我没有

Autograd: 自动微分

autograd 在 PyTorch 中是一个对神经网络十分重要的包，它能对 tensor 上的所有 operations 自动求梯度。它是一个 define-by-run 的框架，意味着你的反向传播被你的代码如何运行所决定了，并且每次迭代都可能会不同。
让我们从一些例子中了解这件事

Tensor

torch.tensor 是这个包的主要的类，如果你设置 .require_grad 为 True，它就会追踪所有在它上做的 operations。当你结束了所有的计算后，调用 .backward() 就会让所有的梯度自动计算，这个 tensor 的梯度就会加进 .grad 属性中

为了停止一个 tensor 追踪历史，可以调用 .detach() 来将它从计算历史中断开，阻止了未来的计算被追踪

为了阻止追踪历史（占用内存），你可以使用 with torch.no_grad(): 包裹代码：当你评估一个模型的时候，因为它可能会有可训练的参数，那么这样就很有帮助，因为我们不需要梯度

还有一个在 autograd 的实现中很重要的一个类是 Function

Tensor 和 Function 是互相连接在一起构成一个非循环图，这个图编码了计算的完整的历史。每个 Tensor 都有一个 .grad_fn 的属性，它引用了创造了这个 Tensor 的 Function（除了用户自己创建的 Tensors，它们的 grad_fn 是 None）

如果你想计算导数，可以在 Tensor 上调用 .backward() ，如果一个 Tensor 是标量（比如它只有一个元素），你不需要具体说明 backward() 的参数，然而如果它有更多的元素，你需要给 gradient 标明与它的 tensor 相同的 shape

下面创建一个 tensor 并设置 requires_grad=True 来追踪与它有关的计算

x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

做一个 tensor operation

y = x + 2

print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

y 是有一个 operation (加法) 创建的，所以它有 grad_fn

print(y.grad_fn)

<AddBackward0 object at 0x11c1ef4a8>

在 y 上做更多的 operations

z = y * y * 3
out = z.mean()

print(z)
print(out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward1>)

.requires_grad(…) 在原地改变一个 Tensor 的 require_grad flag，这个输入的 flag 默认是 False 如果没有给的话

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x11c1f85f8>

Gradients

让我们开始反向传播，因为 out 只包含一个值，out.backward() 等同于 out.backward(torch.tensor(1.))

out.backward()

打印 d(out)/dx

print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

如果我们有一个函数 \(\vec{y}=f(\vec{x})\) ，那么 \(\vec{y}\) 对于 \(\vec{x}\) 的梯度是一个 Jacobian 矩阵

torch.autograd 是一个计算 Jacobian-vector 乘积的工具，若 \(v\) 是标量函数 \(l=g(\vec{y})\) 的梯度，也就是 \(v = (\frac{\partial{l}}{\partial{y_1}}, …, \frac{\partial{l}}{\partial{y_m}})\)，那么根据链式法则，Jacobian-vector 的乘积就是 \(l\) 相对于 \(\vec{x}\) 的梯度

Jacobian-vector 乘积的特性使得给模型喂入外部的梯度变得简单，我们看一下 Jacobian-vector 乘积的一个例子

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

tensor([ 805.7248, -670.8773,  -32.1244], grad_fn=<MulBackward0>)

现在 y 不再是一个标量，torch.autograd 就不能直接计算完整的 Jacobian,然而如果我们想得到 Jacobian-vector 乘积，只要传入 vector 到 backward 中就行了

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])

你也可以让 autograd 停止从 .requires_grad=True 的 tensors 追踪历史，通过包裹代码块进 with torch.no_grad()

print(x.requires_grad)
print(y.requires_grad)
print((x * 2).requires_grad)

with torch.no_grad():
    print((x * 2).requires_grad)

True
True
True
False

Read Later:
autograd 和 Function 的文档

lovelyfrog

http://lovelyfrog.github.io/2019/04/26/PyTorch_intro1/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 lovelyfrog !

PyTorch

PyTorch introduction2

本文介绍了使用pytorch 建立神经网络的一些步骤：如何定义网络如何定义损失函数如何反向传播并介绍了一个训练图片分类器的实例本文参考自DEEP LEARNING WITH PYTORCH: A 60 MINUTE B

2019-04-26 深度学习

PyTorch

cs231n assignment3 Network Visualization

本文来自于 cs231n assignment3 NetworkVisualization这个作业，下面我总结了这个作业的知识点和一些编程技巧。这个作业介绍了梯度上升法来生成新的图像，在训练模型的时候，我们定义一个损失函数来衡量模型的

2019-04-20 深度学习

cs231n