PyTorch中torch.nn.Linear实例详解
2022-06-21 18:03:53 来源:易采站长站 作者:
目录
前言1. nn.Linear的原理:2. nn.Linear的使用:3. nn.Linear的源码定义:补充:许多细节需要声明总结前言
在学习transformer时,遇到过非常频繁的nn.Linear()函数,这里对nn.Linear进行一个详解。
参考:https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html
1.>
从名称就可以看出来,nn.Linear表示的是线性变换,原型就是初级数学里学到的线性函数:y=kx+b
不过在深度学习中,变量都是多维张量,乘法就是矩阵乘法,加法就是矩阵加法,因此nn.Linear()运行的真正的计算就是:
output = weight @ input + bias
@: 在python中代表矩阵乘法
input: 表示输入的Tensor,可以有多个维度
weights: 表示可学习的权重,shape=(output_feature,in_feature)
bias: 表示科学习的偏置,shape=(output_feature)
in_feature: nn.Linear 初始化的第一个参数,即输入Tensor最后一维的通道数
out_feature: nn.Linear 初始化的第二个参数,即返回Tensor最后一维的通道数
output: 表示输入的Tensor,可以有多个维度
2.>
常用头文件:import torch.nn as nn
nn.Linear()的初始化:
nn.Linear(in_feature,out_feature,bias)
in_feature: int型, 在forward中输入Tensor最后一维的通道数
out_feature: int型, 在forward中输出Tensor最后一维的通道数
bias: bool型, Linear线性变换中是否添加bias偏置
nn.Linear()的执行:(即执行forward函数)
out=nn.Linear(input)
input: 表示输入的Tensor,可以有多个维度
output: 表示输入的Tensor,可以有多个维度
举例:
2维 Tensor
m = nn.Linear(20, 40) input = torch.randn(128, 20) output = m(input) print(output.size()) # [(128,40])
4维 Tensor:
m = nn.Linear(128, 64) input = torch.randn(512, 3,128,128) output = m(input) print(output.size()) # [(512, 3,128,64))
3.>
import math
import torch
import torch.nn as nn
from torch import Tensor
from torch.nn.parameter import Parameter, UninitializedParameter
from torch.nn import functional as F
from torch.nn import init
# from .lazy import LazyModuleMixin
class myLinear(nn.Module):
r"""Applies a linear transformation to the incoming data: :math:`y = xA^T + b`
This module supports :ref:`TensorFloat32<tf32_on_ampere>`.
Args:
in_features: size of each input sample
out_features: size of each output sample
bias: If set to ``False``, the layer will not learn an additive bias.
Default: ``True``
Shape:
- Input: :math:`(*, H_{in})` where :math:`*` means any number of
dimensions including none and :math:`H_{in} = \text{in\_features}`.
- Output: :math:`(*, H_{out})` where all but the last dimension
are the same shape as the input and :math:`H_{out} = \text{out\_features}`.
Attributes:
weight: the learnable weights of the module of shape
:math:`(\text{out\_features}, \text{in\_features})`. The values are
initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where
:math:`k = \frac{1}{\text{in\_features}}`
bias: the learnable bias of the module of shape :math:`(\text{out\_features})`.
If :attr:`bias` is ``True``, the values are initialized from
:math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{1}{\text{in\_features}}`
Examples::
>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])
"""
__constants__ = ['in_features', 'out_features']
in_features: int
out_features: int
weight: Tensor
def __init__(self, in_features: int, out_features: int, bias: bool = True,
device=None, dtype=None) -> None:
factory_kwargs = {'device': device, 'dtype': dtype}
super(myLinear, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
if bias:
self.bias = Parameter(torch.empty(out_features, **factory_kwargs))
else:
self.register_parameter('bias', None)
self.reset_parameters()
def reset_parameters(self) -> None:
# Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
# uniform(-1/sqrt(in_features), 1/sqrt(in_features)). For details, see
# https://github.com/pytorch/pytorch/issues/57109
print("333")
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
init.uniform_(self.bias, -bound, bound)
def forward(self, input: Tensor) -> Tensor:
print("111")
print("self.weight.shape=(", )
return F.linear(input, self.weight, self.bias)
def extra_repr(self) -> str:
print("www")
return 'in_features={}, out_features={}, bias={}'.format(
self.in_features, self.out_features, self.bias is not None
)
# m = myLinear(20, 40)
# input = torch.randn(128, 40, 20)
# output = m(input)
# print(output.size())
m = myLinear(128, 64)
input = torch.randn(512, 3,128,128)
output = m(input)
print(output.size()) # [(512, 3,128,64))
4. nn.Linear的官方源码:
import math import torch from torch import Tensor from torch.nn.parameter import Parameter, UninitializedParameter from .. import functional as F from .. import init from .module import Module from .lazy import LazyModuleMixin class Identity(Module): r"""A placeholder identity operator that is argument-insensitive. Args: args: any argument (unused) kwargs: any keyword argument (unused) Shape: - Input: :math:`(*)`, where :math:`*` means any number of dimensions. - Output: :math:`(*)`, same shape as the input. Examples:: >>> m = nn.Identity(54, unused_argument1=0.1, unused_argument2=False) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 20]) """ def __init__(self, *args, **kwargs): super(Identity, self).__init__() def forward(self, input: Tensor) -> Tensor: return input class Linear(Module): r"""Applies a linear transformation to the incoming data: :math:`y = xA^T + b` This module supports :ref:`TensorFloat32<tf32_on_ampere>`. Args: in_features: size of each input sample out_features: size of each output sample bias: If set to ``False``, the layer will not learn an additive bias. Default: ``True`` Shape: - Input: :math:`(*, H_{in})` where :math:`*` means any number of dimensions including none and :math:`H_{in} = \text{in\_features}`. - Output: :math:`(*, H_{out})` where all but the last dimension are the same shape as the input and :math:`H_{out} = \text{out\_features}`. Attributes: weight: the learnable weights of the module of shape :math:`(\text{out\_features}, \text{in\_features})`. The values are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where :math:`k = \frac{1}{\text{in\_features}}` bias: the learnable bias of the module of shape :math:`(\text{out\_features})`. If :attr:`bias` is ``True``, the values are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{1}{\text{in\_features}}` Examples:: >>> m = nn.Linear(20, 30) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 30]) """ __constants__ = ['in_features', 'out_features'] in_features: int out_features: int weight: Tensor def __init__(self, in_features: int, out_features: int, bias: bool = True, device=None, dtype=None) -> None: factory_kwargs = {'device': device, 'dtype': dtype} super(Linear, self).__init__() self.in_features = in_features self.out_features = out_features self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs)) if bias: self.bias = Parameter(torch.empty(out_features, **factory_kwargs)) else: self.register_parameter('bias', None) self.reset_parameters() def reset_parameters(self) -> None: # Setting a=sqrt(5) in kaiming_uniform is the same as initializing with # uniform(-1/sqrt(in_features), 1/sqrt(in_features)). For details, see # https://github.com/pytorch/pytorch/issues/57109 init.kaiming_uniform_(self.weight, a=math.sqrt(5)) if self.bias is not None: fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight) bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0 init.uniform_(self.bias, -bound, bound) def forward(self, input: Tensor) -> Tensor: return F.linear(input, self.weight, self.bias) def extra_repr(self) -> str: return 'in_features={}, out_features={}, bias={}'.format( self.in_features, self.out_features, self.bias is not None ) # This class exists solely to avoid triggering an obscure error when scripting # an improperly quantized attention layer. See this issue for details: # https://github.com/pytorch/pytorch/issues/58969 # TODO: fail fast on quantization API usage error, then remove this class # and replace uses of it with plain Linear class NonDynamicallyQuantizableLinear(Linear): def __init__(self, in_features: int, out_features: int, bias: bool = True, device=None, dtype=None) -> None: super().__init__(in_features, out_features, bias=bias, device=device, dtype=dtype) [docs]class Bilinear(Module): r"""Applies a bilinear transformation to the incoming data: :math:`y = x_1^T A x_2 + b` Args: in1_features: size of each first input sample in2_features: size of each second input sample out_features: size of each output sample bias: If set to False, the layer will not learn an additive bias. Default: ``True`` Shape: - Input1: :math:`(*, H_{in1})` where :math:`H_{in1}=\text{in1\_features}` and :math:`*` means any number of additional dimensions including none. All but the last dimension of the inputs should be the same. - Input2: :math:`(*, H_{in2})` where :math:`H_{in2}=\text{in2\_features}`. - Output: :math:`(*, H_{out})` where :math:`H_{out}=\text{out\_features}` and all but the last dimension are the same shape as the input. Attributes: weight: the learnable weights of the module of shape :math:`(\text{out\_features}, \text{in1\_features}, \text{in2\_features})`. The values are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where :math:`k = \frac{1}{\text{in1\_features}}` bias: the learnable bias of the module of shape :math:`(\text{out\_features})`. If :attr:`bias` is ``True``, the values are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where :math:`k = \frac{1}{\text{in1\_features}}` Examples:: >>> m = nn.Bilinear(20, 30, 40) >>> input1 = torch.randn(128, 20) >>> input2 = torch.randn(128, 30) >>> output = m(input1, input2) >>> print(output.size()) torch.Size([128, 40]) """ __constants__ = ['in1_features', 'in2_features', 'out_features'] in1_features: int in2_features: int out_features: int weight: Tensor def __init__(self, in1_features: int, in2_features: int, out_features: int, bias: bool = True, device=None, dtype=None) -> None: factory_kwargs = {'device': device, 'dtype': dtype} super(Bilinear, self).__init__() self.in1_features = in1_features self.in2_features = in2_features self.out_features = out_features self.weight = Parameter(torch.empty((out_features, in1_features, in2_features), **factory_kwargs)) if bias: self.bias = Parameter(torch.empty(out_features, **factory_kwargs)) else: self.register_parameter('bias', None) self.reset_parameters() def reset_parameters(self) -> None: bound = 1 / math.sqrt(self.weight.size(1)) init.uniform_(self.weight, -bound, bound) if self.bias is not None: init.uniform_(self.bias, -bound, bound) def forward(self, input1: Tensor, input2: Tensor) -> Tensor: return F.bilinear(input1, input2, self.weight, self.bias) def extra_repr(self) -> str: return 'in1_features={}, in2_features={}, out_features={}, bias={}'.format( self.in1_features, self.in2_features, self.out_features, self.bias is not None ) class LazyLinear(LazyModuleMixin, Linear): r"""A :class:`torch.nn.Linear` module where `in_features` is inferred. In this module, the `weight` and `bias` are of :class:`torch.nn.UninitializedParameter` class. They will be initialized after the first call to ``forward`` is done and the module will become a regular :class:`torch.nn.Linear` module. The ``in_features`` argument of the :class:`Linear` is inferred from the ``input.shape[-1]``. Check the :class:`torch.nn.modules.lazy.LazyModuleMixin` for further documentation on lazy modules and their limitations. Args: out_features: size of each output sample bias: If set to ``False``, the layer will not learn an additive bias. Default: ``True`` Attributes: weight: the learnable weights of the module of shape :math:`(\text{out\_features}, \text{in\_features})`. The values are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where :math:`k = \frac{1}{\text{in\_features}}` bias: the learnable bias of the module of shape :math:`(\text{out\_features})`. If :attr:`bias` is ``True``, the values are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{1}{\text{in\_features}}` """ cls_to_become = Linear # type: ignore[assignment] weight: UninitializedParameter bias: UninitializedParameter # type: ignore[assignment] def __init__(self, out_features: int, bias: bool = True, device=None, dtype=None) -> None: factory_kwargs = {'device': device, 'dtype': dtype} # bias is hardcoded to False to avoid creating tensor # that will soon be overwritten. super().__init__(0, 0, False) self.weight = UninitializedParameter(**factory_kwargs) self.out_features = out_features if bias: self.bias = UninitializedParameter(**factory_kwargs) def reset_parameters(self) -> None: if not self.has_uninitialized_params() and self.in_features != 0: super().reset_parameters() def initialize_parameters(self, input) -> None: # type: ignore[override] if self.has_uninitialized_params(): with torch.no_grad(): self.in_features = input.shape[-1] self.weight.materialize((self.out_features, self.in_features)) if self.bias is not None: self.bias.materialize((self.out_features,)) self.reset_parameters() # TODO: PartialLinear - maybe in sparse?
补充:许多细节需要声明
1)nn.Linear是一个类,使用时进行类的实例化
2)实例化的时候,nn.Linear需要输入两个参数,in_features为上一层神经元的个数,out_features为这一层的神经元个数
3)不需要定义w和b。所有nn.Module的子类,形如nn.XXX的层,都会在实例化的同时随机生成w和b的初始值。所以实例化之后,我们就可以调用属性weight和bias来查看生成的w和b。其中w是必然会生成的,b是我们可以控制是否要生成的。在nn.Linear类中,有参数bias,默认>
4)由于w和b是随机生成的,所以同样的代码多次运行后的结果是不一致的。如果我们希望控制随机性,则可以使用torch中的random类。如:torch.random.manual_seed(420) #人为设置随机数种子
5)由于不需要定义常量b,因此在特征张量中,不需要留出与常数项相乘的那一列,只需要输入特征张量。
6)输入层只有一层,并且输入层的结构(神经元的个数)由输入的特征张量X决定,因此在PyTorch中构筑神经网络时,不需要定义输入层。
7)实例化之后,将特征张量输入到实例化后的类中。
总结
到此这篇关于PyTorch中torch.nn.Linear实例详解的文章就介绍到这了,更多相关PyTorch torch.nn.Linear详解内容请搜索易采站长站以前的文章或继续浏览下面的相关文章希望大家以后多多支持易采站长站!
如有侵权,请联系QQ:279390809 电话:15144810328
最新图文推荐
相关文章
-
Pycharm永久激活教程(适用jetbrains全系列产品:Pycharm、Idea、WebStor
一.激活前注意事项 1.PyCharm尽量在官网下载:https://www.jetbrains.com/pycharm/download/ 2.本教程适用于PyCharm所有版本 3.本教程适用于jetbrains全系列产品(Pycharm、Idea、WebStorm、phpstorm、CLion、Rub2020-06-26
-
python+opencv+caffe+摄像头做目标检测的实例代码
首先之前已经成功的使用Python做图像的目标检测,这回因为项目最终是需要用摄像头的, 所以实现摄像头获取图像,并且用Python调用CAFFE接口来实现目标识别 首先是摄像头请选择支持2020-06-22
-
pycharm中导入模块错误时提示Try to run this command from the system ter
pycharm中导入模块错误时,提示:Try to run this command from the system terminal. Make sure that you use the correct version of ‘pip' installed for your Python interpreter located atpycharm工作路径。 安装好pycharm,而且2020-03-26
-
Python如何爬取微信公众号文章和评论(基于 Fiddler 抓包分析)
背景说明 感觉微信公众号算得是比较难爬的平台之一,不过一番折腾之后还是小有收获的。没有用Scrapy(估计爬太快也有反爬限制),但后面会开始整理写一些实战出来。简单介绍下本次2020-06-19