【BERT】模型返回值解析

以调用的BERT预训练模型为例：outputs = self.bert(input_ids,attention_mask=attention_mask,token_type_ids=token_type_ids)outputs 包含4个：sequence_output, pooled_output, (hidden_states), (attentions)BERT返回值官方解释：Return::

尊新必威

3870人浏览 · 2022-03-03 17:04:53

尊新必威 · 2022-03-03 17:04:53 发布

以调用的BERT预训练模型为例：

outputs = self.bert(input_ids, 
					attention_mask=attention_mask,
					token_type_ids=token_type_ids)

outputs 包含4个：sequence_output, pooled_output, (hidden_states), (attentions)

BERT返回值官方解释：

Return:
        :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
        last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
            Sequence of hidden-states at the output of the last layer of the model.
        pooler_output (:obj:`torch.FloatTensor`: of shape :obj:`(batch_size, hidden_size)`):
            Last layer hidden-state of the first token of the sequence (classification token)
            further processed by a Linear layer and a Tanh activation function. The Linear
            layer weights are trained from the next sentence prediction (classification)
            objective during pre-training.
            This output is usually *not* a good summary
            of the semantic content of the input, you're often better with averaging or pooling
            the sequence of hidden-states for the whole input sequence.
        hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
            Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
            of shape :obj:`(batch_size, sequence_length, hidden_size)`.
            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
        attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
            Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
            :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
            heads.

last_hidden_state：shape是(batch_size, sequence_length, hidden_size)，hidden_size=768，类型为tensor，它是模型最后一层输出的隐藏状态。
pooler_output：shape是(batch_size, hidden_size)，类型为tensor，这是序列的第一个token：[CLS]的最后一层的隐藏状态，它是由线性层和Tanh激活函数进一步处理的。这个输出不是对输入的语义内容的一个很好的总结，对于整个输入序列的隐藏状态序列的平均化或池化通常更好。
hidden_states：这是输出的一个可选项，如果输出，需要指定config.output_hidden_states=True。它是一个元组，第一个元素是embedding，其余元素是各层的输出，每个元素的形状是(batch_size, sequence_length, hidden_size)。
attentions：这也是输出的一个可选项，如果输出，需要指定config.output_attentions=True。它也是一个元组，它的元素是每一层的注意力权重，用于计算self-attention heads的加权平均值。

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

UFW防火墙安全指南

UFW（Uncomplicated Firewall）是Ubuntu/Debian系统中简化防火墙管理的工具，通过直观命令帮助用户有效控制网络流量，提升系统安全性。文章详细介绍了UFW的基本命令，包括启停防火墙、添加规则、限制连接速率和日志配置等操作，并提供了安全最佳实践，如默认拒绝策略、IP地址限制和服务级规则管理。同时，还涵盖高级配置技巧，例如多网络接口设置、规则优先级调整、IPv6支持及与f