BERT模型入门系列（二）: Attention模型实现

概述:

在上一篇文章《BERT模型入门系列: Attention机制入门》里边，用了机器翻译的比方把Encoder-Decoder模型、以及Attention模型的基本原理进行了阐明，这篇配合上一篇文章的阐明，把涉及到的模型进行完毕而且具体得说tensorflow装置教程明，利于咱们进一步加强了application解。文章的内容较多，但实践上原理仍是比较简略的，比方基于python、tensorflow2.4，即使都没有学过也没有联系，咱们在代码里边，要害得当地都进行了具体阐明，approve配合tensorflow得官方文档，了解起来彻底没有APP问题产品营销策划。

在初步之前，先列出具体的完毕次序，带着清晰的方针，更利于学习：

文本预处理
Encoder完毕
Decoder完毕
Attention模型完毕
模型练习
英文->中文翻译

文章涉及到的代码现已提交到github.com/rotbit/ntensorflow装置教程mt.… ，但这个代码的目的不是要做一个可用的商业化产品，所以并不寻求实践的最优的作用。假定能够帮忙咱们能够更进一步了解EncoTensorFlowder、Decoder、attention模型，就达到了预期目的了。

1、文本预处理

文本处理的作业呢，其实非常简略，便是把咱们的句子转换成一个数字标明的向量，git指令比方“你吃github中文官网饭了嘛？”转换成一个向量“[2,543,56,12,76]appleid“。这么处理的原因是因为核算产品质量法机不识字，只知道1010，所以要转成核算机看得懂的信息。具体要在怎么做呢，先上个流程图。

流程产品运营首要做什么挺简略，说起来就首要几个进程，读入文件、文本预处理、结构词典、文本转为向量。万丈giti轮胎是什么品牌高楼平地起，咱们这就从最根底的函数初步看。

seg_char: 中文按字拆分

# 把句子按字分隔，不损坏英文结构
# 例如： "我爱te产品密钥在哪里能找到nsorflow" -> "['我', '爱', 'tenforflow'tensorflow装置教程]"
def seg_char(sent):
# 首要切开 英文 以及英文和标点
pattern_char_1 = re.compile(r'([W])')
parts = pattern_char_1.spapplelit(sent)
parts = [p for p in parts if len(p.strip())>0]
# 切开中文
p产品attern = re.compile(r'([u4e00-u9fa5])')
chars = pattern.split(sent)
chars = [w for w in chars if产品运营首要做什么 len(w.gitistrip())>0]
return chars

上面这个咱们只需求知道输入是什么，输出是什么姿势产品生命周期的就能够了。

preprocesappreciates_senteTensorFlownce: 句子预处理

# 文本预处application理，用空格按字拆分文本
# w 需进行处理的文本
# type 文本类型 0：英approve文  1:中文
def preprocess_sentengithub永久回家地址mice(w, type):
if type == 0:
w = re.sub(r"([?.!,])", r" 1 ", w)
w = re.sub(r'[" "]+', " ", w)
if type == 1:
#seg_list =tensorflow菜鸟教程 jieba.cut(w)
seg_list = seg_char(w)
w = " ".join(seg_list)
w = '<start>github是干什么的 ' + w + ' <end>'
return w

咱们来作业一下，看看输入和输出是什么

en = "I lov产品密钥在哪里能找到e tensorflow."
pre_en = preprocess_sentence(en, 0)
print("pre_en=", pre_en)
cn =tensorflow2.0和1.0差异 "我爱tenforflow"
pre_cn = preprocess_sentencegit指令(cn, 1)
print("pre_cn=", pre_cn)

输出:

prtensorflow装置教程e_en= <startAPP> I love tensorflow .  <end>
pre_cn= <statensorflow是什么rt> 我 爱 tengitiforflow <end>

来看看这输出，咱们输入的gitee文本都被空格分隔了，而且giticomfort在首尾别离加上了、，在首位加标识符是用来在后面的模型练习中标志文本的初步和完毕。

create_dataset: 文本加载、预处理

# path 数据存储tensorflow和pytorch哪个好路径
# num_exampgithub永久回家地址miles 读入记载条数
# 加载文本
def create_da产品设计专业taset(path, num_examples):
lines =approach i产品运营首要做什么o.open(path, encoding='U产品批号是生产日期吗TF-8').read().strip().split('ntensorflow最具体教程')
# 英文文本
english_words = []
# 中tensorflow是什么文文本
chinese_words = []
for l in lines[:num_examples]:
word_arrs = l.split('t')
if len(word_arrs) < 2:
continuetensorflow和pytorch哪个好
english_w = prepappleidrocess_sentence(wogithub永久回家地址mird_arrs[0], 0)
chinese_w = preprocess_sentence(word_arrs[1], 1)
english_words.appendtensorflow装置(engli机器学习sh_w)
chinese_wGitords.append(chinese_w)
# 回来[('<stensorflow最具体教程tart> Hi .  <end>', '<start> 嗨 。 <end>')]
return english_words, chinese_words

用到的数据集能够能够从这儿下载 cmnt.txt 咱们抽几条数据集里边的数据看看。数据集一行便是一个样本。能够看到会被分为三列，第一列是英文，第二列是英文对应的中文翻译，第github是干什么的三列咱们不需求，直接丢掉就行了。creagiticomfortte_dataset的功用便是读入这样的文本，处理之后别离回来处理之后的中英文列表。

Hi.	嗨。	CC-BY 2.0 (France) Attribution: tatoeba.otensorflow装置rg #538123 (CM) & #891077 (Martha)
Hi.	你好。	CC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #4857568 (musclegirlxyp)
Run.	你用跑的。	CC-BY 2.0 (France) Attribution: tatoeba.orgiteeg #appearance4008918 (JSakuragi) & #3748344 (egg0073)
Wait!	等等！	CCgithub永久回家地址mi-BY 2.tensorflow2.0和1.0差异0 (France) Attribution: tatoeba.org #1744314 (belgavox产品司理) & #4970122 (wzhd)

老规矩，照样跑一下这个代码，看看他终究输出的东西张啥样。

# 从cmn.txt读入4条记载
inp_lang, targitlabg_lang = create_dataset('cmn.txt', 4)
print("inptensorflow装置教程_lantensorflow是什么g={}, targ_lang=tensorflow装置{}".format(inp_lang, targ_lang))

输出作用: 能够看到，输出的中英文是分隔的两个列表，两个列表中英文翻译是依据下标一一对应的，比方，inp_lang[0]=’ Hi . ‘，对应的中approve文翻译是targ_lang=’ 嗨。 ‘

inp_lang=[
'<start> Hi .  <end>',
'<tensorflow2.0和1.0差异sta产品运营首要做什么rt> Hi .  <eapproachnd>',
'产品<start> Run .  <end>',
'<start> Wait !  <endgitee>'
],
targ_l产品运营ang=[
'<start> 嗨app装置下载 。 <end>',
'<start>产品设计 你 好 。 <end>',
'<start> 你产品设计专业 用 跑 的 。 <end>',
'<st产品密钥在哪里能找到art> 等 等 ！ <end>'
]tensorflow2.0和1.0差异

load_dataset、tokenize：创建github永久回家地址mi字典、文本转向量

# #产品 文本tensorflow菜鸟教程内容转向量
def tokenize(lang):
lang_tokenigitlabzer = tf.keras.preprocessing.text.Tokenizer(filters='')
lang_tokenizer.fit_on_texts(lang)
tensor = lang_tokenizer.texts_to_sequencegithub是干什么的s(lang)
tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor,
padding='post')
return tensor, lang_tokenizer
def load_dataset(path, num_examgiteeples=Ntensorflow装置教程one):
inp_lang, targ_lang = create_dataset(path,tensorflow是什么 num_examples)
input_tensor, iappearnp_lang_tokenizer = tokenize(inp_lang)
target_tensor, targ_lang_tokenizer = tokenize(targ_github是干什么的lang)
return input_tensor, target_tensor, inp_lang_tokenigitizertensorflow和pytorch哪个好, targ_lang_tokenizer

作业一下lo产品ad_dataset：

inp_tensor, targ_tensor, inp_tokenizer, tapprovearg_tokenizer = loadgiti轮胎是什么品牌_dataset("cmn.txt", 4)
print("inp_tensor={}, inpgiticomfort_tokenizer={产品运营}".format(inputappstore_tensor, ingithub永久回家地址mip_lang_tokenizer.index_word))

来看一下输出作用

inp_tensor=[[1 4 3 2]
[1tensorflow装置 4 3 2]
[1 5 3 2]
[1 6 7 2]],
inp_tokenizer={1: '<sappletart>', 2: '<end>', 3: '.', 4: 'hi', 5: 'run', 6: 'wait', 7: '!'}

inp_tokenizer是结构的词典库，结构的方tensorflow装置法是给每个词分配一个仅有的整数id, inp_tensorgitee是文本转向量的作用，向量里的每个appear元素对应到词典库的单词。

文本预处理的作业到这儿就完毕了。

2、Encoder完毕：

Encoder的作用在《BERT模型入门系列: Attention机制入门》现已介绍过了，在这儿就不多介绍了。在咱们下面的代码完毕里产品，Encoder由两部分组成： Embeding层、RNN层。先上代码看产品质量法看。

import tensorflow as tf
# encoder
class Encoder(tf.keras.Model):
# vocab_siztensorflow版别e:github是干什么的 词典表巨细
# embedding_dim：词嵌入维度
# enc_uints： 编码RNN节点数量
# batch_sgithub永久回家地址miz 批巨细
def __init__(appstoreself, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__in产品司理it__()
self.batch_sz = batch_sz # 批巨细
self.enc_units =TensorFlow enc_units # 编码单元个数(RNN单元个数giti)
# Embedtensorflow装置ding 把一个整数转为一个固定长度的稠密向量
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)产品运营
# 创建一个rn产品生命周期n层
self.rnn = tf.keras.layers.SimpleRNN(self.enc_units,
retugiti轮胎是什么品牌rn_sequences=True,
returGitn_state=True)
def call(self, x, hidden):
x = self.embedding(产品运营首要做什么x)
output, state = self.rnn(x, initial_state=hidden)
return output, state
# 张量的概念 tf.Tensoapplicationr https:/产品设计/www.tensorflow.org/guide/tensor
def initialize_hidden_state(self):
return tf.zeros((seappreciatelf.batch_sz, self.enc_units))

咱们来解析一下参数的意义

__init__函数appearance的参数意义:

vocab_size: 词典表巨细, 词典表的巨细指的是词典表里边有多少个github中文官网不重复的词语，这个词典是咱们调用load_dataset函数结构的。

embedding_dim：词嵌入维度, 在前面现已说过，咱们会用一个数字标明每个单词，这样咱们的句子能够编码成一个布满向量，但这种编码办法存在缺点，不能捕获两个单词之间的关联性。所以，咱们的输入的数据在用整数编码成布满向量approve后，还会经过一个Embeddingtensorflow最具体教程层，gitlab重新编码成一个固定长度的稠密向量，embeddingtensorflow装置教程_dim指的便是经过Embedding层编码后的向产品质量法量的维度。有关为何经过整数编码后还要进行Embedding，能够参阅字嵌入

enc_uints：编码RNN的输出节点,咱们这个比方只用了一层RNN机器学习，但是实践上也是能够设置为多层RNN的，enc_uint指的是最giti轮胎是什么品牌终一层输出层的节点数

ba产品质量法tch_sz：批巨细，深度学习里边，每一次参数的更新所核算的损失函数并不是仅仅由一个{产品密钥在哪里能找到data:label} 所核算的，而是由一组{data:label} 加权得到的，这组数据的巨细便是batch_size

Encoder除了初始化的_intensorflow菜鸟教程it_tensorflow装置教程函数外，还有一个call函数，call函数是真实实施编码动作的逻tensorflow菜鸟教程辑，咱们来看看call函数的具体参数解析

call函数的参approach数意义、输出意义：

x: 练习样本，即向量化后的gitlab文本，load产品密钥在哪里能找到_dataset回来的处理之后的数据。是 BATCH_SIZE * 样本长度的矩阵,即 x是BATCH_SIZE个样本数据。

hiddenapp装置下载: BATCH_SIZE * enc_units 的矩阵。循环神经网络的躲藏层的值不仅仅取决tensorflow最具体教程于当时这次的输入x，还取决于上一次躲藏层的值hidden,所以，需求输入上一个输入的躲藏值hidden。此处，调用call函数时分是初始状况，所appstore以咱们只需求给一个初始值就能够了。

这儿问题来了，为什么hidden是tensorflow是什么BATCH_SIZE * enc_uint的矩阵呢?

简tensorflow装置略来说说，练习模型的时分，咱们输入的是BATCH个样本，其次，咱们的RNN界说的是enc_uints个神经元，换句话说便是关于每一个单词的输入，都会有enc_uints个神经元输出值。因而，咱们的tensorflow最具体教程RNN输出的躲藏层是 BATCH_SI产品营销策划ZE * word_size* enc_uints_github永久回家地址mi，_这儿word_size是一个样本中的单词的数量。

所以，关于咱们的初始值来说，咱们只需求输入BATCH_SappleidIZE个样本，样本中的单词数量都为1即可，即call产品设计专业函数的hidden参数是BATCH_SIZE * 1* enc_uints 的矩阵

输出:gitlab BATCH_SIZE *tensorflow结构 word_size * enc_uints,其中gitee word_size是一个样本中的单词的数量

了解输入、输出关于了解代码很有帮忙，上面扯了那么多giti，这儿来一幅图总结一下。

Encoder数据流程图

看了上面的解析，信赖关于数据输入输出都有了必定的了解了，咱们直接作业，看看代码的输出作用。

# 加载样本数据
inpugithubt_tensor, tagithub中文官网rget_tensor, inp_lang, targ_lang=preprocess.load_dataset("./cmn.txt", 30000)
# 选用8github中文官网0-20的比例切分练习集和验证集
input_tensor_train, input_tensor_val, target_tensor_train, 
target_tensor_val =tensorflow是什么 train_tappstoreest_split(input_tensor, target_tens产品营销策划or, test_size=0.2)
# 创建一个 tf.data 数据集
BUFFER_SIZE = len(input_tensor_train)
B产品司理ATCH_SIZE = 32
steps_per_epoch = len(input_tensor_train)//BATCH_SIZtensorflow是什么E
embedding_dim = 256 # embedding维度
units = 512
vocab_inp_size = len(inp_langtensorflow结构.word_index)+1
vocab_tar_size = len(targ_lang.word_index)+1
dataseappeart = tf.data.Dataseappstoret.from_tensor_slices((inappearanceput_tensor_train, target_tentensorflow版别sor_train))
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
# 调用encoder
encoder = encoder.Encoder(vocab_intensorflow装置教程p_size, embedding_dimgithub永久回家地址mi, units, BATCH_SIZE)
# 初始化一个躲藏状况
s产品ample_hiddenappreciate = encoder.initialize_hidden_state()
# 实施编码
sample产品批号是生产日期吗_output, sample_hidden = encoder(example_input_batch, sample_hidden)
print ('output shape:(batch size, s产品设计专业equence length, units){}tensorflow2.0和1.0差异'.format(sample_output.shape))
print ('Hidden stappearate shape: (batcappleh size, units) {}'.format(sample_hidden.shape))

Encodetensorflow菜鸟教程r输出作用:

output shape: (batch size, sequence length, units) (32, 36, 512)
Hidden state shape: (batch size, units) (32, 512)

Encoder的完毕阐明到这儿就完毕了，不过。。。咱们的内容还没有完毕。gitlab

咱们这儿初步来聊聊Decoder的完毕，Decoder的作用便是把Encoder编码之后的文本翻译成方针文本，嗯，对，功用便是那么简略，Dec产品设计专业odtensorflow装置教程er咱们也是用一个RNN完毕，废话不多说了，先看看git指令代码。

3approve、Dectensorflow2.0和1.0差异oder完毕

import tensorflow as tf
i机器学习mport attegitlabntion
clas产品s Decoder(tf.keras.Mtensorflow最具体教程odel):app装置下载
# votensorflow装置cab_siz产品运营首要做什么e 词典巨细
# embedding_dim 词嵌入维度
# dec_uints 解码RNN输出神经元数
# batch_sz 批巨细
def __init__(self, vocab_siz产品设计专业e, embedding_dim, dec_units, batch_sappleidz, attention):
super(Decoder, self).__init__()
self.batch_szapproach = batch_sz
self.dec_units = dapp装置下载ec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
stensorflow版别elf.rnn = tf.keras.layers.SimpleRNN(self.dec_units,
return_sequences=True,
return_state=True)
selgiticomfortf.fc = tf.ketensorflow最具体教程ras.layers.Dense(vocab_size)
self.attention = attention
# x 是输出方针词语[教师强制](这儿是个整数，是单词在词表中的index)
def cgiti轮胎是什么品牌all(self, x, hidden, enc_output):
# 编码器输出 （enc_output） 的形状 == （批巨细，最大长度，躲藏层巨细）
# context_vector 的shape == (批巨细，躲藏层巨细产品运营)
# attention_wegiteeight == (批巨细，最大长度, 1)
contexgiti轮胎是什么品牌t_vector, attention_weights = self.attention(hiddentensorflow是什么, enc_output)
#print("context_vector.shape={}application".format(context_vector.shape))
# x 在经过嵌入层后的形状 == （批巨细，1，嵌入维度）
x = self.embedding(x)
# xapple 在拼接 （concatenation） 后的形状 == （批巨细，1，嵌入维度 + 躲藏层巨细）[特征拼接]
x = tf.concat([tf.expand_dims(context_vector, 1), x], axitensorflow2.0和1.0差异s=-1)
#print("x.shape={}".format(x.shape))
# 将吞并后的向量传送到 RNN, rnn需求的shape是(batch_size, time_step, feature)
ougit指令tput, staTensorFlowte = self.rnn(x)
#print("output 1.shape={}".format(output.shape))
# 输出的形状 == （批巨细 * 1，躲藏层巨细）
# 将吞并产品设计专业后的向量传送到 RNN, r产品设计nn需求的shape是(batch_size, time_stepgithub, feature),time_step这gitlab个维度没什么意义，
# 在全联接层能够去掉,这儿去掉
output = tf.reshape(outappreciatepuTensorFlowt, (-1, output.shape[2]))
# 输出的形状 == （github是干什么的批巨细，vocab）,输出全部单词概率
x = self.fc(output)
return x, state, attentiogiteen_weights

Decoder的参数咱们也解application析一下

call函数参数解析

x: 上一个输入得到的翻译作用，例如： “machine learning”=>”机器学习”，

1、若gitlab当时待翻译的是”machine”,那么这儿 x 是一个标识符””，

2、若当时待翻译的是”learningTensorFlow“，那么此处x是“机器”。

这种将前一个输入的输出作用作为当时输入的特征进gitlab行练习的办法叫教师强制，是一种快github速有效地练习循环神经网络模型的办法，感兴趣的同学请移步到《Professor Forcing: A New AlgorAPPithm for Training Recurrent Networks》

hidden: Encoder回来的躲藏层状况， hidden的shape是BATCH_SIZE * enc_uints。

enc_output: Encoder的编码作用 , shape是 BATCH_SIZE * word**_**size *gitlab enc_uints。

Decoder还有一个attention参数，这个便是核算attention的函数，这儿当作是个参数传进来。attention的核算办法在《BERT模型入门系列: Attetensorflow最具体教程ntion机制入门》现已讲过，这儿不approach多说了，咱们以点积(dot product)的办法完TensorFlow结核算attention的核算办法。

4、Attention模型完毕:

class DotProductAttention(tf.keras.layers.Layer):
def __init__(self):
super(DotProductAttention, self).__init__()
def call(selfgithub中文官网, query, value):
# 32 * 512 * 1
hidden = tf.ex产品密钥在哪里能找到pand_dims(query, -1)
# 核算产品设计点积
score = tf.matmul(value, hidden)
attention_weights =appstore tf.nn.softmax(score, axis=1)
context_vector = attention_weights *approve valtensorflow结构ue
# 求和
contetensorflow最具体教程xt_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attentigithub中文官网on_weights

首tensorflow结构要部分都定github中文官网义完了，咱们这就来作业一下

imtensorflow最具体教程pogithub是干什么的rt tensorflow as tf
i产品司理mport decoder
import attention
import encoder
importtensorflow2.0和1.0差异 preprocess
# 加载、预处理数据
input_tensor, target_tensor, inp_lang, targ_lang = preprocess.load_dataset("./cmn.txt", 30000)
# 公共参数界说github中文官网
BUFFER_SIZgiti轮胎是什么品牌E = len(input_tegitinsor)
BATCH_SIZE = 32
steps_per_epoch = len(input_tensor)//BATCH_SIZE
embtensorflow最具体教程edding_dim = 2tensorflow装置教程56 # 词向量维度
u产品设计nits = 512
vocab_inp_size = len(inp_giticomfortlang.word_index)+1
vocab产品_tar_size = len(targ_lang.word_index)+1
# 数据集
dataset = tf.data.Dataset.from_tensor_slices((input_tensor, target_tensor)).shuffle(BUFFER_SIZE)
data产品设计专业set = dataset.batch(BATCH_SIZE, drop_产品司理remainder=True)
# 界说encoder
encoder = encoder.Encoder(vocab_inp_applicationsize, embedding_dim, units, BATCH_SIZE)
sample_hidden = encoder.tensorflow和pytorch哪个好initialize_hidden_state(tensorflow菜鸟教程)
sample_output, sample_hidden = encoder(example_input_batch, sample_hiddetensorflow和pytorch哪个好ntensorflow菜鸟教程)
print ('encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print ('encoder Hidden state shtensorflow结构ape: (batch size, units) {giticomfort}'.format(sample_hidden.shape))
#Git 界说注意力
attentappearanceion_layer = attention.DotProductAttention()
context_vector, attention_weights = attentappleidiontensorflow结构_layer(sample_github永久回家地址mihidden,产品设计专业 sample_output)
print ('conteapp装置下载xt_vector shagitlabpe:  {}'.format(context_vector.shape))
print ('attention_weights state: {}'.format(attenti产品设计专业on_weights.shape))
# 界说decoder
dec_input =approach tf.expand_dims(产品设计[targ_lang.word_index['<start>']] * BATCH_SIZE, 1)
detensorflow2.0和1.0差异coder = decoder.Decoder(vocab_tar_size, embed产品设计专业ding_dim, units, BATCH_SIZE,appstore attention_layer)
dec_output, degithubc_state,giticomfort attention_weights = decoder(dec_input, sample_hidden, sample_output)
print ('decoder shAPPape: (batch size, sequence length, units) {}'.format(dgitlabec_output.shape))
print ('decoder Hidden state shape: (batch size, unigithub中文官网ts) {}'.format(dec_state.shape))

5、模型练习:

Encoder和Decodapplicationer、Attention咱们都现已完毕了，接下来就能够初步界说模型练习tensorflow结构的进程。在咱们的数据预处理进程中，github中文官网咱们现已用

 dataset.batch(BATCH_SIZE, drop_remainder=True)

按照BATCH_SIZE巨细对练习数据进行拾掇，所tensorflow最具体教程以咱们每个练习的最小单元是产品批号是生产日期吗BATCH_SIZE巨细的数据集。来看看具体产品质量法的练习进程

单个BATCH练习

import tensorflow as tf
imporappearancet optimizer
# 单个样本的模型练习
# encoder 界说好的encoder模型
# decoder 界说好的decoder模型
# inp 练习数据，待翻译文本的张量
# targ 练习据， 方针文本的张量
# targ_lang 方针文本的词典
# enc_hidden encoder回来的躲藏层状况
def train_step(encogitider, decoder, inp, targ, targ_lang, enc_hidden, BATCH_SIZE):
loss = 0
with tf.GradientTape() as tap产品质量法e:
enc_outp产品质量法ut, enc_hidden = encoder(inp, enc_tensorflow装置教程hidden)
dec_hidden = enc_hidden
dec_inptensorflow2.0和1.0差异ut = tf.e产品批号是生产日期吗xpand_dims([targ_lanappearg.word_index['<start>']] * B产品司理ATCH_SIZE, 1)
# 以文本长度为主，遍历全部词语
ftensorflow装置教程or t in range(1, targ.shape[1]):
# 将编码器输出 （enc_output） 传送至解码器
predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
# 这儿输入的是一个batch
loss += optimizer.loss_function(tatensorflow装置教程rgapp装置下载[:, t], predictions)
# 教师强制 - 将方针词作为下一个输入，一个batch的循环
dec_input = tf.expand_dims(targ[:, t], 1)
batch_loss = (loss / int(targ.shape[1]))
varappearanceiables = encoder.trainable_variables + decoder.trainable_variables
gradients = tape.gradient(loss, variables)
optimiztensorflow最具体教程er.optimizer.apply_gradients(zip(applegradients, variables))
return batch_loss

全体练习流程:

# 模型练习
dtensorflow装置ef train(appearepochs):
EPOCHS = epochs
for epoch in range(EPOCHS):
enc_hidden = encoder.iappreciatenitialize_hidd产品质量法en_state()
total_loss = 0
# dataset最多有steps_per_epoch个元素
for (batch, (inp, targ)) in enu机器学习merate(dataset.take(len(inappreciateput_tensor))):
b产品运营首要做什么atch_loss = train_function.train_step(encoder, decoder, inp, targ, targ_ltensorflow菜鸟教程ang, enc_hidden, BATCH_SIZE)
total_loss += batch_loss
if batch % 100 ==tensorflow是什么 0:
print('Epoch {产品司理} Batch {} Loss {:.4f}'.format(epogithub中文官网ch +approach 1,
batch,
batch_loss.numpy()))

6、英文->中文翻译

# 猜测方针解码application词语
def evaluate(sentence):
sentencetensorflow装置 = prepro产品营销策划cess.preprocess_sentence(sentence, 0)
inputs = [inp_lang.word_ind产品运营首要做什么ex[i] for i in sentence.split(' ')]
inputs = tf.keras.preprocessin产品营销策划g.sequence.pad_sequences([inputs],
maxlen=max_length_inp,
padding='pos产品运营首要做什么t')
inputs = tf.convert_to_tensor(inputs)
result = ''
hidden = [tf.zertensorflow菜鸟教程os((1, units))]
enc_out, enc_hidden = encoder(inputs, hidden)
dec_hidden = enc_hidden
dec_input = tf.expand_dims([targ_lang.word_ingithubdex['<start>']],产品司理 0)
# max_length_targ 解码张量的最大长度
for t in range(max_length_targ):
predictions, dec_hidden, attention_weiappleidghts = decoder(dec_input,
dec_hidden,
entensorflow装置c_out)
tf.reshaptensorflow最具体教程e(attenttensorflow是什么ion_weights, (-1, ))
predicted_id = tf.agithub是干什么的rgmax(predictions[0]).numpy()
result += targ_lang.index_word[predicted_id] + ' '
if targ_lang.index_word[predicted_id] =产品设计= '<end>':
return result, sentence
# 猜测的 ID 被输送回模型
dec_igithubnput =appleid tf.expand_d产品批号是生产日期吗ims([predicted_id], 0)
return resulgithub永久回家地址mit, sentence
# 翻译
def translate(sentence):
result, sentence = evaluate(sentence)
print('Input: %s' % (sentence))
print('Predicted tran产品运营slation: {}'.fGitormat(result))

作业一下:

traapplein(20)
translate("hello")
translate("he is swimming in the rivertensorflow版别")

输出作用

Epoch 20 Batch 300 Loss 0.5712
Epoch 20 Batch 400 Loss 0.4970
Epoch 20 Batch 500 Ltensorflow版别oss 0.5692
Epoch 20 Batch 600 Loss 0.6004
Epoch 20 Batch 700 Loss 0.6078
Input: <start> hello <end>
Predicted translation: 你 好 。 <end>
Input: <start> he is swimming机器学习 in the river<end>
Predicted translatTensorFlowion: 我  <end>

总算完毕了，这个比方跑出来的APP作用不是很好，估计有以下几个原因

1、数据量缺少，数据集只有3000多条。

2、练习tensorflow是什么次数缺少，进一步giticomfort优化增加迭代次数

3、Attention模github中文官网型仍有有优化的空间、只github中文官网用了点积的注意力核算方产品设计法，仍有作用更好的核算办法

4、运用的是RNN模型，也产品营销策划可替换为lstm、applicationgru等神经网络进行调试

参阅:

www.tensorflow.org/tutorials/t…

git产品hub.com/rotbit/nmt.…

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

BERT模型入门系列（二）: Attention模型实现

概述:

1、文本预处理

load_dataset、tokenize：创建github永久回家地址mi字典、文本转向量

2、Encoder完毕：

Encodetensorflow菜鸟教程r输出作用:

3approve、Dectensorflow2.0和1.0差异oder完毕

4、Attention模型完毕:

5、模型练习:

6、英文->中文翻译

参阅:

评论(0)

提示：请文明发言取消回复

近期文章

近期评论

BERT模型入门系列（二）: Attention模型实现

概述:

1、文本预处理

load_dataset、tokenize： 创建github永久回家地址mi字典、文本转向量

2、Encoder完毕：

Encodetensorflow菜鸟教程r输出作用:

3approve、Dectensorflow2.0和1.0差异oder完毕

4、Attention模型完毕:

5、模型练习:

6、英文->中文翻译

参阅:

评论(0)

提示：请文明发言 取消回复

近期文章

近期评论

load_dataset、tokenize：创建github永久回家地址mi字典、文本转向量

提示：请文明发言取消回复