Link
Requirements
- Numpy >= 1.13.1
- Tensorflow-gpu >= 1.2.1
- tqdm
- nltk
Construction Details
As we all know Translation System can be used in implementing conversational model just by replacing the paris of two different sentences to questions and answers. After all, the basic conversation model named “Sequence-to-Sequence” is develped from translation system. Therefore, why we not to improve the efficiency of conversation model in generating dialogues?
This is the structure of transformer which is the core of implementing our model. Now let’s split it into several points:
- First one is Input Datasets(Get the batch datasets from generator, which is represented as a list of token ids in this experiment).
- Second one is Embedding layers(Including two parts:Dataset Embedding and Positional Embedding)
- Dataset Embedding transform input token ids into a one-hot vector whose size is the length of vocabulary.
- Positional Embedding also called positional encoding. It considered the index of each word in the list of sentence as the position symbol.
- Third we have a multi-head attention model to split the output of embedding layers into many pieces and run through different attention models parallelly. Finally we can get the result by concating all the outputs from every models.
- Finally, going through a feed forward layer and combining with residual items, so that we can get the result.
Usage
- STEP 1. Download dialogue corpus with format like sample datasets and extract them to
data/
folder. - STEP 2. Adjust hyper parameters in
params.py
if you want. - STEP 3. Run
make_dic.py
to generate vocabulary files to a new folder nameddictionary
. - STEP 4. Run
train.py
to build the model. Checkpoint will be stored incheckpoint
folder while the tensorflow event files can be found inlogdir
. - STEP 5. Run
eval.py
to evaluate the result with testing data. Result will be stored inResults
folder.
Results
|
|
Comparison
Implement feedforward through fully connected.
- Training Accuracy
- Training Loss
Implement feedforward through convolution in only one dimention.
- Training Accuracy
- Training Loss
Reference:
Thanks for Transformer
- 本文在原先的模组上添加了Attention is all you need提到的Position encoding的部分