Chapitre 2 et 3 couvrent respectivement chaque papier, avant de conclure par le chapitre 4. In this work, we propose a near lossless method for encoding long sequences of texts as well as all of their sub-sequences into feature rich representations. We test our method on sentiment analysis and show good performance across all sub-sentence and sentence embeddings.
This work also demonstrates the use of knowledge distillation and quantization to compress the original Transformer model [Vaswani et al. We are, to the best of our knowledge, the first to show that 8-bit quantization of the weights of the Transformer can achieve the same BLEU score as the full-precision model.
Furthermore, when we combine knowledge distillation with weight quantization, we can train smaller Transformer networks and achieve up to Chapter 1 introduces machine learning concepts for natural language processing which are essential to understanding both papers presented in this thesis. Chapter 2 and 3 cover each paper respectively, before finally concluding with chapter 4.