Weekly NLP News Cypher

T5 | The New SOTA Transformer from Google

A new entrant in the transformer school of hard-knocks was unveiled yesterday by Google called T5. This new transformer achieved new SOTA performance on SuperGLUE leaderboard scoring a total score of 88.9, just 0.9 away from human performance.

The model comes in 5 sizes:

  • T5-Small (60 million params)
  • T5-Base (220 million params)
  • T5-Large (770 million params)
  • T5–3B (3 billion params)
  • T5–11B (11 billion params)
Image for post
Image for post


Facebook AI’s RoBERTa Distilled by Hugging Face

Smaller models make it easier to deploy and less $$ for cloud compute.

“95% of RoBERTa-base's performance on GLUE, twice as fast as RoBERTa while being 35% smaller.” — Hugging Face

Below are the results of dev sets on GLUE:

Image for post
Image for post
Hugging Face


Multiprocessing vs. Threading

Understanding the difference between multiprocessing vs. threading is important when deploying machine learning models: FloydHub’s new article goes in-depth:

Fine-Tuning BERT, a Tutorial

Chris McCormick’s blog show us how to use Hugging Face’s Pytorch library to fine-tune BERT for sentence classification:

Microsoft’s UniLM AI Improves Summarization

New Microsoft model, UniLM, completes unidirectional, sequence-to-sequence, and bidirectional prediction which helps improve performance on several NLP tasks. Code and pre-trained models found here:

This is a weekly round-up of NLP News and Code drops from Techies worldwide.

Follow us on Twitter for more NLP News, Code & Demos: @Quantum_Stat

Image for post
Image for post

Written by

We Build NLP for the Bravehearts ✌ quantumstat.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store