Quantum Stat
The Old Bridge | Robert


Welcome back! We have a long newsletter this week as many new NLP repos were published as tech nerds return from their Summer vacation. 😁

This week I’ll add close to 150 new NLP repos to the NLP Index. So stay tuned for this update, it will drop this week.

Welcome to the Matrix

Six Degrees of Wikipedia

just explore…


Embeddinghub is a database built for machine learning embeddings. It is built with four goals in mind.

  • Store embeddings durably and with high availability
  • Allow for approximate nearest neighbor operations
  • Enable other operations like partitioning, sub-indices, and averaging
  • Manage versioning, access control, and rollbacks painlessly

Rubrix | Open Sourced NLP Data Explorer/Annotator

This library is…

Hubert Robert


Hey Welcome Back! A flood of EMNLP 2021 papers came in this week so today’s newsletter should be loads of fun! 😋

But first, a meme search engine:

The Missing Text Phenomenon

An article on The Gradient had an interesting take on NLU. It describes how a NNs’ capacity for NLU inference is inherently bounded to the background knowledge it knows (which is usually highly limited relative to a human). Although I would add a bit more nuance to this by sharing that this is only a problem for a model that is not localized for its user, meaning a model that wasn’t fine-tuned/prompted…

Nova melting hypothetical planet | Bonestell


Way back in February of 2020, someone Twitter posted they had FOIA’d the NSA aka National Security Agency. This actor, by the name ‘cupcake’ was able to retrieve a 400-page printout of their COMP 3321 training course (😂). It was OCR’d and uploaded to the cloud totaling 118MB of absolute FOIA madness of Python learning material courtesy of the Men in Black by the way of Fort Meade. Enjoy!


Don’t click on this 👇 and definitely don’t type “help” and press “enter”.

Windows 96 Bruh!

Want access your favorite OS of all time: Windows 96??

Graph Research

A graph neural network paper list for…

The Embarkation of the Queen of Sheba | Lorrain


Hey Welcome Back!

We have a new CLIP implementation from Max Woolf. It allows for faster experimentation and has some new features like using weighted prompts and using icons for priming the model to improve generation quality. It was released today, try it out! It’s trippy:

Textual (Text User Interface)

From the maker of Rich library, Will McGugan, Textual is a new project where you can create some amazing apps in terminal. 😎😎

Giant Flying Mocca Cup with an Inexplicable Five Metre Appendage | Dali


Welcome back! This week’s Cypher will be a bit shorter than usual, it was a slow week in NLP land. But first, I want to update you on the BlenderBot 2.0 situation. On last week’s Cypher, the last hurdle to overcome with the instantiation of blenderbot inference was the search server (which gives the bot the ability to comb the web to answer factoid type of questions). Well we finally have a search server repo to work with!

Thank you to Jules Gagnon-Marchand for creating a wonderful repo that provides a seamless integration with ParlAI’s library for creating your…



Sometimes… cool things happen. A new chatbot from Facebook AI was released this Friday with remarkable features. This chatbot, BlenderBot 2.0, is an improvement on their previous bot from last year. The bot has better long-term memory and can search the internet for information during conversation! This is a convenient improvement versus traditional bots since information is not statically “memorized” but instead has the option to be dynamic and “staying up to date” via the internet. 🤯

I’ve recently tested the model and trialed the smaller 400M variant. Currently, there exists two variants:

  • BlenderBot 2.0 400m: --model-file zoo:blenderbot2/blenderbot2_400M/model
  • BlenderBot 2.0…

Assumption of the Virgin | Correggio


Welcome back! Hope you had a great week. We have a new leader on the SuperGLUE benchmark with a new Ernie model from Baidu comprising of 10 billion parameters trained on on a 4TB corpus. FYI, human baseline was already beat by Microsoft’s DeBERTa model at the beginning of the year… time for a new SuperSuperGLUE benchmark???


The Codex Paper

BTW, if you are still interested in GitHub’s CoPilot, I stumbled upon the Codex paper this week:


DeepMind’s Perceiver

DeepMind’s Perceiver transformer allows it to take a variety of modalities (vision, audio, text) as its input and able to achieve competitive outcomes in…

Stellaris Art


Hey Welcome back! Want to wish everyone in the US a happy 4th of July🎆🎇! Also, want to quickly mention that the NLP Index has doubled in size (since its inception) with now housing over 6,000 repos, pretty cool!!! 😎 And as always, it gets updated weekly. But first, this week we asked 100 NLP developers: Name one thing Microsoft got for paying $7.5 billi for GitHub, and $1 billi to OpenAI? SURVEY SAYS:

7.5B + 1B = GitHub CoPilot 👍

If you want to hear GitHub’s take on their new code generating assistant read here:

Also… it turns…

The Voyage of Life: Youth | Cole


Welcome back! EleutherAI has a brand new (and big) GPT model that was open-sourced over this past week. The model (JAX-based) was trained for 5 weeks on the Pile dataset, Eleuther’s own ~800GB data dump. The model is called GPT-J, a 6 billion parameter model that rivals the performance of GPT-3 of the same size. And apparently it performs well on code generation:

Here’s a comparison of all the major language models on various datasets:

Baptism of Christ (aka there's a giant UFO in the sky) | Gelder (1710)


Welcome back to the simulation ✌ . So ACL 2021 data dump happened and now we have a huge list of repos to get through in the Repo Cypher this week. 😁

Also, we are updating the NLP index very soon with 100+ new repos (many of which are mentioned here) alongside 30+ new NLP notebooks like this one 👇 . If you would like to get an email alert for future newsletters and asset updates, you can sign-up here.

thank you Niels Rogge

So let us start with incoming awesomeness. Heard of the Graph4NLP library??? If you want to…

