The Embarkation of the Queen of Sheba | Lorrain

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 08.08.21

Time to Pretend

5 min readAug 8, 2021

Hey Welcome Back!

We have a new CLIP implementation from Max Woolf. It allows for faster experimentation and has some new features like using weighted prompts and using icons for priming the model to improve generation quality. It was released today, try it out! It’s trippy:

Google Colaboratory

Edit description

colab.research.google.com

Textual (Text User Interface)

From the maker of Rich library, Will McGugan, Textual is a new project where you can create some amazing apps in terminal. 😎😎

GitHub - willmcgugan/textual: Textual is a TUI (Text User Interface) framework for Python inspired…

Textual is a TUI (Text User Interface) framework for Python inspired by modern web development. Currently a work in…

github.com

Ciphey | NLP in Encryption

Looks like NLP has arrived for cracking encryption. Let’s say you wanted to know “How was X encrypted?” Ciphey was built to answer this question.

Under the hood:

“Ciphey uses a custom built artificial intelligence module (AuSearch) with a Cipher Detection Interface to approximate what something is encrypted with. And then a custom-built, customisable natural language processing Language Checker Interface, which can detect when the given text becomes plaintext.”

GitHub - Ciphey/Ciphey: ⚡ Automatically decrypt encryptions without knowing the key or cipher…

Translations 🇩🇪 DE 🇫🇷 FR 🇭🇺 HU 🇮🇩 ID 🇮🇹 IT 🇳🇱 NL 🇧🇷 PT-BR 🇷🇺 RU 🇨🇳 ZH ➡️ Documentation | Discord |…

github.com

Recent papers you need to read:

These three papers cover prompting, question answering and the fragility of evaluation benchmarks.

Pre-train, Prompt, and Predict: A Systematic Survey of
Prompting Methods in Natural Language Processing

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

The Benchmark Lottery

Collection of Repos for Parsing PDFs

Document Layout Analysis resources for development with PdfPig. It’s in C#, sorry Python lovers.

GitHub — BobLd/DocumentLayoutAnalysis: Document Layout Analysis resources repos for development…

Document Layout Analysis repos for development with PdfPig. From wikipedia: Document layout analysis is the process of…

github.com

Resources 📚

New NLP Videos by Dan Jurafsky dropped:

From Languages to Information

Share your videos with friends, family, and the world

www.youtube.com

Machine Learning education content from aggregating 1,300 questions from an ML Course.

Pretty cool site with very simple and intuitive answers to technical ML questions. If you are looking for more math heavy stuff go elsewhere.

Here’s an example:

What do dropout layers do?

Dropout layers throw things away. Now you would be asking, why would I want my model to throw data away? It turns out that throwing things away when training a model can drastically improve a model’s performance in testing (where data is not throw away).

When to use dropout layers?

When you feel like your model is overfitting the input, makes the probability of dropping out higher. Often you dropout as much as possible because dropout usually makes a model more robust to noisy inputs.

https://rentruewang.github.io/learning-machine/intro.html

Free PDF download for 2nd edition of Introduction to Stat Learning

An Introduction to Statistical Learning

Winner of the 2014 Eric Ziegel award from Technometrics. As the scale and scope of data collection continue to increase…

www.statlearning.com

CI/CD Tools Review Used in Machine Learning

A breakdown of all the most used tools for CI/CD including free and paid variants. You know you love Jenkins. (just saying 😂)

Continuous Integration and Continuous Deployment (CI/CD) Tools for Machine Learning - neptune.ai

In modern software development teams, continuous integration (CI) and continuous deployment (CD) are standard…

neptune.ai

Stack Overflow Developer Survey

Breaks down tech stacks by media salary among other things 😎…

Stack Overflow Developer Survey 2021

Even though Engineering managers, SREs, DevOps specialist roles pay the most, we see they also have, on average, over…

insights.stackoverflow.com

Summary Explorer: For Exploring Datasets and Models for Summarization

Get access to 50+ models for summarization including their paper, repo and Rouge scores. (In addition to visualizing a few summarization datasets).

Summary Explorer

Edit description

tldr.webis.de

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

Includes datasets and model checkpoints.

GitHub - IS2AI/MultilingualASR

This repository provides the recipe for the paper A Study of Multilingual End-to-End Speech Recognition for Kazakh…

github.com

Connected Papers 📈

MTVR Dataset

MTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21.8K TV show video clips.

GitHub - jayleicn/mTVRetrieval: [ACL 2021] mTVR: Multilingual Video Moment Retrieval

mTVR: Multilingual Moment Retrieval in Videos. ACL 2021 We introduce MTVR, a large-scale multilingual video moment…

github.com

Connected Papers 📈

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

The official implementation of StyleGAN-NADA, a non-adversarial domain adaptation for image generators. Includes Colab.

GitHub - rinongal/StyleGAN-nada

Project Website] StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators Rinon Gal, Or Patashnik, Haggai…

github.com

Connected Papers 📈

InferWiki, Inferential Benchmark for Knowledge Graph Completion

InferWiki16k and InferWiki64k datasets for the knowledge graph completion task.

GitHub - TaoMiner/inferwiki

This is the dataset of the paper Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion…

github.com

Connected Papers 📈

EmailSum Dataset

Email Thread Summarization (EMAILSUM) dataset, which contains human annotated short summaries of 2,549 email threads (each containing 3 to 10 emails) over a wide variety of topics.

GitHub - ZhangShiyue/EmailSum: The data and code for EmailSum

This repository contains the data and code for the following paper: EmailSum: Abstractive Email Thread Summarization…

github.com

Connected Papers 📈

We build amazing NLP software for companies worldwide. If you are looking for software development, check out our site and reach out to us here: info [at] quantumstat com

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 08.08.21

Time to Pretend

Google Colaboratory

Edit description

Textual (Text User Interface)

GitHub - willmcgugan/textual: Textual is a TUI (Text User Interface) framework for Python inspired…

Textual is a TUI (Text User Interface) framework for Python inspired by modern web development. Currently a work in…

Ciphey | NLP in Encryption

GitHub - Ciphey/Ciphey: ⚡ Automatically decrypt encryptions without knowing the key or cipher…

Translations 🇩🇪 DE 🇫🇷 FR 🇭🇺 HU 🇮🇩 ID 🇮🇹 IT 🇳🇱 NL 🇧🇷 PT-BR 🇷🇺 RU 🇨🇳 ZH ➡️ Documentation | Discord |…

Recent papers you need to read:

Collection of Repos for Parsing PDFs

GitHub — BobLd/DocumentLayoutAnalysis: Document Layout Analysis resources repos for development…

Document Layout Analysis repos for development with PdfPig. From wikipedia: Document layout analysis is the process of…

Resources 📚

New NLP Videos by Dan Jurafsky dropped:

From Languages to Information

Share your videos with friends, family, and the world

Machine Learning education content from aggregating 1,300 questions from an ML Course.

Free PDF download for 2nd edition of Introduction to Stat Learning

An Introduction to Statistical Learning

Winner of the 2014 Eric Ziegel award from Technometrics. As the scale and scope of data collection continue to increase…

CI/CD Tools Review Used in Machine Learning

Continuous Integration and Continuous Deployment (CI/CD) Tools for Machine Learning - neptune.ai

In modern software development teams, continuous integration (CI) and continuous deployment (CD) are standard…

Stack Overflow Developer Survey

Stack Overflow Developer Survey 2021

Even though Engineering managers, SREs, DevOps specialist roles pay the most, we see they also have, on average, over…

Summary Explorer: For Exploring Datasets and Models for Summarization

Summary Explorer

Edit description

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

GitHub - IS2AI/MultilingualASR

This repository provides the recipe for the paper A Study of Multilingual End-to-End Speech Recognition for Kazakh…

MTVR Dataset

GitHub - jayleicn/mTVRetrieval: [ACL 2021] mTVR: Multilingual Video Moment Retrieval

mTVR: Multilingual Moment Retrieval in Videos. ACL 2021 We introduce MTVR, a large-scale multilingual video moment…

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

GitHub - rinongal/StyleGAN-nada

Project Website] StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators Rinon Gal, Or Patashnik, Haggai…

InferWiki, Inferential Benchmark for Knowledge Graph Completion

GitHub - TaoMiner/inferwiki

This is the dataset of the paper Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion…

EmailSum Dataset

GitHub - ZhangShiyue/EmailSum: The data and code for EmailSum

This repository contains the data and code for the following paper: EmailSum: Abstractive Email Thread Summarization…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Ricky Costa

No responses yet