The Embarkation of the Queen of Sheba | Lorrain

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 08.08.21

Time to Pretend

5 min readAug 8, 2021

--

Hey Welcome Back!

We have a new CLIP implementation from Max Woolf. It allows for faster experimentation and has some new features like using weighted prompts and using icons for priming the model to improve generation quality. It was released today, try it out! It’s trippy:

Textual (Text User Interface)

From the maker of Rich library, Will McGugan, Textual is a new project where you can create some amazing apps in terminal. 😎😎

Ciphey | NLP in Encryption

Looks like NLP has arrived for cracking encryption. Let’s say you wanted to know “How was X encrypted?” Ciphey was built to answer this question.

Under the hood:

“Ciphey uses a custom built artificial intelligence module (AuSearch) with a Cipher Detection Interface to approximate what something is encrypted with. And then a custom-built, customisable natural language processing Language Checker Interface, which can detect when the given text becomes plaintext.”

Recent papers you need to read:

These three papers cover prompting, question answering and the fragility of evaluation benchmarks.

Pre-train, Prompt, and Predict: A Systematic Survey of
Prompting Methods in Natural Language Processing

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

The Benchmark Lottery

Collection of Repos for Parsing PDFs

Document Layout Analysis resources for development with PdfPig. It’s in C#, sorry Python lovers.

Resources 📚

New NLP Videos by Dan Jurafsky dropped:

Machine Learning education content from aggregating 1,300 questions from an ML Course.

Pretty cool site with very simple and intuitive answers to technical ML questions. If you are looking for more math heavy stuff go elsewhere.

Here’s an example:

What do dropout layers do?

Dropout layers throw things away. Now you would be asking, why would I want my model to throw data away? It turns out that throwing things away when training a model can drastically improve a model’s performance in testing (where data is not throw away).

When to use dropout layers?

When you feel like your model is overfitting the input, makes the probability of dropping out higher. Often you dropout as much as possible because dropout usually makes a model more robust to noisy inputs.

https://rentruewang.github.io/learning-machine/intro.html

Free PDF download for 2nd edition of Introduction to Stat Learning

CI/CD Tools Review Used in Machine Learning

A breakdown of all the most used tools for CI/CD including free and paid variants. You know you love Jenkins. (just saying 😂)

Stack Overflow Developer Survey

Breaks down tech stacks by media salary among other things 😎…

Summary Explorer: For Exploring Datasets and Models for Summarization

Get access to 50+ models for summarization including their paper, repo and Rouge scores. (In addition to visualizing a few summarization datasets).

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

EmailSum Dataset

Email Thread Summarization (EMAILSUM) dataset, which contains human annotated short summaries of 2,549 email threads (each containing 3 to 10 emails) over a wide variety of topics.

Connected Papers 📈

We build amazing NLP software for companies worldwide. If you are looking for software development, check out our site and reach out to us here: info [at] quantumstat com

--

--

Ricky Costa

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟