Hubert Robert

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

The NLP Cypher | 09.05.21

Omega

5 min readSep 5, 2021

--

Hey Welcome Back! A flood of EMNLP 2021 papers came in this week so today’s newsletter should be loads of fun! 😋

But first, a meme search engine:

The Missing Text Phenomenon

An article on The Gradient had an interesting take on NLU. It describes how a NNs’ capacity for NLU inference is inherently bounded to the background knowledge it knows (which is usually highly limited relative to a human). Although I would add a bit more nuance to this by sharing that this is only a problem for a model that is not localized for its user, meaning a model that wasn’t fine-tuned/prompted (localized) for a specific user. For information that is general and with ground truth i.e. (rain is wet or rain falls down to the ground), the MTP isn’t a big issue with large enough data/model.

I think a bigger issue in NLU (using text only) is when data doesn’t match the complexity of real-world. Meaning there isn’t enough information in the text only modality. Humans by default use a multi-modal approach (text, audio, visual etc.) when interpreting the world around us which helps us with inference. Multi-modal learning can be a viable approach to the MTP problem examples discussed in the article.

Document Parsing Goes Multi-Lingual

For those into document (PDF) parsing 👇. Includes the 2nd version of LayoutLM and also its multi-lingual cousin LayoutXLM.

…And there’s already a repo built on top of these models! 👌

Papers to Read 📚

https://arxiv.org/pdf/2108.13048.pdf
https://arxiv.org/pdf/2108.13300.pdf
https://arxiv.org/pdf/2108.08877.pdf
https://arxiv.org/pdf/2108.10197.pdf

StackOverflow Survey Full Dataset Released

Had previously mentioned the highlights/shorter version on a previous newsletter, now you can get the full dataset:

GNN Intro

A long and awesome introduction to graph neural networks.

The Machine & Deep Learning Compendium

Holy Moly 🤯

The Compendium contains over 500-topics in ML, and has been written for over 4 years. It’s now offered in an interactive web-based format.

Repo Cypher 👨‍💻

A collection of recently released repos that caught our 👁

FinQA | Financial Dataset

Dataset contains 8,281 financial QA pairs, along with their numerical reasoning processes. Eleven finance professionals collectively constructed FINQA based on the earnings reports of S&P 500 companies.

Connected Papers 📈

Thermostat

Thermostat is a large collection of NLP model explanations and accompanying analysis tools.

  • Combines explainability methods from the captum library with Hugging Face's datasets and transformers.
  • Mitigates repetitive execution of common experiments in Explainable NLP and thus reduces the environmental impact and financial roadblocks.
  • Increases comparability and replicability of research.
  • Reduces the implementational burden.

Connected Papers 📈

Emotion Recognition in Conversation (ERC)

EmoBERTa can learn intra- and inter- speaker states and context to predict the emotion of a current speaker, in an end-to-end manner.

Connected Papers 📈

WebQA: Multihop and Multimodal QA

WebQA, is a new benchmark for multi-modal multi-hop reasoning in which systems are presented with the same style of data as humans when searching the web: Snippets and Images.

Connected Papers 📈

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

For complete coverage, follow our Twitter: @Quantum_Stat

Quantum Stat

--

--

Ricky Costa

Subscribe to the NLP Cypher newsletter for the latest in NLP & ML code/research. 🤟