Hey Welcome Back!
We have a new CLIP implementation from Max Woolf. It allows for faster experimentation and has some new features like using weighted prompts and using icons for priming the model to improve generation quality. It was released today, try it out! It’s trippy:
Textual (Text User Interface)
From the maker of Rich library, Will McGugan, Textual is a new project where you can create some amazing apps in terminal. 😎😎
GitHub - willmcgugan/textual: Textual is a TUI (Text User Interface) framework for Python inspired…
Textual is a TUI (Text User Interface) framework for Python inspired by modern web development. Currently a work in…
Ciphey | NLP in Encryption
Looks like NLP has arrived for cracking encryption. Let’s say you wanted to know “How was X encrypted?” Ciphey was built to answer this question.
Under the hood:
“Ciphey uses a custom built artificial intelligence module (AuSearch) with a Cipher Detection Interface to approximate what something is encrypted with. And then a custom-built, customisable natural language processing Language Checker Interface, which can detect when the given text becomes plaintext.”
GitHub - Ciphey/Ciphey: ⚡ Automatically decrypt encryptions without knowing the key or cipher…
Translations 🇩🇪 DE 🇫🇷 FR 🇭🇺 HU 🇮🇩 ID 🇮🇹 IT 🇳🇱 NL 🇧🇷 PT-BR 🇷🇺 RU 🇨🇳 ZH ➡️ Documentation | Discord |…
Recent papers you need to read:
These three papers cover prompting, question answering and the fragility of evaluation benchmarks.
Collection of Repos for Parsing PDFs
Document Layout Analysis resources for development with PdfPig. It’s in C#, sorry Python lovers.
GitHub — BobLd/DocumentLayoutAnalysis: Document Layout Analysis resources repos for development…
Document Layout Analysis repos for development with PdfPig. From wikipedia: Document layout analysis is the process of…
New NLP Videos by Dan Jurafsky dropped:
Machine Learning education content from aggregating 1,300 questions from an ML Course.
Pretty cool site with very simple and intuitive answers to technical ML questions. If you are looking for more math heavy stuff go elsewhere.
Here’s an example:
What do dropout layers do?
Dropout layers throw things away. Now you would be asking, why would I want my model to throw data away? It turns out that throwing things away when training a model can drastically improve a model’s performance in testing (where data is not throw away).
When to use dropout layers?
When you feel like your model is overfitting the input, makes the probability of dropping out higher. Often you dropout as much as possible because dropout usually makes a model more robust to noisy inputs.
Free PDF download for 2nd edition of Introduction to Stat Learning
An Introduction to Statistical Learning
Winner of the 2014 Eric Ziegel award from Technometrics. As the scale and scope of data collection continue to increase…
CI/CD Tools Review Used in Machine Learning
A breakdown of all the most used tools for CI/CD including free and paid variants. You know you love Jenkins. (just saying 😂)
Continuous Integration and Continuous Deployment (CI/CD) Tools for Machine Learning - neptune.ai
In modern software development teams, continuous integration (CI) and continuous deployment (CD) are standard…
Stack Overflow Developer Survey
Breaks down tech stacks by media salary among other things 😎…
Stack Overflow Developer Survey 2021
Even though Engineering managers, SREs, DevOps specialist roles pay the most, we see they also have, on average, over…
Summary Explorer: For Exploring Datasets and Models for Summarization
Get access to 50+ models for summarization including their paper, repo and Rouge scores. (In addition to visualizing a few summarization datasets).
Repo Cypher 👨💻
A collection of recently released repos that caught our 👁
MTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21.8K TV show video clips.
GitHub - jayleicn/mTVRetrieval: [ACL 2021] mTVR: Multilingual Video Moment Retrieval
mTVR: Multilingual Moment Retrieval in Videos. ACL 2021 We introduce MTVR, a large-scale multilingual video moment…
The official implementation of StyleGAN-NADA, a non-adversarial domain adaptation for image generators. Includes Colab.
GitHub - rinongal/StyleGAN-nada
Project Website] StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators Rinon Gal, Or Patashnik, Haggai…
InferWiki16k and InferWiki64k datasets for the knowledge graph completion task.
GitHub - TaoMiner/inferwiki
This is the dataset of the paper Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion…
Email Thread Summarization (EMAILSUM) dataset, which contains human annotated short summaries of 2,549 email threads (each containing 3 to 10 emails) over a wide variety of topics.
GitHub - ZhangShiyue/EmailSum: The data and code for EmailSum
This repository contains the data and code for the following paper: EmailSum: Abstractive Email Thread Summarization…
We build amazing NLP software for companies worldwide. If you are looking for software development, check out our site and reach out to us here: info [at] quantumstat com