James Briggs
The articles and videos that make up the course

For the past few months I’ve been working with Pinecone on a series of articles and videos covering the essentials of vector similarity search.

The course introduces the idea and theory behind vector search, how to implement several algorithms in plain Python, and how to implement everything we learn efficiently…

Photo by Jeremy Bezanger on Unsplash

Question-answering (Q&A) transformers are widely applicable, insanely cool applications of modern-NLP.

At first glance, most of us would view building something like this as a feat of great difficulty. Fortunately, most of us would be wrong.

Transformers are — despite their incredible performance — surprisingly straightforward to train or fine-tune…

Getting Started

Image by author

Taking those first steps into interacting with the web using Python can seem daunting — but it need not be. It is a surprisingly simple process, with well established rules and guidelines.

We’ll cover the absolute essentials for getting started, including:

- Application Program Interfaces (APIs)
- Javascript Object Notation (JSON)

Image by author

HF Datasets is an essential tool for NLP practitioners — hosting over 1.4K (mainly) high-quality language-focused datasets and an easy-to-use treasure trove of functions for building efficient pre-processing pipelines.

This article will look at the massive repository of datasets available and explore some of the library's brilliant data processing capabilities.

Notes from Industry

Scalable search with Facebook AI — original article on Pinecone.io — image by author

Facebook AI Similarity Search (Faiss) is one of the most popular implementations of efficient similarity search, but what is it — and how can we use it?

What is it that makes Faiss special? How do we make the best use of this incredible tool?

Fortunately, it’s a brilliantly simple…

Image by author

HuggingFace’s transformers library is the de-facto standard for NLP — used by practitioners worldwide, it’s powerful, flexible, and easy to use. It achieves this through a fairly large (and complex) code-base, which has resulted in the question:

Why are there so many tokenization methods in HuggingFace transformers?

Tokenization is the…

Image by author — original article on Pinecone.io

Similarity search is one of the fastest-growing domains in AI and machine learning. At its core, it is the process of matching relevant pieces of information together.

There’s a strong chance that you found this article through a search engine — most likely Google. Maybe you searched something like “what…

James Briggs

Freelance ML engineer learning and writing about everything.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store