James Briggs
The articles and videos that make up the course

For the past few months I’ve been working with Pinecone on a series of articles and videos covering the essentials of vector similarity search.

The course introduces the idea and theory behind vector search, how to implement several algorithms in plain Python, and how to implement everything we learn efficiently…

Image by author

I’ve been discussing NLP with Ismail Ashraq from the Maldives. A beautiful archipelago in the Indian Ocean with an incredibly stable average low-to-high of 25.2–31.6°C (for those of you stuck in the 18th century, that’s 77.4–88.9°F).

Ashraq introduced me to the language of Dhivehi (or Maldivian), which is fascinating. It…

Photo by Jeremy Bezanger on Unsplash

Question-answering (Q&A) transformers are widely applicable, insanely cool applications of modern-NLP.

At first glance, most of us would view building something like this as a feat of great difficulty. Fortunately, most of us would be wrong.

Transformers are — despite their incredible performance — surprisingly straightforward to train or fine-tune…

Getting Started

Image by author

Taking those first steps into interacting with the web using Python can seem daunting — but it need not be. It is a surprisingly simple process, with well established rules and guidelines.

We’ll cover the absolute essentials for getting started, including:

- Application Program Interfaces (APIs)
- Javascript Object Notation (JSON)

Image by author

HF Datasets is an essential tool for NLP practitioners — hosting over 1.4K (mainly) high-quality language-focused datasets and an easy-to-use treasure trove of functions for building efficient pre-processing pipelines.

This article will look at the massive repository of datasets available and explore some of the library's brilliant data processing capabilities.

Hands-on Tutorials

Image by author

Building a transformer model from scratch can often be the only option for many more specific use cases. Although BERT and other transformer models have been pre-trained for many languages and domains, they do not cover everything.

Often, these less common use cases stand to gain the most from having…

Notes from Industry

Scalable search with Facebook AI — original article on Pinecone.io — image by author

Facebook AI Similarity Search (Faiss) is one of the most popular implementations of efficient similarity search, but what is it — and how can we use it?

What is it that makes Faiss special? How do we make the best use of this incredible tool?

Fortunately, it’s a brilliantly simple…

Image by author

HuggingFace’s transformers library is the de-facto standard for NLP — used by practitioners worldwide, it’s powerful, flexible, and easy to use. It achieves this through a fairly large (and complex) code-base, which has resulted in the question:

Why are there so many tokenization methods in HuggingFace transformers?

Tokenization is the…

James Briggs

Freelance ML engineer learning and writing about everything.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store