Sign in

James Briggs

All you need to create a custom tokenizer using HF transformers

Image by author

Now, of course, there’s always some complication. Maybe the data has some strange property that you (nor anyone else) has ever seen before and renders the data a nightmare to preprocess — but as for model setup, we can usually get going with an existing pre-trained model.

Now, that’s great, but what if there is no pre-trained model that aligns with our specific requirements?

Maybe we’d like our model to understand a less common language for example, how many transformer models out there have been trained on Piemontese…

Quick-fire guide to training a transformer

Form like this requires pretraining — image by author.

Sometimes, this is all we need — we take the model and roll with it as is.

But at other times, we find that we really need to fine-tune the model. We need to train it a little bit more on our specific use case.

Each transformer model is different, and fine-tuning for different use-cases is different too — so we’ll…

A rundown of the coolest features

Pattern by Alexander Ant on Unsplash

We’ll cover some of the most interesting additions to Python — structural pattern matching, parenthesized context managers, more typing, and the new and improved error messages.

Check out the video version of the article here:

Structural Pattern Matching

Structural pattern matching is an incredible feature to be added to Python — truly awesome.

Imagine an if-else statement that looks like this:

If-else logic in Python 3.9

You take that and you modify the syntax so it looks more…

Easy fine-tuning with transformers and PyTorch

Deadlifts, BERTs favorite — Image by author

Although NSP (and MLM) are used to pre-train BERT models, we can use these exact methods to fine-tune our models to better understand the specific style of language in our own use-cases.

So, in this article, we’ll cover exactly how we take an unstructured body of text, and use it to fine-tune a BERT model using NSP.


So, how can we fine-tune a model using NSP?

First, we need data. Because we’re essentially just switching between consecutive sentences…

The other half to pretraining BERT

Chest + tris with BERT — Image by author (sorry)

Where MLM teaches BERT to understand relationships between words — NSP teaches BERT to understand longer-term dependencies across sentences.

Without NSP, BERT performs worse on every single metric [1] — so it’s important.

Now, when we use a pre-trained BERT model, training with NSP and MLM has already been done, so why do we need to know about it?

Well, we can actually fine-tune these pre-trained BERT models so that they better understand the language used in…

Fine-tune your models on any dataset

BERT’s bidirectional biceps — image by author.

BERT has enjoyed unparalleled success in NLP thanks to two unique training approaches, masked-language modeling (MLM), and next sentence prediction (NSP).

In many cases, we might be able to take the pre-trained BERT model out-of-the-box and apply it successfully to our own language tasks.

But often, we might need to fine-tune the model.

How MLM works

Further training with MLM allows us to…

Overpowered entity extraction using roBERTa

Image by author

“Apple reached an all-time high stock price of 143 dollars this January.”

We might want to extract the key pieces of information — or ‘entities’ — and categorize each of those entities. Like so:

Apple — Organization

143 dollars — Monetary Value

this January — Date

For us humans, this is easy. But how can we teach a machine to distinguish between a granny smith apple and the Apple we trade on NASDAQ?

(No, we can’t rely on the ‘A’…

High-performance semantic similarity with BERT

Image by author

A big part of NLP relies on similarity in highly-dimensional spaces. Typically an NLP solution will take some text, process it to create a big vector/array representing said text — then perform several transformations.

It’s highly-dimensional magic.

Sentence similarity is one of the clearest examples of how powerful highly-dimensional magic can be.

The logic is this:

  • Take a sentence, convert it into a…

Write code that goes the extra mile

Image by the author

Euclidean distance, dot product, and cosine similarity

Image by the author

When we convert language into a machine-readable format, the standard approach is to use dense vectors.

A neural network typically generates dense vectors. They allow us to convert words and sentences into high-dimensional vectors — organized so that each vector's geometric position can attribute meaning.

James Briggs

Data scientist learning and writing about everything.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store