Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
Do the markets reflect rational behavior or human irrationality? Mass psychology's effects may not be the only factor driving the markets, but it’s unquestionably significant .
This fascinating quality is something that we can measure and use to predict market movement with surprising accuracy levels.
With the real-time information available to us on massive social media platforms like Twitter, we have all the data we could ever need to create these predictions. …
How many lines of code does it take to use one of Google AI’s top-performing language models and apply it to your own text summarization project?
Or how about intelligent language generation? You start writing, and OpenAI’s GPT-2 finishes — how many lines of code?
Must be a lot? No — both can be built in just seven lines of Python.
In my opinion, that is absolutely ridiculous. Seven lines of code to apply models that are the culmination of decades of work, produced by some of the most intelligent people on Earth, using millions of dollars of research funds.
Already, my mind is blown. It’s incredible, but it’s true. …
Kaggle is the world’s platform for everything data science. Like a strange social network, full of data scientists, with Jupyter notebooks everywhere.
It’s a great platform to learn and compete thanks to the dazzlingly large number of competitions on the site posted by companies looking for solutions to their data science problems, without spending too much.
This ecosystem unsurprisingly produces a lot of datasets — which is why you’re here. You want to download data from Kaggle with Python, and that is exactly what we will do.
If you prefer video, we cover the Kaggle API setup and use here…
Transformers are, without a doubt, one of the biggest advances in NLP in the past decade. They have (quite fittingly) transformed the landscape of language-based ML.
Despite this, there are no built-in implementations of transformer models in the core TensorFlow or PyTorch frameworks. To use them, you either need to apply for the relevant Ph.D. program, and we’ll see you in three years — or you
pip install transformers.
Although this is simplifying the process a little — in reality, it really is incredibly easy to get up and running with some of the most cutting-edge models out there (think BERT and GPT-2). …
The second alpha version of Python 3.10 was released at the beginning of November — and with it, we are able to see a glimpse of what’s next for Python.
Some exciting moves are being made that will likely change the future Python ecosystem towards more explicit, readable code — while maintaining the ease-of-use that we all know and love.
Data pipelines are the less glamorous but still fundamental building blocks in any scalable, production-quality ML solution. Indeed, the vast majority of ML is actually data wrangling — so it makes sense that a strong pipeline is a big factor in building strong solutions.
Focusing on TensorFlow 2, we have a wonderful thing called a
Dataset object built-in with the library. Using dataset objects, we can design efficient data pipelines with significantly less effort — the result is a cleaner, logical, and highly optimized pipeline.
We will be doing a deep-dive on the dataset object. …
GPT-3, BERT, XLNet, all of these are the current state of the art in natural language processing (NLP) — and all of them use a special architecture component called a Transformer.
Without a doubt, transformers are one of the biggest developments in AI in the past decade — bringing us ever closer to the unimaginable future of humanity.
Nonetheless, despite their popularity, transformers can seem confusing at first. Part of this seems to be because many ‘introductions’ to transformers miss the context that is just as important as understanding the transformer model architecture itself.
How can anyone be expected to grasp the concept of a transformer without first understanding attention and where attention came from? …
API development is a huge domain with a thriving ecosystem. Unsurprisingly, Python is a big player in this space.
That also means that there are many frameworks designed or supporting API development in Python. There are a few big names that seem to dominate though —like Flask, Django, and FastAPI.
Both Flask and Django are general web development frameworks, and they’re great at what they do — API development included.
FastAPI on the other hand is a smaller project, focused solely on API development. Now, FastAPI simply excels in this domain. …
K-means clustering is an unsupervised ML algorithm that we can use to split our dataset into logical groupings — called clusters. Because it is unsupervised, we don’t need to rely on having labeled data to train with.
Attention is all you need. That is the name of the 2017 paper that introduced attention as an independent learning model — the herald of our now transformer dominant world in natural language processing (NLP).
Transformers are the new cutting-edge in NLP, and they may seem somewhat abstract — but when we look at the past decade of developments in NLP they begin to make sense.
We will cover these developments, and look at how they have led to the Transformers being used today. …