Another one! This is nearly the same as the BERT fine-tuning post but uses the updated huggingface library. (There are also a few differences in preprocessing XLNet requires.)

Skip to content
## XLNet Fine-Tuning Tutorial with PyTorch

## BERT Fine-Tuning Tutorial with PyTorch

## BERT Word Embeddings Tutorial

## Broyden’s Method in Python

## Root-Finding Algorithms Tutorial in Python: Line Search, Bisection, Secant, Newton-Raphson, Inverse Quadratic Interpolation, Brent’s Method

## Statistical Learning Theory: VC Dimension, Structural Risk Minimization

## DropConnect Implementation in Python and TensorFlow

## Style Transfer with Tensorflow

## Flexible Python: Product of a List

## The Box-Cox Transformation

Another one! This is nearly the same as the BERT fine-tuning post but uses the updated huggingface library. (There are also a few differences in preprocessing XLNet requires.)

Here’s another post I co-authored with Chris McCormick on how to quickly and easily create a SOTA text classifier by fine-tuning BERT in PyTorch. It’s incredibly useful to take a look at this transfer learning approach if you’re interested in creating a high performance NLP model.

Please check out the post I co-authored with Chris McCormick on BERT Word Embeddings here. In it, we take an in-depth look at the word embeddings produced by BERT, show you how to create your own in a Google Colab notebook, and tips on how to implement and use these embeddings in your production pipeline. Check it out!

In a previous post we looked at root-finding methods for single variable equations. In this post we’ll look at the expansion of Quasi-Newton methods to the multivariable case and look at one of the more widely-used algorithms today: Broyden’s Method.

**Motivation**

How do you find the roots of a continuous polynomial function? Well, if we want to find the roots of something like:

Sometimes our models overfit, sometimes they overfit.

A model’s **capacity** is, informally, its ability to fit a wide variety of functions. As a simple example, a linear regression model with a single parameter has a much lower capacity than a linear regression model with multiple polynomial parameters. Different datasets demand models of different capacity, and each time we apply a model to a dataset we run the risk of overfitting or underfitting our data.

Continue reading “Statistical Learning Theory: VC Dimension, Structural Risk Minimization”

I wouldn’t expect DropConnect to appear in TensorFlow, Keras, or Theano since, as far as I know, it’s used pretty rarely and doesn’t seem as well-studied or demonstrably more useful than its cousin, Dropout. However, there don’t seem to be any implementations out there, so I’ll provide a few ways of doing so. Continue reading “DropConnect Implementation in Python and TensorFlow”

“A Neural Algorithm of Artistic Style” is an accessible and intriguing paper about the distinction and separability of image content and image style using convolutional neural networks (CNNs). In this post we’ll explain the paper and then run a few of our own experiments.

To begin, consider van Gogh’s “The Starry Night”: Continue reading “Style Transfer with Tensorflow”

How many different ways can we multiply the elements of a variable-length list in Python? Continue reading “Flexible Python: Product of a List”

The Box-Cox transformation is a family of power transform functions that are used to stabilize variance and make a dataset look more like a normal distribution. Lots of useful tools require normal-like data in order to be effective, so by using the Box-Cox transformation on your wonky-looking dataset you can then utilize some of these tools.

Here’s the transformation in its basic form. For value and parameter :