Pipeline Abstractions for Deep Learning

A unifying open-source development framework for the entire PyTorch workflow

Pipeline Abstractions for Deep Learning

A unifying open-source development framework for the entire PyTorch workflow

PADL makes working with deep learning models intuitive, simple and fun

Spend less time racking your brains about how to structure your deep learning projects, workflows and code, and more time focusing on the science.

An AI development standard
Simpler deployment of models to production
Build all computations and data into your pipeline
Save and link all objects needed by a model
Cross compatibility with the entire PyTorch and scientific python ecosystem
Greater reproducibility and lineage of experiments and models
... much more

A powerful extension of traditional models

PADL works together with PyTorch models for an optimal developer experience

Using and reproducing deep learning models in practice is about much more than writing individual PyTorch layers; it also means specifying how individuals PyTorch layers intercommunicate, how inputs are prepared as tensors, which pieces of auxiliary data are necessary to prepare those inputs, and how tensor outputs are post-processed and converted to make them useful for the target application. PADL's "Pipeline" specifies all of these additional things.

word_predict = (
       clean
    >> tokenize
    >> to_tensor
    >> batch
    >> (dropout >> transformer) 
       + right_shift
    >> cross_entropy_loss
)




Functionality from experimentation to production

Convert python code to PADL with one keyword

PADL allows developers to create composable and reusable blocks with a simple decorator: "transform". Decorated code is ported to a "Transform" instance, a powerful abstraction encompassing a wide range of computations required for preprocessing, forward pass and postprocessing step; "Transform" allows for auxiliary data such as PyTorch layer weights, lookup tables and much more.

from padl import transform

@transform
def add(x, y):  # this is a Transform
    return x + y

transform(lambda x: x + 1000)  # this is a Transform

@transform
class MLP(torch.nn.Module):
    def __init__(self, n_in, hidden, n_out):
        ...    
    def forward(self, x):
        ...

mml = MLP(10, 10, 10) #  this is a Transform

Link distinct steps of pre-processing, forward pass and post-processing

"Transform" instances may be conveniently linked to become "Pipelines" using a small set of functional operators >> , + , / and ~. This allows developers to manage complex branching and interdependent models with ease. Pre-processing, forward pass and post-processing are concisely delineated using batch and unbatch

my_classifier_transform = (
    load_image                 # preprocessing ...
    >> transforms.ToTensor()   # 
    >> batch                   # ... stage
    >> models.resnet18()       # forward pass
    >> unbatch                 # postprocessing ...
    >> classify                # ... stage
)

Iterate interactively in notebooks, but serve statically

Data scientists love quick and dirty notebooks, production engineers love clean and orderly code. With PADL it's possible to have both with PADL's inbuilt serializer; it tracks down the minimal code and data necessary to instantiate the full "Pipeline", and compiles these to a compact module and set of data artifacts, ready for shipping to your production environment.

From PyTorch to PADL

Iterating over data

PyTorch

class MyModel(torch.nn.Module):
    ...

class MyDataSet(torch.utils.data.DataSet):
    def __init__(self, raw_data):
        self.data = raw_data

    def __getitem__(self, item):
        x, target = self.data[item]
        x = preprocessing_step_1(item)
        x = preprocessing_step_2(item)
        return x, target

data_set = MyDataSet(raw_data_lines)
data_loader = torch.utils.data.DataLoader(
    data_set, 
    batch_size=10
)
model = MyModel()

for batch, target in data_loader:
    output = model(batch)
    loss = loss_function(output, target)
    ...

PADL

@transform
class MyModel(torch.nn.Module):
    ...

model_transform = (
    preprocessing_step_1
    >> preprocessing_step_2
    >> batch
    >> MyModel()
)

train_transform = (
    model_transform / identity
    >> loss_function
)

for loss in train_transform.train_apply(raw_data_lines,
                                        batch_size=100):
    ...

Saving models

PyTorch

class PreProcessor:
    ... # lots of code with lots of sub routines etc.

class PostProcessor:
    ... # lots of code with lots of sub routines etc.

class Model(torch.nn.Module):
    ... # layer as usual

torch.save(model.to_dict(), 'mydirectory/layer.pt')

with open('mydirectory/preprocess.pkl', 'w') as f:
    dill.dump(preprocessor, f)  # lots of errors / uncertainty

with open('mydirectory/postprocess.pkl', 'w') as f:
    dill.dump(postprocessor, f)  # lots of errors / uncertainty

PADL

from padl import save

save(mytransform, 'mydirectory.padl')

Complex branching

PyTorch

class Model(torch.nn.Module):
    def __init__(self, la, lb, lc, ld):
        super().__init__()
        self.la = la
        self.lb = lb
        self.lc = lc
        self.ld = ld

def forward(self, x, y):
        lhs = self.la(x)
        rhs_1 = self.lb(y)
        rhs_2 = self.lc(y)
        return self.ld(lhs, (rhs_1, rhs_2))
				
model = Model(layer_a, layer_b, layer_c, layer_d)

PADL

model = (
    layer_a / (layer_b + layer_c)
    >> layer_d
)
Padl 2 cropped

Get started

Streamline your entire deep learning workflow, from experimentation to deployment.