Spend less time racking your brains about how to structure your deep learning projects, workflows and code, and more time focusing on the science.
Using and reproducing deep learning models in practice is about much more than writing individual PyTorch layers; it also means specifying how individuals PyTorch layers intercommunicate, how inputs are prepared as tensors, which pieces of auxiliary data are necessary to prepare those inputs, and how tensor outputs are post-processed and converted to make them useful for the target application. PADL's "Pipeline" specifies all of these additional things.
word_predict = (
clean
>> tokenize
>> to_tensor
>> batch
>> (dropout >> transformer)
+ right_shift
>> cross_entropy_loss
)
PADL allows developers to create composable and reusable blocks with a simple decorator: "transform". Decorated code is ported to a "Transform" instance, a powerful abstraction encompassing a wide range of computations required for preprocessing, forward pass and postprocessing step; "Transform" allows for auxiliary data such as PyTorch layer weights, lookup tables and much more.
from padl import transform
@transform
def add(x, y): # this is a Transform
return x + y
transform(lambda x: x + 1000) # this is a Transform
@transform
class MLP(torch.nn.Module):
def __init__(self, n_in, hidden, n_out):
...
def forward(self, x):
...
mml = MLP(10, 10, 10) # this is a Transform
"Transform" instances may be conveniently linked to become "Pipelines" using a small set of functional operators >> , + , / and ~. This allows developers to manage complex branching and interdependent models with ease. Pre-processing, forward pass and post-processing are concisely delineated using batch and unbatch
my_classifier_transform = (
load_image # preprocessing ...
>> transforms.ToTensor() #
>> batch # ... stage
>> models.resnet18() # forward pass
>> unbatch # postprocessing ...
>> classify # ... stage
)
Data scientists love quick and dirty notebooks, production engineers love clean and orderly code. With PADL it's possible to have both with PADL's inbuilt serializer; it tracks down the minimal code and data necessary to instantiate the full "Pipeline", and compiles these to a compact module and set of data artifacts, ready for shipping to your production environment.
class MyModel(torch.nn.Module):
...
class MyDataSet(torch.utils.data.DataSet):
def __init__(self, raw_data):
self.data = raw_data
def __getitem__(self, item):
x, target = self.data[item]
x = preprocessing_step_1(item)
x = preprocessing_step_2(item)
return x, target
data_set = MyDataSet(raw_data_lines)
data_loader = torch.utils.data.DataLoader(
data_set,
batch_size=10
)
model = MyModel()
for batch, target in data_loader:
output = model(batch)
loss = loss_function(output, target)
...
@transform
class MyModel(torch.nn.Module):
...
model_transform = (
preprocessing_step_1
>> preprocessing_step_2
>> batch
>> MyModel()
)
train_transform = (
model_transform / identity
>> loss_function
)
for loss in train_transform.train_apply(raw_data_lines,
batch_size=100):
...
class PreProcessor:
... # lots of code with lots of sub routines etc.
class PostProcessor:
... # lots of code with lots of sub routines etc.
class Model(torch.nn.Module):
... # layer as usual
torch.save(model.to_dict(), 'mydirectory/layer.pt')
with open('mydirectory/preprocess.pkl', 'w') as f:
dill.dump(preprocessor, f) # lots of errors / uncertainty
with open('mydirectory/postprocess.pkl', 'w') as f:
dill.dump(postprocessor, f) # lots of errors / uncertainty
from padl import save
save(mytransform, 'mydirectory.padl')
class Model(torch.nn.Module):
def __init__(self, la, lb, lc, ld):
super().__init__()
self.la = la
self.lb = lb
self.lc = lc
self.ld = ld
def forward(self, x, y):
lhs = self.la(x)
rhs_1 = self.lb(y)
rhs_2 = self.lc(y)
return self.ld(lhs, (rhs_1, rhs_2))
model = Model(layer_a, layer_b, layer_c, layer_d)
model = (
layer_a / (layer_b + layer_c)
>> layer_d
)
Input your search keywords and press Enter.