Python Type Annotations
January 01, 0001I have to admit, 40% of the reason I use Haskell lies in static type checking. Another 40% lies in strong, ironclad immutability. Only about 20% of my love of Haskell has anything to do with language expressiveness. Make no mistake, the language has expressiveness out the wazoo, but I truly love static typing and immutability.
One day recently turned into a disaster. One problem lead to another in a task that should have been trivial and instead involved four hours of beating my head against the desk. Part of the problem was that my system under test had only acceptance tests, executed only with all of standard out and standard error shunted away to /dev/null. Either way, after I got my changes made, I decided to step out of the office to think.
Python has expressiveness. It has neither static type checking or immutable values, and the language developers get really holy about this. I have no interest in arguing with them, as I believe they have decided to abandon all safety and flood the world with code that might blow up in production because maybe in some corner case that a STATIC TYPE CHECKER could have detected at compile time, they inadverdently passed an Integer to a function that expected a String. So, I will not change Python the language, but I do want to make things nicer. Even though my solution will not get checked at compile time, it can certainly make debugging easier when a function crashes immediately due to a blown data type assertion, rather than propogating that incorrect data type potentially quite some distance.
I have put the code in a Repository. You can get this particular file at type_checking.py
hg clone https://gitlab.com/savannidgerinel/python-tools
The gruesome way
You could do it like this. I have done this.
def authenticate (username, password):
assert isinstance(username, types.StringType)
assert isinstance(password, types.StringType)
... do a bunch of authentication stuff and talking to databases and many things that belong in the IO monad ...
assert isinstance(ret, UserClass)
return ret
Ew. It will work in a pinch… but ew. This gets especially bad if I have several places from which I can return from the function. Yes, it improves the readability of the input parameters, but it does little for the return parameter beyond making postconditions.
Slightly better
Assertions are things that “should never happen in code”. So, technically, an AssertionError is actually not a good thing to throw in the case of a type error. Python actually provides TypeError to indicate that a data type error has occurred. That is convenient. So, instead of calling assert, let’s create a function that will do the job and raise a better exception. And, let’s build in making the parameter optional.
def typecheck (param, type_, optional=False):
if not optional and param is None:
raise TypeError('None is not permitted for this value')
if param is not None and not isinstance(param, type_):
raise TypeError('Expected type %r, got %r' % (type_, type(param)))
With this, your above code would look like this:
def authenticate (username, password):
typecheck(username, types.StringType)
typecheck(password, types.StringType)
... do a bunch of authentication stuff and talking to databases and many things that belong in the IO monad ...
typecheck(ret, UserClass)
return ret
This doesn’t improve the code much, but it does make for more descriptive error messages. I’m rather liking this improvement. But I can do better.
Decorative rescue
I once read this round of pejoratives about static type users, and I wondered for a while what that meant. I looked things up, found a few references to using decorators to “decorate” type checks on functions, but I did not like the solutions. Maybe they were good solutions, but I wanted to solve it myself. Also, the typecheck module for Python appears to be almost seven years dead.
So I introduce a some code that I wrote in an hour yesterday.
First, I played a bit with the syntax, and then I put the syntax into a unit test. You do test your code, don’t you?
class TestTypeChecker(unittest.TestCase):
@unittest.skip('disabled')
def testNoParamsReturnString(self):
@accepts()
@returns(types.StringType)
def f():
return 'abcd'
f()
self.assertRaises(AssertionError, lambda: f('a'))
@unittest.skip('disabled')
def testParams(self):
@accepts(types.StringType, types.IntType)
@returns(types.NoneType)
def f(var1, var2):
return None
f('abcd', 15)
self.assertRaises(AssertionError, lambda: f('abcd', 'efgh'))
self.assertRaises(AssertionError, lambda: f(15, 'efgh'))
self.assertRaises(AssertionError, lambda: f())
In here you can see the syntax. Before each declaration of f()
, I put an @accepts
block and a returns
block. The desired data types get passed into @accepts
and @returns
as though these two calls are function calls. As it happens, they are.
Additionally, I wanted to flag a parameter as optional. Not optional in that it can be omitted, but optional in that I could pass None instead of the declared type.
def testMaybeParams(self):
@accepts(types.StringType, Maybe(types.IntType))
@returns(types.NoneType)
def f(var1, var2):
return None
self.assertRaises(AssertionError, lambda: f('abcd', 'efgh'))
f('abcd', None)
f('abcd', 15)
def testOptions(self):
@accepts(types.StringType, Options(types.NoneType, types.StringType, types.IntType))
@returns(Options(types.NoneType, types.IntType))
def f(var1, var2):
if var1 == 'None':
return None
else:
return 15
f('abcd', 15)
f('abcd', '15')
f('abcd', None)
self.assertRaises(TypeError, lambda: f('abcd', 5.5))
self.assertRaises(TypeError, lambda: f(None, 'abcd'))
Note Maybe
. Declarations will come soon, but I created Maybe
as a class that accepts a data type as a single parameter. If either of the decorators see that the parameter type is Maybe
, then it will allow None
or the type passed in to Maybe
in the corresponding parameter. And then, some time later, I created Options
as a way to specify that a parameter can be any number of data types, including None.
So, finally, it is time to present the code itself. First, my two support classes. They are delightfully short.
class Maybe(object):
def __init__(self, var_type):
self.var_type = var_type
def __repr__(self):
return 'Maybe(%s)' % self.var_type
def check(self, param):
if param is None:
return True
if isinstance(param, self.var_type):
return True
return False
class Options(object):
def __init__(self, *args):
self.var_options = args
def __repr__(self):
return 'Options(%s)' % ','.join(map(repr, self.var_options))
def check(self, param):
for type_ in self.var_options:
if isinstance(param, type_):
return True
return False
Both of these exist to give expressiveness to the type system, as above. In both cases, it became simplest to create a check
operation that would actually run the check against a parameter and return whether the parameter passes.
The actual guts of the type checking happens in a series of standalone functions.
def format_param_mismatch(idx, arg_type, expected_type):
return 'Incorrect type in parameter %d: got %s, expected %s' % (idx, arg_type, expected_type)
First, I have a function, format_param_mismatch to provide a good error message in the case of a parameter type mismatch. Note that the requirements are the index of the parameter, the argument type, and the expected argument type. I included the index because I found it necessary to say “Hey, a parameter doesn’t match and it is this parameter!”
def check_param(param, expected):
if getattr(expected, 'check', None):
return expected.check(param)
return isinstance(param, expected)
This function is pretty simple. It only returns True or False. If the “expected” type has a check
method, i.e., it is Maybe
, Options
, or some other supporting class that I have not created yet, get the result by calling the check method. Otherwise, just run isinstance.
def accepts(*var_types):
def checked_function(f):
def checker(*args, **kwargs):
# if len(var_types) != len(args):
mismatches = [
(idx, type(arg), var_type)
for (idx, var_type, arg) in zip(itertools.count(), var_types, args)
if not check_param(arg, var_type)]
if len(mismatches) != 0:
raise TypeError('\n'.join(map(lambda x: format_param_mismatch(*x), mismatches)))
return f(*args, **kwargs)
return checker
return checked_function
Decorators are complicated to code.
First, the decorator itself takes parameters. That is *var_types
, and that is what allows the syntax above. That returns checked_function as a parameter.
Second, checked_function
then gets applied to your original function, and the magic plumbing of the decorator replaces your binding with this new function that wraps your original function.
Third, the decorator needs to return a function, and that function will take your original function as a parameter. So, at compile time the original function and the types will all get linked together and your function binding will be replaced with the function that runs this check.
def returns(return_type):
def checked_function(f):
def checker(*args, **kwargs):
val = f(*args, **kwargs)
# assert isinstance(val, return_type), 'Incorrect return type: returned %s, expected %s' % (type(val), return_type)
if not check_param(val, return_type):
raise TypeError('Incorrect return type: return %s, expected %r' % (type(val), return_type))
return val
return checker
return checked_function
returns
works in exactly the same way as accepts
, but applies the data type to the return value. With this, no matter how many return statements you have in your code, the actual returned value gets checked. Admittedly, it is getting checked after whatever side effects your function had, so if you return invalid data from your database update, your database has already been updated and potentially corrupted.
Limits
You have some limits still.
First, a part that I think is critical, is that you still will not know that you have a data type error until you actually exercise a code path that exhibits the data type error. On the other hand, at least you find out very quickly when you do so that your error is not a different kind of logic error.
Also, not quite obviously, I do not have a way for you to check arbitrary argument lists. Any part of *args that does not get captured by a named parameter will not be checked, and none of **kwargs will be checked. The decorator syntax is simply too limited to be able to describe such a check without the entire declaration becoming very cumbersome.
Generally, I would suggest avoiding arbitrary keyword arguments. It is not always a problem, but it does tend to lead to necessary but undocumented parameters. If you must use them, use them for cases where the arbitrary keyword arguments are just used to name optional arbitrary data fields, but that all actually necessary parameters are given an explicit name in your function declaration.
Overall, however, using these decorators liberally will help significantly with the task of tracking down problems that are ultimately data type errors. Additionally, the presence of the decorator helps document the API for the next person to come along, making explicit things that otherwise a programmer would have to dig into the code to find out.
If you are like me, then data type errors are the most common error you make, these decorators are going to be a big help.
Python Type Annotations by Savanni D'Gerinel is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. You can link to it, copy it, redistribute it, and modify it, but don't sell it or the modifications and don't take my name from it.