Wednesday, October 26, 2011

Refactoring now possible for dynamic languages

There's an interesting post here from Bob Nystrom about getting used to the optional typing in Dart. But it did have one bit in particular that irked me.

"If it knows the type, then thanks to the previous point, it knows what you can do with it. Ta-da: auto-complete and refactoring are now possible for a dynamic language."

I don't want to pick on Bob, this seems to be one example of the widespread belief that you can't do refactoring in a dynamically typed language, despite the fact that much of the early work on it was done in Smalltalk. The term was actually coined with respect to Forth, as Brian Foote points out here, but it was popularized by the work of Bill Opdyke, John Brant and Don Roberts. The Brant and Roberts Refactoring Browser is currently the standard browser in VisualWorks and ObjectStudio and is the first example I know of automated refactoring support. Thanks to Don Roberts and Brian Foote who happen to be here at the Splash conference and so available to provide the historical information.

It's true that in a dynamic language you have a bit less information to use during refactoring. If we have a polymorphic message and we want to refactor it to, say, rename the method, but only some occurrences, then in a dynamic language we don't have a reliable way to know which senders refer to the ones we want to rename and which refer to the other implementations. So, if we wanted to rename MyClass>>printString we don't have a way to know reliably which senders of printString mean MyClass.

The problem is that we don't have a way to know that reliably in a statically typed language either. We will have more information that might be helpful in some circumstances. But suppose that we use a generic collection List. If we send any messages to the objects in that collection we don't know who the receiver is. So if we want to refactor, it's hard to make any assumptions about who that might be sent to.

Even with inheritance this sort of situation can arise. Suppose I want to rename the method printString in a subclass B whose superclass A also defines that method. If I find that message send to something whose static type is B, I can change the sender. But if if that message is sent to something whose static type is A, what do I do? The problem is that in the presence of multiple polymorphic implementations of the same method, renaming may not be a behaviour-preserving transformation.

I suspect that people using refactoring tools in statically typed languages don't notice these issues because in practice refactoring works fine for them in most normal circumstances. But the same thing is true for people using dynamically typed languages.

And in closing I'll add one comment from Don Roberts, that when he and John Brant looked at refactoring in Java they found that although the static types did give you some more information, the difficulty of satisfying the bookkeeping of the static type system ended up making it more difficult.


  1. Can you elaborate on what you mean in about sending messages to objects in a generic collection? I can't think of a statically typed language object oriented programming language where you would not "know" the receiver type.

    For example, in Java's generic collections you can only call MyClass methods that are statically known to be available in the element type. You can't call MyClass methods on the elements of a List, but you can call them on the elements of a List. Even before Java had generics you'd have to cast the (Object) elements in the collection to MyClass in order to call MyClass methods. Either way, you know the receiver type at the point of invocation. (This information is even in the bytecode, BTW. The JVM invokevirtual instruction's parameters include the class/interface the method comes from.)

    As for the inheritance example: if the method you're renaming in B overrides a method from its superclass then this isn't really a "rename" refactoring at all. You're adding a new method, and changing the behavior of the old method (to be the same as the super class). I'd expect a good refactoring tool to at least warn me about this fact.

    For me this is the fundamental difference between refactoring in a static language versus a dynamic one: in a static language there are certain refactorings that you can prove are safe. Ones that cannot be proven safe can raise warnings, but those are generally less common. (eg: I rename methods all the time, but I can't remember the last time I renamed an override) With a dynamic language you can't prove any of them to be correct, so every refactoring should really require a warning.

  2. What I was talking about was not a user of a generic collection sending a message to one of its elements, but rather about a generic class which sends messages to the objects it contains. List is probably not a particularly good example, because it probably sends very few messages to the objects it contains, and they'd be very generic.

    So, feeling lucky on google for a generics example I found
    public class Member {
    private T id;

    where we have a Member whose "id" field might be "String, Integer, etc."

    So if the Member class sends a message to its "id" variable we don't know what type it's going to be when the generic collection is instantiated.

    As far as "safe" vs. "unsafe" refactorings I think that's a slightly different statement of what I was saying. Refactorings are defined as being behavior-preserving transformations. Some of them always are (I can't think of a situation that would make renaming a local variable or method parameter non-behavior-preserving) but others may not be, depending on how they're applied.

    If a good refactoring tool would warn you about such circumstances, then that makes me wonder which refactoring tools out there do so currently?

  3. I think you have some misunderstandings about how generics actually work in Java. In your example, Java will not allow Member<T> to call any methods on “id” except for those declared in Object.

    The only way around it would be for Member<T> to cast “id” to a particular type, eg:

        int x = ((Integer) id).intValue();

    Here “id” is cast to Integer, and so the method intValue (which exists in Integer but not Object) can be called. Either way, the receiver type can be determined statically.

    Another way to call a method on a generic field that isn't from Object is to use a wildcard in the declaration of your type variable. For example:

        public class Baz<T extends Bar> {
            private T bar;
            public T getBar(){
                return bar;

    Now the “bar” field is a sub-type of Bar, but it's generic so if someone creates a Baz<FooBar>, and later they call getBar(), they'll know that they've got a FooBar and not some other kind of Bar. Within Baz, however, only Bar methods can be called, not FooBar or WineBar or SandBar methods.

    Eclipse generates a warning and requests confirmation if you attempt to perform a rename refactoring on an override. If you rename the base declaration instead then there is no warning and the rename affects all of the overrides as well.

  4. Indeed it does appear that I don't have a sufficient understanding of Java generics. The situation I described definitely does arise with C++ templates, which do not have the restriction of requiring a cast in order to send a non-Object message.

    In Java I'm getting badly lost in a maze of twisty little wildcards and generic methods, so I'll take your word for it that there isn't a way to do the equivalent of the C++ case.

    In general, it still seems to boil down to some refactorings being provably safe and some not being, and the set of which is which depends on details. A particular static type system will make some additional ones provably safe, but not all.

  5. And I should say thank you to Michael Lucas-Smith for actually writing the C++ test case and trying to write a Java one.