Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation notes #45

Open
nikic opened this issue Jan 23, 2020 · 1 comment
Open

Implementation notes #45

nikic opened this issue Jan 23, 2020 · 1 comment

Comments

@nikic
Copy link
Collaborator

nikic commented Jan 23, 2020

I've done some initial protoyping for (reified) generics in nikic/php-src#3. I want to discuss some implementation aspects here. Some additional reading on relevant issues:

The prototype focussed on making something minimal work, which is "purely abstract" generics involving simple inheritance:

abstract class WithParam<T> {
    public function method(T $param) {
        var_dump($param);
    }
}
class WithInt extends WithParam<int> {}

function acceptsWithParamInt(WithParam<int> $param) {
    $param->method(42);
}

acceptsWithParamInt(new WithInt);

Note that only the abstract class has unbound type parameters. The concrete class doesn't, so there are no generic parameters on the object instantiation.

Generic arguments are stored using the same structure as union types:

typedef struct {
	uint32_t num_types;
	zend_type types[1];
} zend_type_list;

A big part of implementing generics is replacing existing zend_class_entry* pointers with a structure that holds both zend_class_entry* and the list of generic type arguments:

typedef struct _zend_class_reference {
	zend_class_entry *ce;
	zend_type_list args;
} zend_class_reference;

As most class references are expected to be to classes without generic arguments, we apply the following optimization for that case: Each class entry is allocated with an additional header, that effectively looks like this:

struct real_class_entry {
    zend_class_entry *ce; /* = &self.normal_class_entry */
    uitn32_t num_types; /* = 0 */
    zend_class_entry normal_class_entry;
};

This allows cheaply converting any class entry into a class reference without generic arguments. Additionally, accesses to the class entry through such a reference should be rather cheap, because the header will be in the same cache line as start of the class entry.

Unfortunately, there are quite a few cases where we don't work on class entries, but on string names of classes. Important examples are unresolved type declarations, as well as pre-inheritance parent/interface references and class references inside opcodes. The structure for this type of reference is principally the same, just replacing a class entry with a name:

typedef struct _zend_name_reference {
	zend_string *name;
	zend_type_list args;
} zend_name_reference;

However, in this case we do not have such a convenient way to represent name references without arguments. After all, we do not want to add an additional header to each interned string.

Instead, we define an additional "packed name reference" (abbreviated PNR in macros and function names):

typedef uintptr_t zend_packed_name_reference;

This is an align-tagged union between a simple zend_string* (no arguments) or zend_name_reference (arguments).

A quick grep tells me that php-src has 1500 references to zend_class_entry, and probably quite a few more uses than that through members, and a large part of these need migration to zend_class_reference. I only migrated parent references and type declarations in my prototype.

Apart from storing generic type arguments on class references, we also need to deal with references to generic type parameters, such as in public function method(T $param). Each generic parameter is assigned an parameter ID, which is stored as a new zend_type kind.

We now need to consider how we can access the actual bound type argument efficiently. Consider the following example:

abstract class A<T1> {
    public function method_a(T1 $param) {}
}
abstract class B<T2> extends A<int> {
    public function method_b(T2 $param) {}
}
abstract class C<T3> extends B<string> {
    public function method_c(T3 $param) {}
}
class D extends C<array> {}

Pre-inheritance, the generic parameter ID of T1, T2 and T3 is 0, as they are each the first generic parameter of their respective class. During inheritance, we shift the generic parameter ID by the number of bound generic parameters on the parent. This means that T1 has ID 0, T2 has ID 1 and T3 has ID 2. Class D would store the following array of bound generic arguments:

bound_generic_args[0]: int
bound_generic_args[1]: string
bound_generic_args[2]: array

This includes the necessary elements for parent and grandparent classes as well.

While this scheme is sufficient for the purpose of simple abstract inheritance, as implemented in the protoype, we should also consider how we can extend it to other situations.

First, for interfaces we do not have to worry about resolution of generic type parameters at runtime, as interfaces are only checked during inheritance.

For traits, generic parameter IDs cannot be shifted in this manner, as there may be multiple traits. Instead, we would probably actually perform some form of monomorphization for this particular case, where we copy the trait methods and replace type parameters.

One larger problem is what to do with generic parameter references in opcodes, for example for $x instanceof T. As opcodes are immutable during inheritance, we cannot shift the parameter in this case. I don't have an idea on how to make this really efficient. I believe this will involve looking up the correct offset to use at runtime, for the particular declaring class and the particular calling class.

Finally, this only considers generic arguments bound in the class declaration. What about free generic arguments bound on instantiation? These would be stored in the class reference on the object. A generic parameter lookup would have to picked one or the other using something like:

uint32_t num_bound_generic_args = ce_ref->ce->num_bound_generic_args;
if (param_id < num_bound_generic_args) {
    type = &ce_ref->ce->bound_generic_args[param_id];
} else {
    type = &ce_ref->args.types[param_id - num_bound_generic_args];
}

The alternative is to copy the class arguments in the object class reference as well. This might be a non-trivial increase in memory usage though, if deep hierarchies are involved.

@nikic
Copy link
Collaborator Author

nikic commented Jan 27, 2020

Another thing to consider is how we create and memoize the zend_class_references. In my prototype they're only used for the list of parent references, of which there are few, so I just allocate them. If they're used for each object (those class has free generic parameters), these will have to be memoized.

For that purpose, I would suggest to use a hashtable mapping from type string to class reference on a best-effort basis. The class references will not be unique, for example Collection<Foo> and Collection<Bar> might refer to an equal class reference if Foo and Bar are aliases.

The zend_name_reference could get an extra field that stores the full type string and will be used to look up an existing class reference, before running full type resolution. This class reference could also be cached in an opline runtime cache easily, because the opline doesn't own it.

A caveat here are generic parameters used inside the type arguments, such as Collection<T>. I think the way to go about that would be to use an encoded string like `"Collection<\0{generic-param-id}>" and look up the class reference that still contains the parameter ID (as that can be cached) and then create the actual class reference from that ... somehow. Possibly such a class reference would hold a hashtable for specific type substitutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant