Use the left/right arrow keys to navigate.

Doctrine 2 Internals

Benjamin Eberlei

FrOSCon 2010

About me

  • Benjamin Eberlei
  • direkt effekt GmBH (digital marketing)
  • Open Source contributor
    • Doctrine2
    • Zend Framework
    • Zeta, PHPUnit, Symfony2 and pet projects
  • Twitter @beberlei
  • Blog http://www.whitewashing.de

Object-Relational-Mapping

Object-Relational-Mapping

  • Translation between Relational- and Object-World
  • OR Impedance Mismatch
  • Manual mapping is nearly impossible
  • General solution requires Metadata Mapping

ORM with PHP

  • Object Instantiation
  • Primitive PHP <-> Database Type Conversions
  • Object references <-> single associations (1:1 m:1)
  • Arrays (?) <-> collection associations (1:n m:n)
  • Access object state through:
    • Interface (ezcPersistentObject)
    • Methods (Propel, Doctrine 1)
    • Reflection (Doctrine 2)

What is Doctrine 2?

  • Object-Relational-Mapper (ORM)
  • Data-Mapper not ActiveRecord
  • PHP 5.3 only
  • Currently Beta 3, Beta 4 on 1st September
  • Release September/October 2010

ORM with Doctrine 2

<?php
class Order
{
    private $id;
    private $created;
    private $customer;
    private $items;

    public function __construct(Customer $cust)
    {
        $this->customer = $cust;
        $this->created = new DateTime("now");
        $this->items = new ArrayCollection();
    }
}

ORM with Doctrine 2

<?php
/** @Entity @Table(name="order") */
class Order
{
    /** @Id @Column(type="integer")
     *  @GeneratedValue */
    private $id;
    /** @Column(type="datetime") */
    private $created;
    /** @ManyToOne(targetEntity="Customer", ..) */
    private $customer;
    /** @OneToMany(targetEntity="OrderItem", ..) */
    private $items;
}

Runtime Mapping

  • Mapping Drivers: Annotations, XML, Yaml, PHP
  • Populate ClassMetadata instance for each entity
  • ClassMetadata is cached (APC, Memcache, ...)
<?php
class ClassMetadataInfo
{
    public $fieldMappings;
    public $associationMappings;
    public $reflClass;
    public $reflFields;
    public $identifier;
    public $idGenerator;
    // .. 10-20 other variables
}

ORM with Doctrine 2

<?php
$product = $entityManager->find("Product", $pid);

$customer = new Customer("kontakt@beberlei.de");

$order = new Order($customer);
$item = new OrderItem($product, $amount);
$order->addItem($item);

$entityManager->persist($customer);
$entityManager->persist($order);
$entityManager->flush();

Mapping using Reflection

<?php
// once per request
$reflClass = new ReflectionClass("Order");
$reflField = $reflClass->getProperty("id");
$reflField->setAccessible(true);

// once per object hydration+field
$reflField->setValue($order, $row['id']);

// once per object persistance+field
$id = $reflField->getValue($order);

Not calling the constructor

<?php
public function newInstance()
{
    if ($this->_prototype === null) {
        $this->_prototype = unserialize(
            sprintf(
                'O:%d:"%s":0:{}',
                strlen($this->name),
                $this->name
            )
        );
    }
    return clone $this->_prototype;
}

Lazy-Loading

  • Doctrine allows traversal of complete object/entity graph
  • On demand lazy-loading of unloaded objects
    • Proxies are generated PHP code
    • ArrayCollection wrapped by PersistentCollection
  • Naive usage of Doctrine will cause N+1 problem
  • Usage of public properties breaks lazy-loading
  • Solution: Use DQL to fetch-join assocations

Proxies and Lazy-Loading

<?php
$order = $em->find("Order", $myOrderId);

// `Order::$customer` is a proxy
$me = $order->getCustomer(); 

// lazy load fired
echo "Customer: " . $me->getEmail();

Proxies and Lazy-Loading

<?php
class CustomerProxy extends Customer implements Proxy
{
    private function _load()
    {
        // lazy loading code
    }

    public function getEmail()
    {
        $this->_load();
        return parent::getEmail();
    }
    // .. other public methods of Customer
}

Collections and Lazy-Loading

<?php
$order = $em->find("Order", $myOrderId);
// `Order::$items` is PersistentCollection

$items = $order->getItems();
$sum = 0;

// loop fires lazy load of all related Items
foreach ($items AS $item) { 
    // product lazy load
    $price = $item->getProduct()->getPrice(); 

    $sum += $item->getAmount() * $price;
}

Further Mapping Work

1. Composite Keys

<?php
class Reference
{
    /** @Id @ManyToOne(targetEntity="Article") */
    private $source;

    /** @Id @ManyToOne(targetEntity="Article") */
    private $destination;

    /** @Column(type="string") */
    private $title;
    /** @Column(type="datetime") */
    private $created;
}

2. Mapping Arrays

<?php
class Product
{
    /**
     * @ElementCollection(table="product_details")
     */
    private $details = array();
}
CREATE TABLE product_details (product_id INT,
    col_key VARCHAR(255), col_value VARCHAR(255),
    PRIMARY KEY (product_id, key));

3. Mapping Value Objects

<?php
class User
{
    private $id;
    private $email;
    /** @Embedded(class="Address") */
    private $address;
}

class Address
{
    private $street;
    private $city;
    private $country;
}

UnitOfWork Pattern

A birds eye view

  • Keeps track of changes to all Objects
  • Is a Transaction for Objects
  • Aggregates Writes ("Transactional Write Behind")
  • Topological Sorting of Foreign Key Dependencies
  • Contains IdentityMap

UnitOfWork Lifecycle

Fetching Objects

  • Register Object as Managed
  • Identity Map: $id => $instance
  • Save Original Data relying on PHPs Copy-on-Write
    • No entity changes => no memory overhead
  • Load Associations
    • Re-use existing entities from identity map
    • Otherwise create lazy loading proxies
    • Wrap Collections inside lazy PersistentCollection

Entity Lifecycle

Persist New Entity

  • Call EntityManager::persist($entity)
  • Register Object as managed
  • Generate ID if possible (Sequence only)
  • No INSERT performed by this operation!

Remove Entity

  • Call EntityManager::remove($entity)
  • Schedule object for deletion
  • No DELETE performed by this operation!

Entity Lifecycle

Flush

  • Call EntityManager::flush()
  • Compute changesets of all managed entities
  • Calculate CommitOrder (Foreign Keys)
  • Start Transaction
  • Execute INSERT for all new entities
  • Execute UPDATE for all changed entities
  • Execute DELETE/UPDATE for all changed collections
  • Execute DELETE for all removed entities
  • Commit Transaction

UnitOfWork Changesets

  • Three different strategies (slow to fast)
    • Iterate all managed entities and compare fields (Default)
    • Only compare explicitly marked entities
    • Entities explicitly notify UnitOfWork through a listener
  • Configurable on per Entity Basis

Doctrine Query Language

  • DQL is not SQL!
  • Object Query Language
  • Utilizes Runtime Mapping Information
  • Allows powerful queries porting available SQL features:
    • GROUP BY and HAVING
    • Subselects
    • ORDER BY
    • Functions and Aggregates
    • UPDATE and DELETE
  • Allows Object, Array or Scalar Results
  • A real parser, manually build from EBNF

Doctrine Query Language

DQL Parser Details

  • Top-Down
  • Recursive-Descent
  • Left to Right
  • Arbitrary Lookahead - LL(*)
  • No backtracking, no memoization
  • Hooks for AST transformation

DQL EBNF to Parser/AST

Each EBNF Grammer Rule translates to Method

SelectStatement ::= SelectClause FromClause
                    [WhereClause] [GroupByClause]
                    [HavingClause] [OrderByClause]

DQL EBNF to Parser/AST

<?php
class Parser
{
    public function SelectStatement()
    { 
        $selectStatement = new AST\SelectStatement(
            $this->SelectClause(), $this->FromClause()
        );
        $selectStatement->whereClause =
            $this->_lexer->isNextToken(Lexer::T_WHERE) ?
            $this->WhereClause() : null;

        //...
    }
}

DQL Examples 1

Fetch all User objects

SELECT u FROM User u

Restrict Users

SELECT u
FROM User u
WHERE u.age > 20 AND u.country = ?1

Update Users

UPDATE User u
SET u.status = 'inactive'
WHERE u.lastLogin < ?1

DQL Examples 2

Join Address

SELECT u FROM User u JOIN u.address a

Fetch-Join Address

SELECT u, a FROM User u JOIN u.address a

Count Groups per User-Email

SELECT u.email, COUNT(g.id)
FROM User u
JOIN u.groups g
GROUP BY u.email

Hydration

  • Process of building result from a query
  • Different result types supported:
    • Objects
    • Deep Nested Arrays
    • Rows
    • Single Scalars
    • Mixed Objects and Scalars
    • Custom
  • Result-Cursor iteration possible

Profiling

  • ORM performance generally sucks
  • We care for performance!
  • Using Xdebug, KCachegrind and Xhprof
  • Our bottlenecks:
    • DQL to SQL Generation
    • UnitOfWork
    • Hydration
    • Metadata Access
    • Metadata (Un-)Serialization

DQL Parsing Performance

  • DQL String transformed into AST
  • AST transformed into SQL
  • Slow for complex queries
  • Sentences in the DQL are very short
  • Can be fully cached!

UnitOfWork

  • Changeset Calculations are slow
  • Optimize Datastructures for use with PHP internal methods
  • Default Complexity: Managed Entities x Number of Field Mappings
  • Optimize with Change Tracking Policies

Hydration

  • Complexity at least: Number of Rows x Number of Columns
  • For Objects: 2 x Number of Rows x Number of Columns
  • For each iteration and column:
    • Convert Database to PHP Value
    • Pass value to Object
  • Very high method call and field access numbers

Metadata Access

  • Using public properties instead of methods
  • In PHP 5.3: 4 times faster than a method call

Serialization

Public faster than Protected and Private

<?php
class ClassMetadata
{
    public $fieldMappings;
    protected $associationMappings;
    private $identifier;
}

echo serialize(new ClassMetadata());

Yields:

O:13:"ClassMetadata":3:{ s:13:"fieldMappings";N; s:22:"*associationMappings";N; s:25:"ClassMetadataidentifier";N;}

Questions?

Thank you!