Brace yourselves, this one's a doozy. Drupal's database system is probably one of the least understood parts, and perhaps one of the toughest to understand.
Before we jump into the code, let's take a quick step back.
If we're trying to explain how Drupal talks to databases, we first need to define the major building blocks in general terms. Here are the three big pieces that need to exist before we have a useful database abstraction layer.
- The connection. How is a connection established, maintained, and closed?
- The query. How is a query built and sent to the database?
- The result. How are query results returned in a format which we can use?
Thankfully, PDO ("PHP Data Objects") exists for PHP, which has stock solutions for each of those three. There are two classes here that we need to be aware of.
- The
PDO
class handles creating, maintaining, and closing the connection between PHP and whatever database you're using. - The
PDOStatement
class handles querying the database (using "prepared statements" which is where the name comes from) as well as returning results.
That's it. Those are the two major pieces of PDO, and you can see that each of our three major pieces in the list above above are addressed in one of those two classes.
Note: there are more PDO
classes than just those two. There is also a PDOException
class and some driver classes for different databases, for example, but those aren't important for the purposes of our understanding right now.
Congratulations, you know enough about PDO to understand how Drupal is using it! In fact, a lot of PDO will look familiar to you if you've done some querying with Drupal. For example, fetchAll()
and rowCount()
are core functions of the PDOStatement
class.
With that out of the way, let's talk about what Drupal's doing. This one's a bit tough to summarize, because at a high level it's extremely simple, and at a low level, it's pretty dang complex.
The bird's eye view is that Drupal's database system is not much more than some extensions on top of PDO. Drupal has some classes which extend from the core PDO classes to add support for advanced features (table prefixing is the most obvious example) and fancy helper functions, but at the end of the day it's all just dressing on top of PDO.
Let's start with a quick rundown of the major classes in use. Note that this isn't an exhaustive list of all DB-related classes in Drupal, but it's enough to get you the big picture.
Here are the big three, all of which live along with a few others in includes/database/database.inc
:
DatabaseConnection
- extends thePDO
class to manage the connectionDatabaseStatementBase
- extends thePDOStatement
class to send queries and fetch resultsDatabase
- a standalone class that isn't meant to be instantiated or extended, which contains some functions used to manage the connection withDatabaseConnection
without having to resort to global variables (this will make more sense later).
And then we have the classes for specific query types, which live in includes/database/query.inc
:
Query
- base class that doesn't do much on its ownInsertQuery
- extends theQuery
class forINSERT
queriesUpdateQuery
- extends theQuery
class forUPDATE
queriesDeleteQuery
- extends theQuery
class forDELETE
queriesMergeQuery
- extends theQuery
class forMERGE
queriesTruncateQuery
- extends theQuery
class forTRUNCATE
queriesSelectQuery
- extends theQuery
class forSELECT
queries. Note that this one is special enough to exist in its own file (includes/database/select.inc
) with its owninterface
to go along with it.
And to close the loop, since this is supposed to be a summary after all, here's the general process (at a VERY high level, even though it may not seem like it):
- A query function such as
db_select()
is called when a query is being requested. - That function will call the
getConnection()
method of theDatabase
class which will fetch the DB driver and connection info fromsettings.php
. - Once we have the connection info, we can instantiate the database driver class for our chosen database, such as
DatabaseConnection_mysql
, and hand it our connection info. - The driver class constructor function will call the class's parent constructor (i.e.,
DatabaseConnection
's constructor) which calls its own parent's constructor (i.e.,PDO
's constructor), which creates and returns the connection. - Back in the
db_select()
function with a connection in hand, we can use function chaining to instantiate theSelectQuery
class on top of it and build the query object. - Any other functions chained after our original
db_select()
, such ascondition()
orrange()
ororderBy()
, will be functions inside our query class (i.e.,SelectQuery
) and will alter the instantiated query object's attributes. - The
execute()
method of theSelectQuery
class runs, which converts our query object to aSQL
string, and runs it throughquery()
method of theDatabaseConnection
class - That ends up running the
execute()
method of theDatabaseStatementBase
class - Finally, that function runs the
execute()
method if its parent class, which isPDOStatement
, which actually executes the query against the target database. - Our query has run, and we can fetch results using one of many
fetch*()
functions provided byPDOStatement
andDatabaseStatementBase
.
Too detailed? Ok fine, here it is at an EVEN HIGHER level:
- A query function like
db_select()
runs. - It creates a connection (thanks
PDO
!). - Then it creates a query object.
- Then it executes the query (thanks again
PDO!
). - Then it fetches results however you ask it to.
That's about as helpful a summary as we're going to get without diving into the details a bit. Speaking of details...
Rather than going function by function on the hundreds of database-related functions, I think it's more helpful (or perhaps less boring) to trace the path of a query from start to finish.
For George's request to about-us
, a bunch of queries are going to run, and the vast majority of them are going to be SELECT
queries, so we'll use one of those as our example. Let's see what happens when this very typical looking query runs (assuming $node
is the node currently being viewed).
$result = db_select('node', 'n')
->fields('n')
->condition('nid', $node->nid, '=')
->execute()
->fetchAssoc();
Pretty basic query, right? Let's walk through it, line by line, just to get our bearings. Bear with me if you're a veteran; we'll get there.
$result = db_select('node', 'n')
We're calling db_select()
on the node
table, and aliasing it to n
.
->fields('n')
We're telling it that instead of fetching specific fields in the node
table, we want all of them.
->condition('nid', $node->nid, '=')
We're saying we want to filter out any results where the nid
column isn't equal to $node->nid
.
->execute()
We're done building the query, and we're ready to run it.
->fetchAssoc();
We want the results to be formatted into an associative array. And we're done!
So you can see that even this very basic query has a lot of information baked in. Specifically, we told it:
- What kind of query we want to run
- What our base table is
- What fields we want to retrive from it
- What conditions we want to include
- How we want the results formatted
And all of that in one convenient little chained method call!
Let's dig in! Starting at the beginning, we're calling db_select()
, which is a very simple function.
function db_select($table, $alias = NULL, array $options = array()) {
if (empty($options['target'])) {
$options['target'] = 'default';
}
return Database::getConnection($options['target'])->select($table, $alias, $options);
}
So basically, use the target database unless otherwise specified, then grab a connection using getConnection
, and run select()
on it.
First things first, we need to connect to a database before we can do anything useful. Notice that Drupal doesn't just open a connection for the heck of it during the database bootstrap. No connections are opened until they're actually needed for a query. Here's what happens when we're ready for that.
Firstly, each possible query function (in our case, db_select()
, but just as true for db_update()
, db_insert()
, etc.) includes a call to the getConnection()
function of the Database
class before calling the actual query. We're calling db_select()
so we get this (copied from above):
return Database::getConnection($options['target'])->select($table, $alias, $options);
If you dive into that Database::getConnection
call, it'll take you to Database::openConnection()
.
But wait, there's more! If you dive into that openConnection()
call, you're taken to Database::parseConnectionInfo()
. This very important function populates the $databaseInfo
variable with the information from the global $databases
variable.
You may or may not remember that that global variable was created in the DRUPAL_BOOTSTRAP_CONFIGURATION
phase of the bootstrap process, courtesy of the drupal_settings_initialize()
function.
Anyways, as a result of the Database::openConnection()
function, we have a fully populated $databaseInfo
variable, and we pass that into the constructor for the driver class we're using.
if (!$driver = self::$databaseInfo[$key][$target]['driver']) {
throw new DatabaseDriverNotSpecifiedException('Driver not specified for this database connection: ' . $key);
}
$driver_class = 'DatabaseConnection_' . $driver;
require_once DRUPAL_ROOT . '/includes/database/' . $driver . '/database.inc';
$new_connection = new $driver_class(self::$databaseInfo[$key][$target]);
All we're really doing here is grabbing the driver name from $databaseInfo
, and using that to try to instantiate a class for it, which should be named DatabaseConnection_<driver>
and should be located in /includes/database/<driver>/database.inc
. If found, we pass in the database info so that the driver class can make use of it.
From here, the driver class that we've just instantiated will do some driver-specific stuff (whatever is needed to prepare a connection for whatever database driver you're using) along with calling its parent class's constructor. Its parent class happens to be DatabaseConnection
, which we've already talked about as being extended from the core PHP PDO
class, which creates our connection when instantiated.
And we have our connection. To summarize: any given query will call getConnection()
which calls openConnection()
which, after finding our DB connection info from the global $databases
variable, calls the constructor for our DB driver's class. That constructor calls DatabaseConnection
's constructor which calls PDO
's constructor, which creates our connection.
And with an open connection ripe and ready for the picking, we can move on and do something useful.
Remember that this is the line of code we're looking at, from the db_select()
function.
return Database::getConnection($options['target'])->select($table, $alias, $options);
Now that getConnection()
is complete and has given us a DB connection to use, we can move on to the second part of that statement.
The select()
function is also very simple:
$class = $this->getDriverClass('SelectQuery', array('query.inc', 'select.inc'));
return new $class($table, $alias, $this, $options);
This just gives the DB drivers a chance to override the default query classes to add or alter their behavior so that they play nice with the database in use. For example, PostgreSQL has to have its own version of SelectQuery to change the way orderBy()
and orderRandom()
behave.
All this is doing behind the scenes is checking to see if there is a driver-specific SelectQuery
class, and returning that if so, otherwise it just returns the default. To do this, it just looks for a class named SelectQuery_<drivername>
such as SelectQuery_mysql
. In our case, it won't find one, because SelectQuery_mysql
doesn't exist. But if it were an INSERT
query, it would find InsertQuery_mysql
, which does exist. Or if we were using SQLite, then it would find SelectQuery_sqlite
, which also exists.
Query objects, such as our SelectQuery
, are really complicated little things. They extend the Query
class but they don't really get much from it. Most of the logic belongs to the individual query classes themselves. Our SelectQuery
has functions for all of the things you might want to do to it, such as:
SelectQuery::having()
SelectQuery::isNotNull()
SelectQuery::where()
- Anything else you might chain onto a call to
db_select()
All of the things you can chain onto db_select()
functions to write complicated queries end up running functions defined in the SelectQuery
class. And all of those functions basically just alter our current instantiated SelectQuery
object by updating its attributes.
I know this all seems very vague and confusing. As a specific example, let's see what happens when we walk through our original query, given what we know about Query objects.
$result = db_select('node', 'n')
->fields('n')
->condition('nid', $node->nid, '=')
->execute()
->fetchAssoc();
Remember that the first line creates the connection and the SelectQuery
object. We've already covered that.
The second line calls the SelectQuery::fields()
(remember, all chained functions are defined in the class for the query object) function which looks like this:
public function fields($table_alias, array $fields = array()) {
if ($fields) {
foreach ($fields as $field) {
// We don't care what alias was assigned.
$this->addField($table_alias, $field);
}
}
else {
// We want all fields from this table.
$this->tables[$table_alias]['all_fields'] = TRUE;
}
return $this;
}
See what I mean about just altering the SelectQuery
object? It either runs SelectQuery::addField()
on each field if there are specific fields listed, or it sets all_fields
to TRUE
if not. Remember, all of this is just so that by the time we're ready to execute the query, we have a fully built query object that we can convert into an SQL statement. More on that in a bit.
In the meantime, the 3rd line (the one with condition()
) happens to be important enough that it deserves its own section.
The third line calls condition()
which is yet another function belonging to SelectQuery
, but it's a little more involved than most of the others. Here's what it looks like:
public function condition($field, $value = NULL, $operator = NULL) {
$this->where->condition($field, $value, $operator);
return $this;
}
The tricky part is that $this->where
is actually an instance of the DatabaseCondition
class (yes, conditions get their own class - are you overwhelmed yet?). This instantiation happened in the __construct()
function of SelectQuery
, like so:
$this->where = new DatabaseCondition('AND');
As a result, $this->where->conditions()
becomes an array of conditions in the following format:
array(
'field' => $field,
'value' => $value,
'operator' => $operator,
);
Then, at execution time, that gets compiled along with the rest of the query into raw SQL.
So, by adding a condition to our query using the chained condition()
function, what it's really doing is calling the condition()
function of the DatabaseCondition
class, which adds an array to the list of conditions on our query object.
Don't forget the original code that kicked all of this off:
$result = db_select('node', 'n')
->fields('n')
->condition('nid', $node->nid, '=')
->execute()
->fetchAssoc();
We're done with those first 3 lines, so now it's time to run execute()
, which is yet another function of the SelectQuery
class. It looks like this:
public function execute() {
// If validation fails, simply return NULL.
// Note that validation routines in preExecute() may throw exceptions instead.
if (!$this->preExecute()) {
return NULL;
}
$args = $this->getArguments();
return $this->connection->query((string) $this, $args, $this->queryOptions);
}
The SelectQuery::preExecute()
function does a few very important things:
- If the query has any tags added by the
addTag()
function (which we don't in our example) then it runs the alter hook(s) specified by those tags. - It also runs
preExecute()
on any defined subqueries - Finally, it runs
preExecute()
on any defined unions
With preExecute()
complete, we move onto the getArguments()
. This is the part where WHERE
and HAVING
clauses are compiled from objects to string SQL syntax, along with any UNION
s or subqueries.
And finally, we run the dang query! Note that that last line, the one that runs query()
, casts $this
(i.e., the query object) to a string, which by default PHP behavior means that it runs its __toString()
function.
Lucky for us, SelectQuery
has a great big __toString()
function which converts that hefty object into a regular old prepared statement, which looks a lot more like the raw SQL that our database expects. That __toString()
function doesn't really do anything magical - it's just a matter of asking our query object if it has this or that, and adding the appropriate SQL to the string. For example, here's how GROUP BY
clauses are added:
// GROUP BY
if ($this->group) {
$query .= "\nGROUP BY " . implode(', ', $this->group);
}
See? Not so bad.
At the end of __toString()
we have a fully built SQL string which can be passed on to run the query, by calling the query()
method of the DatabaseConnection
class.
This is the code from that function that ends up doing the magic:
$this->expandArguments($query, $args);
$stmt = $this->prepareQuery($query);
$stmt->execute($args, $options);
That first expandArguments()
line is mostly just housekeeping. For any arguments that are arrays, such as if we had something like ->condition('nid', $nids_array)
, it will convert the array to a comma delimited list.
The second line, the one that runs prepareQuery()
isn't very fun either. It calls the prefixTables()
which literally just runs a str_replace
to add in table prefixes if you're using them. If not, it just removes the curly brackets around table names in the query and moves on.
And we have arrived at the last line. Remember that we're working with the DatabaseConnection
class which extends from the core PDO
class. This last line just calls the execute()
function of the core PDO
class, which takes over and runs the query against our database.
Our query has run, folks! We're ready to see what it returned! It's been a long journey but we're not done yet!
The next bit of code from the DatabaseConnection::query()
function is important:
switch ($options['return']) {
case Database::RETURN_STATEMENT:
return $stmt;
case Database::RETURN_AFFECTED:
return $stmt->rowCount();
case Database::RETURN_INSERT_ID:
return $this->lastInsertId();
case Database::RETURN_NULL:
return;
default:
throw new PDOException('Invalid return directive: ' . $options['return']);
}
This is when our Drupal decides what exactly to return.
SELECT
queries just return the results object as given to us by PDO.UPDATE
,DELETE
,MERGE
, andTRUNCATE
queries return the count of affected rowsINSERT
queries return the ID of the last inserted item- You'll almost never see
RETURN_NULL
set - it happens as an edge case forINSERT
queries running against PostgreSQL or if manually specified in the code that runs the query.
Let's take a peek back at our original code:
$result = db_select('node', 'n')
->fields('n')
->condition('nid', $node->nid, '=')
->execute()
->fetchAssoc();
All of the stuff we just talked about happens as part of execute()
. Now that that's all complete, we can fetch the results in whatever format we want them.
We used fetchAssoc()
, but we could have easily used any of the other methods defined in our DatabaseStatementBase
class (which, remember, extends PDOStatement
), such as:
fetchAssoc()
- returns the next row as an associative arrayfetchAllAssoc()
- returns the entire result set as an associative array keyed by the given fieldfetchAllKeyed()
- returns a single associative array (only useful for two-column result sets)fetchCol()
- returns an entire single column of the result set as an indexed arrayfetchField()
- returns a single field from the next record of a result set
Or, we can also choose from these, which come straight from the PDOStatement
class, so Drupal can use them out of the box:
fetch()
- returns the next row from the result setfetchObject()
- returns the next row as an objectfetchAll()
- returns an array containing all rows of a result setfetchColumn()
- returns a single column (you probably should never use this since Drupal'sfetchField()
exists which calls this behind the scenes)
There's not much to say about how these fetch*
functions do what they do. The implementation of the PDO
functions is outside of the scope of this book, and the ones that exist in Drupal basically each just call the PDO
functions with a certain set of parameters, or a little bit of processing.
As an example of "a little bit of processing", the fetchAllAssoc()
function just looks like this:
foreach ($this as $record) {
$record_key = is_object($record) ? $record->$key : $record [$key];
$return [$record_key] = $record;
}
return $return;
And as an example of "just calling a PDO
function", we can look at what fetchCol()
does internally:
return $this->fetchAll(PDO::FETCH_COLUMN, $index);
See? Nothing very special there. Thankfully, this is one part of Drupal's database layer that isn't very tough to grasp.
And thus we have queried the database and fetched our results, and we're ready to do something useful with them. The database layer's work is done, for that particular query.
All that is well and good, but how does the database table structure get built in the first place? How does Drupal know which tables and fields and indexes to create?
For those who may not be aware, this is a good time to note that modules primarily define their table structure inside a hook_schema()
implementation in modulename.install
. Let's walk through exactly how that gets called on a module (in our case, we'll say the node
module) and what Drupal does with it.
When a module is enabled, the module_enable
runs this little chunk of code (which I have removed comments from):
if (drupal_get_installed_schema_version($module, TRUE) == SCHEMA_UNINSTALLED) {
drupal_install_schema($module);
$versions = drupal_get_schema_versions($module);
$version = $versions ? max($versions) : SCHEMA_INSTALLED;
drupal_set_installed_schema_version($module, $version);
module_invoke($module, 'install');
}
In plain English, this code checks to see if the module has ever been installed before, and if not (meaning the schema_version
column in the system
table for this module is equal to SCHEMA_UNINSTALLED
which is -1
), it installs the schema, then saves the current schema version to the system table.
Note that the above code only runs if the module has never been installed. If it was previously installed, then disabled, and is being re-enabled again, that code does not run at all. Therefore, any schema changes which need to be made after the module was previously installed will have to happen in hook_update_N
and will run the next time database updates are applied.
Obviously, the magic happens in drupal_install_schema($module)
. Let's take a look at what's going on there.
function drupal_install_schema($module) {
$schema = drupal_get_schema_unprocessed($module);
_drupal_schema_initialize($schema, $module, FALSE);
foreach ($schema as $name => $table) {
db_create_table($name, $table);
}
}
Let's take it line by line.
First, we run drupal_get_schema_unprocessed($module)
. This extremely simple function literally just loads the module's install file, invokes its hook_schema()
implementation, and returns that result, without really doing any other processing on it. Not bad.
This gives us $schema
, which is an array of tables, which themselves are arrays of properties.
Next, we run _drupal_schema_initialize()
which is also fairly simple. It just loops through the $schema
array and makes sure that each table has $table['module']
and $table['name']
set, and sets it if needed.
And finally, we just loop through the $schema
array and run db_create_table()
on each table inside it. This ends up running DatabaseSchema::createTable()
on the table, which looks like this:
public function createTable($name, $table) {
if ($this->tableExists($name)) {
throw new DatabaseSchemaObjectExistsException(t('Table @name already exists.', array('@name' => $name)));
}
$statements = $this->createTableSql($name, $table);
foreach ($statements as $statement) {
$this->connection->query($statement);
}
}
Basically, we just check to make sure the table doesn't already exist, then create it if so.
The createTableSql()
is probably the only interesting part of that snippet. It's a driver-specific function, meaning there is a separate version of it depending on which database driver you're using. Here's the MySQL version of it, if you'd like to take a peek.
No matter which driver you're using, however, the idea is the same. We use the createTableSql()
function to create the raw SQL for the CREATE TABLE
statement based on the table array, and then run the SQL to. The function creates a string for the CREATE TABLE
statement, then loops through each of the defined fields and calls createFieldSql()
(another driver-specific function) on each of them, adding the result to the SQL string. At the end of the function, we should have a fully built CREATE TABLE
statement complete with all defined fields and their options, ready to run.