Skip to content

Commit

Permalink
Support mapping time series collections (#2687)
Browse files Browse the repository at this point in the history
* Store metadata information for time series collections

* Specify time series options when creating collection

* Use named arguments for attributes

* Don't skip empty metadata field names

* Leave encoding of enum values to the MongoDB driver

* Remove unused isTimeSeries option

* Disable early exit requirement in XML driver

* Use explicit closure instead of empty() checks

* Support bucketMaxSpanSeconds and bucketRoundingSeconds in time series collections

* Read bucket options for time series in XML driver

* Add attribute documentation for time series collections

* Add cookbook entry for time series data

* Update documentation links

* Expand time series cookbook to use multiple measurements

* Simplify markAsTimeSeries tests with granularity and bucket options

* Apply wording suggestions from code review

Co-authored-by: Jeremy Mikola <jmikola@gmail.com>

---------

Co-authored-by: Jeremy Mikola <jmikola@gmail.com>
  • Loading branch information
alcaeus and jmikola authored Oct 21, 2024
1 parent da17214 commit e9a8e78
Show file tree
Hide file tree
Showing 17 changed files with 570 additions and 6 deletions.
132 changes: 132 additions & 0 deletions docs/en/cookbook/time-series-data.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
Storing Time Series Data
========================

.. note::

Support for mapping time series data was added in ODM 2.10.

`time-series data <https://www.mongodb.com/docs/manual/core/timeseries-collections/>`__
is a sequence of data points in which insights are gained by analyzing changes
over time.

Time series data is generally composed of these components:

-
Time when the data point was recorded

-
Metadata, which is a label, tag, or other data that identifies a data series
and rarely changes

-
Measurements, which are the data points tracked at increments in time.

A time series document always contains a time value, and one or more measurement
fields. Metadata is optional, but cannot be added to a time series collection
after creating it. When using an embedded document for metadata, fields can be
added to this document after creating the collection.

.. note::

Support for time series collections was added in MongoDB 5.0. Attempting to
use this functionality on older server versions will result in an error on
schema creation.

Creating The Model
------------------

For this example, we'll be storing data from multiple sensors measuring
temperature and humidity. Other examples for time series include stock data,
price information, website visitors, or vehicle telemetry (speed, position,
etc.).

First, we define the model for our data:

.. code-block:: php
<?php
use DateTimeImmutable;
use Doctrine\ODM\MongoDB\Mapping\Annotations as ODM;
use MongoDB\BSON\ObjectId;
#[ODM\Document]
readonly class Measurement
{
#[ODM\Id]
public string $id;
public function __construct(
#[ODM\Field(type: 'date_immutable')]
public DateTimeImmutable $time,
#[ODM\Field(type: 'int')]
public int $sensorId,
#[ODM\Field(type: 'float')]
public float $temperature,
#[ODM\Field(type: 'float')]
public float $humidity,
) {
$this->id = (string) new ObjectId();
}
}
Note that we defined the entire model as readonly. While we could theoretically
change values in the document, in this example we'll assume that the data will
not change.

Now we can mark the document as a time series document. To do so, we use the
``TimeSeries`` attribute, configuring appropriate values for the time and
metadata field, which in our case stores the ID of the sensor reporting the
measurement:

.. code-block:: php
<?php
// ...
#[ODM\Document]
#[ODM\TimeSeries(timeField: 'time', metaField: 'sensorId')]
readonly class Measurement
{
// ...
}
Once we create the schema, we can store our measurements in this time series
collection and let MongoDB optimize the storage for faster queries:

.. code-block:: php
<?php
$measurement = new Measurement(
time: new DateTimeImmutable(),
sensorId: $sensorId,
temperature: $temperature,
humidity: $humidity,
);
$documentManager->persist($measurement);
$documentManager->flush();
Note that other functionality such as querying, using aggregation pipelines, or
removing data works the same as with other collections.

Considerations
--------------

With the mapping above, data is stored with a granularity of seconds. Depending
on how often measurements come in, we can reduce the granularity to minutes or
hours. This changes how the data is stored internally by changing the bucket
size. This affects storage requirements and query performance.

For example, with the default ``seconds`` granularity, each bucket groups
documents for one hour. If each sensor only reports data every few minutes, we'd
do well to configure ``minute`` granularity. This reduces the
number of buckets created, reducing storage and making queries more efficient.
However, if we were to choose ``hours`` for granularity, readings for a whole
month would be grouped into one bucket, resulting in slower queries as more
entries have to be traversed when reading data.

More details on granularity and other consideration scan be found in the
`MongoDB documentation <https://www.mongodb.com/docs/manual/core/timeseries/timeseries-considerations/>`__.
58 changes: 58 additions & 0 deletions docs/en/reference/attributes-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1144,6 +1144,64 @@ for sharding the document collection.
//...
}
#[TimeSeries]
-------------

This attribute may be used at the class level to mark a collection as containing
:doc:`time-series data <../cookbook/time-series-data>`.

.. code-block:: php
<?php
use Doctrine\ODM\MongoDB\Mapping\TimeSeries\Granularity;
#[Document]
#[TimeSeries(timeField: 'time', metaField: 'metadata', granularity: Granularity::Seconds)]
class Measurements
{
#[Id]
public string $id;
#[Field]
public DateTimeImmutable $time;
#[EmbedOne(targetDocument: MeasurementMetadata)]
public MeasurementMetadata $metadata;
#[Field]
public int $measurement;
}
The ``timeField`` attribute is required and denotes the field where the time of
a time series entry is stored. The following optional attributes may be set:

-
``metaField`` - The name of the field which contains metadata in each time
series document. The field can be of any data type.

-
``granularity`` - Set the granularity to the value that most closely matches
the time between consecutive incoming timestamps. This allows MongoDB to
optimize how data is stored. Note: this attribute cannot be combined with
``bucketMaxSpanSeconds`` and ``bucketRoundingSeconds``.

-
``bucketMaxSpanSeconds`` - Used with ``bucketRoundingSeconds`` as an
alternative to ``granularity``. Sets the maximum time between timestamps
in the same bucket. Possible values are 1 - 31356000.

-
``bucketRoundingSeconds`` - Used with ``bucketMaxSpanSeconds``, must be set
to the same value as ``bucketMaxSpanSeconds``. When a document requires a
new bucket, MongoDB rounds down the document's timestamp value by this
interval to set the minimum time for the bucket.

-
``expireAfterSeconds`` - Enables the automatic deletion of documents in a
time series collection by specifying the number of seconds after which
documents expire. MongoDB deletes these expired documents automatically.

#[UniqueIndex]
--------------

Expand Down
25 changes: 25 additions & 0 deletions doctrine-mongo-mapping.xsd
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@
<xs:element name="shard-key" type="odm:shard-key" minOccurs="0" />
<xs:element name="read-preference" type="odm:read-preference" minOccurs="0" />
<xs:element name="schema-validation" type="odm:schema-validation" minOccurs="0" />
<xs:element name="time-series" type="odm:time-series" minOccurs="0" />
</xs:choice>

<xs:attribute name="db" type="xs:NMTOKEN" />
Expand Down Expand Up @@ -634,4 +635,28 @@
</xs:restriction>
</xs:simpleType>

<xs:complexType name="time-series">
<xs:attribute name="time-field" type="xs:NMTOKEN" use="required" />
<xs:attribute name="meta-field" type="xs:NMTOKEN" />
<xs:attribute name="granularity" type="odm:time-series-granularity" />
<xs:attribute name="expire-after-seconds" type="xs:integer" />
<xs:attribute name="bucket-max-span-seconds" type="odm:time-series-group-seconds" />
<xs:attribute name="bucket-rounding-seconds" type="odm:time-series-group-seconds" />
</xs:complexType>

<xs:simpleType name="time-series-granularity">
<xs:restriction base="xs:token">
<xs:enumeration value="seconds" />
<xs:enumeration value="minutes" />
<xs:enumeration value="hours" />
</xs:restriction>
</xs:simpleType>

<xs:simpleType name="time-series-group-seconds">
<xs:restriction base="xs:integer">
<xs:minInclusive value="1" />
<xs:maxInclusive value="31536000" />
</xs:restriction>
</xs:simpleType>

</xs:schema>
29 changes: 29 additions & 0 deletions lib/Doctrine/ODM/MongoDB/Mapping/Annotations/TimeSeries.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<?php

declare(strict_types=1);

namespace Doctrine\ODM\MongoDB\Mapping\Annotations;

use Attribute;
use Doctrine\Common\Annotations\Annotation\NamedArgumentConstructor;
use Doctrine\ODM\MongoDB\Mapping\TimeSeries\Granularity;

/**
* Marks a document or superclass as a time series document
*
* @Annotation
* @NamedArgumentConstructor
*/
#[Attribute(Attribute::TARGET_CLASS)]
final class TimeSeries implements Annotation
{
public function __construct(
public readonly string $timeField,
public readonly ?string $metaField = null,
public readonly ?Granularity $granularity = null,
public readonly ?int $expireAfterSeconds = null,
public readonly ?int $bucketMaxSpanSeconds = null,
public readonly ?int $bucketRoundingSeconds = null,
) {
}
}
23 changes: 23 additions & 0 deletions lib/Doctrine/ODM/MongoDB/Mapping/ClassMetadata.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
use Doctrine\Instantiator\InstantiatorInterface;
use Doctrine\ODM\MongoDB\Id\IdGenerator;
use Doctrine\ODM\MongoDB\LockException;
use Doctrine\ODM\MongoDB\Mapping\Annotations\TimeSeries;
use Doctrine\ODM\MongoDB\Types\Incrementable;
use Doctrine\ODM\MongoDB\Types\Type;
use Doctrine\ODM\MongoDB\Types\Versionable;
Expand Down Expand Up @@ -799,6 +800,9 @@
*/
public $isReadOnly;

/** READ ONLY: stores metadata about the time series collection */
public ?TimeSeries $timeSeriesOptions = null;

private InstantiatorInterface $instantiator;

private ReflectionService $reflectionService;
Expand Down Expand Up @@ -2174,6 +2178,13 @@ public function markViewOf(string $rootClass): void
$this->rootClass = $rootClass;
}

public function markAsTimeSeries(TimeSeries $options): void
{
$this->validateTimeSeriesOptions($options);

$this->timeSeriesOptions = $options;
}

public function getFieldNames(): array
{
return array_keys($this->fieldMappings);
Expand Down Expand Up @@ -2527,6 +2538,7 @@ public function __sleep()
'idGenerator',
'indexes',
'shardKey',
'timeSeriesOptions',
];

// The rest of the metadata is only serialized if necessary.
Expand Down Expand Up @@ -2758,4 +2770,15 @@ private function validateAndCompleteTypedManyAssociationMapping(array $mapping):

return $mapping;
}

private function validateTimeSeriesOptions(TimeSeries $options): void
{
if (! $this->hasField($options->timeField)) {
throw MappingException::timeSeriesFieldNotFound($this->name, $options->timeField, 'time');
}

if ($options->metaField !== null && ! $this->hasField($options->metaField)) {
throw MappingException::timeSeriesFieldNotFound($this->name, $options->metaField, 'metadata');
}
}
}
7 changes: 7 additions & 0 deletions lib/Doctrine/ODM/MongoDB/Mapping/Driver/AttributeDriver.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
use Doctrine\ODM\MongoDB\Mapping\Annotations\AbstractIndex;
use Doctrine\ODM\MongoDB\Mapping\Annotations\SearchIndex;
use Doctrine\ODM\MongoDB\Mapping\Annotations\ShardKey;
use Doctrine\ODM\MongoDB\Mapping\Annotations\TimeSeries;
use Doctrine\ODM\MongoDB\Mapping\ClassMetadata;
use Doctrine\ODM\MongoDB\Mapping\MappingException;
use Doctrine\Persistence\Mapping\ClassMetadata as PersistenceClassMetadata;
Expand Down Expand Up @@ -288,6 +289,12 @@ public function loadMetadataForClass($className, PersistenceClassMetadata $metad
$this->setShardKey($metadata, $classAttributes[ShardKey::class]);
}

// Mark as time series only after mapping all fields
if (isset($classAttributes[TimeSeries::class])) {
assert($classAttributes[TimeSeries::class] instanceof TimeSeries);
$metadata->markAsTimeSeries($classAttributes[TimeSeries::class]);
}

foreach ($reflClass->getMethods(ReflectionMethod::IS_PUBLIC) as $method) {
/* Filter for the declaring class only. Callbacks from parent
* classes will already be registered.
Expand Down
30 changes: 26 additions & 4 deletions lib/Doctrine/ODM/MongoDB/Mapping/Driver/XmlDriver.php
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@

namespace Doctrine\ODM\MongoDB\Mapping\Driver;

use Doctrine\ODM\MongoDB\Mapping\Annotations\TimeSeries;
use Doctrine\ODM\MongoDB\Mapping\ClassMetadata;
use Doctrine\ODM\MongoDB\Mapping\MappingException;
use Doctrine\ODM\MongoDB\Mapping\TimeSeries\Granularity;
use Doctrine\ODM\MongoDB\Utility\CollectionHelper;
use Doctrine\Persistence\Mapping\Driver\FileDriver;
use DOMDocument;
Expand Down Expand Up @@ -78,6 +80,7 @@ public function __construct($locator, $fileExtension = self::DEFAULT_FILE_EXTENS
parent::__construct($locator, $fileExtension);
}

// phpcs:disable SlevomatCodingStandard.ControlStructures.EarlyExit.EarlyExitNotUsed
public function loadMetadataForClass($className, \Doctrine\Persistence\Mapping\ClassMetadata $metadata)
{
assert($metadata instanceof ClassMetadata);
Expand Down Expand Up @@ -335,15 +338,34 @@ public function loadMetadataForClass($className, \Doctrine\Persistence\Mapping\C
}
}

if (! isset($xmlRoot->{'also-load-methods'})) {
return;
if (isset($xmlRoot->{'also-load-methods'})) {
foreach ($xmlRoot->{'also-load-methods'}->{'also-load-method'} as $alsoLoadMethod) {
$metadata->registerAlsoLoadMethod((string) $alsoLoadMethod['method'], (string) $alsoLoadMethod['field']);
}
}

foreach ($xmlRoot->{'also-load-methods'}->{'also-load-method'} as $alsoLoadMethod) {
$metadata->registerAlsoLoadMethod((string) $alsoLoadMethod['method'], (string) $alsoLoadMethod['field']);
if (isset($xmlRoot->{'time-series'})) {
$attributes = $xmlRoot->{'time-series'}->attributes();

$metaField = isset($attributes['meta-field']) ? (string) $attributes['meta-field'] : null;
$granularity = isset($attributes['granularity']) ? Granularity::from((string) $attributes['granularity']) : null;
$expireAfterSeconds = isset($attributes['expire-after-seconds']) ? (int) $attributes['expire-after-seconds'] : null;
$bucketMaxSpanSeconds = isset($attributes['bucket-max-span-seconds']) ? (int) $attributes['bucket-max-span-seconds'] : null;
$bucketRoundingSeconds = isset($attributes['bucket-rounding-seconds']) ? (int) $attributes['bucket-rounding-seconds'] : null;

$metadata->markAsTimeSeries(new TimeSeries(
timeField: (string) $attributes['time-field'],
metaField: $metaField,
granularity: $granularity,
expireAfterSeconds: $expireAfterSeconds,
bucketMaxSpanSeconds: $bucketMaxSpanSeconds,
bucketRoundingSeconds: $bucketRoundingSeconds,
));
}
}

// phpcs:enable SlevomatCodingStandard.ControlStructures.EarlyExit.EarlyExitNotUsed

/**
* @param ClassMetadata<object> $class
* @phpstan-param FieldMappingConfig $mapping
Expand Down
Loading

0 comments on commit e9a8e78

Please sign in to comment.