Define Models¶
A Basic Definition¶
Every model inherits from BaseModel
, and needs at least a hash key:
>>> from bloop import BaseModel, Column, UUID
>>> class User(BaseModel):
... id = Column(UUID, hash_key=True)
...
>>> User
<Model[User]>
>>> User.id
<Column[User.id=hash]>
Let's add some columns, a range key, and a GSI:
>>> from bloop import (
... BaseModel, Boolean, Column, DateTime,
... GlobalSecondaryIndex, String, UUID)
...
>>> class User(BaseModel):
... id = Column(UUID, hash_key=True)
... version = Column(String, range_key=True)
... email = Column(String)
... created_on = Column(DateTime)
... verified = Column(Boolean)
... profile = Column(String)
... by_email = GlobalSecondaryIndex(projection="keys", hash_key="email")
...
>>> User
<Model[User]>
>>> User.by_email
<GSI[User.by_email=keys]>
Then create the table in DynamoDB:
>>> from bloop import Engine
>>> engine = Engine()
>>> engine.bind(User)
Hint
Alternatively, we could have called engine.bind(BaseModel)
to bind all non-abstract models that subclass
BaseModel
. If any model doesn't match its backing table, TableMismatch
is raised.
Note
Models must be hashable. If you implement __eq__
without
__hash__
, Bloop will inject the first hash method it finds by walking the model's class.mro()
.
Creating Instances¶
The default __init__
takes **kwargs and applies them by each column's model name:
>>> import datetime, uuid
>>> now = datetime.datetime.now(datetime.timezone.utc)
>>> user = User(
... id=uuid.uuid4(),
... version="1",
... email="user@domain.com",
... created_at=now)
>>> user.email
'user@domain.com'
>>> user
User(created_on=datetime.datetime(2016, 10, 29, ...), ...)
A local object's hash and range keys don't need values until you're ready to interact with DynamoDB:
>>> user = User(email="u@d.com", version="1")
>>> engine.save(user)
MissingKey: User(email='u@d.com') is missing hash_key: 'id'
>>> user.id = uuid.uuid4()
>>> engine.save(user)
Metadata: Table Configuration¶
You can provide an inner Meta
class to configure the model's DynamoDB table:
>>> class Tweet(BaseModel):
... class Meta:
... table_name = "custom-table-name"
... read_units = 200
... user = Column(Integer, hash_key=True)
...
>>> Tweet.Meta.read_units
200
>>> Tweet.Meta.keys
{<Column[Tweet.user=hash]}
>>> Tweet.Meta.indexes
set()
Table configuration defaults are:
class Meta:
abstract = False
table_name = __name__ # model class name
read_units = None # uses DynamoDB value, or 1 for new tables
write_units = None # uses DynamoDB value, or 1 for new tables
stream = None
ttl = None
encryption = None
backups = None
abstract¶
If abstract
is true, no backing table will be created in DynamoDB. Instances of abstract models can't be saved
or loaded. You can use abstract models, or even plain classes with Columns and Indexes, as mixins. Derived models
never copy their parents' Meta value. For more information, see the Inheritance and Mixins section.
table_name¶
The default table_name
is simply the model's __name__
. This property is useful for mapping a model
to an existing table, or mapping multiple models to the same table:
class Employee(BaseModel):
class Meta:
table_name = "employees-uk"
...
Changed in version 2.0.0: Engines can customize table names using table_name_template
. This does not change the value of
Meta.table_name
. For example, the template "dev-{table_name}" would cause the Employee
model
above to use the table "dev-employees-uk"
read_units, write_units¶
Default read_units
and write_units
are None. These do not include provisioned throughput for any
GlobalSecondaryIndex
, which has its own read and write units.
If you do not specify the read or write units of a table or GSI, the existing values in DynamoDB are used. When the table or GSI does not exist, they fall back to 1.
Changed in version 1.2.0: Previously, read_units
and write_units
defaulted to 1
. This was inconvenient when throughput
is controlled by an external script, and totally broken with the new auto-scaling features.
backups¶
You can use backups
to enable Continuous Backups and Point-in-Time Recovery. By default continuous backups
are not enabled, and this is None
. To enable continuous backups, use:
class Meta:
backups = {
"enabled": True
}
billing¶
You can use billing
to enable On-Demand Billing or explicitly require provisioned throughput. By default
billing is None.
If you do not specify the billing mode, the existing configuration in DynamoDB is used. When the table does not exist and billing mode is None, the table is created using provisioned throughput.
class Meta:
billing = {
"mode": "on_demand"
}
class Meta:
billing = {
"mode": "provisioned" # if not specified, provisioned billing is used for new tables
}
encryption¶
You can use encryption
to enable Server-Side Encryption. By default encryption is not enabled, and
this is None
. To enable server-side encryption, use:
class Meta:
encryption = {
"enabled": True
}
stream¶
You can use stream
to enable DynamoDBStreams on the table. By default streaming is not enabled, and this
is None
. To enable a stream with both new and old images, use:
class Meta:
stream = {
"include": ["new", "old"]
}
See the Streams section of the user guide to get started. Streams are awesome.
ttl¶
You can use ttl
to enable the TTL feature on the table. By default a TTL attribute is not set, and this
is None
. To enable a ttl on the attribute "delete_after"
, use:
class Meta:
ttl = {
"column": "delete_after"
}
The Column.typedef
of the ttl column must be Number
and per the DynamoDB documents, must
represent the deletion time as number of seconds since the epoch. The Timestamp
type is provided
for your convenience, and is used as a class:datetime.datetime:
class TemporaryPaste(BaseModel):
id = Column(UUID, hash_key=True)
private = Column(Boolean)
delete_after = Column(Timestamp)
class Meta:
ttl = {"column": "delete_after"}
Like DateTime
, bloop.ext
exposes drop-in replacements for Timestamp
for each of three
popular python datetime libraries: arrow, delorean, and pendulum.
Metadata: Model Introspection¶
When a new model is created, a number of attributes are computed and stored in Meta
. These can be used to
generalize conditions for any model, or find columns by their name in DynamoDB.
These top-level properties can be used to describe the model in broad terms:
model
-- The model this Meta is attached tocolumns
-- The set of all columns in the modelcolumns_by_name
-- Dictionary of model Column objects by theirname
attribute.keys
-- The set of all table keys in the model (hash key, or hash and range keys)indexes
-- The set of all indexes (gsis, lsis) in the model
Additional properties break down the broad categories, such as splitting indexes
into gsis
and lsis
:
hash_key
-- The table hash keyrange_key
-- The table range key or Nonegsis
-- The set of allGlobalSecondaryIndex
in the modellsis
-- The set of allLocalSecondaryIndex
in the modelprojection
A pseudo-projection for the table, providing API parity with an Index
Here's the User model we just defined:
>>> User.Meta.hash_key
<Column[User.id=hash]>
>>> User.Meta.gsis
{<GSI[User.by_email=keys]>}
>>> User.Meta.keys
{<Column[User.version=range]>,
<Column[User.id=hash]>}
>>> User.Meta.columns
{<Column[User.created_on]>,
<Column[User.profile]>,
<Column[User.verified]>,
<Column[User.id=hash]>,
<Column[User.version=range]>,
<Column[User.email]>}
Metadata: Using Generic Models¶
A common pattern involves saving an item only if it doesn't exist. Instead of creating a specific
condition for every model, we can use Meta.keys
to make a function for any model:
from bloop import Condition
def if_not_exist(obj):
condition = Condition()
for key in obj.Meta.keys:
condition &= key.is_(None)
return condition
Now, saving only when an object doesn't exist is as simple as:
engine.save(some_obj, condition=if_not_exist(some_obj))
(This is also available in the patterns section of the user guide)
Columns¶
Every Column
must have a Type
that is used to load and dump values to
and from DynamoDB. The typedef
argument can be a type class, or a type instance. When you provide a
class, the Column will create an instance by calling the constructor without args. This is a convenience for
common types that do not require much configuration. The following are functionally equivalent:
Column(Integer)
Column(Integer())
Some types require an argument, such as Set
. Sets must have an inner type so they can map to
a string set, number set, or binary set. For example:
# FAILS: Set must have a type
Column(Set)
# GOOD: Set will instantiate the inner type
Column(Set(Integer))
Column(Set(Integer()))
To make a column the model's hash or range key, use hash_key=True
or range_key=True
. The usual rules apply:
a column can't be both, there can't be more than one of each, and there must be a hash key.
class Impression(BaseModel):
referrer = Column(String, hash_key=True)
version = Column(Integer, range_key=True)
By default values will be stored in DynamoDB under the name of the column in the model definition (its name
).
If you want to conserve read and write units, you can use shorter names for attributes in DynamoDB (attribute names
are counted against your provisioned throughput). Like the table_name
in Meta, the optional dynamo_name
parameter
lets you use descriptive model names without binding you to those names in DynamoDB. This is also convenient when
mapping an existing table, or multi-model tables where an attribute can be interpreted multiple ways.
The following model is identical to the one just defined, except that each attribute is stored using a short name:
class Impression(BaseModel):
referrer = Column(String, hash_key=True, dynamo_name="ref")
version = Column(Integer, range_key=True, dynamo_name="v")
Locally, the model names "referrer" and "version" are still used. An instance would be constructed as usual:
>>> click = Impression(
... referrer="google.com",
... version=get_current_version())
>>> engine.save(click)
Default Values¶
You can provide a default value or a no-arg function that returns a default value when specifying a Column:
class User(BaseModel):
id = Column(UUID)
verified = Column(Boolean, default=False)
created = Column(DateTime, default=lambda: datetime.datetime.now())
Defaults are only applied when new instances are created locally by the default BaseModel.__init__
method.
When new instances are created as part of a Query, Scan, or iterating a Stream, defaults are not applied. This is
because a projection query may not include an existing value; applying the default would locally overwrite the
previous value in DynamoDB.
import datetime
def two_days_later():
offset = datetime.timedelta(days=2)
now = datetime.datetime.now()
return now + offset
class TemporaryPaste(BaseModel):
class Meta:
ttl = {"column": "delete_after"}
id = Column(UUID, hash_key=True, default=uuid.uuid4)
delete_after = Column(Timestamp, default=two_days_later)
verified = Column(Boolean, default=False)
views = Column(Integer, default=1)
Like default function arguments in python, the provided value is not copied but used directly. For example, a
default value of [1, 2, 3]
will use the same list object on each new instance of the model. If you want a
copy of a mutable value, you should wrap it in a lambda: lambda: [1, 2, 3]
.
If you don't want to set a default value, you can return the special sentinel bloop.missing
from your function:
import datetime
import random
from bloop import missing
specials = [
"one free latte",
"50% off chai for a month",
"free drip coffee for a year",
]
offer_ends = datetime.datetime.now() + datetime.timedelta(hours=8)
def limited_time_offer():
now = datetime.datetime.now()
if now < offer_ends:
return random.choice(specials)
return missing
class User(BaseModel):
id = Column(UUID, hash_key=True)
active_coupon = Column(String, default=limited_time_offer)
In this example, a random special is applied to new users for the next 8 hours. Afterwards, the
limited_time_offer
function will return bloop.missing
and the user won't have an active coupon.
Returning bloop.missing
tells Bloop not to set the value, which is different than setting the value to None
.
An explicit None
will clear any existing value on save, while not setting it leaves the value as-is.
Indexes¶
Indexes provide additional ways to query and scan your data. If you have not used indexes, you should first read the Developer's Guide on Improving Data Access with Secondary Indexes.
A single GSI or LSI can be used by two models with different projections, so long as the projections that each model expects are a subset of the actual projection. This can be a useful way to restrict which columns are loaded by eg. a partially hydrated version of a model, while the table's underlying index still provides access to all attributes.
GlobalSecondaryIndex¶
Every GlobalSecondaryIndex
must declare a projection
, which describes the columns projected
into the index. Only projected columns are loaded from queries and scans on the index, and non-projected columns
can't be used in filter expressions. A projection can be "all"
for all columns in the model; "keys"
for the
hash and range columns of the model and the index; or a set of Column
objects or their model
names. If you specify a set of columns, key columns will always be included.
class HeavilyIndexed(BaseModel):
...
by_email = GlobalSecondaryIndex("all", hash_key="email")
by_username = GlobalSecondaryIndex("keys", hash_key="username")
by_create_date = GlobalSecondaryIndex(
{"email", "username"}, hash_key="created_on")
A GlobalSecondaryIndex must have a hash_key
, and can optionally have a range_key
. This can either be the
name of a column, or the column object itself:
class Impression(BaseModel):
id = Column(UUID, hash_key=True)
referrer = Column(String)
version = Column(Integer)
created_on = Column(DateTime)
by_referrer = GlobalSecondaryIndex("all", hash_key=referrer)
by_version = GlobalSecondaryIndex("keys", hash_key="version")
Unlike LocalSecondaryIndex
, a GSI does not share its throughput with the table. You can
specify the read_units
and write_units
of the GSI. If you don't specify the throughput and the GSI already
exists, the values will be read from DynamoDB. If the table doesn't exist, the GSI's read and write units will
instead default to 1.
GlobalSecondaryIndex("all", hash_key=version, read_units=500, write_units=20)
As with Column
you can provide a dynamo_name
for the GSI in DynamoDB. This can be used to map
to an existing index while still using a pythonic model name locally:
class Impression(BaseModel):
...
by_email = GlobalSecondaryIndex("keys", hash_key=email, dynamo_name="index_email")
See also
Global Secondary Indexes in the DynamoDB Developer Guide
LocalSecondaryIndex¶
LocalSecondaryIndex
is similar to GlobalSecondaryIndex
in its use,
but has different requirements. LSIs always have the same hash key as the model, and it can't be changed. The model
must have a range key, and the LSI must specify a range_key
:
LocalSecondaryIndex("all", range_key=created_on)
You can specify a name to use in DynamoDB, just like Column
and GSI:
class Impression(BaseModel):
url = Column(String, hash_key=True)
user_agent = Column(String, range_key=True, dynamo_name="ua")
visited_at = Column(DateTime, dynamo_name="at")
by_date = LocalSecondaryIndex(
"keys", range_key=visited_at, dynamo_name="index_date")
The final optional parameter is strict
, which defaults to True. This controls whether DynamoDB may incur
additional reads on the table when querying the LSI for columns outside the projection. Bloop enforces this by
evaluating the key, filter, and projection conditions against the index's allowed columns and raises an exception
if it finds any non-projected columns.
It is recommended that you leave strict=True
, to prevent accidentally consuming twice as many read units with
an errant projection or filter condition. Since this is local to Bloop and not part of the index definition in
DynamoDB, you can always disable and re-enable it in the future.
See also
Local Secondary Indexes in the DynamoDB Developer Guide
Inheritance and Mixins¶
Your models will often have identical constructs, especially when sharing a table. Rather than define these repeatedly in each model, Bloop provides the ability to derive Columns and Indexes from base classes. Consider a set of models that each has a UUID and sorts on a DateTime:
class HashRangeBase(BaseModel):
id = Column(UUID, hash_key=True, dynamo_name="i")
date = Column(DateTime, range_key=True, dynamo_name="d")
class Meta:
abstract = True
class User(HashRangeBase):
pass
class Upload(HashRangeBase):
class Meta:
write_units = 50
read_units = 10
Subclassing BaseModel
is optional, and provides early validation against missing columns/indexes. Mixins do not
need to be specified in any particular order:
class IndexedEmail:
by_email = GlobalSecondaryIndex(projection="keys", hash_key="email")
class WithEmail:
email = Column(String)
class User(BaseModel, IndexedEmail, WithEmail):
id = Column(Integer, hash_key=True)
assert User.by_email.hash_key is User.email # True
assert User.email is not WithEmail.email # True
Even though the by_email
Index requires the email
Column to exist, it is first in the User's bases.
Modify Derived Columns¶
Bloop uses the __copy__
method to create shallow copies of the base Columns and Indexes. You can override
this to modify derived Columns and Indexes:
class MyColumn(Column):
def __copy__(self):
copy = super().__copy__()
copy.derived = True
class WithEmail:
email = MyColumn(String)
class User(BaseModel, WithEmail):
id = Column(String, hash_key=True)
assert User.email.derived # True
assert not hasattr(WithEmail.email, "derived") # True
Conflicting Derived Values¶
A model cannot derive from two base models or mixins that define the same column or index, or that have an
overlapping dynamo_name
. Consider the following mixins:
class Id:
id = Column(String)
class AlsoId:
id = Column(String, dynamo_name="shared-id")
class AnotherId:
some_id = Column(String, dynamo_name="shared-id")
Each of the following are invalid, and will fail:
# Id, AlsoId have the same column name "id"
class Invalid(BaseModel, Id, AlsoId):
hash = Column(String, hash_key=True)
# AlsoId, AnotherId have same column dynamo_name "shared-id"
class AlsoInvalid(BaseModel, AlsoId, AnotherId):
hash = Column(String, hash_key=True)
For simplicity, Bloop also disallows subclassing more than one model or mixin that defines a hash key, a range key, or an Index (either by name or dynamo_name).
However, a derived class may always overwrite an inherited column or index. The following is valid:
class SharedIds:
hash = Column(String, hash_key=True)
range = Column(Integer, range_key=True)
class CustomHash(BaseModel, SharedIds):
hash = Column(Integer, hash_key=True)
assert CustomHash.hash.typedef is Integer # True
assert SharedIds.hash.typedef is String # True # mixin column is unchanged
assert CustomHash.range.typedef is Integer # Still inherited
This also allows you to hide or omit a derived column:
class SharedColumns:
foo = Column(String)
bar = Column(String)
class MyModel(BaseModel, SharedColumns):
id = Column(Integer, hash_key=True)
foo = None
assert MyModel.foo is None # True
assert MyModel.bar.typedef is String # True
assert {MyModel.id, MyModel.bar} == MyModel.Meta.columns # True