cyberangles blog

How to Use MongoEngine Document Class Methods for Custom Validation & Pre-Save Hooks: A Guide to Auto-Generating IDs and Field Validation Logic

MongoDB, a leading NoSQL database, is renowned for its flexibility and scalability. When working with MongoDB in Python, MongoEngine—an Object-Document Mapper (ODM)—simplifies interactions by allowing you to define structured data models (called Document classes) that map to MongoDB collections. While MongoEngine provides built-in validation and default behaviors (like auto-generating ObjectId for _id), real-world applications often require custom logic: ensuring data consistency, auto-generating fields (e.g., slugs or UUIDs), or enforcing business rules (e.g., password complexity).

In this guide, we’ll explore how to leverage MongoEngine’s Document class methods to implement custom validation and pre-save hooks. We’ll dive into practical examples, including auto-generating custom IDs, validating fields, and modifying data before it’s saved to the database. By the end, you’ll be equipped to build robust, self-validating data models tailored to your application’s needs.

2026-02

Table of Contents#

  1. Introduction to MongoEngine’s Document Class
  2. Custom Validation with clean() and validate()
  3. Pre-Save Hooks: Modifying Data Before Save
  4. Auto-Generating Custom IDs
  5. Advanced Field Validation with Custom Validators
  6. Practical Example: Building a Self-Validating Blog Post Model
  7. Best Practices
  8. Conclusion
  9. References

1. Introduction to MongoEngine’s Document Class#

At the core of MongoEngine is the Document class, which serves as the base for all your data models. A Document subclass defines the structure of your MongoDB collection, including fields, validation rules, and custom methods.

Basic Document Structure#

Here’s a simple example of a User document:

from mongoengine import Document, StringField, EmailField, IntField  
 
class User(Document):  
    username = StringField(required=True, max_length=50, unique=True)  
    email = EmailField(required=True, unique=True)  
    age = IntField(min_value=13)  # Built-in validation  
 
    # Optional: Custom method  
    def greet(self):  
        return f"Hello, {self.username}!"  

This model maps to a MongoDB collection named user (lowercase by default) and includes built-in validation (e.g., required=True, max_length, min_value). However, to enforce more complex rules (e.g., "username must not contain spaces"), we need custom logic.

2. Custom Validation with clean() and validate()#

MongoEngine triggers validation automatically when you call save() or validate() on a document. For custom validation, override the clean() method or use the validate() method.

The clean() Method#

The clean() method is called during validation and is ideal for cross-field validation (e.g., ensuring two fields are consistent) or complex rules that can’t be handled by built-in field validators.

How to Use clean():#

  • Override clean() in your Document subclass.
  • Raise a ValidationError if validation fails.
  • Use self to access the document’s fields.

Example: Username and Email Validation#

from mongoengine import ValidationError  
 
class User(Document):  
    username = StringField(required=True, max_length=50, unique=True)  
    email = EmailField(required=True, unique=True)  
    age = IntField(min_value=13)  
 
    def clean(self):  
        # Ensure username has no spaces  
        if ' ' in self.username:  
            raise ValidationError("Username cannot contain spaces.")  
 
        # Ensure email domain is allowed (e.g., no disposable emails)  
        if self.email.split('@')[-1] in ["mailinator.com", "tempmail.com"]:  
            raise ValidationError("Disposable email addresses are not allowed.")  
 
        # Cross-field validation: If under 18, email must be parental  
        if self.age and self.age < 18 and not self.email.endswith(".parental.com"):  
            raise ValidationError("Minors must use a parental email.")  

Now, when you call user.save() or user.validate(), clean() runs, and invalid data raises a ValidationError.

The validate() Method#

The validate() method is the entry point for validation. By default, it calls clean() and checks field-level validators. You can override validate() for full control over the validation workflow, but this is rarely needed—clean() is sufficient for most cases.

3. Pre-Save Hooks: Modifying Data Before Save#

Pre-save hooks let you modify or enrich data before it’s saved to MongoDB. Common use cases include:

  • Auto-generating timestamps (created_at, updated_at).
  • Logging save events.
  • Sanitizing input (e.g., stripping whitespace from strings).

MongoEngine provides a @pre_save decorator or an overridable pre_save() method to implement hooks.

Using the pre_save() Method#

Override pre_save() to define logic that runs before save().

Example: Auto-Setting Timestamps#

from datetime import datetime  
 
class User(Document):  
    username = StringField(required=True)  
    created_at = DateTimeField()  
    updated_at = DateTimeField()  
 
    def pre_save(self):  
        # Set created_at on first save (if not already set)  
        if not self.created_at:  
            self.created_at = datetime.utcnow()  
        # Update updated_at on every save  
        self.updated_at = datetime.utcnow()  

Now, whenever user.save() is called:

  • created_at is set to the current UTC time if the document is new.
  • updated_at is updated to the current UTC time.

Using the @pre_save Decorator#

For more flexibility (e.g., multiple hooks), use the @pre_save decorator from mongoengine.signals:

from mongoengine.signals import pre_save  
from datetime import datetime  
 
class User(Document):  
    username = StringField(required=True)  
    created_at = DateTimeField()  
    updated_at = DateTimeField()  
 
@pre_save(sender=User)  
def set_timestamps(sender, document, **kwargs):  
    if not document.created_at:  
        document.created_at = datetime.utcnow()  
    document.updated_at = datetime.utcnow()  

This achieves the same result as overriding pre_save(), but keeps the logic separate from the model class.

4. Auto-Generating Custom IDs#

MongoDB uses ObjectId as the default _id field, but you may need custom IDs (e.g., slugs, UUIDs, or sequential numbers). Use pre-save hooks to generate these IDs automatically.

Example 1: Auto-Generating Slugs#

Slugs are URL-friendly versions of a title (e.g., "my-blog-post" from "My Blog Post"). Generate a slug from a title field using pre_save():

from slugify import slugify  # Install with: pip install python-slugify  
 
class BlogPost(Document):  
    title = StringField(required=True)  
    slug = StringField(primary_key=True, unique=True)  # Use slug as _id  
    content = StringField(required=True)  
 
    def pre_save(self):  
        # Generate slug if not already set  
        if not self.slug:  
            base_slug = slugify(self.title)  
            existing_slugs = BlogPost.objects(slug__startswith=base_slug).count()  
            # Add a suffix if slug exists (e.g., "my-blog-post-2")  
            if existing_slugs > 0:  
                self.slug = f"{base_slug}-{existing_slugs + 1}"  
            else:  
                self.slug = base_slug  
  • primary_key=True replaces the default ObjectId with slug as the unique identifier.
  • slugify converts the title to lowercase and replaces spaces with hyphens.
  • We check for existing slugs to avoid duplicates (e.g., "my-blog-post" becomes "my-blog-post-2" if the first exists).

Example 2: Auto-Generating UUIDs#

For globally unique identifiers (UUIDs), use Python’s uuid module:

import uuid  
 
class Product(Document):  
    product_id = StringField(primary_key=True, unique=True)  
    name = StringField(required=True)  
 
    def pre_save(self):  
        if not self.product_id:  
            self.product_id = str(uuid.uuid4())  # Generate UUID4  

This ensures each Product has a unique product_id like a1b2c3d4-5678-90ef-ghij-klmnopqrstuv.

Example 3: Sequential IDs (Advanced)#

For sequential IDs (e.g., order-001, order-002), use a counter collection to track the last ID and increment it:

class Counter(Document):  
    name = StringField(required=True, unique=True)  
    value = IntField(default=0)  
 
class Order(Document):  
    order_id = StringField(primary_key=True)  
    total = FloatField(required=True)  
 
    def pre_save(self):  
        if not self.order_id:  
            # Get or create a counter for orders  
            counter = Counter.objects(name="order_counter").modify(  
                upsert=True, new=True, inc__value=1  
            )  
            self.order_id = f"order-{counter.value:03d}"  # Format as 001, 002, etc.  

Note: This uses MongoDB’s atomic modify to avoid race conditions when incrementing the counter.

5. Advanced Field Validation with Custom Validators#

For field-specific validation (e.g., password complexity), use MongoEngine’s validators parameter in field definitions. Validators are functions that check a field’s value and raise ValidationError if invalid.

Step 1: Define a Validator Function#

A validator function takes three arguments: value (the field’s value), field (the field object), and **kwargs.

def password_complexity(value):  
    if len(value) < 8:  
        raise ValidationError("Password must be at least 8 characters long.")  
    if not any(c.isupper() for c in value):  
        raise ValidationError("Password must contain an uppercase letter.")  
    if not any(c.isdigit() for c in value):  
        raise ValidationError("Password must contain a number.")  

Step 2: Attach the Validator to a Field#

Add the validator to a StringField using the validators parameter:

from mongoengine import StringField  
 
class User(Document):  
    username = StringField(required=True)  
    password = StringField(  
        required=True,  
        validators=[password_complexity]  # Attach custom validator  
    )  

Now, when a User is saved, password_complexity runs automatically.

Built-in Validators#

MongoEngine also provides built-in validators (e.g., EmailValidator, URLValidator). For example:

from mongoengine.validators import URLValidator  
 
class Profile(Document):  
    website = StringField(validators=[URLValidator()])  # Ensures URL format  

6. Practical Example: Building a Self-Validating Blog Post Model#

Let’s combine everything into a BlogPost model with:

  • Auto-generated slug (ID).
  • Timestamps (created_at, updated_at).
  • Custom validation (title length, content checks).
  • Field validators (author name format).

Full Example Code#

from mongoengine import Document, StringField, DateTimeField, ValidationError  
from mongoengine.signals import pre_save  
from datetime import datetime  
from slugify import slugify  
import re  
 
# Custom validator for author name (letters only)  
def validate_author(value):  
    if not re.match(r'^[A-Za-z\s]+$', value):  
        raise ValidationError("Author name can only contain letters and spaces.")  
 
class BlogPost(Document):  
    title = StringField(required=True, max_length=200)  
    slug = StringField(primary_key=True, unique=True)  # Custom ID  
    content = StringField(required=True)  
    author = StringField(required=True, validators=[validate_author])  
    created_at = DateTimeField()  
    updated_at = DateTimeField()  
 
    def clean(self):  
        # Validate title length  
        if len(self.title) < 5:  
            raise ValidationError("Title must be at least 5 characters long.")  
        # Validate content is not empty  
        if not self.content.strip():  
            raise ValidationError("Content cannot be empty.")  
 
# Pre-save hook to generate slug and timestamps  
@pre_save(sender=BlogPost)  
def blog_post_pre_save(sender, document, **kwargs):  
    # Generate slug if missing  
    if not document.slug:  
        base_slug = slugify(document.title)  
        existing = BlogPost.objects(slug__startswith=base_slug).count()  
        document.slug = f"{base_slug}-{existing + 1}" if existing else base_slug  
 
    # Set timestamps  
    if not document.created_at:  
        document.created_at = datetime.utcnow()  
    document.updated_at = datetime.utcnow()  
 
# Usage Example  
try:  
    post = BlogPost(  
        title="Getting Started with MongoEngine",  
        content="MongoEngine is a powerful ODM for MongoDB...",  
        author="Jane Doe"  
    )  
    post.save()  
    print(f"Created post with slug: {post.slug}")  # Output: "getting-started-with-mongoengine"  
except ValidationError as e:  
    print(f"Validation failed: {e}")  

Key Features Explained:#

  • Slug Generation: pre_save generates a URL-friendly slug from the title, appending a number if the slug already exists.
  • Timestamps: created_at is set on first save, and updated_at updates on every save.
  • Validation:
    • clean() ensures the title is long enough and content isn’t empty.
    • validate_author ensures the author’s name contains only letters and spaces.

7. Best Practices#

  1. Prefer Built-in Validators: Use MongoEngine’s built-in validators (e.g., max_length, EmailField) for simplicity before writing custom logic.
  2. Keep clean() Focused: Use clean() for cross-field validation; use field validators for single-field rules.
  3. Test Validation Logic: Write unit tests for custom validators and clean() methods to catch edge cases.
  4. Avoid Side Effects in Hooks: Pre-save hooks should modify the document or log, not perform external actions (e.g., API calls).
  5. Document Custom Logic: Add docstrings to clean(), pre_save(), and validators to explain their purpose.

8. Conclusion#

MongoEngine’s Document class methods empower you to build robust, self-validating data models. By overriding clean() for custom validation, using pre_save hooks for pre-save logic, and leveraging validators for field-specific rules, you can ensure data integrity and enforce business requirements. Auto-generating custom IDs (like slugs or UUIDs) further tailors your models to application needs.

With these tools, you’ll create MongoDB collections that are consistent, secure, and easy to maintain.

9. References#