Skip to content

Instant Clone for Multi-Team Development

Overview

This tutorial demonstrates how MatrixOne's Copy-on-Write cloning enables efficient multi-team collaboration on large production datasets. Learn how teams can work independently without storage bloat or time-consuming data copies.

Real-World Scenario: Multiple Teams Need Production Data

Your organization has large production databases, and multiple teams need isolated environments for testing:

  • 📊 Data Science: Train ML models on production data
  • 🧪 QA Team: Run destructive integration tests
  • 👨‍💻 Dev Team: Experiment with schema changes
  • Time-Travel: Test against historical snapshots

The Challenge:

  • 🐢 Slow: Traditional copy takes 30-60 minutes for 1TB
  • 💸 Expensive: Each copy doubles storage (1TB → 2TB → 3TB...)
  • 🔒 Risky: Teams can't work independently without conflicts
  • 📦 Wasteful: Identical data stored multiple times

MatrixOne's Solution:

  • Instant: Clone 1TB database in < 5 seconds
  • 💰 Efficient: 1TB stays 1TB, not 4TB (Copy-on-Write)
  • 🔓 Isolated: Each team gets independent environment
  • 🗑️ Safe: Delete clones without affecting source

Why This Matters for Teams

Traditional Approach Problems:

Production DB (1TB) → Full Copy → Storage Explosion

Team 1: Copy 1TB (30 min) → 2TB total storage
Team 2: Copy 1TB (30 min) → 3TB total storage
Team 3: Copy 1TB (30 min) → 4TB total storage

Result: 4TB storage, 90 minutes, $$$$ costs

MatrixOne Approach:

Production DB (1TB) → Instant Clone → Minimal Storage

Team 1: Clone (5 sec) → 1TB storage (copy-on-write)
Team 2: Clone (5 sec) → 1TB storage (copy-on-write)
Team 3: Clone (5 sec) → 1TB storage (copy-on-write)

Result: ~1.02TB storage, 15 seconds, 💰 75% savings!

Comparison Table:

Aspect Traditional Copy MatrixOne Clone
1TB Clone Time 30-60 minutes < 5 seconds
3 Team Copies Storage 4TB (1 + 1 + 1 + 1) ~1.02TB (base + deltas) 💰
Team Isolation Separate databases Independent Copy-on-Write
Delete Clone Drop database Drop without affecting source
CI/CD Friendly Too slow Perfect for automation
Cost for Cloud 4x storage cost ~1x storage cost

Key Benefits

For Development Teams 👨‍💻

  • Experiment Freely: Test schema changes without risk
  • Parallel Development: Multiple branches, multiple clones
  • Fast Iteration: Create → Test → Delete in seconds
  • No Conflicts: Each developer gets isolated environment

For QA Teams 🧪

  • Destructive Testing: Run tests that modify/delete data
  • Parallel Testing: Multiple test suites, multiple clones
  • Fresh State: New clone for each test run
  • Production Parity: Test on real production data

For Data Science Teams 📊

  • Large Datasets: Clone TB-scale data instantly
  • Experiment Tracking: One clone per experiment
  • Model Training: Full production data for ML
  • No Interference: Train models without affecting prod

For CI/CD Pipelines 🔄

  • Fast Provisioning: Spin up test DB in seconds
  • Cost Effective: No storage explosion
  • Automated Testing: Clone → Test → Delete
  • Scalable: Handle hundreds of parallel jobs

Multi-Team Workflow Diagram

graph TD
    Production["🏢 Production Database<br>1TB - User Behavior Logs<br>Millions of rows"]

    Snapshot["📸 Optional: Snapshot<br>Point-in-time backup<br>< 1 second"]

    DSClone["📊 Data Science Clone<br>⚡ 5 seconds<br>💰 +10MB storage"]
    QAClone["🧪 QA Clone<br>⚡ 5 seconds<br>💰 +5MB storage"]
    DevClone["👨‍💻 Dev Clone<br>⚡ 5 seconds<br>💰 +8MB storage"]
    TTClone["⏰ Time-Travel Clone<br>From snapshot<br>💰 +0MB storage"]

    DSWork["ML Model Training<br>Modify 100 rows<br>Add predictions"]
    QAWork["Integration Tests<br>Delete test data<br>Run destructive tests"]
    DevWork["Schema Changes<br>Add indexes<br>Insert test data"]
    TTWork["Historical Testing<br>Yesterday's data<br>Regression tests"]

    DSDelete["🗑️ Delete DS Clone<br>Production unaffected"]
    QADelete["🗑️ Delete QA Clone<br>Other clones unaffected"]
    DevDelete["🗑️ Delete Dev Clone<br>Work independently"]

    Production --> Snapshot
    Production --> DSClone
    Production --> QAClone
    Production --> DevClone
    Snapshot --> TTClone

    DSClone --> DSWork
    QAClone --> QAWork
    DevClone --> DevWork
    TTClone --> TTWork

    DSWork --> DSDelete
    QAWork --> QADelete
    DevWork --> DevDelete

    style Production fill:#d4edda,stroke:#28a745,stroke-width:3px
    style Snapshot fill:#fff3cd
    style DSClone fill:#d1ecf1
    style QAClone fill:#d1ecf1
    style DevClone fill:#d1ecf1
    style TTClone fill:#d1ecf1
    style DSWork fill:#e2e3e5
    style QAWork fill:#e2e3e5
    style DevWork fill:#e2e3e5
    style TTWork fill:#e2e3e5

Workflow Explanation

Step Action Time Storage Team Isolation
1️⃣ Production Large database running - 1TB Source data
2️⃣ Clone DS Data Science team 5s +0MB Independent
3️⃣ Clone QA QA team 5s +0MB Independent
4️⃣ Clone Dev Dev team 5s +0MB Independent
5️⃣ Modify Each team works - +deltas Isolated
6️⃣ Delete Clean up clones 1s Freed No impact

Key Points:

  • 🟢 Green: Production database (untouched)
  • 🔵 Blue: Team clones (instant, isolated)
  • Gray: Independent modifications
  • All clones can be deleted without affecting production or each other!

Copy-on-Write Magic

How It Works:

When you clone a database:

  1. No data copying: Only metadata created (< 5 seconds)
  2. Shared storage: All clones read from same underlying data
  3. Write isolation: Modified data stored separately (Copy-on-Write)
  4. Independent lifecycle: Delete clones without affecting source

Example:

Production: 1TB
+ DS Clone: 0MB (shared read)
+ QA Clone: 0MB (shared read)
+ Dev Clone: 0MB (shared read)

After work:
+ DS modifies 100 rows → +10MB
+ QA deletes 500 rows → +5MB
+ Dev adds 200 rows → +8MB

Total storage: 1.023TB (not 4TB!)
Savings: 75% storage cost 💰

MatrixOne Python SDK Documentation

For complete API reference, see MatrixOne Python SDK Documentation

Before You Start

Prerequisites

  • MatrixOne database installed and running
  • Python 3.7 or higher
  • MatrixOne Python SDK installed
pip3 install matrixone-python-sdk

Import Required Libraries

from matrixone import Client, SnapshotLevel
from matrixone.config import get_connection_params
from matrixone.orm import declarative_base
from matrixone.sqlalchemy_ext import create_vector_column
from sqlalchemy import BigInteger, Column, String, Integer, Float, Text
from datetime import datetime
import time
import numpy as np

Complete Working Example

Phase 1: Setup Production Database

Connect to Database

from matrixone import Client
from matrixone.config import get_connection_params

# Connect to MatrixOne
host, port, user, password, database = get_connection_params()
client = Client()
client.connect(host=host, port=port, user=user, password=password, database=database)

print(f"Connected to {host}:{port}/{database}")

Create Production Database with Large Table

from matrixone.orm import declarative_base
from matrixone.sqlalchemy_ext import create_vector_column
from sqlalchemy import BigInteger, Column, String, Integer, Float

Base = declarative_base()

# Define large production table
class UserBehavior(Base):
    """Large production table: user behavior logs"""
    __tablename__ = "user_behavior"

    id = Column(BigInteger, primary_key=True, autoincrement=True)
    user_id = Column(BigInteger)
    product_id = Column(BigInteger)
    action = Column(String(50))  # view, click, purchase
    timestamp = Column(BigInteger)
    session_id = Column(String(100))
    device_type = Column(String(50))
    price = Column(Float)
    quantity = Column(Integer)
    embedding = create_vector_column(128, "f32")  # Behavior embedding

# Create production database
prod_db = "production_data"
client.execute(f"CREATE DATABASE IF NOT EXISTS {prod_db}")

# Connect to production
prod_client = Client()
prod_client.connect(host=host, port=port, user=user, password=password, database=prod_db)

# Create table
prod_client.create_table(UserBehavior)
print(f"Created production table: {UserBehavior.__tablename__}")

Insert Large Dataset

import numpy as np

# Simulate large production dataset (1000 rows = millions in production)
actions = ["view", "click", "add_to_cart", "purchase", "review"]
devices = ["mobile", "desktop", "tablet"]

batch_data = []
for i in range(1000):
    batch_data.append({
        "user_id": (i % 100) + 1,
        "product_id": (i % 50) + 1,
        "action": actions[i % len(actions)],
        "timestamp": int(time.time() * 1000) + i,
        "session_id": f"session_{i // 10}",
        "device_type": devices[i % len(devices)],
        "price": round(10 + (i % 500) * 1.5, 2),
        "quantity": (i % 5) + 1,
        "embedding": np.random.rand(128).astype(np.float32).tolist()
    })

prod_client.batch_insert(UserBehavior, batch_data)

total_records = prod_client.query(UserBehavior).count()
print(f"Production ready: {total_records:,} records (~100MB, represents 10GB+ in production)")

prod_client.disconnect()

Phase 2: Data Science Team - Clone for ML Training

Instant Clone for Data Science

# ⚡ INSTANT CLONE: 1TB database cloned in < 5 seconds
# 💰 ZERO STORAGE OVERHEAD: 1TB stays 1TB (Copy-on-Write)

ds_db = "datasci_experiment_ml"
clone_start = time.time()

# Clone production database - instant operation!
client.clone.clone_database(
    target_db=ds_db,
    source_db=prod_db
)

clone_time = time.time() - clone_start
print(f"Data Science clone completed in {clone_time:.2f} seconds")
print(f"No data copied - metadata operation only")
print(f"Storage: ~0 MB additional (Copy-on-Write)")

Data Science Work: ML Model Training

# Connect to DS database
ds_client = Client()
ds_client.connect(host=host, port=port, user=user, password=password, database=ds_db)

# Verify clone has same data
ds_count = ds_client.query(UserBehavior).count()
print(f"DS clone verified: {ds_count:,} records")

# Add ML model predictions (triggers Copy-on-Write)
for i in range(100):
    ds_client.query(UserBehavior).filter(
        UserBehavior.id == i + 1
    ).update(price=999.99)  # Mark as processed by ML model

print(f"Updated 100 records with ML predictions")
print(f"Only modified rows are stored (Copy-on-Write)")
print(f"Production data: completely unaffected")

ds_client.disconnect()

Phase 3: QA Team - Clone for Integration Testing

Instant Clone for QA

qa_db = "qa_integration_test"

# ⚡ Another instant clone - still no storage overhead!
client.clone.clone_database(
    target_db=qa_db,
    source_db=prod_db
)

print(f"QA clone created: {qa_db}")
print(f"QA can run destructive tests safely")

QA Work: Destructive Testing

# Connect to QA database
qa_client = Client()
qa_client.connect(host=host, port=port, user=user, password=password, database=qa_db)

# Run destructive tests - delete data
qa_client.query(UserBehavior).filter(
    UserBehavior.action == "purchase"
).delete().execute()

qa_count = qa_client.query(UserBehavior).count()
print(f"QA deleted purchase records for testing")
print(f"QA database now: {qa_count:,} records")
print(f"Production: unaffected")
print(f"DS clone: unaffected")

qa_client.disconnect()

Phase 4: Dev Team - Clone for Schema Experimentation

Instant Clone for Development

dev_db = "dev_schema_experiment"

# ⚡ Third instant clone - still efficient!
client.clone.clone_database(
    target_db=dev_db,
    source_db=prod_db
)

print(f"Dev clone created: {dev_db}")

Dev Work: Schema Changes and Testing

# Connect to Dev database
dev_client = Client()
dev_client.connect(host=host, port=port, user=user, password=password, database=dev_db)

# Experiment with vector index
dev_client.vector_ops.create_ivf(
    UserBehavior,
    "idx_embedding_test",
    "embedding",
    lists=10,
    op_type="vector_l2_ops"
)

print(f"Created IVF index on embedding column")
print(f"Testing vector search performance")

# Insert test data
test_records = []
for i in range(50):
    test_records.append({
        "user_id": 999,  # Test user
        "product_id": 999,
        "action": "test_action",
        "timestamp": int(time.time() * 1000),
        "session_id": f"test_session_{i}",
        "device_type": "test_device",
        "price": 0.01,
        "quantity": 1,
        "embedding": np.random.rand(128).astype(np.float32).tolist()
    })

dev_client.batch_insert(UserBehavior, test_records)

dev_count = dev_client.query(UserBehavior).count()
print(f"Dev inserted {len(test_records)} test records")
print(f"Dev database: {dev_count:,} records")

dev_client.disconnect()

Phase 5: Verify Production Integrity

# Reconnect to production
prod_client = Client()
prod_client.connect(host=host, port=port, user=user, password=password, database=prod_db)

prod_count = prod_client.query(UserBehavior).count()
test_users = prod_client.query(UserBehavior).filter(UserBehavior.user_id == 999).count()

print(f"\n Production Database Integrity Check:")
print(f"Original records: {total_records:,}")
print(f"Current records: {prod_count:,}")
print(f"Data integrity: {'PRESERVED' if prod_count == total_records else 'MODIFIED'}")
print(f"Test records: {test_users} (expected: 0)")
print(f"\n All team clones are completely isolated!")

prod_client.disconnect()

Phase 6: Storage Efficiency Analysis

print("\n"+"="* 70)
print("Storage Efficiency Comparison")
print("="* 70)

print("\n Traditional Copy Approach:")
print(f"Production: 10 GB")
print(f"DS Clone: 10 GB (full copy)")
print(f"QA Clone: 10 GB (full copy)")
print(f"Dev Clone: 10 GB (full copy)")
print(f"─────────────────────")
print(f"Total: 40 GB")
print(f"Time: ~90 minutes (30 min × 3)")

print("\n MatrixOne Clone (Copy-on-Write):")
print(f"Production: 10 GB")
print(f"DS Clone: ~0 GB + 10 MB (modified data)")
print(f"QA Clone: ~0 GB + 5 MB (modified data)")
print(f"Dev Clone: ~0 GB + 8 MB (modified data)")
print(f"─────────────────────")
print(f"Total: ~10.023 GB")
print(f"Time: ~15 seconds (5 sec × 3)")

print("\n Savings:")
print(f"Storage: 75% saved (29.977 GB)")
print(f"Time: 99.7% faster (89.75 min saved)")
print(f"Cloud cost: ~75% reduction")
print(f"Team productivity: Unlimited!")

Phase 7: Independent Clone Deletion

# Delete QA clone - production and other clones unaffected
client.execute(f"DROP DATABASE {qa_db}")
print(f"\n Dropped QA database: {qa_db}")

# Verify production still intact
prod_client = Client()
prod_client.connect(host=host, port=port, user=user, password=password, database=prod_db)
prod_count_final = prod_client.query(UserBehavior).count()

print(f"Production after QA clone deletion:")
print(f"Records: {prod_count_final:,} (unchanged)")
print(f"Other clones (DS, Dev): still accessible")

prod_client.disconnect()

Phase 8: Time-Travel Testing with Snapshots

# Create snapshot of current state
snapshot_name = f"prod_snapshot_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
client.snapshots.create(
    name=snapshot_name,
    level=SnapshotLevel.DATABASE,
    database=prod_db
)

print(f"\n Created snapshot: {snapshot_name}")

# Simulate production getting new data (today's data)
prod_client = Client()
prod_client.connect(host=host, port=port, user=user, password=password, database=prod_db)

new_data = []
for i in range(100):
    new_data.append({
        "user_id": 200 + i,
        "product_id": 1,
        "action": "new_purchase",
        "timestamp": int(time.time() * 1000),
        "session_id": f"new_session_{i}",
        "device_type": "mobile",
        "price": 99.99,
        "quantity": 1,
        "embedding": np.random.rand(128).astype(np.float32).tolist()
    })

prod_client.batch_insert(UserBehavior, new_data)
new_total = prod_client.query(UserBehavior).count()
print(f"Production received 100 new records")
print(f"Production now: {new_total:,} records")

prod_client.disconnect()

# Clone from snapshot (yesterday's data)
timetravel_db = "test_yesterday_data"
client.clone.clone_database_with_snapshot(
    target_db=timetravel_db,
    source_db=prod_db,
    snapshot_name=snapshot_name
)

# Verify historical data
tt_client = Client()
tt_client.connect(host=host, port=port, user=user, password=password, database=timetravel_db)
tt_count = tt_client.query(UserBehavior).count()

print(f"\n Time-travel clone created:")
print(f"Production (today): {new_total:,} records")
print(f"Clone (snapshot): {tt_count:,} records")
print(f"Testing against historical data!")

tt_client.disconnect()

Clone Operations Reference

Basic Clone Operations

Clone Current Database

# Clone current state of database
client.clone.clone_database(
    target_db="new_database_name",
    source_db="source_database"
)

Use Cases:

  • Quick test environment
  • Parallel development branches
  • Data Science experiments
  • QA testing

Clone from Snapshot

# Clone historical state from snapshot
client.clone.clone_database_with_snapshot(
    target_db="historical_clone",
    source_db="production",
    snapshot_name="prod_snapshot_20250110"
)

Use Cases:

  • Time-travel testing
  • Compare before/after states
  • Regression testing
  • Historical analysis

Snapshot Operations

Create Snapshot

client.snapshots.create(
    name="my_snapshot",
    level=SnapshotLevel.DATABASE,
    database="production"
)

List Snapshots

snapshots = client.snapshots.list()
for snap in snapshots:
    print(f"Snapshot: {snap.name}, Created: {snap.created_at}")

Delete Snapshot

client.snapshots.delete("my_snapshot")

Best Practices

1. Use Clones for Every Test Run

Fresh Start Every Time

def run_test_suite():
    """Create fresh clone for each test run"""
    test_db = f"test_run_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

    # Clone production
    client.clone.clone_database(
        target_db=test_db,
        source_db="production"
    )

    # Run tests
    try:
        run_tests_on(test_db)
    finally:
        # Clean up
        client.execute(f"DROP DATABASE {test_db}")

2. Leverage Copy-on-Write for CI/CD

Parallel CI Jobs

# Each CI job gets its own clone - no storage penalty!

# Job 1: Unit tests
client.clone.clone_database(target_db="ci_job_1", source_db="prod")

# Job 2: Integration tests
client.clone.clone_database(target_db="ci_job_2", source_db="prod")

# Job 3: Performance tests
client.clone.clone_database(target_db="ci_job_3", source_db="prod")

# Total time: ~15 seconds for all 3
# Total storage: ~production size (not 3x!)

3. Use Snapshots for Time-Travel

Historical Testing

# Daily snapshot
client.snapshots.create(
    name=f"daily_{datetime.now().strftime('%Y%m%d')}",
    level=SnapshotLevel.DATABASE,
    database="production"
)

# Clone from last week's snapshot for regression test
client.clone.clone_database_with_snapshot(
    target_db="regression_test",
    source_db="production",
    snapshot_name="daily_20250103"
)

4. Clean Up Clones Regularly

Automated Cleanup

def cleanup_old_clones(prefix="test_", days_old=7):
    """Drop clones older than N days"""
    cutoff = datetime.now() - timedelta(days=days_old)

    # List all databases
    databases = client.execute("SHOW DATABASES")

    for db in databases:
        if db['name'].startswith(prefix):
            # Check creation time and drop if old
            # (implementation depends on metadata tracking)
            pass

5. Name Clones Descriptively

def generate_clone_name(team, purpose):
    """Generate descriptive clone name"""
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    return f"{team}_{purpose}_{timestamp}"

# Examples
ds_clone = generate_clone_name("datasci", "ml_training")
# → "datasci_ml_training_20250110_143052"

qa_clone = generate_clone_name("qa", "integration_test")
# → "qa_integration_test_20250110_143052"

Performance Characteristics

Clone Operations

Real-World Performance:

Database Size Clone Time Storage After Clone Storage After Modifications
100GB < 3 seconds 100GB (no change) 100GB + deltas
1TB < 5 seconds 1TB (no change) 1TB + deltas
10TB < 10 seconds 10TB (no change) 10TB + deltas

Multi-Team Scenario:

Production: 1TB
+ 10 team clones created: ~10 seconds total
+ Storage after cloning: 1TB (unchanged!)

After 1 week of work:
+ Each team modifies ~1% of data: +10GB per clone
+ Total storage: 1TB + (10 × 10GB) = 1.1TB
+ Traditional approach: 1TB × 11 = 11TB

Savings: 90% storage cost! 💰

Snapshot Operations

Create Snapshot:

  • ⚡ < 2 seconds for any size database
  • 📦 Metadata operation only
  • 💾 No storage overhead initially

Clone from Snapshot:

  • ⚡ Same as regular clone (< 5 seconds)
  • 📅 Access historical data instantly
  • 💾 Copy-on-Write applies

Use Cases and Examples

Use Case 1: Data Science Experimentation

# Each data scientist gets their own clone
for scientist in ["alice", "bob", "charlie"]:
    clone_db = f"datasci_{scientist}_experiment"
    client.clone.clone_database(
        target_db=clone_db,
        source_db="production"
    )
    print(f"Created clone for {scientist}")

# Result: 3 full production copies in 15 seconds
# Storage: ~production size (not 3x!)

Use Case 2: Blue-Green Deployment Testing

# Current production (blue)
blue_db = "production_v1"

# Create green environment for v2 testing
green_db = "production_v2_candidate"
client.clone.clone_database(
    target_db=green_db,
    source_db=blue_db
)

# Test v2 changes on green
test_results = run_v2_tests(green_db)

if test_results.success:
    # Promote green to production
    client.execute(f"RENAME DATABASE {green_db} TO {blue_db}")
else:
    # Discard green, keep blue
    client.execute(f"DROP DATABASE {green_db}")

Use Case 3: Parallel A/B Testing

# Create multiple variants for A/B testing
variants = ["control", "variant_a", "variant_b", "variant_c"]

for variant in variants:
    clone_db = f"ab_test_{variant}"
    client.clone.clone_database(
        target_db=clone_db,
        source_db="production"
    )

    # Apply variant-specific changes
    apply_variant_changes(clone_db, variant)

    # Run tests
    metrics = collect_metrics(clone_db)

print(f"4 parallel A/B tests completed")
print(f"Time: ~20 seconds")
print(f"Storage: ~production size")

Use Case 4: CI/CD Pipeline Integration

# .github/workflows/ci.yml equivalent in Python

def ci_pipeline(branch_name):
    """CI pipeline with isolated database"""

    # 1. Create test database for this branch
    test_db = f"ci_{branch_name}_{int(time.time())}"

    client.clone.clone_database(
        target_db=test_db,
        source_db="production_snapshot"
    )

    # 2. Run migrations
    apply_migrations(test_db)

    # 3. Run tests
    test_results = run_test_suite(test_db)

    # 4. Clean up
    client.execute(f"DROP DATABASE {test_db}")

    return test_results

# Each PR gets isolated test environment
# No conflicts between parallel CI jobs
# No storage explosion

Troubleshooting

Issue: Clone Takes Longer Than Expected

Symptoms: Clone operation takes > 10 seconds

Possible Causes:

  • Network latency
  • Database has many small files
  • First clone after MatrixOne restart

Solution:

# Subsequent clones should be faster
# First clone may take longer to warm up metadata

Issue: Cannot Drop Clone Database

Symptoms: Error when trying to drop cloned database

Possible Causes:

  • Active connections to clone
  • Clone being used by another process

Solution:

# Disconnect all clients first
client.disconnect()

# Then drop database
client.execute(f"DROP DATABASE {clone_db}")

Issue: Storage Growing Faster Than Expected

Symptoms: Storage usage higher than expected with Copy-on-Write

Possible Causes:

  • Many modifications to cloned data
  • Large bulk inserts/updates

Explanation:

# Copy-on-Write stores deltas
# If you modify 50% of cloned data, storage grows by 50%
# This is still better than traditional copy (100% overhead)

# Example:
# Production: 1TB
# Clone + modify 50%: 1TB + 0.5TB = 1.5TB
# Traditional copy: 1TB + 1TB = 2TB
# Still 25% savings!

Solution:

  • Drop clones you no longer need
  • Use snapshots for read-only historical access
  • Consider storage budget when planning modifications

Summary

MatrixOne's Copy-on-Write cloning enables:

Instant Cloning

  • 1TB database cloned in < 5 seconds
  • No waiting for data copy
  • Perfect for rapid iteration

Storage Efficiency

  • 10 clones ≈ 1x storage (not 10x!)
  • Copy-on-Write stores only changes
  • 75-90% storage savings

Team Productivity

  • Each team gets isolated environment
  • No conflicts between teams
  • Parallel testing and development

Cost Reduction

  • Massive cloud storage savings
  • Reduced infrastructure costs
  • Better resource utilization

CI/CD Friendly

  • Fast test database provisioning
  • Parallel job execution
  • Automated workflows

Key Operations:

# Clone current database
client.clone.clone_database(target_db="new_db", source_db="source_db")

# Clone from snapshot
client.clone.clone_database_with_snapshot(
    target_db="historical_clone",
    source_db="production",
    snapshot_name="my_snapshot"
)

# Create snapshot
client.snapshots.create(name="backup", level=SnapshotLevel.DATABASE, database="prod")

# Drop clone (no impact on source)
client.execute("DROP DATABASE clone_db")

Perfect For:

  • 👨‍💻 Multi-team development
  • 🧪 CI/CD pipelines
  • 📊 Data Science experiments
  • 🔬 Schema migrations
  • 🧪 Integration testing
  • ⏰ Time-travel debugging

Start leveraging MatrixOne's efficient cloning today and transform your team's workflow! 🚀