Instant Clone for Multi-Team Development
Overview
This tutorial demonstrates how MatrixOne's Copy-on-Write cloning enables efficient multi-team collaboration on large production datasets. Learn how teams can work independently without storage bloat or time-consuming data copies.
Real-World Scenario: Multiple Teams Need Production Data
Your organization has large production databases, and multiple teams need isolated environments for testing:
- 📊 Data Science: Train ML models on production data
- 🧪 QA Team: Run destructive integration tests
- 👨💻 Dev Team: Experiment with schema changes
- ⏰ Time-Travel: Test against historical snapshots
The Challenge:
- 🐢 Slow: Traditional copy takes 30-60 minutes for 1TB
- 💸 Expensive: Each copy doubles storage (1TB → 2TB → 3TB...)
- 🔒 Risky: Teams can't work independently without conflicts
- 📦 Wasteful: Identical data stored multiple times
MatrixOne's Solution:
- ⚡ Instant: Clone 1TB database in < 5 seconds
- 💰 Efficient: 1TB stays 1TB, not 4TB (Copy-on-Write)
- 🔓 Isolated: Each team gets independent environment
- 🗑️ Safe: Delete clones without affecting source
Why This Matters for Teams
Traditional Approach Problems:
Production DB (1TB) → Full Copy → Storage Explosion
Team 1: Copy 1TB (30 min) → 2TB total storage
Team 2: Copy 1TB (30 min) → 3TB total storage
Team 3: Copy 1TB (30 min) → 4TB total storage
Result: 4TB storage, 90 minutes, $$$$ costs
MatrixOne Approach:
Production DB (1TB) → Instant Clone → Minimal Storage
Team 1: Clone (5 sec) → 1TB storage (copy-on-write)
Team 2: Clone (5 sec) → 1TB storage (copy-on-write)
Team 3: Clone (5 sec) → 1TB storage (copy-on-write)
Result: ~1.02TB storage, 15 seconds, 💰 75% savings!
Comparison Table:
| Aspect | Traditional Copy | MatrixOne Clone |
|---|---|---|
| 1TB Clone Time | 30-60 minutes | < 5 seconds ⚡ |
| 3 Team Copies Storage | 4TB (1 + 1 + 1 + 1) | ~1.02TB (base + deltas) 💰 |
| Team Isolation | Separate databases | Independent Copy-on-Write |
| Delete Clone | Drop database | Drop without affecting source |
| CI/CD Friendly | Too slow | Perfect for automation |
| Cost for Cloud | 4x storage cost | ~1x storage cost |
Key Benefits
For Development Teams 👨💻
- ✅ Experiment Freely: Test schema changes without risk
- ✅ Parallel Development: Multiple branches, multiple clones
- ✅ Fast Iteration: Create → Test → Delete in seconds
- ✅ No Conflicts: Each developer gets isolated environment
For QA Teams 🧪
- ✅ Destructive Testing: Run tests that modify/delete data
- ✅ Parallel Testing: Multiple test suites, multiple clones
- ✅ Fresh State: New clone for each test run
- ✅ Production Parity: Test on real production data
For Data Science Teams 📊
- ✅ Large Datasets: Clone TB-scale data instantly
- ✅ Experiment Tracking: One clone per experiment
- ✅ Model Training: Full production data for ML
- ✅ No Interference: Train models without affecting prod
For CI/CD Pipelines 🔄
- ✅ Fast Provisioning: Spin up test DB in seconds
- ✅ Cost Effective: No storage explosion
- ✅ Automated Testing: Clone → Test → Delete
- ✅ Scalable: Handle hundreds of parallel jobs
Multi-Team Workflow Diagram
graph TD
Production["🏢 Production Database<br>1TB - User Behavior Logs<br>Millions of rows"]
Snapshot["📸 Optional: Snapshot<br>Point-in-time backup<br>< 1 second"]
DSClone["📊 Data Science Clone<br>⚡ 5 seconds<br>💰 +10MB storage"]
QAClone["🧪 QA Clone<br>⚡ 5 seconds<br>💰 +5MB storage"]
DevClone["👨💻 Dev Clone<br>⚡ 5 seconds<br>💰 +8MB storage"]
TTClone["⏰ Time-Travel Clone<br>From snapshot<br>💰 +0MB storage"]
DSWork["ML Model Training<br>Modify 100 rows<br>Add predictions"]
QAWork["Integration Tests<br>Delete test data<br>Run destructive tests"]
DevWork["Schema Changes<br>Add indexes<br>Insert test data"]
TTWork["Historical Testing<br>Yesterday's data<br>Regression tests"]
DSDelete["🗑️ Delete DS Clone<br>Production unaffected"]
QADelete["🗑️ Delete QA Clone<br>Other clones unaffected"]
DevDelete["🗑️ Delete Dev Clone<br>Work independently"]
Production --> Snapshot
Production --> DSClone
Production --> QAClone
Production --> DevClone
Snapshot --> TTClone
DSClone --> DSWork
QAClone --> QAWork
DevClone --> DevWork
TTClone --> TTWork
DSWork --> DSDelete
QAWork --> QADelete
DevWork --> DevDelete
style Production fill:#d4edda,stroke:#28a745,stroke-width:3px
style Snapshot fill:#fff3cd
style DSClone fill:#d1ecf1
style QAClone fill:#d1ecf1
style DevClone fill:#d1ecf1
style TTClone fill:#d1ecf1
style DSWork fill:#e2e3e5
style QAWork fill:#e2e3e5
style DevWork fill:#e2e3e5
style TTWork fill:#e2e3e5
Workflow Explanation
| Step | Action | Time | Storage | Team Isolation |
|---|---|---|---|---|
| 1️⃣ Production | Large database running | - | 1TB | Source data |
| 2️⃣ Clone DS | Data Science team | 5s | +0MB | Independent |
| 3️⃣ Clone QA | QA team | 5s | +0MB | Independent |
| 4️⃣ Clone Dev | Dev team | 5s | +0MB | Independent |
| 5️⃣ Modify | Each team works | - | +deltas | Isolated |
| 6️⃣ Delete | Clean up clones | 1s | Freed | No impact |
Key Points:
- 🟢 Green: Production database (untouched)
- 🔵 Blue: Team clones (instant, isolated)
- ⚪ Gray: Independent modifications
- All clones can be deleted without affecting production or each other!
Copy-on-Write Magic
How It Works:
When you clone a database:
- ✅ No data copying: Only metadata created (< 5 seconds)
- ✅ Shared storage: All clones read from same underlying data
- ✅ Write isolation: Modified data stored separately (Copy-on-Write)
- ✅ Independent lifecycle: Delete clones without affecting source
Example:
Production: 1TB
+ DS Clone: 0MB (shared read)
+ QA Clone: 0MB (shared read)
+ Dev Clone: 0MB (shared read)
After work:
+ DS modifies 100 rows → +10MB
+ QA deletes 500 rows → +5MB
+ Dev adds 200 rows → +8MB
Total storage: 1.023TB (not 4TB!)
Savings: 75% storage cost 💰
MatrixOne Python SDK Documentation
For complete API reference, see MatrixOne Python SDK Documentation
Before You Start
Prerequisites
- MatrixOne database installed and running
- Python 3.7 or higher
- MatrixOne Python SDK installed
pip3 install matrixone-python-sdk
Import Required Libraries
from matrixone import Client, SnapshotLevel
from matrixone.config import get_connection_params
from matrixone.orm import declarative_base
from matrixone.sqlalchemy_ext import create_vector_column
from sqlalchemy import BigInteger, Column, String, Integer, Float, Text
from datetime import datetime
import time
import numpy as np
Complete Working Example
Phase 1: Setup Production Database
Connect to Database
from matrixone import Client
from matrixone.config import get_connection_params
# Connect to MatrixOne
host, port, user, password, database = get_connection_params()
client = Client()
client.connect(host=host, port=port, user=user, password=password, database=database)
print(f"Connected to {host}:{port}/{database}")
Create Production Database with Large Table
from matrixone.orm import declarative_base
from matrixone.sqlalchemy_ext import create_vector_column
from sqlalchemy import BigInteger, Column, String, Integer, Float
Base = declarative_base()
# Define large production table
class UserBehavior(Base):
"""Large production table: user behavior logs"""
__tablename__ = "user_behavior"
id = Column(BigInteger, primary_key=True, autoincrement=True)
user_id = Column(BigInteger)
product_id = Column(BigInteger)
action = Column(String(50)) # view, click, purchase
timestamp = Column(BigInteger)
session_id = Column(String(100))
device_type = Column(String(50))
price = Column(Float)
quantity = Column(Integer)
embedding = create_vector_column(128, "f32") # Behavior embedding
# Create production database
prod_db = "production_data"
client.execute(f"CREATE DATABASE IF NOT EXISTS {prod_db}")
# Connect to production
prod_client = Client()
prod_client.connect(host=host, port=port, user=user, password=password, database=prod_db)
# Create table
prod_client.create_table(UserBehavior)
print(f"Created production table: {UserBehavior.__tablename__}")
Insert Large Dataset
import numpy as np
# Simulate large production dataset (1000 rows = millions in production)
actions = ["view", "click", "add_to_cart", "purchase", "review"]
devices = ["mobile", "desktop", "tablet"]
batch_data = []
for i in range(1000):
batch_data.append({
"user_id": (i % 100) + 1,
"product_id": (i % 50) + 1,
"action": actions[i % len(actions)],
"timestamp": int(time.time() * 1000) + i,
"session_id": f"session_{i // 10}",
"device_type": devices[i % len(devices)],
"price": round(10 + (i % 500) * 1.5, 2),
"quantity": (i % 5) + 1,
"embedding": np.random.rand(128).astype(np.float32).tolist()
})
prod_client.batch_insert(UserBehavior, batch_data)
total_records = prod_client.query(UserBehavior).count()
print(f"Production ready: {total_records:,} records (~100MB, represents 10GB+ in production)")
prod_client.disconnect()
Phase 2: Data Science Team - Clone for ML Training
Instant Clone for Data Science
# ⚡ INSTANT CLONE: 1TB database cloned in < 5 seconds
# 💰 ZERO STORAGE OVERHEAD: 1TB stays 1TB (Copy-on-Write)
ds_db = "datasci_experiment_ml"
clone_start = time.time()
# Clone production database - instant operation!
client.clone.clone_database(
target_db=ds_db,
source_db=prod_db
)
clone_time = time.time() - clone_start
print(f"Data Science clone completed in {clone_time:.2f} seconds")
print(f"No data copied - metadata operation only")
print(f"Storage: ~0 MB additional (Copy-on-Write)")
Data Science Work: ML Model Training
# Connect to DS database
ds_client = Client()
ds_client.connect(host=host, port=port, user=user, password=password, database=ds_db)
# Verify clone has same data
ds_count = ds_client.query(UserBehavior).count()
print(f"DS clone verified: {ds_count:,} records")
# Add ML model predictions (triggers Copy-on-Write)
for i in range(100):
ds_client.query(UserBehavior).filter(
UserBehavior.id == i + 1
).update(price=999.99) # Mark as processed by ML model
print(f"Updated 100 records with ML predictions")
print(f"Only modified rows are stored (Copy-on-Write)")
print(f"Production data: completely unaffected")
ds_client.disconnect()
Phase 3: QA Team - Clone for Integration Testing
Instant Clone for QA
qa_db = "qa_integration_test"
# ⚡ Another instant clone - still no storage overhead!
client.clone.clone_database(
target_db=qa_db,
source_db=prod_db
)
print(f"QA clone created: {qa_db}")
print(f"QA can run destructive tests safely")
QA Work: Destructive Testing
# Connect to QA database
qa_client = Client()
qa_client.connect(host=host, port=port, user=user, password=password, database=qa_db)
# Run destructive tests - delete data
qa_client.query(UserBehavior).filter(
UserBehavior.action == "purchase"
).delete().execute()
qa_count = qa_client.query(UserBehavior).count()
print(f"QA deleted purchase records for testing")
print(f"QA database now: {qa_count:,} records")
print(f"Production: unaffected")
print(f"DS clone: unaffected")
qa_client.disconnect()
Phase 4: Dev Team - Clone for Schema Experimentation
Instant Clone for Development
dev_db = "dev_schema_experiment"
# ⚡ Third instant clone - still efficient!
client.clone.clone_database(
target_db=dev_db,
source_db=prod_db
)
print(f"Dev clone created: {dev_db}")
Dev Work: Schema Changes and Testing
# Connect to Dev database
dev_client = Client()
dev_client.connect(host=host, port=port, user=user, password=password, database=dev_db)
# Experiment with vector index
dev_client.vector_ops.create_ivf(
UserBehavior,
"idx_embedding_test",
"embedding",
lists=10,
op_type="vector_l2_ops"
)
print(f"Created IVF index on embedding column")
print(f"Testing vector search performance")
# Insert test data
test_records = []
for i in range(50):
test_records.append({
"user_id": 999, # Test user
"product_id": 999,
"action": "test_action",
"timestamp": int(time.time() * 1000),
"session_id": f"test_session_{i}",
"device_type": "test_device",
"price": 0.01,
"quantity": 1,
"embedding": np.random.rand(128).astype(np.float32).tolist()
})
dev_client.batch_insert(UserBehavior, test_records)
dev_count = dev_client.query(UserBehavior).count()
print(f"Dev inserted {len(test_records)} test records")
print(f"Dev database: {dev_count:,} records")
dev_client.disconnect()
Phase 5: Verify Production Integrity
# Reconnect to production
prod_client = Client()
prod_client.connect(host=host, port=port, user=user, password=password, database=prod_db)
prod_count = prod_client.query(UserBehavior).count()
test_users = prod_client.query(UserBehavior).filter(UserBehavior.user_id == 999).count()
print(f"\n Production Database Integrity Check:")
print(f"Original records: {total_records:,}")
print(f"Current records: {prod_count:,}")
print(f"Data integrity: {'PRESERVED' if prod_count == total_records else 'MODIFIED'}")
print(f"Test records: {test_users} (expected: 0)")
print(f"\n All team clones are completely isolated!")
prod_client.disconnect()
Phase 6: Storage Efficiency Analysis
print("\n"+"="* 70)
print("Storage Efficiency Comparison")
print("="* 70)
print("\n Traditional Copy Approach:")
print(f"Production: 10 GB")
print(f"DS Clone: 10 GB (full copy)")
print(f"QA Clone: 10 GB (full copy)")
print(f"Dev Clone: 10 GB (full copy)")
print(f"─────────────────────")
print(f"Total: 40 GB")
print(f"Time: ~90 minutes (30 min × 3)")
print("\n MatrixOne Clone (Copy-on-Write):")
print(f"Production: 10 GB")
print(f"DS Clone: ~0 GB + 10 MB (modified data)")
print(f"QA Clone: ~0 GB + 5 MB (modified data)")
print(f"Dev Clone: ~0 GB + 8 MB (modified data)")
print(f"─────────────────────")
print(f"Total: ~10.023 GB")
print(f"Time: ~15 seconds (5 sec × 3)")
print("\n Savings:")
print(f"Storage: 75% saved (29.977 GB)")
print(f"Time: 99.7% faster (89.75 min saved)")
print(f"Cloud cost: ~75% reduction")
print(f"Team productivity: Unlimited!")
Phase 7: Independent Clone Deletion
# Delete QA clone - production and other clones unaffected
client.execute(f"DROP DATABASE {qa_db}")
print(f"\n Dropped QA database: {qa_db}")
# Verify production still intact
prod_client = Client()
prod_client.connect(host=host, port=port, user=user, password=password, database=prod_db)
prod_count_final = prod_client.query(UserBehavior).count()
print(f"Production after QA clone deletion:")
print(f"Records: {prod_count_final:,} (unchanged)")
print(f"Other clones (DS, Dev): still accessible")
prod_client.disconnect()
Phase 8: Time-Travel Testing with Snapshots
# Create snapshot of current state
snapshot_name = f"prod_snapshot_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
client.snapshots.create(
name=snapshot_name,
level=SnapshotLevel.DATABASE,
database=prod_db
)
print(f"\n Created snapshot: {snapshot_name}")
# Simulate production getting new data (today's data)
prod_client = Client()
prod_client.connect(host=host, port=port, user=user, password=password, database=prod_db)
new_data = []
for i in range(100):
new_data.append({
"user_id": 200 + i,
"product_id": 1,
"action": "new_purchase",
"timestamp": int(time.time() * 1000),
"session_id": f"new_session_{i}",
"device_type": "mobile",
"price": 99.99,
"quantity": 1,
"embedding": np.random.rand(128).astype(np.float32).tolist()
})
prod_client.batch_insert(UserBehavior, new_data)
new_total = prod_client.query(UserBehavior).count()
print(f"Production received 100 new records")
print(f"Production now: {new_total:,} records")
prod_client.disconnect()
# Clone from snapshot (yesterday's data)
timetravel_db = "test_yesterday_data"
client.clone.clone_database_with_snapshot(
target_db=timetravel_db,
source_db=prod_db,
snapshot_name=snapshot_name
)
# Verify historical data
tt_client = Client()
tt_client.connect(host=host, port=port, user=user, password=password, database=timetravel_db)
tt_count = tt_client.query(UserBehavior).count()
print(f"\n Time-travel clone created:")
print(f"Production (today): {new_total:,} records")
print(f"Clone (snapshot): {tt_count:,} records")
print(f"Testing against historical data!")
tt_client.disconnect()
Clone Operations Reference
Basic Clone Operations
Clone Current Database
# Clone current state of database
client.clone.clone_database(
target_db="new_database_name",
source_db="source_database"
)
Use Cases:
- Quick test environment
- Parallel development branches
- Data Science experiments
- QA testing
Clone from Snapshot
# Clone historical state from snapshot
client.clone.clone_database_with_snapshot(
target_db="historical_clone",
source_db="production",
snapshot_name="prod_snapshot_20250110"
)
Use Cases:
- Time-travel testing
- Compare before/after states
- Regression testing
- Historical analysis
Snapshot Operations
Create Snapshot
client.snapshots.create(
name="my_snapshot",
level=SnapshotLevel.DATABASE,
database="production"
)
List Snapshots
snapshots = client.snapshots.list()
for snap in snapshots:
print(f"Snapshot: {snap.name}, Created: {snap.created_at}")
Delete Snapshot
client.snapshots.delete("my_snapshot")
Best Practices
1. Use Clones for Every Test Run
Fresh Start Every Time
def run_test_suite():
"""Create fresh clone for each test run"""
test_db = f"test_run_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
# Clone production
client.clone.clone_database(
target_db=test_db,
source_db="production"
)
# Run tests
try:
run_tests_on(test_db)
finally:
# Clean up
client.execute(f"DROP DATABASE {test_db}")
2. Leverage Copy-on-Write for CI/CD
Parallel CI Jobs
# Each CI job gets its own clone - no storage penalty!
# Job 1: Unit tests
client.clone.clone_database(target_db="ci_job_1", source_db="prod")
# Job 2: Integration tests
client.clone.clone_database(target_db="ci_job_2", source_db="prod")
# Job 3: Performance tests
client.clone.clone_database(target_db="ci_job_3", source_db="prod")
# Total time: ~15 seconds for all 3
# Total storage: ~production size (not 3x!)
3. Use Snapshots for Time-Travel
Historical Testing
# Daily snapshot
client.snapshots.create(
name=f"daily_{datetime.now().strftime('%Y%m%d')}",
level=SnapshotLevel.DATABASE,
database="production"
)
# Clone from last week's snapshot for regression test
client.clone.clone_database_with_snapshot(
target_db="regression_test",
source_db="production",
snapshot_name="daily_20250103"
)
4. Clean Up Clones Regularly
Automated Cleanup
def cleanup_old_clones(prefix="test_", days_old=7):
"""Drop clones older than N days"""
cutoff = datetime.now() - timedelta(days=days_old)
# List all databases
databases = client.execute("SHOW DATABASES")
for db in databases:
if db['name'].startswith(prefix):
# Check creation time and drop if old
# (implementation depends on metadata tracking)
pass
5. Name Clones Descriptively
def generate_clone_name(team, purpose):
"""Generate descriptive clone name"""
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
return f"{team}_{purpose}_{timestamp}"
# Examples
ds_clone = generate_clone_name("datasci", "ml_training")
# → "datasci_ml_training_20250110_143052"
qa_clone = generate_clone_name("qa", "integration_test")
# → "qa_integration_test_20250110_143052"
Performance Characteristics
Clone Operations
Real-World Performance:
| Database Size | Clone Time | Storage After Clone | Storage After Modifications |
|---|---|---|---|
| 100GB | < 3 seconds | 100GB (no change) | 100GB + deltas |
| 1TB | < 5 seconds | 1TB (no change) | 1TB + deltas |
| 10TB | < 10 seconds | 10TB (no change) | 10TB + deltas |
Multi-Team Scenario:
Production: 1TB
+ 10 team clones created: ~10 seconds total
+ Storage after cloning: 1TB (unchanged!)
After 1 week of work:
+ Each team modifies ~1% of data: +10GB per clone
+ Total storage: 1TB + (10 × 10GB) = 1.1TB
+ Traditional approach: 1TB × 11 = 11TB
Savings: 90% storage cost! 💰
Snapshot Operations
Create Snapshot:
- ⚡ < 2 seconds for any size database
- 📦 Metadata operation only
- 💾 No storage overhead initially
Clone from Snapshot:
- ⚡ Same as regular clone (< 5 seconds)
- 📅 Access historical data instantly
- 💾 Copy-on-Write applies
Use Cases and Examples
Use Case 1: Data Science Experimentation
# Each data scientist gets their own clone
for scientist in ["alice", "bob", "charlie"]:
clone_db = f"datasci_{scientist}_experiment"
client.clone.clone_database(
target_db=clone_db,
source_db="production"
)
print(f"Created clone for {scientist}")
# Result: 3 full production copies in 15 seconds
# Storage: ~production size (not 3x!)
Use Case 2: Blue-Green Deployment Testing
# Current production (blue)
blue_db = "production_v1"
# Create green environment for v2 testing
green_db = "production_v2_candidate"
client.clone.clone_database(
target_db=green_db,
source_db=blue_db
)
# Test v2 changes on green
test_results = run_v2_tests(green_db)
if test_results.success:
# Promote green to production
client.execute(f"RENAME DATABASE {green_db} TO {blue_db}")
else:
# Discard green, keep blue
client.execute(f"DROP DATABASE {green_db}")
Use Case 3: Parallel A/B Testing
# Create multiple variants for A/B testing
variants = ["control", "variant_a", "variant_b", "variant_c"]
for variant in variants:
clone_db = f"ab_test_{variant}"
client.clone.clone_database(
target_db=clone_db,
source_db="production"
)
# Apply variant-specific changes
apply_variant_changes(clone_db, variant)
# Run tests
metrics = collect_metrics(clone_db)
print(f"4 parallel A/B tests completed")
print(f"Time: ~20 seconds")
print(f"Storage: ~production size")
Use Case 4: CI/CD Pipeline Integration
# .github/workflows/ci.yml equivalent in Python
def ci_pipeline(branch_name):
"""CI pipeline with isolated database"""
# 1. Create test database for this branch
test_db = f"ci_{branch_name}_{int(time.time())}"
client.clone.clone_database(
target_db=test_db,
source_db="production_snapshot"
)
# 2. Run migrations
apply_migrations(test_db)
# 3. Run tests
test_results = run_test_suite(test_db)
# 4. Clean up
client.execute(f"DROP DATABASE {test_db}")
return test_results
# Each PR gets isolated test environment
# No conflicts between parallel CI jobs
# No storage explosion
Troubleshooting
Issue: Clone Takes Longer Than Expected
Symptoms: Clone operation takes > 10 seconds
Possible Causes:
- Network latency
- Database has many small files
- First clone after MatrixOne restart
Solution:
# Subsequent clones should be faster
# First clone may take longer to warm up metadata
Issue: Cannot Drop Clone Database
Symptoms: Error when trying to drop cloned database
Possible Causes:
- Active connections to clone
- Clone being used by another process
Solution:
# Disconnect all clients first
client.disconnect()
# Then drop database
client.execute(f"DROP DATABASE {clone_db}")
Issue: Storage Growing Faster Than Expected
Symptoms: Storage usage higher than expected with Copy-on-Write
Possible Causes:
- Many modifications to cloned data
- Large bulk inserts/updates
Explanation:
# Copy-on-Write stores deltas
# If you modify 50% of cloned data, storage grows by 50%
# This is still better than traditional copy (100% overhead)
# Example:
# Production: 1TB
# Clone + modify 50%: 1TB + 0.5TB = 1.5TB
# Traditional copy: 1TB + 1TB = 2TB
# Still 25% savings!
Solution:
- Drop clones you no longer need
- Use snapshots for read-only historical access
- Consider storage budget when planning modifications
Summary
MatrixOne's Copy-on-Write cloning enables:
✅ Instant Cloning
- 1TB database cloned in < 5 seconds
- No waiting for data copy
- Perfect for rapid iteration
✅ Storage Efficiency
- 10 clones ≈ 1x storage (not 10x!)
- Copy-on-Write stores only changes
- 75-90% storage savings
✅ Team Productivity
- Each team gets isolated environment
- No conflicts between teams
- Parallel testing and development
✅ Cost Reduction
- Massive cloud storage savings
- Reduced infrastructure costs
- Better resource utilization
✅ CI/CD Friendly
- Fast test database provisioning
- Parallel job execution
- Automated workflows
Key Operations:
# Clone current database
client.clone.clone_database(target_db="new_db", source_db="source_db")
# Clone from snapshot
client.clone.clone_database_with_snapshot(
target_db="historical_clone",
source_db="production",
snapshot_name="my_snapshot"
)
# Create snapshot
client.snapshots.create(name="backup", level=SnapshotLevel.DATABASE, database="prod")
# Drop clone (no impact on source)
client.execute("DROP DATABASE clone_db")
Perfect For:
- 👨💻 Multi-team development
- 🧪 CI/CD pipelines
- 📊 Data Science experiments
- 🔬 Schema migrations
- 🧪 Integration testing
- ⏰ Time-travel debugging
Start leveraging MatrixOne's efficient cloning today and transform your team's workflow! 🚀