Skip to content

Fulltext Boolean Search Demo

Overview

This tutorial demonstrates Boolean mode fulltext search in MatrixOne Python SDK. Boolean mode provides precise control over search logic using operators, perfect for advanced search interfaces and filtering.

Boolean Operators:

  • MUST (+): Required terms (AND logic)
  • MUST_NOT (-): Excluded terms (NOT logic)
  • 📈 ENCOURAGE: Optional terms that boost relevance
  • 📉 DISCOURAGE (~): Terms that reduce relevance
  • 🔤 PHRASE (""): Exact phrase matching

Perfect For:

  • Advanced search interfaces
  • Precise query control
  • Power user features
  • Complex filtering logic
  • Professional search tools

MatrixOne Python SDK Documentation

For complete API reference, see MatrixOne Python SDK Documentation

Before You Start

Prerequisites

  • MatrixOne database installed and running
  • Python 3.7 or higher
  • MatrixOne Python SDK installed

Install SDK

pip3 install matrixone-python-sdk

Complete Working Example

Save this as boolean_search_demo.py and run with python3 boolean_search_demo.py:

from matrixone import Client, FulltextIndex, FulltextAlgorithmType
from matrixone.config import get_connection_params
from matrixone.sqlalchemy_ext import boolean_match
from matrixone.orm import declarative_base
from sqlalchemy import BigInteger, Column, String, Text, Integer, Float

print("="* 70)
print("MatrixOne Boolean Mode Fulltext Search Demo")
print("="* 70)

# Connect to database
host, port, user, password, database = get_connection_params(database='demo')
client = Client()
client.connect(host=host, port=port, user=user, password=password, database=database)
print(f"Successfully connected to database")

# Define table structure
Base = declarative_base()

class Article(Base):
    """Article table with BM25 fulltext search"""
    __tablename__ = "boolean_search_articles"

    id = Column(BigInteger, primary_key=True)
    title = Column(String(200))
    content = Column(Text)
    category = Column(String(100))
    author = Column(String(100))
    views = Column(Integer)
    rating = Column(Float)

    __table_args__ = (
        FulltextIndex("idx_article_search", ["title", "content"],
                     algorithm=FulltextAlgorithmType.BM25),
    )

# Create table
client.fulltext_index.enable_fulltext()
client.execute('SET ft_relevancy_algorithm = "BM25"')
client.drop_table(Article)
client.create_table(Article)
print("Table created with BM25 index")

# Insert sample articles
sample_articles = [
    {
        "id": 1,
        "title": "Introduction to Machine Learning",
        "content": "Machine learning is a subset of AI...",
        "category": "AI",
        "author": "Alice Johnson",
        "views": 1500,
        "rating": 4.5
    },
    # ... more articles
]

client.batch_insert(Article, sample_articles)
print(f"Inserted {len(sample_articles)} articles")

# MUST search - both terms required
print("\nMUST search: 'machine' AND 'learning'")
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("machine", "learning")
).execute()

for row in results.fetchall():
    print(f"- {row.title}")

# MUST_NOT search - exclude terms
print("\nMUST 'programming' but NOT 'legacy'")
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("programming")
    .must_not("legacy")
).execute()

for row in results.fetchall():
    print(f"- {row.title}")

# Cleanup
client.disconnect()
print("\n Demo completed!")

Boolean Operators Explained

1. MUST (+) - Required Terms

All specified terms must appear in the document (AND logic):

# Both 'machine' AND 'learning' must be present
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("machine", "learning")
).execute()

Equivalent to traditional syntax: +machine +learning

2. MUST_NOT (-) - Excluded Terms

Specified terms must not appear in the document:

# Must contain 'programming' but must NOT contain 'legacy'
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("programming")
    .must_not("legacy")
).execute()

Equivalent to traditional syntax: +programming -legacy

3. ENCOURAGE - Boost Relevance

Optional terms that increase relevance score if present:

# Must have 'Python', prefer articles with 'data' or 'science'
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("Python")
    .encourage("data", "science")
).execute()

Behavior: Articles with encouraged terms rank higher, but articles without them still match.

4. DISCOURAGE (~) - Reduce Relevance

Terms that decrease relevance score if present:

# Must have 'Python', discourage 'legacy' (but still matches)
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("Python")
    .encourage("practices")
    .discourage("legacy")
).execute()

Equivalent to traditional syntax: +Python practices ~legacy

5. PHRASE - Exact Phrase Match

Match exact phrase (multiple words in order):

# Find exact phrase "neural networks"
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).phrase("neural networks")
).execute()

# Find exact phrase "best practices"
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).phrase("best practices")
).execute()

Equivalent to traditional syntax: "neural networks"

Usage Examples

Simple MUST Searches

Single Required Term

# Find all articles about 'Python'
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("Python")
).execute()

Multiple Required Terms

# Must contain both 'machine' and 'learning'
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("machine", "learning")
).execute()

Excluding Terms with MUST_NOT

Exclude Single Term

# Contains 'learning' but excludes 'deep'
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("learning")
    .must_not("deep")
).execute()

Exclude Multiple Terms

# Contains 'programming' but excludes both 'legacy' and 'deprecated'
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("programming")
    .must_not("legacy", "deprecated")
).execute()

Boosting with ENCOURAGE

# Find Python articles, boost those about data science
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("Python")
    .encourage("data", "science")
).execute()

# Articles with 'data' or 'science' will rank higher

Reducing Relevance with DISCOURAGE

# Find programming articles, lower rank for legacy code
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("programming")
    .discourage("legacy", "deprecated")
).execute()

# Legacy articles still match but appear lower in results

Exact Phrase Matching

# Find exact phrase
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).phrase("neural networks")
).execute()

# Only matches documents with "neural" followed by "networks"

Complex Combinations

Combine multiple operators for sophisticated queries:

# MUST 'learning' + ENCOURAGE 'machine'/'deep' + MUST_NOT 'legacy'
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("learning")
    .encourage("machine", "deep")
    .must_not("legacy")
).execute()

Combine with WHERE Conditions

Filter by category, author, rating, etc.:

# Boolean search + category filter
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("learning")
).filter(
    Article.category == "AI"
).filter(
    Article.rating >= 4.5
).execute()

# Boolean search + author filter
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("machine")
).filter(
    Article.author == "Alice Johnson"
).execute()

# Boolean search + views filter
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("programming")
).filter(
    Article.views > 1000
).execute()

Sorting Results

# Sort by views (descending)
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).encourage("programming", "development")
).order_by(Article.views.desc()).limit(5).execute()

# Sort by rating (descending)
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("learning")
).order_by(Article.rating.desc()).limit(5).execute()

# Multiple sort fields
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("Python")
).order_by(
    Article.rating.desc(),
    Article.views.desc()
).execute()

Sample Data

Complete sample data with 8 articles covering various topics:

sample_articles = [
    {
        "id": 1,
        "title": "Introduction to Machine Learning",
        "content": "Machine learning is a subset of artificial intelligence that enables computers to learn from data without explicit programming.",
        "category": "AI",
        "author": "Alice Johnson",
        "views": 1500,
        "rating": 4.5
    },
    {
        "id": 2,
        "title": "Python Programming Best Practices",
        "content": "Python is a versatile programming language. This article covers best practices for writing clean, maintainable Python code.",
        "category": "Programming",
        "author": "Bob Smith",
        "views": 2300,
        "rating": 4.8
    },
    {
        "id": 3,
        "title": "Deep Learning with Neural Networks",
        "content": "Deep learning uses multi-layer neural networks to learn complex patterns in data for tasks like image recognition.",
        "category": "AI",
        "author": "Alice Johnson",
        "views": 3200,
        "rating": 4.7
    },
    {
        "id": 4,
        "title": "Web Development with JavaScript",
        "content": "JavaScript is essential for modern web development. Learn DOM manipulation, async programming, and popular frameworks.",
        "category": "Web",
        "author": "Carol Williams",
        "views": 1800,
        "rating": 4.3
    },
    {
        "id": 5,
        "title": "Database Design Principles",
        "content": "Good database design is crucial for application performance. Covers normalization, indexing, and query optimization.",
        "category": "Database",
        "author": "David Brown",
        "views": 1200,
        "rating": 4.6
    },
    {
        "id": 6,
        "title": "Machine Learning Algorithms",
        "content": "Comprehensive guide to machine learning algorithms including supervised learning, unsupervised learning techniques.",
        "category": "AI",
        "author": "Alice Johnson",
        "views": 2800,
        "rating": 4.9
    },
    {
        "id": 7,
        "title": "Python Data Science Tools",
        "content": "Python offers powerful libraries for data science including NumPy, Pandas, Matplotlib for data analysis.",
        "category": "Data Science",
        "author": "Emma Davis",
        "views": 2100,
        "rating": 4.4
    },
    {
        "id": 8,
        "title": "Legacy Code Maintenance",
        "content": "Working with legacy code requires patience and careful refactoring to maintain functionality.",
        "category": "Programming",
        "author": "Frank Miller",
        "views": 900,
        "rating": 3.8
    },
]

Boolean vs Natural Language Mode

When to Use Boolean Mode

Use Boolean mode when:

  • ✅ You need precise control over term inclusion/exclusion
  • ✅ Building advanced search interfaces with filters
  • ✅ Users are familiar with search operators
  • ✅ You need exact phrase matching
  • ✅ Fine-tuning relevance is important

When to Use Natural Language Mode

Use Natural Language mode when:

  • ✅ Building simple, user-friendly search boxes
  • ✅ Users are not technical (like Google search)
  • ✅ Automatic stopword removal is desired
  • ✅ Question-like queries are common
  • ✅ Simplicity is more important than precision

Comparison Table

Feature Boolean Mode Natural Language Mode
Operators Explicit (+, -, ~, "") None needed
Control Precise Automatic
Stopwords Manual Automatic removal
Use Case Advanced users General users
Relevance Fine-tunable Auto-optimized
Example +python -legacy python programming

Advanced Use Cases

class Product(Base):
    __tablename__ = "products"
    name = Column(String(200))
    description = Column(Text)
    brand = Column(String(100))
    price = Column(Float)

    __table_args__ = (
        FulltextIndex("idx_product_search", ["name", "description"],
                     algorithm=FulltextAlgorithmType.BM25),
    )

# Find laptops, prefer gaming, exclude refurbished
results = client.query(Product).filter(
    boolean_match(Product.name, Product.description)
    .must("laptop")
    .encourage("gaming")
    .must_not("refurbished")
).filter(
    Product.price < 2000
).execute()
class JobPosting(Base):
    __tablename__ = "jobs"
    title = Column(String(200))
    description = Column(Text)
    requirements = Column(Text)

    __table_args__ = (
        FulltextIndex("idx_job_search", ["title", "description", "requirements"],
                     algorithm=FulltextAlgorithmType.BM25),
    )

# Find Python jobs, prefer senior, exclude internships
results = client.query(JobPosting).filter(
    boolean_match(JobPosting.title, JobPosting.description, JobPosting.requirements)
    .must("Python")
    .encourage("senior", "lead")
    .must_not("intern", "junior")
).execute()
class ResearchPaper(Base):
    __tablename__ = "papers"
    title = Column(String(500))
    abstract = Column(Text)
    keywords = Column(String(500))

    __table_args__ = (
        FulltextIndex("idx_paper_search", ["title", "abstract", "keywords"],
                     algorithm=FulltextAlgorithmType.BM25),
    )

# Find ML papers, prefer deep learning, exclude surveys
results = client.query(ResearchPaper).filter(
    boolean_match(ResearchPaper.title, ResearchPaper.abstract)
    .must("machine learning")
    .encourage("deep learning", "neural")
    .must_not("survey", "review")
).execute()

4. News Article Search

# Recent tech news, exclude politics
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("technology")
    .encourage("AI", "blockchain", "cloud")
    .must_not("politics", "election")
).filter(
    Article.published_date > "2025-01-01"
).order_by(Article.published_date.desc()).execute()

Best Practices

1. Use MUST for Required Terms

# Good: Explicitly mark required terms
boolean_match(Article.title, Article.content).must("Python", "tutorial")

# Avoid: Relying only on ENCOURAGE for critical terms
boolean_match(Article.title, Article.content).encourage("Python", "tutorial")

2. Combine Operators Wisely

# Good: Clear search intent
boolean_match(Article.title, Article.content)
.must("programming")        # Required
.encourage("Python", "Go")  # Prefer these languages
.must_not("deprecated")     # Exclude old content

# Avoid: Too many MUST terms (overly restrictive)
boolean_match(Article.title, Article.content)
.must("programming", "Python", "tutorial", "beginner", "guide")  # Too strict

3. Use Filters to Narrow Results

# Good: Filter by category first
results = client.query(Article).filter(
    Article.category == "AI"  # Fast indexed filter
).filter(
    boolean_match(Article.title, Article.content).must("learning")
).execute()

4. Sort for Better UX

# Sort by relevance + popularity
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("Python")
).order_by(Article.views.desc()).limit(10).execute()

5. Use PHRASE for Exact Matches

# When term order matters
boolean_match(Article.title, Article.content).phrase("machine learning")

# Better than:
boolean_match(Article.title, Article.content).must("machine", "learning")
# ^ This matches "learning machine" too

Operator Chaining Examples

# Find AI articles, prefer deep learning, exclude beginner content
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("artificial intelligence")
    .encourage("deep learning", "neural networks")
    .must_not("beginner", "introduction")
).filter(
    Article.rating >= 4.0
).execute()
# Find Python tutorials, prefer advanced, exclude legacy
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("Python", "tutorial")
    .encourage("advanced", "expert")
    .must_not("legacy", "deprecated", "Python 2")
).execute()

Example 3: Research Papers

# Find ML research, prefer transformers, exclude surveys
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("machine learning")
    .encourage("transformer", "attention", "BERT")
    .must_not("survey", "review paper")
).order_by(Article.views.desc()).execute()

Example 4: Product Reviews

# Find product reviews, must be positive, exclude negative terms
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content)
    .must("review")
    .encourage("excellent", "amazing", "great")
    .must_not("terrible", "awful", "broken")
).filter(
    Article.rating >= 4.0
).execute()

Troubleshooting

Issue: "No results with MUST operator"

Solution: Terms might be too restrictive, try ENCOURAGE instead

# Too restrictive - might return 0 results
boolean_match(Article.title, Article.content).must("machine", "learning", "deep", "neural")

# Better: Use MUST for key terms, ENCOURAGE for optional
boolean_match(Article.title, Article.content)
.must("machine", "learning")
.encourage("deep", "neural")

Issue: "Phrase search not working"

Solution: Ensure exact phrase exists in content

# Check if phrase exists
results = client.query(Article).filter(
    Article.content.like("%neural networks%")
).execute()

# Then try phrase search
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).phrase("neural networks")
).execute()

Issue: "ENCOURAGE not affecting ranking"

Solution: Ensure BM25 algorithm is set

# Set BM25 for proper relevance scoring
client.execute('SET ft_relevancy_algorithm = "BM25"')

# Verify it's set
result = client.execute("SHOW VARIABLES LIKE 'ft_relevancy_algorithm'")

Issue: "Search ignoring MUST_NOT"

Solution: Verify filter syntax is correct

# Correct: Use must_not() method
boolean_match(Article.title, Article.content)
.must("Python")
.must_not("legacy")  # ✅ Correct

# Wrong: Trying to use negative operator in string
boolean_match(Article.title, Article.content, query="+Python -legacy")  # ❌ Wrong

Performance Optimization

1. Add Category Filters

# Fast: Filter by category first
results = client.query(Article).filter(
    Article.category.in_(["AI", "Programming"])
).filter(
    boolean_match(Article.title, Article.content).must("learning")
).execute()

2. Use LIMIT

# Always limit results for pagination
results = client.query(Article).filter(
    boolean_match(Article.title, Article.content).must("Python")
).limit(20).execute()

3. Select Specific Columns

# Don't fetch all columns if not needed
results = client.query(
    Article.id, Article.title, Article.category
).filter(
    boolean_match(Article.title, Article.content).must("Python")
).execute()

Reference

Summary

Boolean mode fulltext search provides:

Precise Control: Use +, -, ~ operators for exact logic ✅ Complex Queries: Combine multiple operators ✅ Phrase Matching: Exact phrase search with "" ✅ Relevance Tuning: ENCOURAGE/DISCOURAGE for ranking ✅ Professional: Perfect for advanced search interfaces

Comparison:

  • Natural Language Mode: User-friendly, automatic (like Google)
  • Boolean Mode: Powerful, precise (like advanced search)

Choose based on your users' needs! 🚀