PDF Document Processing

Learn how to upload PDF documents to your knowledge base and extract insights through intelligent search and summarization.

What You’ll Learn

This guide demonstrates how to:

Upload PDF documents to your knowledge base
Check processing status for document content
Search within uploaded documents
Generate summaries from PDF content

Prerequisites

A Senso API key
A PDF file to upload (max size: 20MB)
Basic understanding of multipart form data uploads

Uploading a PDF Document

const API_URL = 'https://sdk.senso.ai/api/v1';
const API_KEY = 'YOUR_API_KEY';

async function uploadPDFDocument() {
  try {
    // Prepare form data
    const formData = new FormData();
    const pdfFile = await fetch('./documents/product-guide.pdf');
    const pdfBlob = await pdfFile.blob();
    
    formData.append('file', pdfBlob, 'product-guide.pdf');
    formData.append('title', 'Product Guide 2024');
    formData.append('summary', 'Comprehensive guide to our product features and capabilities');
    
    // Upload the PDF document
    const response = await fetch(`${API_URL}/content/file`, {
      method: 'POST',
      headers: {
        'X-API-Key': API_KEY
      },
      body: formData
    });
    
    const content = await response.json();
    console.log('PDF uploaded successfully!');
    console.log('Content ID:', content.id);
    console.log('Processing status:', content.processing_status);
    
    // Check processing status
    let processedContent = content;
    while (processedContent.processing_status !== 'completed') {
      await new Promise(resolve => setTimeout(resolve, 3000));
      
      const statusResponse = await fetch(`${API_URL}/content/${content.id}`, {
        headers: { 'X-API-Key': API_KEY }
      });
      processedContent = await statusResponse.json();
      console.log('Status:', processedContent.processing_status);
    }
    
    console.log('PDF processing completed!');
    
    // Search within the PDF
    const searchResponse = await fetch(`${API_URL}/search`, {
      method: 'POST',
      headers: {
        'X-API-Key': API_KEY,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        query: 'What are the main product features?',
        max_results: 5
      })
    });
    
    const searchResult = await searchResponse.json();
    console.log('\nSearch Results:');
    console.log('Answer:', searchResult.answer);
    console.log('Number of sources:', searchResult.results.length);
    
    // Generate a summary
    const summaryResponse = await fetch(`${API_URL}/generate`, {
      method: 'POST',
      headers: {
        'X-API-Key': API_KEY,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        content_type: 'product documentation PDF',
        instructions: 'Create a concise executive summary of the key points',
        max_results: 10
      })
    });
    
    const summary = await summaryResponse.json();
    console.log('\nGenerated Summary:');
    console.log(summary.generated_text);
    
  } catch (error) {
    console.error('Error processing PDF:', error);
  }
}

uploadPDFDocument();

Working with Large PDFs

For larger PDF documents, processing may take longer. Here’s how to handle it efficiently:

// Upload with progress tracking
async function uploadLargePDF(filePath) {
  const formData = new FormData();
  const fileResponse = await fetch(filePath);
  const fileBlob = await fileResponse.blob();
  const fileSize = fileBlob.size / 1024 / 1024;
  
  console.log(`Uploading ${fileSize.toFixed(2)}MB PDF...`);
  
  formData.append('file', fileBlob, 'large-document.pdf');
  formData.append('title', 'Large Document');
  
  const response = await fetch(`${API_URL}/content/file`, {
    method: 'POST',
    headers: { 'X-API-Key': API_KEY },
    body: formData
  });
  
  const content = await response.json();
  
  // Poll for status with exponential backoff
  let delay = 2000;
  while (content.processing_status === 'processing') {
    await new Promise(resolve => setTimeout(resolve, delay));
    
    const statusResponse = await fetch(`${API_URL}/content/${content.id}`, {
      headers: { 'X-API-Key': API_KEY }
    });
    const status = await statusResponse.json();
    
    if (status.processing_status === 'failed') {
      throw new Error('PDF processing failed');
    }
    
    content.processing_status = status.processing_status;
    delay = Math.min(delay * 1.5, 30000); // Max 30 seconds
  }
  
  return content;
}

Understanding PDF Processing

When you upload a PDF, Senso:

Extracts text content from all pages
Preserves document structure (headings, lists, tables)
Chunks the content for optimal search performance
Creates embeddings for semantic search

The response includes:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "document",
  "title": "Product Guide 2024",
  "file_name": "product-guide.pdf",
  "mime_type": "application/pdf",
  "processing_status": "processing",
  "created_at": "2024-01-15T10:30:00Z"
}

Best Practices

File size limits: Keep PDFs under 20MB for optimal processing
Text-based PDFs: Ensure PDFs contain selectable text (not scanned images)
Descriptive titles: Use clear titles to help identify content later
Processing time: Allow 1-2 minutes per MB for processing
Error handling: Always check processing status before searching

Common Use Cases

Extract Specific Information

result = requests.post(
    f'{API_URL}/search',
    headers={'X-API-Key': API_KEY},
    json={
        'query': 'What are the technical specifications?',
        'category_id': 'technical-docs-category'
    }
).json()

Generate FAQ from PDF

faq = requests.post(
    f'{API_URL}/generate',
    headers={'X-API-Key': API_KEY},
    json={
        'content_type': 'technical documentation',
        'instructions': 'Generate 10 frequently asked questions with answers',
        'save': True
    }
).json()

Next Steps

Create categories and topics to organize your PDFs
Use templates to structure extracted information
Generate content based on your PDF knowledge base
Build reusable snippets from PDF content

Get Started

Examples

PDF Document Processing

What You’ll Learn

Prerequisites

Uploading a PDF Document

Working with Large PDFs

Understanding PDF Processing

Best Practices

Common Use Cases

Extract Specific Information

Generate FAQ from PDF

Next Steps

Get Started

Examples

​What You’ll Learn

​Prerequisites

​Uploading a PDF Document

​Working with Large PDFs

​Understanding PDF Processing

​Best Practices

​Common Use Cases

​Extract Specific Information

​Generate FAQ from PDF

​Next Steps

What You’ll Learn

Prerequisites

Uploading a PDF Document

Working with Large PDFs

Understanding PDF Processing

Best Practices

Common Use Cases

Extract Specific Information

Generate FAQ from PDF

Next Steps