Learn how to upload PDF documents to your knowledge base and extract insights through intelligent search and summarization.

What You’ll Learn

This guide demonstrates how to:

  • Upload PDF documents to your knowledge base
  • Check processing status for document content
  • Search within uploaded documents
  • Generate summaries from PDF content

Prerequisites

  • A Senso API key
  • A PDF file to upload (max size: 20MB)
  • Basic understanding of multipart form data uploads

Uploading a PDF Document

const API_URL = 'https://sdk.senso.ai/api/v1';
const API_KEY = 'YOUR_API_KEY';

async function uploadPDFDocument() {
  try {
    // Prepare form data
    const formData = new FormData();
    const pdfFile = await fetch('./documents/product-guide.pdf');
    const pdfBlob = await pdfFile.blob();
    
    formData.append('file', pdfBlob, 'product-guide.pdf');
    formData.append('title', 'Product Guide 2024');
    formData.append('summary', 'Comprehensive guide to our product features and capabilities');
    
    // Upload the PDF document
    const response = await fetch(`${API_URL}/content/file`, {
      method: 'POST',
      headers: {
        'X-API-Key': API_KEY
      },
      body: formData
    });
    
    const content = await response.json();
    console.log('PDF uploaded successfully!');
    console.log('Content ID:', content.id);
    console.log('Processing status:', content.processing_status);
    
    // Check processing status
    let processedContent = content;
    while (processedContent.processing_status !== 'completed') {
      await new Promise(resolve => setTimeout(resolve, 3000));
      
      const statusResponse = await fetch(`${API_URL}/content/${content.id}`, {
        headers: { 'X-API-Key': API_KEY }
      });
      processedContent = await statusResponse.json();
      console.log('Status:', processedContent.processing_status);
    }
    
    console.log('PDF processing completed!');
    
    // Search within the PDF
    const searchResponse = await fetch(`${API_URL}/search`, {
      method: 'POST',
      headers: {
        'X-API-Key': API_KEY,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        query: 'What are the main product features?',
        max_results: 5
      })
    });
    
    const searchResult = await searchResponse.json();
    console.log('\nSearch Results:');
    console.log('Answer:', searchResult.answer);
    console.log('Number of sources:', searchResult.results.length);
    
    // Generate a summary
    const summaryResponse = await fetch(`${API_URL}/generate`, {
      method: 'POST',
      headers: {
        'X-API-Key': API_KEY,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        content_type: 'product documentation PDF',
        instructions: 'Create a concise executive summary of the key points',
        max_results: 10
      })
    });
    
    const summary = await summaryResponse.json();
    console.log('\nGenerated Summary:');
    console.log(summary.generated_text);
    
  } catch (error) {
    console.error('Error processing PDF:', error);
  }
}

uploadPDFDocument();

Working with Large PDFs

For larger PDF documents, processing may take longer. Here’s how to handle it efficiently:

// Upload with progress tracking
async function uploadLargePDF(filePath) {
  const formData = new FormData();
  const fileResponse = await fetch(filePath);
  const fileBlob = await fileResponse.blob();
  const fileSize = fileBlob.size / 1024 / 1024;
  
  console.log(`Uploading ${fileSize.toFixed(2)}MB PDF...`);
  
  formData.append('file', fileBlob, 'large-document.pdf');
  formData.append('title', 'Large Document');
  
  const response = await fetch(`${API_URL}/content/file`, {
    method: 'POST',
    headers: { 'X-API-Key': API_KEY },
    body: formData
  });
  
  const content = await response.json();
  
  // Poll for status with exponential backoff
  let delay = 2000;
  while (content.processing_status === 'processing') {
    await new Promise(resolve => setTimeout(resolve, delay));
    
    const statusResponse = await fetch(`${API_URL}/content/${content.id}`, {
      headers: { 'X-API-Key': API_KEY }
    });
    const status = await statusResponse.json();
    
    if (status.processing_status === 'failed') {
      throw new Error('PDF processing failed');
    }
    
    content.processing_status = status.processing_status;
    delay = Math.min(delay * 1.5, 30000); // Max 30 seconds
  }
  
  return content;
}

Understanding PDF Processing

When you upload a PDF, Senso:

  1. Extracts text content from all pages
  2. Preserves document structure (headings, lists, tables)
  3. Chunks the content for optimal search performance
  4. Creates embeddings for semantic search

The response includes:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "document",
  "title": "Product Guide 2024",
  "file_name": "product-guide.pdf",
  "mime_type": "application/pdf",
  "processing_status": "processing",
  "created_at": "2024-01-15T10:30:00Z"
}

Best Practices

  1. File size limits: Keep PDFs under 20MB for optimal processing
  2. Text-based PDFs: Ensure PDFs contain selectable text (not scanned images)
  3. Descriptive titles: Use clear titles to help identify content later
  4. Processing time: Allow 1-2 minutes per MB for processing
  5. Error handling: Always check processing status before searching

Common Use Cases

Extract Specific Information

result = requests.post(
    f'{API_URL}/search',
    headers={'X-API-Key': API_KEY},
    json={
        'query': 'What are the technical specifications?',
        'category_id': 'technical-docs-category'
    }
).json()

Generate FAQ from PDF

faq = requests.post(
    f'{API_URL}/generate',
    headers={'X-API-Key': API_KEY},
    json={
        'content_type': 'technical documentation',
        'instructions': 'Generate 10 frequently asked questions with answers',
        'save': True
    }
).json()

Next Steps