June 28, 2025·13 min read

Quick Apply: Building a Serverless AI Interview Assistant

Muhammed Akif KayaSoftware Developer - Guest Author

Job interviews are ripe for AI disruption. What if an AI could handle the initial screening conversation with candidates, freeing up human recruiters until it's time to engage the top prospects? I set out to explore this idea by building Upstash Quick Apply, a serverless, LLM-powered interview assistant that transforms the traditional hiring process.

In this post, I'll share why this problem matters and how I built a solution using Upstash Redis and Upstash Vector – all while focusing on developer experience and rapid iteration. The result? A chatbot that collects job applications through a conversational interface, evaluates answers in real-time, and even lets candidates ask questions about the company.

Why Traditional Hiring Doesn’t Scale

As a developer, I'm always looking for processes that can be improved with automation and creativity. Hiring is one of those processes: recruiting teams often sift through hundreds of applications, asking the same basic questions to each applicant. Why not delegate those repetitive conversations to an AI assistant?

From the candidate's side, a chat-based application process could be more engaging than filling out yet another form. Candidates can get immediate feedback and even ask questions like "What's the company culture like?" in real time. It transforms the dreaded multi-step form into a conversational experience.

What is Upstash Quick Apply?

Upstash Quick Apply is essentially a smart chatbot interviewer deployed as a web app. Here's what this application can do:

🎯 Conversational Q&A Flow: Instead of static forms, the candidate experiences a friendly back-and-forth interview. The assistant sequentially asks predefined questions (from a JSON config) and records the answers.

📊 Real-time Answer Evaluation: As answers come in, the app assigns scores based on keywords or criteria set by the recruiter (configured in job-config.json), tallying up a total score out of 100. This score is included in the final output to the recruiter.

📚 Document-powered Responses: If the candidate asks follow-up questions about the company or position, the assistant retrieves relevant info from company documents (indexed in a vector database) and responds factually. If it doesn't know an answer from the docs, it flags the recruiter instead of hallucinating.

🔗 Seamless Integrations: After the Q&A, candidates can upload their CV PDF, which is saved to a Google Drive folder. All Q&A data and scores are logged to a Google Sheet and emailed to recruiters.

🎨 Modern UX: The interface supports light/dark mode and is mobile responsive, ensuring a smooth candidate experience.

Architecture: Serverless by Design

The beauty of this project is showcasing how to combine Large Language Models (LLMs) with serverless data services to create a dynamic, production-ready app without managing infrastructure. The architecture consists of three main components:

Backend APIs (Serverless Functions)

A single serverless POST handler acts as the assistant’s brain: it re-hydrates chat history from Redis, logs the new user message, and branches on the stage flag—steering the structured interview flow in questioning mode or invoking vector-search Q&A in qa. After crafting a response, it writes the assistant turn back to Redis, guaranteeing the next request starts with an up-to-date transcript. This switchboard keeps interview phases cleanly separated while delivering a seamless, coherent conversation.

// Main chat handler API
export async function POST(request: Request) {
  const { message, sessionId, stage } = await request.json()
  
  // Get conversation history from Redis
  const history = await getSessionHistory(sessionId)
  
  // Save user message
  await saveMessage(sessionId, {
    role: 'user',
    content: message,
    timestamp: Date.now()
  })
  
  let response
  
  switch(stage) {
    case 'questioning':
      response = await handleQuestioningStage(sessionId, message)
      break
    case 'qa':
      response = await handleUserQuery(message, sessionId)
      break
    default:
      response = await generateNextQuestion(sessionId)
  }
  
  // Save assistant response
  await saveMessage(sessionId, {
    role: 'assistant',
    content: response,
    timestamp: Date.now()
  })
  
  return Response.json({ response })
}

Upstash Databases (Redis & Vector)

The global data layers that make the app stateful and intelligent.

Upstash Redis: Chat Memory That Scales

To keep every interview coherent, each session stores its turn-by-turn dialogue in a Redis list, while a lightweight Redis hash tracks live metadata such as the active question, running score, and current stage. Lists preserve chronological order; hashes update atomically, so even a surge of simultaneous candidates won’t collide. A 30-day TTL enforces automatic data expiry for privacy, and because Redis is serverless and HTTP-native on Upstash, this pattern scales effortlessly from a handful of demos to thousands of parallel interviews without extra infrastructure.

Here's how we implement session management:

import { Redis } from "@upstash/redis";
 
// Initialize Redis client (uses env vars automatically)
const redis = Redis.fromEnv();
 
// Save messages to conversation history
export async function saveMessage(sessionId: string, message: any) {
  const key = `session:${sessionId}:messages`
  
  // Use Redis list to maintain conversation order
  await redis.rpush(key, JSON.stringify(message))
  
  // Set TTL for privacy (30 days)
  await redis.expire(key, 30 * 24 * 60 * 60)
}
 
// Retrieve conversation history for context
export async function getSessionHistory(sessionId: string) {
  const key = `session:${sessionId}:messages`
  const history = await redis.lrange(key, 0, -1)
  
  return history.map(item => JSON.parse(item))
}
 
// Track interview state
export async function updateInterviewState(sessionId: string, state: any) {
  const key = `session:${sessionId}:state`
  
  await redis.hset(key, {
    currentQuestion: state.currentQuestion,
    score: state.score,
    stage: state.stage
  })
  
  await redis.expire(key, 30 * 24 * 60 * 60)
}

The serverless architecture and pay-as-you-go model of Upstash mean that if we have a spike of 10,000 concurrent interviews, the service will scale automatically and we only pay for the throughput used. As the Upstash team says, with serverless Redis "you don't have to deal with anything, just do your business."

Upstash Vector: Document-Grounded Answers

Quick Apply’s standout feature is its ability to answer candidate questions with facts drawn straight from company documents—a capability powered by Upstash Vector. During an offline indexing pass, every PDF or handbook is sliced into ~1 000-character chunks, each chunk converted to a semantic embedding with OpenAI’s text-embedding-ada-002 model and upserted into the vector store alongside its source text and a unique ID. This pipeline turns static files into a searchable lattice of vectors, so at runtime the assistant can retrieve the most relevant passages and feed them to the language model in a classic RAG (Retrieval-Augmented Generation) loop. By grounding its replies in verified snippets, the system slashes hallucinations and guarantees that every answer traces back to a real document.

Document Indexing Pipeline

During setup, we process company documents and create embeddings:

import { Index } from "@upstash/vector";
import { OpenAI } from "openai";
 
const vectorIndex = new Index({
  url: process.env.UPSTASH_VECTOR_REST_URL!,
  token: process.env.UPSTASH_VECTOR_REST_TOKEN!
});
 
const openai = new OpenAI();
 
// Process and index documents
export async function indexDocuments(documents: string[]) {
  for (const docText of documents) {
    // Split document into chunks
    const chunks = splitTextIntoChunks(docText, 1000)
    
    for (let i = 0; i < chunks.length; i++) {
      const chunk = chunks[i]
      
      // Generate embedding
      const embeddingResponse = await openai.embeddings.create({
        model: "text-embedding-ada-002",
        input: chunk,
      })
      
      const embedding = embeddingResponse.data[0].embedding
      
      // Store in vector database
      await vectorIndex.upsert({
        id: `doc_${Date.now()}_${i}`,
        vector: embedding,
        metadata: {
          text: chunk,
          source: "company_docs"
        }
      })
    }
  }
}

Semantic Search for Contextual Answers

When a candidate asks a question, Quick Apply switches into RAG mode: it embeds the query, runs a cosine-similarity search against Upstash Vector, and retrieves the three passages whose scores clear a 0.7 relevance bar. Those snippets are stitched together and handed to GPT-4 with a guard-rail prompt that forbids straying beyond the provided context; if no passage meets the threshold, the assistant politely promises a follow-up instead. This workflow delivers precise, document-grounded answers and virtually eliminates hallucinations—exactly the reliability enterprises need without sacrificing user experience.

// Handle user queries with RAG (Retrieval-Augmented Generation)
export async function handleUserQuery(question: string, sessionId: string) {
  try {
    // Generate embedding for the question
    const questionEmbedding = await openai.embeddings.create({
      model: "text-embedding-ada-002",
      input: question,
    })
    
    // Search for relevant documents
    const searchResults = await vectorIndex.query({
      vector: questionEmbedding.data[0].embedding,
      topK: 3,
      includeVectors: false,
      includeMetadata: true
    })
    
    // Check if we found relevant information
    if (searchResults.length === 0 || searchResults[0].score < 0.7) {
      return "That's a great question! I'll make sure our team follows up with you directly about that."
    }
    
    // Extract relevant content
    const context = searchResults
      .map(match => match.metadata.text)
      .join('\n\n')
    
    // Generate response using LLM with context
    const completion = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [
        {
          role: "system",
          content: `You are an HR assistant. Answer the candidate's question based ONLY on the provided context. If you cannot answer from the context, say you'll forward the question to the team.`
        },
        {
          role: "user",
          content: `Context: ${context}\n\nQuestion: ${question}`
        }
      ],
    })
    
    return completion.choices[0].message.content
    
  } catch (error) {
    console.error('Error in handleUserQuery:', error)
    return "I'll make sure our team gets back to you on that question."
  }
}

By using Upstash Vector this way, Quick Apply provides accurate, specific answers and avoids the common pitfall of LLMs making things up. This is a pattern known as Retrieval-Augmented Generation (RAG), and it's essential for enterprise use cases.

Real-Time Scoring System

Quick Apply scores every answer on the spot—keyword and exact matches handle closed questions, LLM grades open ones—and streams the running total to Redis, keeping evaluation fast, transparent, and bias-resistant.

// Scoring configuration (from job-config.json)
interface ScoringConfig {
  questions: Array<{
    id: string
    question: string
    scoring: Record<string, number>
    type: 'keyword' | 'exact' | 'llm'
  }>
}
 
// Evaluate candidate answers
export async function evaluateAnswer(
  question: any, 
  answer: string, 
  sessionId: string
): Promise<number> {
  let score = 0
  
  switch (question.type) {
    case 'keyword':
      score = evaluateKeywordMatch(answer, question.scoring)
      break
    case 'exact':
      score = evaluateExactMatch(answer, question.scoring)
      break
    case 'llm':
      score = await evaluateWithLLM(question.question, answer, question.scoring)
      break
  }
  
  // Update running score in Redis
  await updateCandidateScore(sessionId, score)
  
  return score
}
 
function evaluateKeywordMatch(answer: string, scoring: Record<string, number>): number {
  const lowerAnswer = answer.toLowerCase()
  
  for (const [keyword, points] of Object.entries(scoring)) {
    if (lowerAnswer.includes(keyword.toLowerCase())) {
      return points
    }
  }
  
  return 0
}
 
async function evaluateWithLLM(
  question: string, 
  answer: string, 
  scoring: Record<string, number>
): Promise<number> {
  const completion = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: `Rate this interview answer on a scale of 1-10 based on relevance and quality.`
      },
      {
        role: "user",
        content: `Question: ${question}\nAnswer: ${answer}`
      }
    ],
  })
  
  const rating = parseInt(completion.choices[0].message.content || "0")
  return Math.min(rating * 2, 20) // Convert to 20-point scale
}

The Complete Interview Flow

Quick Apply orchestrates the interview in five seamless stages—greeting, questioning, CV upload, Q&A, and finalization—each transition tracked in Redis to keep the flow smooth. Questions run sequentially, answers are scored in real time with instant encouragement for high performers, and waitForUserResponse bridges asynchronous pauses before finalizeApplication wraps up the session. The result feels like talking to a human recruiter while scaling effortlessly across thousands of concurrent interviews.

// Main interview orchestration
export async function conductInterview(sessionId: string, config: ScoringConfig) {
  let totalScore = 0
  
  // 1. Greeting phase
  await saveMessage(sessionId, {
    role: 'assistant',
    content: 'Hi! Thanks for your interest in our position. I\'ll be conducting a quick screening interview.'
  })
  
  // 2. Question phase
  for (let i = 0; i < config.questions.length; i++) {
    const question = config.questions[i]
    
    // Ask question
    await saveMessage(sessionId, {
      role: 'assistant',
      content: question.question
    })
    
    // Wait for and process answer
    const answer = await waitForUserResponse(sessionId)
    const score = await evaluateAnswer(question, answer, sessionId)
    totalScore += score
    
    // Optional: Provide feedback
    if (score > 15) {
      await saveMessage(sessionId, {
        role: 'assistant',
        content: 'Great answer! Let\'s move to the next question.'
      })
    }
  }
  
  // 3. CV upload phase
  await requestCVUpload(sessionId)
  
  // 4. Q&A phase
  await saveMessage(sessionId, {
    role: 'assistant',
    content: 'Do you have any questions about the role or company?'
  })
  
  // 5. Finalization
  await finalizeApplication(sessionId, totalScore)
}

Integration with Sheets and email services

Rather than dropping data into a bespoke dashboard, Quick Apply writes each finished interview straight into Google Sheets, authenticating with a service account so the HR team only sees a familiar spreadsheet URL. Every row carries the session ID, an ISO-8601 timestamp, the aggregate score, and a JSON blob of question-answer pairs, giving recruiters an instant, filter-friendly view of the funnel without exporting CSVs or wrestling with APIs. In parallel, a richly formatted email lands in the hiring inbox summarizing the candidate’s score, full transcript, and a Drive link to the uploaded CV, so decision-makers can triage applications from their phone. Because the notification helper is just a thin wrapper, swapping Gmail for Slack, Teams, or an ATS webhook is a one-line change, ensuring the AI assistant snaps cleanly into whatever tooling the organization already trusts. This tight coupling with existing workflows removes adoption friction, keeps data in systems designed for audit and compliance, and lets recruiters benefit from AI without learning a new interface.

// Google Sheets integration
import { GoogleSpreadsheet } from 'google-spreadsheet'
 
export async function saveToGoogleSheet(
  sessionId: string, 
  answers: any[], 
  score: number
) {
  const doc = new GoogleSpreadsheet(process.env.GOOGLE_SHEET_ID)
  
  await doc.useServiceAccountAuth({
    client_email: process.env.GOOGLE_CLIENT_EMAIL!,
    private_key: process.env.GOOGLE_PRIVATE_KEY!.replace(/\\n/g, '\n'),
  })
  
  await doc.loadInfo()
  const sheet = doc.sheetsByIndex[0]
  
  await sheet.addRow({
    'Session ID': sessionId,
    'Timestamp': new Date().toISOString(),
    'Score': score,
    'Answers': JSON.stringify(answers),
    'Status': 'Completed'
  })
}
 
// Email notification
export async function sendRecruiterNotification(
  answers: any[], 
  score: number, 
  cvUrl?: string
) {
  const emailContent = `
    New candidate application received!
    
    Score: ${score}/100
    
    Answers:
    ${answers.map(a => `Q: ${a.question}\nA: ${a.answer}`).join('\n\n')}
    
    ${cvUrl ? `CV: ${cvUrl}` : 'No CV uploaded'}
  `
  
  // Send email using your preferred service
  await sendEmail({
    to: process.env.RECRUITER_EMAIL,
    subject: `New Application - Score: ${score}/100`,
    text: emailContent
  })
}

Developer Experience Wins 🏆

Building Upstash Quick Apply highlighted several developer experience advantages:

⚡ Rapid Prototyping: Using Upstash and serverless frontend, I got a working prototype up quickly. Creating Redis and Vector databases took just a few clicks, with immediate REST endpoints.

📈 Effortless Scalability: With Upstash's multi-tenant, globally available infrastructure, the app can serve users worldwide with low latency. If 1000 users try Quick Apply simultaneously, Upstash handles the load automatically.

🔒 Integrated Security: All communication to Upstash services uses secure REST calls with tokens. No VPC or complex networking required - just environment variables.

💰 Cost Efficiency: Usage-based pricing meant essentially zero costs during development. Costs scale with usage, making it easy to justify.

Results and Impact

The Quick Apply system demonstrates significant improvements:

Consistent Screening: Every candidate gets the same questions and evaluation criteria
Instant Engagement: Candidates receive immediate responses instead of waiting days
Time Savings: Recruiters only engage with pre-qualified candidates
Better Data: Structured, searchable responses with automatic scoring
Scalable Process: Handle hundreds of applications without additional resources

Conclusion

We started with the question: Why not let AI handle the tedious first round of interviews? With Upstash Quick Apply, I demonstrated that it's not only feasible, but also relatively straightforward given the right tools.

The result is a conversational, intelligent interview assistant that saves time for recruiters and provides instant engagement for candidates. By using Upstash Redis for session management and Upstash Vector for document retrieval, we created a serverless AI stack that scales automatically.

The key takeaway for developers is that services like Upstash can dramatically cut down the complexity of implementing state and search in AI applications. In the age of LLMs, small projects can punch above their weight - with just a few hundred lines of code, we created an app that converses, evaluates, searches, and integrates with external systems.

The entire project is open source on GitHub with a live demo. Whether you're an HR tech enthusiast or a developer excited about AI, I hope this story gave you inspiration and practical insights for your own projects.

Ready to build your own AI-powered application? Start with Upstash Redis and Upstash Vector to get serverless data infrastructure that scales with your needs.

redis vector ai llm