ARTIFICIAL INTELLIGENCE ASSISTANT

JARVIS AI
Multi-Model Assistant

Advanced AI assistant system integrating multiple language models (Groq, Cohere, Hugging Face) with voice recognition, natural language processing, task automation, real-time web search, image generation, and intelligent decision-making. Features speech-to-text, text-to-speech, and comprehensive system control.

3 AI APIs

Integrated

9 Modules

Components

Voice I/O

STT + TTS

Project Type Full-Stack AI System

Primary Language Python 3.x

AI Models Llama3-70B, Command-R-Plus, SD-XL

APIs Used Groq, Cohere, Hugging Face

Status ✓ Functional

01 System Overview

🎯

Project Vision

Inspired by the fictional JARVIS from Iron Man, this project implements a real-world AI assistant capable of understanding voice commands, executing system tasks, generating content, searching the web, creating images, and maintaining natural conversations through advanced language models.

🧠

Multi-AI Architecture

Integrates three distinct AI systems: Groq's Llama3-70B for conversational intelligence and real-time search, Cohere's Command-R-Plus for decision-making and query classification, and Hugging Face's Stable Diffusion XL for high-quality image generation.

⚙️

Core Capabilities

Voice recognition with multi-language support, natural language understanding, automated task execution (app control, web browsing, YouTube playback), content generation (essays, code, letters), real-time information retrieval, and AI-powered image creation.

03 Core Modules & Components

Module 1

✓ Decision Engine

Model.py - Intent Classification Layer

AI Model Cohere Command-R-Plus

Purpose Query Type Detection

Function FirstLayerDMM()

Output Task List with Categories

Temperature 0.7 (Balanced)

✓ Classifies queries into 12+ task types

✓ Handles multi-task requests

✓ Routes to appropriate modules

→ Recognized intents: general, realtime, open, close, play, image generation, system control, content creation

→ Streaming response for real-time classification

→ Context-aware with chat history

Module 2

✓ Conversational AI

Chatbot.py - General Conversation Handler

AI Model Groq Llama3-70B-8192

Context 8,192 tokens

Function ChatBot()

Max Tokens 1,024 per response

Storage ChatLog.json (persistent)

✓ Maintains conversation history

✓ Real-time date/time awareness

✓ Context-aware responses

→ Streaming response generation

→ Automatic error recovery (chat log reset)

→ Professional English responses

Module 3

✓ Web Search

RealtimeSearchEngine.py - Live Information Retrieval

Search API Google Search (googlesearch-python)

AI Model Groq Llama3-70B

Results Top 5 search results

Function RealtimeSearchEngine()

Max Tokens 2,048 per response

✓ Fetches live web data

✓ AI-powered answer synthesis

✓ Source attribution

→ Combines Google search with AI summarization

→ Real-time date/time injection

→ Professional formatting

Module 4

✓ Task Control

Automation.py - System & App Control

App Control AppOpener (open/close apps)

Web Control webbrowser, pywhatkit

System keyboard (volume control)

Content Groq Mixtral-8x7B

Async Parallel task execution

✓ Open/close applications

✓ YouTube playback & search

✓ Google search automation

✓ Content generation (essays, code, letters)

✓ System volume control

→ AsyncIO for concurrent task execution

→ Fuzzy app name matching

→ Auto-saves generated content to Notepad

Module 5

✓ Image AI

ImageGeneration.py - AI Image Creation

Model Stable Diffusion XL Base 1.0

API Hugging Face Inference API

Batch Size 4 images per prompt

Quality 4K, Ultra High Details

Async Parallel generation

✓ Generate 4 variations per prompt

✓ Random seed diversity

✓ Auto-display generated images

→ AsyncIO concurrent image generation

→ High-quality 4K output

→ Automatic file management

Module 6

✓ Voice Input

SpeechToText.py - Voice Recognition

Engine Web Speech API (WebKit)

Driver Selenium WebDriver (Chrome)

Translation mtranslate (multi-language)

Mode Headless browser

Languages Configurable (English default)

✓ Continuous voice recognition

✓ Multi-language support

✓ Auto-translation to English

→ Query formatting & punctuation

→ Question detection & modification

→ Real-time status updates

Module 7

✓ Voice Output

TextToSpeech.py - Natural Voice Synthesis

Engine edge-tts (Microsoft Edge TTS)

Playback pygame mixer

Voice Configurable (multiple accents)

Pitch +5Hz (natural tone)

Speed +13% (optimized)

✓ Natural-sounding speech

✓ Long text handling

✓ Async audio generation

→ Smart truncation for long responses

→ Fallback messages for extended text

→ Audio cleanup & file management

05 Technology Stack & Dependencies

🐍

Python Libraries

AI/ML: groq, cohere, requests
Voice: edge-tts, pygame, mtranslate
Web: selenium, webdriver-manager
Automation: AppOpener, pywhatkit, keyboard
Parsing: beautifulsoup4, googlesearch-python
Async: asyncio (built-in)
Utils: python-dotenv, Pillow, rich

🔑

API Integration

Groq Cloud: Llama3-70B-8192, Mixtral-8x7B-32768
Cohere: Command-R-Plus (decision-making)
Hugging Face: Stable Diffusion XL Base 1.0
Google Search: Real-time information
Microsoft Edge TTS: Natural voice synthesis
Web Speech API: Browser-based STT

⚙️

System Features

Architecture: Modular, async-capable
Storage: JSON chat logs, file-based state
Security: .env for API keys
Error Handling: Auto-recovery mechanisms
Performance: Parallel task execution
Scalability: Streaming responses

06 Technical Achievements

Multi-AI Integration

Successfully integrated three distinct AI systems (Groq, Cohere, Hugging Face) into a unified architecture. Implemented intelligent routing between models based on query type, leveraging each AI's strengths: Llama3 for conversation, Command-R-Plus for decision-making, and Stable Diffusion for image generation.

Voice Intelligence

Developed comprehensive voice I/O system combining Web Speech API for recognition with edge-tts for natural synthesis. Supports multi-language input with automatic translation, intelligent query formatting, and context-aware voice responses with smart truncation for long-form content.

Task Automation Engine

Built robust automation framework capable of controlling system applications, managing web browsers, playing YouTube content, generating professional documents, and executing system commands. Implements parallel task execution using AsyncIO for efficient multi-task handling.

07 Real-World Applications

💬 Conversational AI Assistant

Natural language conversations with context retention. Can discuss topics, answer questions, provide recommendations, and maintain multi-turn dialogues with persistent chat history.

Example: "What's the weather like today?" → Real-time search + AI summary

Example: "Tell me about quantum computing" → Conversational response

🔍 Information Retrieval

Real-time web search with AI-powered synthesis. Fetches current information from Google, processes multiple sources, and generates comprehensive answers with proper attribution.

Example: "Who won the latest Formula 1 race?" → Live search + summary

Example: "What are the top tech news today?" → Multi-source aggregation

🖼️ Content Creation

Generate text documents (essays, applications, letters, code) and AI images. Text generation uses Mixtral-8x7B for high-quality content, saved automatically to Notepad. Image generation produces 4 high-quality variations using Stable Diffusion XL.

Example: "Write an application for leave" → Generated document in Notepad

Example: "Generate image of futuristic city" → 4 unique 4K images

⚙️ System Control

Voice-controlled system automation. Open/close applications, control YouTube playback, manage volume, execute Google searches, and navigate the web—all through natural voice commands.

Example: "Open Chrome and play music on YouTube" → Multi-task execution

Example: "Mute volume" → System command execution

JARVIS AI
Multi-Model Assistant

01 System Overview

Project Vision

Multi-AI Architecture

Core Capabilities

02 System Architecture

03 Core Modules & Components

Model.py - Intent Classification Layer

Chatbot.py - General Conversation Handler

RealtimeSearchEngine.py - Live Information Retrieval

Automation.py - System & App Control

ImageGeneration.py - AI Image Creation

SpeechToText.py - Voice Recognition

TextToSpeech.py - Natural Voice Synthesis

04 Interactive System Demo

05 Technology Stack & Dependencies

Python Libraries

API Integration

System Features

06 Technical Achievements

Multi-AI Integration

Voice Intelligence

Task Automation Engine

07 Real-World Applications

💬 Conversational AI Assistant

🔍 Information Retrieval

🖼️ Content Creation

⚙️ System Control

Explore More Engineering Projects

JARVIS AI Multi-Model Assistant

01 System Overview

Project Vision

Multi-AI Architecture

Core Capabilities

02 System Architecture

03 Core Modules & Components

Model.py - Intent Classification Layer

Chatbot.py - General Conversation Handler

RealtimeSearchEngine.py - Live Information Retrieval

Automation.py - System & App Control

ImageGeneration.py - AI Image Creation

SpeechToText.py - Voice Recognition

TextToSpeech.py - Natural Voice Synthesis

04 Interactive System Demo

05 Technology Stack & Dependencies

Python Libraries

API Integration

System Features

06 Technical Achievements

Multi-AI Integration

Voice Intelligence

Task Automation Engine

07 Real-World Applications

💬 Conversational AI Assistant

🔍 Information Retrieval

🖼️ Content Creation

⚙️ System Control

Explore More Engineering Projects

JARVIS AI
Multi-Model Assistant