linkedin-agent

πŸš€ LinkedIn Agent

A powerful LinkedIn scraping agent that extracts contact information from LinkedIn profiles and company pages using AI-powered automation.

Python License Playwright

✨ What This Does

πŸš€ Quick Start (5 Minutes Setup)

Prerequisites

1. Clone and Setup

# Clone the repository
git clone https://github.com/Jensinjames/linkedin-agent.git
cd linkedin-agent

# Run the automated setup (creates .env, directories, installs dependencies)
make setup-dev

2. Install Python Dependencies

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install all required packages
cd backend
pip install -r requirements.txt

# Install Playwright browsers (required for web scraping)
playwright install

3. Configure Your Settings

# Edit the .env file with your API keys (optional)
nano .env

Required settings in .env:

# Optional: Add your OpenAI API key for AI summarization
OPENAI_API_KEY=your-openai-api-key-here

# Optional: Add Apify token for enhanced proxy support
APIFY_TOKEN=your-apify-token-here

4. Test Your Setup

# Test with the example LinkedIn company page
source .venv/bin/activate
cd backend

# Copy example input to accessible location
cp ../examples/input.json ../storage/data/input.json

# Run the scraper
python simple_main.py ../storage/data/input.json

βœ… You should see scraped LinkedIn data in JSON format!


πŸ“ Project Structure

linkedin-agent/
β”œβ”€β”€ πŸ“ backend/                    # Core LinkedIn scraping engine
β”‚   β”œβ”€β”€ src/                      # Main Python source code
β”‚   β”œβ”€β”€ tests/                    # Backend tests
β”‚   β”œβ”€β”€ requirements.txt          # Python dependencies
β”‚   β”œβ”€β”€ simple_main.py           # Simple execution mode
β”‚   └── Dockerfile*              # Backend containers
β”œβ”€β”€ πŸ“ frontend/                  # React admin dashboard
β”‚   β”œβ”€β”€ src/                     # React components
β”‚   └── package.json             # Frontend dependencies
β”œβ”€β”€ πŸ“ infrastructure/            # Deployment & DevOps
β”‚   β”œβ”€β”€ docker/                  # Docker configurations
β”‚   β”œβ”€β”€ scripts/                 # Utility scripts
β”‚   └── monitoring/              # Monitoring configs
β”œβ”€β”€ πŸ“ examples/                  # Sample inputs & configs
β”‚   β”œβ”€β”€ input.json              # Example LinkedIn URL
β”‚   β”œβ”€β”€ input.csv               # Example CSV batch input
β”‚   └── env.example             # Environment template
β”œβ”€β”€ πŸ“ storage/                   # Runtime data (auto-created)
β”‚   β”œβ”€β”€ data/jobs/              # Job results
β”‚   └── data/logs/              # Application logs
β”œβ”€β”€ .env                         # Your API keys (auto-created)
β”œβ”€β”€ Makefile                     # Development commands
└── README.md                    # This file

🎯 Usage Examples

Basic Usage: Scrape a LinkedIn Company Page

# 1. Activate your virtual environment
source .venv/bin/activate
cd backend

# 2. Create input file with LinkedIn URL
echo '{
  "query": "https://www.linkedin.com/company/microsoft/",
  "maxDepth": 2,
  "includeSocials": true
}' > ../storage/data/my_input.json

# 3. Run the scraper
python simple_main.py ../storage/data/my_input.json

Expected Output:

{
  "url": "https://www.linkedin.com/company/microsoft/",
  "contacts": [
    {
      "name": "Microsoft",
      "title": "Microsoft | LinkedIn",
      "company": null,
      "location": "Redmond, Washington",
      "emails": [],
      "phones": [],
      "social_links": {
        "linkedin.com": "https://www.linkedin.com/company/microsoft/"
      },
      "linkedin_url": "https://www.linkedin.com/company/microsoft/",
      "website": "https://www.microsoft.com",
      "description": "Microsoft is a technology company..."
    }
  ]
}

Batch Processing with CSV Files

# 1. Create a CSV file with multiple LinkedIn URLs
echo "linkedin_url,company_name
https://www.linkedin.com/company/microsoft/,Microsoft
https://www.linkedin.com/company/google/,Google" > ../storage/data/companies.csv

# 2. Process the batch (requires additional setup)
cd backend
python -m src.cli ../storage/data/companies.csv

API Mode (Advanced)

# 1. Start the API server (requires Redis)
cd backend
python src/server.py

# 2. In another terminal, submit a job
curl -X POST "http://localhost:8000/submit" \
  -F "owner_email=you@example.com" \
  -F "input_file=@../examples/input.json"

# 3. Check job status
curl "http://localhost:8000/status/{job_id}"

✨ Key Features

πŸ› οΈ Development Commands

# Show all available commands
make help

# Backend development
make dev                    # Start backend services
make backend-test          # Run backend tests
make backend-lint          # Run backend linting

# Frontend development
make frontend-dev          # Start frontend development
make frontend-test         # Run frontend tests
make frontend-lint         # Run frontend linting

# Full stack development
make fullstack-dev         # Start both backend and frontend

# Production
make deploy                # Deploy to production
make stop                  # Stop all services
make clean                 # Clean up containers

# Utilities
make status                # Check service status
make logs                  # View service logs
make backup                # Create backup
make health                # Health check

πŸ“š Documentation

πŸ”§ Configuration

Environment Variables

Copy the example environment file and configure:

cp examples/env.example .env

Key variables:

Input Formats

The system supports multiple input formats:

  1. JSON Input: Direct LinkedIn URLs
  2. CSV/Excel: Batch processing with LinkedIn URLs
  3. REST API: Programmatic job submission

Excel Templates

For Excel batch processing, use the provided templates:

Excel files must have tabs named:

See docs/EXCEL_FORMAT.md for detailed format requirements.

See examples/ directory for sample inputs.

🚦 Usage Examples

The LinkedIn Agent supports multiple execution modes depending on your needs:

Execution Modes

  1. ** CLI Mode ** (Direct execution)
    cd backend
    python -m src.cli ../examples/input.json
    
  2. API Mode (REST server)
    cd backend
    python src/server.py  # Requires Redis running
    
  3. Worker Mode (Queue processing)
    cd backend
    python src/worker.py  # Requires Redis running
    
  4. Simple Mode (No external dependencies) bash cd backend python simple_main.py ../examples/input.json ### REST API Examples ```bash

    Submit job

    curl -F β€œowner_email=user@example.com”
    -F β€œinput_file=@examples/input.csv”
    http://localhost:8000/submit

Check status

curl http://localhost:8000/status/1

Download results

curl -OJ http://localhost:8000/result/1


### Batch Processing

```bash
cd backend

# Enhanced processor with multi-tab support and URL validation
./src/batch_scrape_excel_enhanced.sh ../examples/sample_input.xlsx ../examples/input.json

# Legacy processor (single sheet only)
./src/batch_scrape_excel.sh ../examples/sample_input.xlsx ../examples/input.json

πŸ—οΈ Architecture Overview

Backend Components

Frontend Components

Infrastructure

πŸ”’ Security

πŸ“Š Monitoring & Health

Health Endpoints

Monitoring

πŸš€ Deployment

Development

make setup-dev
make dev

Production

make setup-prod
make deploy

Docker Compose

# Development
cd infrastructure/docker
docker-compose up -d

# Production
docker-compose -f docker-compose.prod.yml up -d

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

Development Workflow

# Setup development environment (FIRST TIME ONLY)
# This installs all dependencies and creates .env file
make setup-dev

# Edit .env with your API keys (REQUIRED)
nano .env

# Start development services
make dev

# Make changes in backend/src/ or frontend/src/

# Test your changes
make backend-test
make frontend-test

# Format code
make backend-lint
make frontend-lint

# Start frontend development server (optional, in new terminal)
make frontend-dev

# Commit and push
git add .
git commit -m "feat: your feature description"
git push origin feature/your-feature

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ†˜ Support


πŸŽ‰ Welcome to the clean, organized LinkedIn Agent!

This restructured project makes development, deployment, and maintenance much easier. The separation of concerns and clear documentation will help you get up and running quickly.