A powerful LinkedIn scraping agent that extracts contact information from LinkedIn profiles and company pages using AI-powered automation.
# Clone the repository
git clone https://github.com/Jensinjames/linkedin-agent.git
cd linkedin-agent
# Run the automated setup (creates .env, directories, installs dependencies)
make setup-dev
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install all required packages
cd backend
pip install -r requirements.txt
# Install Playwright browsers (required for web scraping)
playwright install
# Edit the .env file with your API keys (optional)
nano .env
Required settings in .env:
# Optional: Add your OpenAI API key for AI summarization
OPENAI_API_KEY=your-openai-api-key-here
# Optional: Add Apify token for enhanced proxy support
APIFY_TOKEN=your-apify-token-here
# Test with the example LinkedIn company page
source .venv/bin/activate
cd backend
# Copy example input to accessible location
cp ../examples/input.json ../storage/data/input.json
# Run the scraper
python simple_main.py ../storage/data/input.json
β You should see scraped LinkedIn data in JSON format!
linkedin-agent/
βββ π backend/ # Core LinkedIn scraping engine
β βββ src/ # Main Python source code
β βββ tests/ # Backend tests
β βββ requirements.txt # Python dependencies
β βββ simple_main.py # Simple execution mode
β βββ Dockerfile* # Backend containers
βββ π frontend/ # React admin dashboard
β βββ src/ # React components
β βββ package.json # Frontend dependencies
βββ π infrastructure/ # Deployment & DevOps
β βββ docker/ # Docker configurations
β βββ scripts/ # Utility scripts
β βββ monitoring/ # Monitoring configs
βββ π examples/ # Sample inputs & configs
β βββ input.json # Example LinkedIn URL
β βββ input.csv # Example CSV batch input
β βββ env.example # Environment template
βββ π storage/ # Runtime data (auto-created)
β βββ data/jobs/ # Job results
β βββ data/logs/ # Application logs
βββ .env # Your API keys (auto-created)
βββ Makefile # Development commands
βββ README.md # This file
# 1. Activate your virtual environment
source .venv/bin/activate
cd backend
# 2. Create input file with LinkedIn URL
echo '{
"query": "https://www.linkedin.com/company/microsoft/",
"maxDepth": 2,
"includeSocials": true
}' > ../storage/data/my_input.json
# 3. Run the scraper
python simple_main.py ../storage/data/my_input.json
Expected Output:
{
"url": "https://www.linkedin.com/company/microsoft/",
"contacts": [
{
"name": "Microsoft",
"title": "Microsoft | LinkedIn",
"company": null,
"location": "Redmond, Washington",
"emails": [],
"phones": [],
"social_links": {
"linkedin.com": "https://www.linkedin.com/company/microsoft/"
},
"linkedin_url": "https://www.linkedin.com/company/microsoft/",
"website": "https://www.microsoft.com",
"description": "Microsoft is a technology company..."
}
]
}
# 1. Create a CSV file with multiple LinkedIn URLs
echo "linkedin_url,company_name
https://www.linkedin.com/company/microsoft/,Microsoft
https://www.linkedin.com/company/google/,Google" > ../storage/data/companies.csv
# 2. Process the batch (requires additional setup)
cd backend
python -m src.cli ../storage/data/companies.csv
# 1. Start the API server (requires Redis)
cd backend
python src/server.py
# 2. In another terminal, submit a job
curl -X POST "http://localhost:8000/submit" \
-F "owner_email=you@example.com" \
-F "input_file=@../examples/input.json"
# 3. Check job status
curl "http://localhost:8000/status/{job_id}"
# Show all available commands
make help
# Backend development
make dev # Start backend services
make backend-test # Run backend tests
make backend-lint # Run backend linting
# Frontend development
make frontend-dev # Start frontend development
make frontend-test # Run frontend tests
make frontend-lint # Run frontend linting
# Full stack development
make fullstack-dev # Start both backend and frontend
# Production
make deploy # Deploy to production
make stop # Stop all services
make clean # Clean up containers
# Utilities
make status # Check service status
make logs # View service logs
make backup # Create backup
make health # Health check
Copy the example environment file and configure:
cp examples/env.example .env
Key variables:
OPENAI_API_KEY - OpenAI API key for LLM featuresAPIFY_TOKEN - Apify token for proxy and platform featuresSUPABASE_JWT_SECRET - JWT secret for authenticationThe system supports multiple input formats:
For Excel batch processing, use the provided templates:
examples/linkedin_template.xlsx - Empty template with correct structureexamples/sample_input.xlsx - Example data showing proper formatExcel files must have tabs named:
Company_Profiles - For LinkedIn company pagesIndividual_Profiles - For LinkedIn personal profilesSee docs/EXCEL_FORMAT.md for detailed format requirements.
See examples/ directory for sample inputs.
The LinkedIn Agent supports multiple execution modes depending on your needs:
cd backend
python -m src.cli ../examples/input.json
cd backend
python src/server.py # Requires Redis running
cd backend
python src/worker.py # Requires Redis running
bash
cd backend
python simple_main.py ../examples/input.json
### REST API
Examples
```bash
curl -F βowner_email=user@example.comβ
-F βinput_file=@examples/input.csvβ
http://localhost:8000/submit
curl http://localhost:8000/status/1
curl -OJ http://localhost:8000/result/1
### Batch Processing
```bash
cd backend
# Enhanced processor with multi-tab support and URL validation
./src/batch_scrape_excel_enhanced.sh ../examples/sample_input.xlsx ../examples/input.json
# Legacy processor (single sheet only)
./src/batch_scrape_excel.sh ../examples/sample_input.xlsx ../examples/input.json
/health - Full system health check/health/simple - Simple health check/health/ready - Kubernetes readiness probe/health/live - Kubernetes liveness probemake setup-dev
make dev
make setup-prod
make deploy
# Development
cd infrastructure/docker
docker-compose up -d
# Production
docker-compose -f docker-compose.prod.yml up -d
# Setup development environment (FIRST TIME ONLY)
# This installs all dependencies and creates .env file
make setup-dev
# Edit .env with your API keys (REQUIRED)
nano .env
# Start development services
make dev
# Make changes in backend/src/ or frontend/src/
# Test your changes
make backend-test
make frontend-test
# Format code
make backend-lint
make frontend-lint
# Start frontend development server (optional, in new terminal)
make frontend-dev
# Commit and push
git add .
git commit -m "feat: your feature description"
git push origin feature/your-feature
MIT License - see LICENSE file for details.
docs/ directoryexamples/ directory for usage examplesπ Welcome to the clean, organized LinkedIn Agent!
This restructured project makes development, deployment, and maintenance much easier. The separation of concerns and clear documentation will help you get up and running quickly.