Troubleshooting¶
Startup Issues¶
Docker Not Working¶
Symptom: Aspire cannot start containers
Solution:
- Check if Docker Desktop is running
- Check Docker resources (min. 4GB RAM)
- Restart Docker Desktop
Ports Already in Use¶
Symptom: Address already in use
Solution:
# Windows - find process
netstat -ano | findstr :5000
taskkill /PID <pid> /F
# Linux/Mac
lsof -i :5000
kill -9 <pid>
Keycloak Not Starting¶
Symptom: Keycloak health check fails
Solution:
- Check logs in Aspire Dashboard
- Keycloak needs a few minutes on first start
- Check if port 8080 is free
Database Issues¶
Migrations Not Applying¶
Symptom: relation does not exist
Solution:
Connection Refused¶
Symptom: Npgsql.NpgsqlException: Connection refused
Solution:
- Check if PostgreSQL container is running
- Check connection string
- Check network connectivity
Authentication Issues¶
401 Unauthorized¶
Symptom: API returns 401
Checklist:
- Is the token valid? (check expiration)
- Is the token being sent in the header?
- Is Keycloak accessible?
- Is the audience in the token correct?
403 Forbidden¶
Symptom: API returns 403
Checklist:
- Does the user have the required roles?
- Does the endpoint require specific permissions?
Token Expired¶
Symptom: Frequent 401 after a few minutes
Solution:
- Check refresh token implementation
- Increase token lifespan in Keycloak (dev only)
Frontend Issues¶
CORS Errors¶
Symptom: Access-Control-Allow-Origin error
Solution:
- Check CORS configuration in API
- Make sure the origin is allowed
- Check if credentials are handled correctly
Query Not Refreshing Data¶
Symptom: Stale data after mutation
Solution:
// Make sure you invalidate in onSettled
onSettled: () => {
void queryClient.invalidateQueries({ queryKey: ['questions'] });
}
Logs and Debugging¶
Aspire Dashboard¶
URL: https://localhost:17014
- Traces - request tracing
- Logs - logs from all services
- Metrics - metrics
API Logs¶
Frontend Logs¶
- Open DevTools (F12)
- Console tab - JS errors
- Network tab - HTTP requests
Production Deployment Issues¶
Traefik Not Running¶
Symptom: All sites return ERR_CONNECTION_REFUSED or ERR_CONNECTION_TIMED_OUT
Diagnose:
Common causes:
Traefik container not started¶
Port conflict (another process on 80/443)¶
If another process holds the port, stop it before starting Traefik.
acme.json permissions wrong¶
Traefik requires acme.json to have mode 600:
SSL Certificate Not Generated¶
Symptom: Browser shows "Your connection is not private" / NET::ERR_CERT_AUTHORITY_INVALID
Diagnose:
Common causes:
- DNS not propagated — the domain does not resolve to the VPS IP yet:
- Ports 80/443 blocked by firewall:
- Let's Encrypt rate limit — if you've requested too many certificates for the same domain in a short period, wait 1 hour and try again. Check Let's Encrypt rate limits.
Solution: Fix the underlying cause, then restart Traefik to retry:
Container Not Discovered by Traefik¶
Symptom: DNS resolves correctly, Traefik is running, but a specific service returns 404 page not found
Diagnose:
# Check if the container is running and has Traefik labels
docker inspect <container_name> | grep -A 5 "traefik"
# Check Traefik sees the router
docker logs traefik 2>&1 | grep "<router_name>"
Common causes:
- Container not on the
proxynetwork — all Traefik-discovered services must be on the sharedproxynetwork: - Missing
traefik.enable=truelabel — checkdocker-compose.traefik.ymlhas the label - Wrong port in loadbalancer label — the
traefik.http.services.*.loadbalancer.server.portmust match the container's internal port
Docker Compose: Service Has No Image¶
Symptom:
The "SOME_IMAGE" variable is not set. Defaulting to a blank string.
service "service-name" has neither an image nor a build context specified
Solution: The .env file is missing a required image variable. Check:
The variable names must match what Aspire generates in docker-compose.yaml. Aspire derives variable names from resource names by uppercasing and replacing - with _, then appending _IMAGE. For example, resource recron-web becomes RECRON_WEB_IMAGE.
Site Not Loading After Deploy¶
Checklist:
- Check all containers are running:
docker compose ps - Check Traefik is running:
docker ps --filter name=traefik - Check Traefik logs for errors:
docker logs traefik --tail 20 - Check if ports 80/443 are bound:
ss -tlnp | grep -E ':80|:443' - Test locally on the VPS:
curl -k https://localhost - Check firewall:
ufw status(ports 80, 443 must be ALLOW) - Check the
proxynetwork exists:docker network ls | grep proxy
Recreating Containers¶
If a container is stuck or has cached config:
cd /opt/recron
# Recreate a specific service
docker compose -f docker-compose.yaml -f docker-compose.traefik.yml up -d --force-recreate <service-name>
# Recreate all Recron services
docker compose -f docker-compose.yaml -f docker-compose.traefik.yml up -d --force-recreate
# Restart Traefik (if routing issues)
docker restart traefik
Full VPS Reset¶
When the VPS Docker environment is corrupted (broken image layers, failed extracts, stale containers) and individual fixes don't help, a full reset is the fastest path forward.
This destroys all data
This procedure removes all containers, images, volumes, and deployment files from the VPS. Only use this when you have no important data on the server, or when all state can be recreated by re-running pipelines.
Step 1: Stop everything and clean up¶
# Stop all project stacks (with volumes)
cd /opt/keycloak && docker compose down -v 2>/dev/null || true
cd /opt/traefik && docker compose down -v 2>/dev/null || true
cd /opt/recron && docker compose -f docker-compose.yaml -f docker-compose.traefik.yml down -v 2>/dev/null || true
cd /opt/velvet && docker compose down -v 2>/dev/null || true
# Remove all Docker artifacts (images, build cache, orphan networks)
docker system prune --all -f
# Remove orphan volumes (compose down -v only removes named compose volumes,
# anonymous volumes from previous runs may remain)
docker volume prune -f
# Remove deployment directories
rm -rf /opt/keycloak /opt/traefik /opt/recron /opt/velvet
# containerd/overlayfs cleaning
systemctl stop docker
rm -rf /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*
systemctl start docker
# Recreate the shared proxy network
docker network create proxy
Step 2: Verify clean state¶
docker ps -a # should be empty
docker images # should be empty
docker volume ls # should be empty
docker network ls # should show only: bridge, host, none, proxy
ls /opt/ # should not contain keycloak, traefik, recron, velvet
Step 3: Re-deploy via pipelines¶
Re-run GitHub Actions workflows in this order (each depends on the previous):
| Order | Repository | Workflow | Why first |
|---|---|---|---|
| 1 | Traefik | Deploy Traefik | Reverse proxy must be up for routing |
| 2 | Keycloak | Deploy Keycloak | Auth provider must be up for API auth |
| 3 | Recron | Build and Deploy | Depends on Traefik + Keycloak |
| 4 | VelvetUi | Deploy Demo | Independent, but needs Traefik |
To trigger each pipeline, either push to main or manually re-run the workflow from the GitHub Actions tab.
Step 4: Verify deployment¶
# Check all containers are running
docker ps
# Check endpoints
curl -sf https://app.bluebraces.online/ > /dev/null && echo "Frontend: OK"
curl -sf https://api.bluebraces.online/health > /dev/null && echo "API: OK"
curl -sf https://auth.bluebraces.online/realms/master > /dev/null && echo "Keycloak: OK"
curl -sf https://docs.bluebraces.online/ > /dev/null && echo "Docs: OK"
curl -sf https://velvet.bluebraces.online/ > /dev/null && echo "Velvet: OK"
SSL certificates
After a full reset, Traefik will re-request SSL certificates from Let's Encrypt. This happens automatically on the first HTTPS request to each domain. If you hit rate limits, wait 1 hour and try again.
Contact¶
In case of issues:
- Check logs in Aspire Dashboard
- Search issues in the repository
- Create a new issue with:
- Problem description
- Steps to reproduce
- Error logs
- System version