Skip to content

Troubleshooting

Startup Issues

Docker Not Working

Symptom: Aspire cannot start containers

Solution:

  1. Check if Docker Desktop is running
  2. Check Docker resources (min. 4GB RAM)
  3. Restart Docker Desktop

Ports Already in Use

Symptom: Address already in use

Solution:

# Windows - find process
netstat -ano | findstr :5000
taskkill /PID <pid> /F

# Linux/Mac
lsof -i :5000
kill -9 <pid>

Keycloak Not Starting

Symptom: Keycloak health check fails

Solution:

  1. Check logs in Aspire Dashboard
  2. Keycloak needs a few minutes on first start
  3. Check if port 8080 is free

Database Issues

Migrations Not Applying

Symptom: relation does not exist

Solution:

cd Questions.Api
dotnet ef database update

Connection Refused

Symptom: Npgsql.NpgsqlException: Connection refused

Solution:

  1. Check if PostgreSQL container is running
  2. Check connection string
  3. Check network connectivity

Authentication Issues

401 Unauthorized

Symptom: API returns 401

Checklist:

  1. Is the token valid? (check expiration)
  2. Is the token being sent in the header?
  3. Is Keycloak accessible?
  4. Is the audience in the token correct?

403 Forbidden

Symptom: API returns 403

Checklist:

  1. Does the user have the required roles?
  2. Does the endpoint require specific permissions?

Token Expired

Symptom: Frequent 401 after a few minutes

Solution:

  1. Check refresh token implementation
  2. Increase token lifespan in Keycloak (dev only)

Frontend Issues

CORS Errors

Symptom: Access-Control-Allow-Origin error

Solution:

  1. Check CORS configuration in API
  2. Make sure the origin is allowed
  3. Check if credentials are handled correctly

Query Not Refreshing Data

Symptom: Stale data after mutation

Solution:

// Make sure you invalidate in onSettled
onSettled: () => {
  void queryClient.invalidateQueries({ queryKey: ['questions'] });
}

Logs and Debugging

Aspire Dashboard

URL: https://localhost:17014

  • Traces - request tracing
  • Logs - logs from all services
  • Metrics - metrics

API Logs

# In Aspire console or
docker logs <container-name>

Frontend Logs

  1. Open DevTools (F12)
  2. Console tab - JS errors
  3. Network tab - HTTP requests

Production Deployment Issues

Traefik Not Running

Symptom: All sites return ERR_CONNECTION_REFUSED or ERR_CONNECTION_TIMED_OUT

Diagnose:

docker ps --filter name=traefik
docker logs traefik --tail 20

Common causes:

Traefik container not started

cd /opt/traefik
docker compose up -d

Port conflict (another process on 80/443)

ss -tlnp | grep -E ':80|:443'

If another process holds the port, stop it before starting Traefik.

acme.json permissions wrong

Traefik requires acme.json to have mode 600:

chmod 600 /opt/traefik/acme.json
docker restart traefik

SSL Certificate Not Generated

Symptom: Browser shows "Your connection is not private" / NET::ERR_CERT_AUTHORITY_INVALID

Diagnose:

docker logs traefik 2>&1 | grep -i "acme\|certificate\|challenge"

Common causes:

  1. DNS not propagated — the domain does not resolve to the VPS IP yet:
    nslookup app.bluebraces.online
    
  2. Ports 80/443 blocked by firewall:
    ufw status
    # Ensure 80 and 443 are ALLOW
    
  3. Let's Encrypt rate limit — if you've requested too many certificates for the same domain in a short period, wait 1 hour and try again. Check Let's Encrypt rate limits.

Solution: Fix the underlying cause, then restart Traefik to retry:

docker restart traefik

Container Not Discovered by Traefik

Symptom: DNS resolves correctly, Traefik is running, but a specific service returns 404 page not found

Diagnose:

# Check if the container is running and has Traefik labels
docker inspect <container_name> | grep -A 5 "traefik"

# Check Traefik sees the router
docker logs traefik 2>&1 | grep "<router_name>"

Common causes:

  1. Container not on the proxy network — all Traefik-discovered services must be on the shared proxy network:
    docker network inspect proxy | grep <container_name>
    
  2. Missing traefik.enable=true label — check docker-compose.traefik.yml has the label
  3. Wrong port in loadbalancer label — the traefik.http.services.*.loadbalancer.server.port must match the container's internal port

Docker Compose: Service Has No Image

Symptom:

The "SOME_IMAGE" variable is not set. Defaulting to a blank string.
service "service-name" has neither an image nor a build context specified

Solution: The .env file is missing a required image variable. Check:

cat /opt/recron/.env

The variable names must match what Aspire generates in docker-compose.yaml. Aspire derives variable names from resource names by uppercasing and replacing - with _, then appending _IMAGE. For example, resource recron-web becomes RECRON_WEB_IMAGE.

Site Not Loading After Deploy

Checklist:

  1. Check all containers are running: docker compose ps
  2. Check Traefik is running: docker ps --filter name=traefik
  3. Check Traefik logs for errors: docker logs traefik --tail 20
  4. Check if ports 80/443 are bound: ss -tlnp | grep -E ':80|:443'
  5. Test locally on the VPS: curl -k https://localhost
  6. Check firewall: ufw status (ports 80, 443 must be ALLOW)
  7. Check the proxy network exists: docker network ls | grep proxy

Recreating Containers

If a container is stuck or has cached config:

cd /opt/recron

# Recreate a specific service
docker compose -f docker-compose.yaml -f docker-compose.traefik.yml up -d --force-recreate <service-name>

# Recreate all Recron services
docker compose -f docker-compose.yaml -f docker-compose.traefik.yml up -d --force-recreate

# Restart Traefik (if routing issues)
docker restart traefik

Full VPS Reset

When the VPS Docker environment is corrupted (broken image layers, failed extracts, stale containers) and individual fixes don't help, a full reset is the fastest path forward.

This destroys all data

This procedure removes all containers, images, volumes, and deployment files from the VPS. Only use this when you have no important data on the server, or when all state can be recreated by re-running pipelines.

Step 1: Stop everything and clean up

ssh root@YOUR_VPS_IP
# Stop all project stacks (with volumes)
cd /opt/keycloak && docker compose down -v 2>/dev/null || true
cd /opt/traefik && docker compose down -v 2>/dev/null || true
cd /opt/recron && docker compose -f docker-compose.yaml -f docker-compose.traefik.yml down -v 2>/dev/null || true
cd /opt/velvet && docker compose down -v 2>/dev/null || true

# Remove all Docker artifacts (images, build cache, orphan networks)
docker system prune --all -f
# Remove orphan volumes (compose down -v only removes named compose volumes,
# anonymous volumes from previous runs may remain)
docker volume prune -f

# Remove deployment directories
rm -rf /opt/keycloak /opt/traefik /opt/recron /opt/velvet

# containerd/overlayfs cleaning
systemctl stop docker
rm -rf /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*
systemctl start docker

# Recreate the shared proxy network
docker network create proxy

Step 2: Verify clean state

docker ps -a          # should be empty
docker images         # should be empty
docker volume ls      # should be empty
docker network ls     # should show only: bridge, host, none, proxy
ls /opt/              # should not contain keycloak, traefik, recron, velvet

Step 3: Re-deploy via pipelines

Re-run GitHub Actions workflows in this order (each depends on the previous):

Order Repository Workflow Why first
1 Traefik Deploy Traefik Reverse proxy must be up for routing
2 Keycloak Deploy Keycloak Auth provider must be up for API auth
3 Recron Build and Deploy Depends on Traefik + Keycloak
4 VelvetUi Deploy Demo Independent, but needs Traefik

To trigger each pipeline, either push to main or manually re-run the workflow from the GitHub Actions tab.

Step 4: Verify deployment

# Check all containers are running
docker ps

# Check endpoints
curl -sf https://app.bluebraces.online/ > /dev/null && echo "Frontend: OK"
curl -sf https://api.bluebraces.online/health > /dev/null && echo "API: OK"
curl -sf https://auth.bluebraces.online/realms/master > /dev/null && echo "Keycloak: OK"
curl -sf https://docs.bluebraces.online/ > /dev/null && echo "Docs: OK"
curl -sf https://velvet.bluebraces.online/ > /dev/null && echo "Velvet: OK"

SSL certificates

After a full reset, Traefik will re-request SSL certificates from Let's Encrypt. This happens automatically on the first HTTPS request to each domain. If you hit rate limits, wait 1 hour and try again.

Contact

In case of issues:

  1. Check logs in Aspire Dashboard
  2. Search issues in the repository
  3. Create a new issue with:
  4. Problem description
  5. Steps to reproduce
  6. Error logs
  7. System version