Troubleshooting¶

Startup Issues¶

Docker Not Working¶

Symptom: Aspire cannot start containers

Solution:

Check if Docker Desktop is running
Check Docker resources (min. 4GB RAM)
Restart Docker Desktop

Ports Already in Use¶

Symptom: Address already in use

Solution:

# Windows - find process
netstat -ano | findstr :5000
taskkill /PID <pid> /F

# Linux/Mac
lsof -i :5000
kill -9 <pid>

Keycloak Not Starting¶

Symptom: Keycloak health check fails

Solution:

Check logs in Aspire Dashboard
Keycloak needs a few minutes on first start
Check if port 8080 is free

Database Issues¶

Migrations Not Applying¶

Symptom: relation does not exist

Solution:

cd Questions.Api
dotnet ef database update

Connection Refused¶

Symptom: Npgsql.NpgsqlException: Connection refused

Solution:

Check if PostgreSQL container is running
Check connection string
Check network connectivity

Authentication Issues¶

401 Unauthorized¶

Symptom: API returns 401

Checklist:

Is the token valid? (check expiration)
Is the token being sent in the header?
Is Keycloak accessible?
Is the audience in the token correct?

403 Forbidden¶

Symptom: API returns 403

Checklist:

Does the user have the required roles?
Does the endpoint require specific permissions?

Token Expired¶

Symptom: Frequent 401 after a few minutes

Solution:

Check refresh token implementation
Increase token lifespan in Keycloak (dev only)

Frontend Issues¶

CORS Errors¶

Symptom: Access-Control-Allow-Origin error

Solution:

Check CORS configuration in API
Make sure the origin is allowed
Check if credentials are handled correctly

Query Not Refreshing Data¶

Symptom: Stale data after mutation

Solution:

// Make sure you invalidate in onSettled
onSettled: () => {
  void queryClient.invalidateQueries({ queryKey: ['questions'] });
}

Logs and Debugging¶

Aspire Dashboard¶

URL: https://localhost:17014

Traces - request tracing
Logs - logs from all services
Metrics - metrics

API Logs¶

# In Aspire console or
docker logs <container-name>

Frontend Logs¶

Open DevTools (F12)
Console tab - JS errors
Network tab - HTTP requests

Production Deployment Issues¶

Traefik Not Running¶

Symptom: All sites return ERR_CONNECTION_REFUSED or ERR_CONNECTION_TIMED_OUT

Diagnose:

docker ps --filter name=traefik
docker logs traefik --tail 20

Common causes:

Traefik container not started¶

cd /opt/traefik
docker compose up -d

Port conflict (another process on 80/443)¶

ss -tlnp | grep -E ':80|:443'

If another process holds the port, stop it before starting Traefik.

acme.json permissions wrong¶

Traefik requires acme.json to have mode 600:

chmod 600 /opt/traefik/acme.json
docker restart traefik

SSL Certificate Not Generated¶

Symptom: Browser shows "Your connection is not private" / NET::ERR_CERT_AUTHORITY_INVALID

Diagnose:

docker logs traefik 2>&1 | grep -i "acme\|certificate\|challenge"

Common causes:

DNS not propagated — the domain does not resolve to the VPS IP yet:
```
nslookup app.bluebraces.online
```

Ports 80/443 blocked by firewall:

ufw status
# Ensure 80 and 443 are ALLOW

Let's Encrypt rate limit — if you've requested too many certificates for the same domain in a short period, wait 1 hour and try again. Check Let's Encrypt rate limits.

Solution: Fix the underlying cause, then restart Traefik to retry:

docker restart traefik

Container Not Discovered by Traefik¶

Symptom: DNS resolves correctly, Traefik is running, but a specific service returns 404 page not found

Diagnose:

# Check if the container is running and has Traefik labels
docker inspect <container_name> | grep -A 5 "traefik"

# Check Traefik sees the router
docker logs traefik 2>&1 | grep "<router_name>"

Common causes:

Container not on the proxy network — all Traefik-discovered services must be on the shared proxy network:
```
docker network inspect proxy | grep <container_name>
```
Missing traefik.enable=true label — check docker-compose.traefik.yml has the label
Wrong port in loadbalancer label — the traefik.http.services.*.loadbalancer.server.port must match the container's internal port

Docker Compose: Service Has No Image¶

Symptom:

The "SOME_IMAGE" variable is not set. Defaulting to a blank string.
service "service-name" has neither an image nor a build context specified

Solution: The .env file is missing a required image variable. Check:

cat /opt/recron/.env

The variable names must match what Aspire generates in docker-compose.yaml. Aspire derives variable names from resource names by uppercasing and replacing - with _, then appending _IMAGE. For example, resource recron-web becomes RECRON_WEB_IMAGE.

Site Not Loading After Deploy¶

Checklist:

Check all containers are running: docker compose ps
Check Traefik is running: docker ps --filter name=traefik
Check Traefik logs for errors: docker logs traefik --tail 20
Check if ports 80/443 are bound: ss -tlnp | grep -E ':80|:443'
Test locally on the VPS: curl -k https://localhost
Check firewall: ufw status (ports 80, 443 must be ALLOW)
Check the proxy network exists: docker network ls | grep proxy

Recreating Containers¶

If a container is stuck or has cached config:

cd /opt/recron

# Recreate a specific service
docker compose -f docker-compose.yaml -f docker-compose.traefik.yml up -d --force-recreate <service-name>

# Recreate all Recron services
docker compose -f docker-compose.yaml -f docker-compose.traefik.yml up -d --force-recreate

# Restart Traefik (if routing issues)
docker restart traefik

Full VPS Reset¶

When the VPS Docker environment is corrupted (broken image layers, failed extracts, stale containers) and individual fixes don't help, a full reset is the fastest path forward.

This destroys all data

This procedure removes all containers, images, volumes, and deployment files from the VPS. Only use this when you have no important data on the server, or when all state can be recreated by re-running pipelines.

Step 1: Stop everything and clean up¶

ssh root@YOUR_VPS_IP

# Stop all project stacks (with volumes)
cd /opt/keycloak && docker compose down -v 2>/dev/null || true
cd /opt/traefik && docker compose down -v 2>/dev/null || true
cd /opt/recron && docker compose -f docker-compose.yaml -f docker-compose.traefik.yml down -v 2>/dev/null || true
cd /opt/velvet && docker compose down -v 2>/dev/null || true

# Remove all Docker artifacts (images, build cache, orphan networks)
docker system prune --all -f
# Remove orphan volumes (compose down -v only removes named compose volumes,
# anonymous volumes from previous runs may remain)
docker volume prune -f

# Remove deployment directories
rm -rf /opt/keycloak /opt/traefik /opt/recron /opt/velvet

# containerd/overlayfs cleaning
systemctl stop docker
rm -rf /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*
systemctl start docker

# Recreate the shared proxy network
docker network create proxy

Step 2: Verify clean state¶

docker ps -a          # should be empty
docker images         # should be empty
docker volume ls      # should be empty
docker network ls     # should show only: bridge, host, none, proxy
ls /opt/              # should not contain keycloak, traefik, recron, velvet

Step 3: Re-deploy via pipelines¶

Re-run GitHub Actions workflows in this order (each depends on the previous):

Order	Repository	Workflow	Why first
1	Traefik	Deploy Traefik	Reverse proxy must be up for routing
2	Keycloak	Deploy Keycloak	Auth provider must be up for API auth
3	Recron	Build and Deploy	Depends on Traefik + Keycloak
4	VelvetUi	Deploy Demo	Independent, but needs Traefik

To trigger each pipeline, either push to main or manually re-run the workflow from the GitHub Actions tab.

Step 4: Verify deployment¶

# Check all containers are running
docker ps

# Check endpoints
curl -sf https://app.bluebraces.online/ > /dev/null && echo "Frontend: OK"
curl -sf https://api.bluebraces.online/health > /dev/null && echo "API: OK"
curl -sf https://auth.bluebraces.online/realms/master > /dev/null && echo "Keycloak: OK"
curl -sf https://docs.bluebraces.online/ > /dev/null && echo "Docs: OK"
curl -sf https://velvet.bluebraces.online/ > /dev/null && echo "Velvet: OK"

SSL certificates

After a full reset, Traefik will re-request SSL certificates from Let's Encrypt. This happens automatically on the first HTTPS request to each domain. If you hit rate limits, wait 1 hour and try again.

Contact¶

In case of issues:

Check logs in Aspire Dashboard
Search issues in the repository
Create a new issue with:
Problem description
Steps to reproduce
Error logs
System version