Skip to main content
Version: 3.17

Phrasea Technical Prerequisites

Objective

Present the technical architecture of the Phrasea software solution.

This page describes the different modules that compose it, as well as the software dependencies and external services required for its implementation.

It is intended for:

  • IT departments wishing to deploy Phrasea on their infrastructure
  • Application architects seeking to integrate Phrasea
  • Technical architects seeking to integrate Phrasea

Phrasea Application Source

The Phrasea application source code is available on GitHub:

Deployment Orchestration

Phrasea can be deployed using Docker/Docker Compose or on a Kubernetes cluster. Each technology offers scalability features and has its own advantages and disadvantages. Depending on your scalability needs and infrastructure management requirements, you can select the deployment method that best suits your project.

Docker Compose

Enables rapid deployment of Phrasea in development or local environments. The stack includes external services such as PostgreSQL, Redis, Elasticsearch, and MinIO. With Docker Compose profiles, you can select which services to start, offering flexibility depending on the needs of each environment. However, stack scaling will only be vertical.

Operating System: Linux Ubuntu 24.04 LTS, Debian 11

Minimum required versions:

  • Docker: 28.5.2
  • Docker Compose: 2.40.3

The Docker Compose file is included in the Phrasea GitHub repository to facilitate deployment in development environments:

You can also refer to the environment variable description page for information about available environment variables.

Kubernetes

Kubernetes is ideal for large-scale needs. Phrasea can be deployed on several nodes for horizontal scaling. For Kubernetes deployments, use the available Helm charts:

Minimum required versions:

  • Kubernetes: 1.29
  • Helm: v3.17.1

Note on Docker and Kubernetes deployments

The containers provided in Docker Compose and Helm charts for Phrasea are primarily designed for development and testing purposes. To ensure these containers are suitable for production, further customization is required. This includes optimizing configurations, securing access, managing resources, and implementing resilience and monitoring practices appropriate for production environments. These adjustments are essential before deploying to production.

It is strongly recommended to delegate primary datastores—such as PostgreSQL and S3-compatible object storage (served by MinIO)—to external managed services for improved performance, reliability, and scalability.

Phrasea Docker Images

External Images

Images referenced in the Docker Compose and Helm stack are not Phrasea images generated by Alchemy, but are necessary for development or test deployments. As previously explained, these images are intended to be replaced by external services.

For the exact deployed version in the Docker Compose deployment context, refer to the environment value. For the Kubernetes context, refer to values.yaml.

Network

  • Traefik reverse proxy and load balancer, including Let's Encrypt certification for some providers

Primary Datastores

Search Engine, Cache Server, and Application Bus

Additional tools useful for development and stack testing:

  • MariaDB – relational database server used for local Matomo stack deployment.
  • PgAdmin – management tool for PostgreSQL.
  • phpMyAdmin – management tool for MySQL/MariaDB.
  • Mailhog – SMTP server and mail client interface for capturing emails in development.
  • Elasticsearch-HQ – interface for querying Elasticsearch.
  • k6 – performance load testing tool.
  • InfluxDB – time series database used by k6.
  • Grafana – data visualization platform for k6 test results.
  • Mendhak – HTTP/HTTPS echo service for testing.
  • Jwilder – Dockerize, utility for orchestrating Docker services.
  • MinIO-MC – MinIO Client, management utility for MinIO.
  • Novu – Novu Notification framework backend infrastructure.

Technologies Used

Back-end

  • Programming Language:
    • Phrasea is developed in PHP with the Symfony Framework.
    • Node.js for certain modules (indexer)

Front-End

Clients consuming the Phrasea API are written with:

  • React / Typescript
  • Material UI

Database

  • PostgreSQL is the relational database that stores and/or references the various Phrasea objects. It is a primary datastore; containers used in the Docker or K8s stack are only for development and testing. An external service is recommended in production.
    The database server contains one database per service:
    • Configurator Database
    • Keycloak Database
    • Databox Database
    • Uploader Database
    • Expose Database
    • Report Database

Search Engine

  • Elasticsearch:
    • To optimize search and indexing of assets/collections and scalability, Phrasea uses Elasticsearch.
    • Denormalization of objects contained in the DBMS is performed in different indexes, allowing textual searches on relevant objects.

File Management and Storage

  • Object Storage:
    • S3-type object storage is used to store assets and renditions. It is a primary datastore; MinIO used in the Docker or Kubernetes stack is only for development and testing. An external service and backup solution are recommended in production.
  • Block Storage:
    • Local block storage is used for temporary files.
    • Network Attached Storage, NFS, or other shared storage solutions are used for sharing data between containers.

User Management and Security

  • Identity Management System (IAM)

Integration of Keycloak in Phrasea. More information here

Encryption

  • Data in transit is encrypted via TLS/SSL to secure communications between users and the applications.

Architecture Diagram

Technical Information

Infrastructure Capacity Prerequisites

Capacity requirements depend on your platform’s purpose and expected workload. The recommended CPU cores and RAM are indicative baselines and must be adjusted based on your specific use case. Phrasea is designed to scale horizontally, allowing you to start with modest resources and expand as needed for production workloads.

Metrics to consider:

  • Phrasea service(s) deployed
  • Number of concurrent write
  • Number of concurrent read
  • Storage volume required for documents
  • Volume of document indexes
  • Number of renditions to generate
  • Expected number of versions per document
  • Number of parallel processes deployed
  • Desired SLA (Service Level Agreement)
  • Log retention period

Phrasea images deployed with Docker Compose and Kubernetes

CPU: 6 Cores
RAM: 16 GB
Disk capacity for Docker image management: 100 GB
Disk capacity for temporary volumes: 100 GB

Managed Service

  • PostgreSQL:

    • CPU: 2 Cores
    • RAM: 8 GB
    • Disk size: depends on needs
  • Object Storage:

    • CPU: 2 Cores
    • RAM: 8 GB
    • Disk size: depends on needs

Production Considerations

  • Security: Use TLS for all external traffic (configure Traefik with Let’s Encrypt or your own certificates).
  • Monitoring: Set up logging and monitoring for capturing containers' standard output (e.g., Fluent Bit, Prometheus, Grafana).
  • APM: You can use the Sentry SaaS service for capturing stack errors; both front-end and back-end agents are installed in Phrasea's images. Set up your Sentry credentials here.
  • Backups: Implement regular backups for PostgreSQL and object storage.
  • Elasticsearch is not a primary datastore, but indexation time can be significant when the number of assets is large. Backing up Elasticsearch indexes allows for quick recovery after a failure. It is also advisable to use an external cluster for Elasticsearch.
  • RabbitMQ database needs to be persisted in case of failure and container restart.