Our partner is a leading financial services organization operating a large-scale, enterprise IT environment with millions of users. Their focus is on building stable, scalable, and modern digital platforms while continuously improving reliability and user experience. As part of a major technology transformation, they are strengthening our observability and reliability engineering capabilities.
Tasks:
- Contribute to the design and implementation of observability solutions (HLD, LLD)
- Build and operate logging, metrics, and distributed tracing systems
- Design and maintain monitoring dashboards and alerting strategies
- Support incident analysis and root cause investigations
- Drive improvements to system reliability using SRE principles
- Define and implement observability standards and best practices
- Automate monitoring and operational workflows
- Collaborate with infrastructure and application teams to improve system visibility and operability
Requirements
- Degree in Computer Science or a related field
- 2–3+ years of experience with modern observability tools (e.g. Prometheus, Grafana, ELK, Dynatrace, Splunk, OpenTelemetry)
- 2–3+ years of experience in infrastructure or cloud operations (on-prem and/or cloud)
- Hands-on experience with containerized and cloud environments (Kubernetes, AWS, Azure)
- Strong understanding of SRE principles and proactive problem-solving
- Ability to analyze complex systems and identify patterns across logs, metrics, and traces
- Intermediate level of English (technical communication)
- Structured thinking, strong communication, and collaborative mindset
Nice to have:
- Experience in financial or enterprise environments
- Familiarity with Agile methodologies
- Knowledge of large-scale integration architectures
- Experience applying ML/AI in observability use cases
Benefits
- Competitive compensation and comprehensive benefits package
- Hybrid working model with home office flexibility
- Support for professional development and continuous learning
- Access to health and sports programs
- Opportunity to shape observability strategy in a large-scale environment
- Collaborative, knowledge-sharing engineering culture
- Long-term stability with ongoing digital transformation projects