Responsibilities:
*Infrastructure Architecture & Scalability:**
* Lead the design, implementation, and management of highly available, fault-tolerant, and **auto-scalable infrastructure** on public cloud platforms including **AWS, Google Cloud Platform (GCP), and Azure**.
* Develop and implement advanced strategies for **scaling solutions** (e.g., microservices architecture, serverless) to handle increasing traffic, data loads, and global distribution efficiently.
* Serve as a subject matter expert for container orchestration platforms, primarily **Kubernetes**, including cluster design, advanced deployment strategies (e.g., Helm, Kustomize), service mesh integration, and complex troubleshooting.
* Champion **Docker** containerization best practices, image optimization, and artifact management in enterprise registries.
* Evaluate and integrate new cloud services and technologies to enhance infrastructure capabilities.
*Automation & CI/CD Excellence:**
* Drive end-to-end **automation** across the entire software development and operations lifecycle, from infrastructure provisioning to application deployment, testing, and monitoring.
* Design, implement, and maintain robust, high-performance **CI/CD pipelines** (e.g., Jenkins, GitLab CI/CD, GitHub Actions, Azure DevOps) that support rapid, reliable, and secure software releases at scale.
* Implement and enforce Infrastructure as Code (IaC) principles using advanced configurations with tools like Terraform, CloudFormation, or Ansible for consistent, repeatable, and auditable infrastructure provisioning.
* Develop custom automation scripts and tools to streamline complex operational workflows and reduce manual intervention.
*Networking, Security & Compliance:**
* Design, implement, and manage secure and optimized network architectures, including advanced configurations of **VPNs, VPCs, Subnets**, routing, and peering across multi-cloud environments.
* Implement and enforce stringent **secure infrastructure** practices, including advanced identity and access management (IAM) policies, network security groups, firewalls, Web Application Firewalls (WAFs), and robust security best practices for cloud environments.
* Conduct regular security audits, vulnerability assessments, and penetration testing remediation.
* Ensure infrastructure and processes adhere to relevant industry compliance standards (e.g., SOC2, ISO 27001, GDPR, HIPAA) as required.
* Manage and automate SSL/TLS certificate lifecycles and complex DNS configurations.
*Observability (Monitoring, Logging & Alerting):**
* Design and implement comprehensive observability solutions utilizing the **ELK Stack (Elasticsearch, Logstash, Kibana)** for centralized logging, data ingestion, analysis, and visualization.
* Set up robust monitoring and alerting systems (e.g., Prometheus, Grafana, CloudWatch, Stackdriver) to proactively identify, diagnose, and resolve issues across applications and infrastructure.
* Develop custom dashboards, metrics, and sophisticated alert thresholds to track system performance, application health, and user experience, enabling data-driven operational decisions.
* Implement distributed tracing and anomaly detection for deeper insights.
*Version Control & Collaboration:**
* Lead and enforce best practices for **GitHub versioning** and advanced branching strategies (e.g., GitFlow, Trunk-Based Development), ensuring code integrity and efficient team collaboration.
* Foster a strong DevOps culture, promoting seamless collaboration, shared ownership, and communication between development, QA, and operations teams.
* Conduct technical reviews of infrastructure code and deployment strategies.
*Troubleshooting & Performance Optimization:**
* Serve as a primary escalation point for complex infrastructure and application issues, performing deep-dive root cause analysis and implementing robust preventative measures.
* Proactively identify and resolve performance bottlenecks across the stack (application, database, network, infrastructure).
* Optimize resource utilization and implement advanced cloud cost management strategies to ensure efficiency without compromising performance or reliability.
*Documentation & Knowledge Sharing:**
* Create and maintain comprehensive, high-quality documentation for all infrastructure, deployment processes, operational runbooks, and architectural decisions.
* Lead knowledge-sharing sessions and mentor junior DevOps engineers, elevating the team's overall technical capabilities.
Qualifications:
*5+ years of hands-on, demonstrable experience as a Senior DevOps Engineer** or in a similar lead role, with significant contributions to large-scale, production-grade systems.
*Proven expertise in designing, implementing, and optimizing highly scalable, resilient, and secure solutions.**
*Expert-level proficiency with Kubernetes** for container orchestration, including advanced deployment patterns, cluster management, and troubleshooting.
*Deep expertise in Docker** for containerization, image optimization, and managing container lifecycles.
*Extensive, hands-on experience with at least two major cloud providers (AWS, GCP, Azure)**, including in-depth knowledge of their core compute, networking, storage, security, and managed services.
*Mastery in designing and implementing robust CI/CD pipelines** (e.g., Jenkins, GitLab CI/CD, GitHub Actions, Azure DevOps).
*Expertise in automation using scripting languages** (e.g., Python, Bash) and advanced Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation, Ansible).
*Strong, practical experience with the ELK Stack (Elasticsearch, Logstash, Kibana)** for centralized logging, analysis, and visualization.
*Advanced knowledge of networking concepts** (TCP/IP, HTTP/S, DNS, VPN, VPC, Subnets, Routing) and experience in designing and securing complex cloud network architectures.
*Demonstrable experience implementing and maintaining secure infrastructure**, including IAM, network security, and compliance frameworks.
* Proficiency with **GitHub versioning** and advanced collaborative development workflows.
* Experience with other monitoring/alerting tools (e.g., Prometheus, Grafana).
* Exceptional analytical and complex problem-solving skills, with a proven ability to debug and resolve critical production issues quickly.
* Excellent communication, interpersonal, and presentation skills, with the ability to effectively articulate complex technical concepts to both technical and non-technical audiences.
* Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field, or equivalent extensive practical experience.
#JobOpening #Hiring #JobSearch #NowHiring #CareerOpportunity #Employment #JobOpportunity #JobListing #JobPosting #JobAlert #recruitment
If interested can forward your updated resumes on hr5@tasolutions.in and can directly contact us on 9056679449 also can provide our reference to your friends and colleagues.