Job description
Data Collection: Extraction of data from various sources, whether it be databases, files, or real-time data streams.
Data Cleaning and Transformation: Cleaning, filtering, enriching, and transforming data to prepare it for analysis. This may include handling missing data, normalization, format conversion, etc.
Data Pipeline Design: Creation of data pipelines to automate data flow, including managing dependencies between different pipeline stages.
Data Storage: Selection of appropriate storage solutions, whether it be Google Cloud Storage, Bigtable, BigQuery, or other GCP services.
Data Integration: Integrating data into data warehouses, columnar data stores, NoSQL databases, or data lakes.
Data Quality Management: Implementation of data quality controls to ensure data integrity and quality.
Data Security: Implementation of security measures to protect sensitive data, including data access, identity and access management, encryption, etc.
Performance Optimization: Monitoring and optimizing the performance of data pipelines to ensure quick response to queries and efficient resource utilization.
Documentation: Documenting data pipelines, data schemas, and processes to facilitate understanding and collaboration.
Automation: Automating ETL (Extract, Transform, Load) processes to minimize manual intervention.
Collaboration: Collaborating with data scientists, analysts, and other team members to understand their needs and ensure data readiness for analysis.
Monitoring: Constant monitoring of data pipelines to detect and resolve potential issues.
Scalability: Designing scalable data pipelines capable of handling growing data volumes.
This list of tasks is not exhaustive and is subject to change.
Profile sought
GCP Mastery: A deep understanding of GCP services and tools is essential for designing and implementing data engineering solutions.
Real-time Data Processing: Ability to design and implement real-time data pipelines using services like Dataflow or Pub/Sub.
Batch Data Processing: Competence in creating batch data processing workflows with tools like Dataprep, Dataprep, and BigQuery.
Programming Languages: Proficiency in programming languages such as Python, Java, or Go for script and application development.
Databases: Knowledge of both NoSQL databases (Cloud Bigtable, Firestore) and SQL databases (BigQuery, Cloud SQL) for data storage and retrieval.
Data Security: Understanding of data security best practices, including authorization management, encryption, and compliance.
Orchestration Tools: Ability to use orchestration tools such as Cloud Composer or Cloud Dataflow to manage data pipelines.
Problem-solving: Aptitude to solve complex problems related to data collection, processing, and storage, as well as optimizing the performance of data pipelines.
Estas empresas también contratan para el puesto de "{profesión}".
Ver todas las ofertas