Job Title: Enabling Areas - Data Scientist - NLP/Gen AI - DM - Information Technology
Job Title – Data Scientist (NLP)
Key Responsibilities:
1. NLP Model Development: Design, develop, and implement state-of-the-art NLP algorithms and models to extract insights and patterns from large volumes of textual data. Leverage machine learning and deep learning techniques to tackle NLP tasks such as text classification, entity extraction, and document summarization.
2. Data Preprocessing and Feature Engineering: Preprocess and clean textual data to prepare it for analysis and modeling. Perform feature engineering to extract relevant features and representations from raw text, including word embeddings, n-grams, and syntactic features.
3. Model Evaluation and Validation: Evaluate the performance of NLP models using appropriate metrics and statistical techniques. Conduct rigorous testing and validation to ensure the accuracy, robustness, and generalization of NLP models across different datasets and domains.
4. NLP Integration: Collaborate with cross-functional teams, including software engineers and product managers, to integrate NLP solutions into production systems and applications. Ensure seamless interoperability and performance optimization of NLP components within larger AI systems.
5. Research and Innovation: Stay up-to-date with the latest research in NLP and Gen AI, and apply cutting-edge methodologies to real-world problems. Explore new methodologies, tools, and approaches to improve the effectiveness and efficiency of NLP solutions. Contribute to research publications and thought leadership in the field of NLP.
6. Data Analysis and Visualization: Analyze and interpret results from NLP models to extract meaningful insights and actionable recommendations. Visualize findings using appropriate techniques and tools to communicate complex information effectively to stakeholders.
7. Documentation and Reporting: Document methodologies, findings, and insights from NLP projects in clear and concise reports. Communicate technical concepts and results to both technical and non-technical audiences, including management and business stakeholders.
Qualifications:
•Master's or Ph.D. degree in Computer Science, Engineering, Mathematics, Statistics, or a related field.
•Minimum of 5+ years of proven experience as a Data Scientist with a focus on NLP and 2 years on Gen AI, preferably in industry or research settings.
•Strong knowledge of NLP techniques and algorithms, including text preprocessing, feature extraction, and supervised and unsupervised learning methods.
•Proficiency in programming languages such as Python or R, and libraries such as NLTK, spaCy, scikit-learn, and TensorFlow/PyTorch for NLP.
•Familiarity with data visualization tools such as Matplotlib, Seaborn, or Plotly.
•Experience with containerization and orchestration technologies such as Docker and Kubernetes.
•Basic understanding of system design and ML-Ops practices.
•Excellent problem-solving skills and analytical thinking.
•Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams.
•Proven ability to manage multiple projects simultaneously and meet deadlines in a fast-paced environment.
Preferred Qualifications:
•Experience with deep learning architectures for NLP, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and transformer models (e.g., BERT, GPT).
•Knowledge of domain-specific NLP tasks and applications, such as chatbots, document understanding, or information extraction.
•Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.