Amazon Comprehend
Amazon Comprehend is a fully managed Natural Language Processing(NLP) service that uses machine learning to analyze and understand natural language text. It is region-specific, means, not available in all regions — check regional support. Also, Automatically scales to handle large volumes of text — no ML expertise needed.
1. Key Features
- Entity Recognition: Identifies entities such as people, places, organizations, dates.
- Sentiment Analysis: Classifies text as Positive, Negative, Neutral, or Mixed.
- Key Phrase Extraction: Identifies key phrases or concepts in a body of text.
- Language Detection: Detects the dominant language of a document.
- PII Detection: Identifies and redacts personally identifiable information.
- Topic Modeling: Automatically organizes a collection of documents by topics.
- Custom Classification: Train your own model to categorize documents based on your data.
- Custom Entity Recognition: Train Comprehend to detect domain-specific entities.
2. Integration
- Commonly invoked via AWS Lambda inside serverless apps.
- Often integrated with Amazon S3, API Gateway, and Step Functions for workflows.
3. Use Cases
- Customer feedback analysis (sentiment, themes).
- Document classification and tagging.
- Data redaction for compliance (GDPR, HIPAA).
- Social media monitoring.
4. Question on Comprehend - 2025
A company is developing a highly available natural language processing (NLP) application. The application handles large volumes of concurrent requests. The application performs NLP tasks such as entity recognition, sentiment analysis, and key phrase extraction on text data.
The company needs to store data that the application processes in a highly available and scalable database.
- Create an Amazon API Gateway REST API endpoint to handle incoming requests. Configure the REST API to invoke an AWS Lambda function for each request. Configure the Lambda function to call Amazon Comprehend to perform NLP tasks on the text data. Store the processed data in Amazon DynamoDB. (
Correct Ans) - Create an Amazon API Gateway HTTP API endpoint to handle incoming requests. Configure the HTTP API to invoke an AWS Lambda function for each request. Configure the Lambda function to call Amazon Translate to perform NLP tasks on the text data. Store the processed data in Amazon ElastiCache.
- Create an Amazon SQS queue to buffer incoming requests. Deploy the NLP application on Amazon EC2 instances in an Auto Scaling group. Use Amazon Comprehend to perform NLP tasks. Store the processed data in an Amazon RDS database.
- Create an Amazon API Gateway WebSocket API endpoint to handle incoming requests. Configure the WebSocket API to invoke an AWS Lambda function for each request. Configure the Lambda function to call Amazon Textract to perform NLP tasks on the text data. Store the processed data in Amazon ElastiCache.
Explanation:
The question emphasizes high availability, scalability, and correct NLP functionality (entity recognition, sentiment, etc.). Amazon Comprehend is the correct AWS service for NLP tasks like:
- Entity recognition
- Sentiment analysis
- Key phrase extraction
API Gateway + Lambda is highly scalable and serverless — perfect for handling high concurrency.
Amazon DynamoDB is highly available and scales automatically — a great match for storing processed data with minimal management overhead.
Why NOT the others?
- Amazon Translate is used for language translation, not NLP tasks like sentiment analysis or entity recognition.
- ElastiCache is an in-memory cache, not a suitable primary database for persistent processed data.
- Amazon SQS + EC2 Auto Scaling works for high throughput but involves more operational overhead.
- RDS is scalable to an extent, but not as scalable or highly available with minimal ops as DynamoDB.
- Amazon Textract is for extracting text from documents/images — not general-purpose NLP.
- WebSocket API is used for bidirectional communication (e.g., chat apps), not suited for REST-style text analysis requests.