cdc using aws dms ๐พ ๐
aws dms
powerful tool for implementing change data capture (cdc)
data migration task within DMS needed for:
initial data load to ensure synchronization (before cdc, transfer initial snapshot of source db to target db)
ongoing replication (dms task configured to continuously capture changes from source db and apply to target db)
flexible config (aws dms allows for custom replication process e.g. filter tables/transformation to data/error handling)
key components of aws dms
replication instance (construction worker) - compute resource (e.g. ec2 instance) that powers the migration process. handles connection to source and target databases, applies transformations, routes the data to the target, security groups + network access
endpoint - defines connection details for specific database. db type + hostname + port + username + password
dm task (blueprint) - config defining specific migration/replication job. specifies src and target endpoints + replication method (full load/cdc) etc.
process flow
change occurs in src db, dms replication instance captures change
replication instance monitors src db continuously, extracts the changes, sends it to kinesis data stream
kinesis data stream acts like a pipeline, receives change data, stores it in shards
application code (for e.g) consumes data from kinesis, does transformation, and can directly interact with target db to store the processed data