# ERRORS #
##########
# When data starts flowing between the two machines (".....data flowing....."), this script
# basically steps back and gets out of the way. It does a "wait" until the unloads+nzloads
# are finished. Usually, they are successful. I'm not expecting any errors. But if
# something untowards happens, here are some sample outputs for your reference.
#
# ----------------------------------------------------------------------
#
# If the nzload job dies (because of an 'nzsession abort') you would see
# messages such as this. (The 'Broken pipe' is a red-herring -- the unload
# died because the load died (and closed the pipe prematurely)).
#
# ERROR: The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 14:50:38
# ERROR: /tmp/nz_migrate_20080527_2411/nz_migrate_20080527_2411_1.pipe : Broken pipe
#
# ERROR: The nzload operation ( thread # 1 ) reported the following problem @ 2008-05-27 14:50:38
# Error: ERROR: Transaction rolled back by client
# Error: ERROR:Communication link failure
# Error: Load Failed, records not inserted.
#
# ----------------------------------------------------------------------
#
# If the nzload job dies (because of a 'kill') you would see
# messages such as this. Again, the 'Broken pipe' is a red-herring.
#
# ERROR: The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 14:55:53
# ERROR: /tmp/nz_migrate_20080527_8057/nz_migrate_20080527_8057_1.pipe : Broken pipe
#
# ERROR: The nzload operation ( thread # 1 ) reported the following problem @ 2008-05-27 14:55:53
# Error: ERROR: Transaction rolled back by client
# Error: load is aborted due to user cancellation.
#
# ----------------------------------------------------------------------
#
# If the unload (create external table) dies (because of an 'nzsession abort')
# you would see messages such as this.
#
# ERROR: The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 14:53:58
# pqFlush() -- connection not open
#
# ----------------------------------------------------------------------
#
# I paused my system (which was being used as both the source + target)
# Which resulted in these types of messages.
#
# ERROR: The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 14:57:47
# ERROR: Transaction rolled back due to restart or failover
#
# ERROR: The nzload operation ( thread # 1 ) reported the following problem @ 2008-05-27 14:57:47
# Error: ERROR: Transaction rolled back due to restart or failover
# Error: Load Failed, records not inserted.
#
# ----------------------------------------------------------------------
#
# If someone messes with the pipe (as in this case) it could throw errors for
# both the unload and the load.
#
# ERROR: The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 16:36:47
# ERROR: /tmp/nz_migrate_20080527_11217/nz_migrate_20080527_11217_1.pipe : Broken pipe
#
# ERROR: The nzload operation ( thread # 1 ) reported the following problem @ 2008-05-27 16:36:47
# Error: ERROR: External Table : count of bad input rows reached maxerrors limit, see nzlog file
# Error: Load Failed, records not inserted.
#
# ----------------------------------------------------------------------
#
# This was the error of interest that was being thrown.
# It is actually generated by nzload (and is simply reported back up the stack by nz_migrate)
#
# ERROR: The nzload operation ( thread # 3 ) reported the following problem @ 2009-09-04 03:29:21
# Error: Communication link failure
# Error: ERROR:Communication link failure
# Error: Load Failed, records not inserted.
#
# The following error was also reported ... but is simply a byproduct of the above error.
#
# ERROR: The unload to an external table (thread # 3 ) reported the following problem @ 2009-09-04 03:29:21
# ERROR: /tmp/nz_migrate_20090904_26809/nz_migrate_20090904_26809_3.pipe : Broken pipe
#
# This turned out to be a network problem ... which affected the data transfer (e.g., nzload) between the
# two boxes (nzload runs on the SOURCE host ... loading data across the network ... into the TARGET host)
#
# An ftp test worked ... but ran very slowly.
#
# Things to test and look at
# Get FTP working
# Get FTP working fast
# Make sure there are no network errors being reported ( /sbin/ifconfig )
#
# This problem turned out to be an incorrect configuration of network cards on one of
# the database hosts. The NICs were configured as Round Robin instead of master-backup.
# Due to Round-robin the network # interface kept switching between different switches
# impacting the performance of moving data across the network.
#
# ----------------------------------------------------------------------
#
# If you see ANYTHING that is flagged as an error in the output, you
# should follow up on it.
#########
# Notes #
#########
# nz_migrate is, for the most part, universal in nature. It can be used on any of
# these software versions.
#
# 3.0, 3.1, 4.0, 4.5, 4.6, 5.0, yada yada
#
# If you are migrating FROM 3.0, you will need to use the "-tbin <dirname>" option.
# nz_migrate is, for the most part, self contained. No other scripts/executables
# should be needed ... except if you include one of these options
#
# -format binary
# -cksum slow
# -cksum fast
# I have tested ASCII and BINARY migrations between the following revs ...
# everything seems to check out ok. And, because nzload is based on
# external tables (as of 3.1) there seems to be no issue with cross
# version compatability -- and thus no need to have a client toolkit
# installed on the source host. In general.
#
# 3.0-->3.1 You WILL need to have the client toolkit
# 3.0-->4.0 installed (i.e., "-tbin <dirname>" ) if your
# 3.0-->4.5 source system is rev 3.0.
#
# 3.1-->4.0
# 3.1-->4.5
#
# 4.0-->3.1
# 4.0-->4.5
#
# 4.5-->3.1
# 4.5-->4.0
# I've tried to make this script case insensitive. But that gets tricky at
# times. Pre 3.1 systems were all lower case. Post 3.1 systems can be
# either upper case or lower case. To say nothing of the fact that you
# can explicitly QUOTE a database/tablename -- to preserve its case
# sensitivity -- in any version. Plus, the very nature of this script
# causes it to often times be used to migrate data from one machine (running
# one version of the software) to another machine (running a different
# version of the software).
#
# At this time ... the script should be mostly case insensitive ... assuming
# it is most important to work in that environment (where one system might
# be uppercase and another system might be lowercase).
#
# I first try to access the table (on both the source and the target machines)
# using an unquoted tablename (which means case insensitive). If that doesn't
# find a match, then I try to access the table using a quoted tablename (which
# means it would have to be an exact match. Once I get a match, I then always
# quote it to make sure the box does what I want it to. This should allow me
# to migrate data from an UPPERCASE system to a lowercase system, and vice versa.
# Running 2 (or more) threads at a time can result in a measureable increase
# in the overall throughput -- by making better use of the multiple processors
# on the SMP host, as well as the house network. While one stream might be
# pended on I/O or a system call, the other stream(s) can forge ahead.
#
# The biggest performance gain is obtained with the 2nd thread, lesser so with
# the 3rd thread, and then only fractionally better thereafter.
#
# This script is set up to use 1 thread for small tables (less than 1M rows),
# since there is an overhead associated with starting up the threads.
# For larger tables, 4 threads will be used -- which seemed a reasonable
# number (and a reasonable compromise, all things considered).
# You can go higher (I've seen continued small improvements with 10 threads).
#
# Use 1 thread if you're interested in simplicity -- one external table
# unload job and 1 nzload job.
#
# Machine-->Machine migration speed will vary based on MANY factors (your
# data, the data skew, the class of NPS box your source database is on, the
# class of NPS box your target database is on, the size of the table, the
# number of columns in the table, the datatypes of each column, etc...)
#
# You should expect numbers in the 100 GB/hour range at the low end --
# and in the 300+ GB/hour range (near physics speeds!) at the high end.
# (The latter numbers were measured going from<-->to two 8250-HA class
# machines.)
#########
# SPEED #
#########
# Recent performance tests
#
# 8250 --> 8150, 52.9 GB, 46 Tables, 78.8 M Rows
# took ~ 0:08:00
# which works out to be about ~400 GB/hour
# -- as measured by the size of the BINARY DATA ON DISK
# -- for the right table ... I've seen this number spike up to ~450 GB/hour
# Even more recent performance tests
#
# 10400 --> 10400 using 4.0, and migrating ~10GB of ascii data
#
# -format ascii -threads 1 3m40s
# -format ascii -threads 2 2m37s
# -format ascii -threads 4 2m16s
#
# -format binary -threads 1 1m05s
# -format binary -threads 2 0m47s
# -format binary -threads 4 0m46s
#
# The tests below ... what do they tell me?
# Binary format is (usually, but not always) fastest.
# Multiple threads are important.
# If you are going to a different sized machine, ascii is
# roughly as fast as binary because the decompression
# takes place on the host instead of the SPUs. And it
# makes the use of multiple threads even more important,
# so more processors (on the SMP host) can be doing the
# decompression in parallel.
#
# 10400 --> 10100 -format ascii -format binary
# -threads 1 336 356
# -threads 4 205 184
# -threads 8 200 196
#
#
# 10100 --> 10100 -format ascii -format binary
# -threads 1 341 143
# -threads 4 223 106
# -threads 8 205 102
#
# Generic nzload timing note ...
# Ran load tests, averaging 470GB/Hour from the StoragePad ... with the load files
# split into 4 pieces, and running multiple nzloads simultaneously.
No comments:
Post a Comment