Tuesday, May 20, 2014

nz_migrate errors

##########
# ERRORS #
##########

# When data starts flowing between the two machines (".....data flowing....."), this script
# basically steps back and gets out of the way.  It does a "wait" until the unloads+nzloads
# are finished.  Usually, they are successful.  I'm not expecting any errors.  But if
# something untowards happens, here are some sample outputs for your reference.
#
# ----------------------------------------------------------------------
#
# If the nzload job dies (because of an 'nzsession abort') you would see
# messages such as this.  (The 'Broken pipe' is a red-herring -- the unload
# died because the load died (and closed the pipe prematurely)).
#
# ERROR:  The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 14:50:38
# ERROR:  /tmp/nz_migrate_20080527_2411/nz_migrate_20080527_2411_1.pipe : Broken pipe
#
# ERROR:  The nzload operation ( thread # 1 ) reported the following problem @ 2008-05-27 14:50:38
# Error: ERROR:  Transaction rolled back by client
# Error: ERROR:Communication link failure
# Error: Load Failed, records not inserted.
#
# ----------------------------------------------------------------------
#
# If the nzload job dies (because of a 'kill') you would see
# messages such as this.  Again, the 'Broken pipe' is a red-herring.
#
# ERROR:  The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 14:55:53
# ERROR:  /tmp/nz_migrate_20080527_8057/nz_migrate_20080527_8057_1.pipe : Broken pipe
#
# ERROR:  The nzload operation ( thread # 1 ) reported the following problem @ 2008-05-27 14:55:53
# Error: ERROR:  Transaction rolled back by client
# Error: load is aborted due to user cancellation.
#
# ----------------------------------------------------------------------
#
# If the unload (create external table) dies (because of an 'nzsession abort')
# you would see messages such as this.
#
# ERROR:  The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 14:53:58
# pqFlush() -- connection not open
#
# ----------------------------------------------------------------------
#
# I paused my system (which was being used as both the source + target)
# Which resulted in these types of messages.
#
# ERROR:  The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 14:57:47
# ERROR:  Transaction rolled back due to restart or failover
#
# ERROR:  The nzload operation ( thread # 1 ) reported the following problem @ 2008-05-27 14:57:47
# Error: ERROR:  Transaction rolled back due to restart or failover
# Error: Load Failed, records not inserted.
#
# ----------------------------------------------------------------------
#
# If someone messes with the pipe (as in this case) it could throw errors for
# both the unload and the load.
#
# ERROR:  The unload to an external table (thread # 1 ) reported the following problem @ 2008-05-27 16:36:47
# ERROR:  /tmp/nz_migrate_20080527_11217/nz_migrate_20080527_11217_1.pipe : Broken pipe
#
# ERROR:  The nzload operation ( thread # 1 ) reported the following problem @ 2008-05-27 16:36:47
# Error: ERROR:  External Table : count of bad input rows reached maxerrors limit, see nzlog file
# Error: Load Failed, records not inserted.
#
# ----------------------------------------------------------------------
#
# This was the error of interest that was being thrown.
# It is actually generated by nzload (and is simply reported back up the stack by nz_migrate)
#
# ERROR: The nzload operation ( thread # 3 ) reported the following problem @ 2009-09-04 03:29:21
# Error: Communication link failure
# Error: ERROR:Communication link failure
# Error: Load Failed, records not inserted.
#
# The following error was also reported ... but is simply a byproduct of the above error.
#
# ERROR: The unload to an external table (thread # 3 ) reported the following problem @ 2009-09-04 03:29:21
# ERROR: /tmp/nz_migrate_20090904_26809/nz_migrate_20090904_26809_3.pipe : Broken pipe
#
# This turned out to be a network problem ... which affected the data transfer (e.g., nzload) between the
# two boxes (nzload runs on the SOURCE host ... loading data across the network ... into the TARGET host)
#
# An ftp test worked ... but ran very slowly.
#
# Things to test and look at
#      Get FTP working
#      Get FTP working fast
#      Make sure there are no network errors being reported ( /sbin/ifconfig )
#
# This problem turned out to be an incorrect configuration of network cards on one of
# the database hosts.  The NICs were configured as Round Robin instead of master-backup.
# Due to Round-robin the network # interface kept switching between different switches
# impacting the performance of moving data across the network.
#
# ----------------------------------------------------------------------
#

# If you see ANYTHING that is flagged as an error in the output, you
# should follow up on it.





#########
# Notes #
#########

# nz_migrate is, for the most part, universal in nature.  It can be used on any of
# these software versions.
#
#      3.0, 3.1, 4.0, 4.5, 4.6, 5.0, yada yada
#
# If you are migrating FROM 3.0, you will need to use the "-tbin <dirname>" option.

# nz_migrate is, for the most part, self contained.  No other scripts/executables
# should be needed ... except if you include one of these options
#
#      -format binary
#      -cksum slow
#      -cksum fast

# I have tested ASCII and BINARY migrations between the following revs ...
# everything seems to check out ok.  And, because nzload is based on
# external tables (as of 3.1) there seems to be no issue with cross
# version compatability -- and thus no need to have a client toolkit
# installed on the source host.  In general.
#
#      3.0-->3.1     You WILL need to have the client toolkit
#      3.0-->4.0     installed (i.e., "-tbin <dirname>" ) if your
#      3.0-->4.5     source system is rev 3.0.
#
#      3.1-->4.0
#      3.1-->4.5
#
#      4.0-->3.1
#      4.0-->4.5
#
#      4.5-->3.1
#      4.5-->4.0

# I've tried to make this script case insensitive.  But that gets tricky at
# times.  Pre 3.1 systems were all lower case.  Post 3.1 systems can be
# either upper case or lower case.  To say nothing of the fact that you
# can explicitly QUOTE a database/tablename -- to preserve its case
# sensitivity -- in any version.  Plus, the very nature of this script
# causes it to often times be used to migrate data from one machine (running
# one version of the software) to another machine (running a different
# version of the software).
#
# At this time ... the script should be mostly case insensitive ... assuming
# it is most important to work in that environment (where one system might
# be uppercase and another system might be lowercase).
#
# I first try to access the table (on both the source and the target machines)
# using an unquoted tablename (which means case insensitive).  If that doesn't
# find a match, then I try to access the table using a quoted tablename (which
# means it would have to be an exact match.  Once I get a match, I then always
# quote it to make sure the box does what I want it to.  This should allow me
# to migrate data from an UPPERCASE system to a lowercase system, and vice versa.

# Running 2 (or more) threads at a time can result in a measureable increase
# in the overall throughput -- by making better use of the multiple processors
# on the SMP host, as well as the house network.  While one stream might be
# pended on I/O or a system call, the other stream(s) can forge ahead.
#
# The biggest performance gain is obtained with the 2nd thread, lesser so with
# the 3rd thread, and then only fractionally better thereafter.
#
# This script is set up to use 1 thread for small tables (less than 1M rows),
# since there is an overhead associated with starting up the threads.
# For larger tables, 4 threads will be used -- which seemed a reasonable
# number (and a reasonable compromise, all things considered).
# You can go higher (I've seen continued small improvements with 10 threads).
#
# Use 1 thread if you're interested in simplicity -- one external table
# unload job and 1 nzload job.
#
# Machine-->Machine migration speed will vary based on MANY factors (your
# data, the data skew, the class of NPS box your source database is on, the
# class of NPS box your target database is on, the size of the table, the
# number of columns in the table, the datatypes of each column, etc...)
#
# You should expect numbers in the 100 GB/hour range at the low end --
# and in the 300+ GB/hour range (near physics speeds!) at the high end.
# (The latter numbers were measured going from<-->to two 8250-HA class
# machines.)





#########
# SPEED #
#########

# Recent performance tests
#
#     8250 --> 8150, 52.9 GB, 46 Tables, 78.8 M Rows
#              took ~ 0:08:00
#              which works out to be about ~400 GB/hour
#              -- as measured by the size of the BINARY DATA ON DISK
#              -- for the right table ... I've seen this number spike up to ~450 GB/hour

# Even more recent performance tests
#
#  10400 --> 10400 using 4.0, and migrating ~10GB of ascii data
#
#       -format ascii -threads 1     3m40s
#       -format ascii -threads 2     2m37s
#       -format ascii -threads 4     2m16s
#
#       -format binary -threads 1    1m05s
#       -format binary -threads 2    0m47s
#       -format binary -threads 4    0m46s
#

# The tests below ... what do they tell me?
#      Binary format is (usually, but not always) fastest.
#      Multiple threads are important.
#      If you are going to a different sized machine, ascii is
#           roughly as fast as binary because the decompression
#           takes place on the host instead of the SPUs.  And it
#           makes the use of multiple threads even more important,
#           so more processors (on the SMP host) can be doing the
#           decompression in parallel.
#
# 10400 --> 10100     -format ascii     -format binary
#      -threads 1           336              356
#      -threads 4           205              184
#      -threads 8           200              196
#
#
# 10100 --> 10100     -format ascii     -format binary
#      -threads 1           341              143
#      -threads 4           223              106
#      -threads 8           205              102
#


# Generic nzload timing note ...
# Ran load tests, averaging 470GB/Hour from the StoragePad ... with the load files
# split into 4 pieces, and running multiple nzloads simultaneously.

No comments:

Post a Comment