Running Many Batch Statements in Parallel

When designing highly scalable architectures for modern machines, you will often need to do some form of manual parallelism control. Managing this is not always easy, but in this blog I will give you one piece of my toolbox to help you.

Let us walk through an example together, a tiny case study. This is a problem which many of you will be familiar with.

Let us say you have 16 files that you want to load into the same table in your database in an automated manner. The naïve approach will do something like this:

BULK INSERT MyTarget FROM ‘C:tempMyFile1′ WITH
(FIELDTERMINATOR = ‘;’, ROWTERMINATOR = ‘n’)

BULK INSERT MyTarget FROM ‘C:tempMyFile2′ WITH
(FIELDTERMINATOR = ‘;’, ROWTERMINATOR = ‘n’)

… etc…

BULK INSERT MyTarget FROM ‘C:tempMyFile16′ WITH
(FIELDTERMINATOR = ‘;’, ROWTERMINATOR = ‘n’)

Now here is the problem with this approach: it executes one statement at a time. Sequential execution is BAD, you need to stop thinking about the world like that if you want to scale on a modern architecture.

Lets assume we have enough hardware resources (in this case, it would take a blade server and a decent I/O system). What we really want is to run every one of these statements in parallel. Unfortunately, SQL server does not have a command to start up new connection from inside T-SQL… what to do?

Getting to the Command Line

Because you cannot execute more than one command on a single connection at a time, we will need multiple connections to SQL Server and this mean we have to go back to the command line. Let us start by creating a little batch file Worker.Cmd with this content:

REM Worker.Cmd File

CALL SQLCMD –S.MyServer –q”BULK INSERT MyTarget FROM ‘C:TempMyFile%1’ …EXIT

This allows us to invoke a bulk load for the first file by executing: Worker.Cmd 1

Unfortunately, we still cannot start multiple connections without manually firing up a lot of command prompts. The coders in the audience may at this point reach for their favorites programming language to write a little utility that can spawn multiple copies of an executable.

However, there is a problem with such a home made executable: you cannot generally rely on a server having the necessary runtime libraries. Typical comments might be:

“No, we don’t have .NET 4.0 here, this is not yet certified by our infrastructure department. Could you recompile it for 1.1 please?”.

“Power Shell is much too fancy for us, what is wrong with running this on Windows 2000?”

Perhaps this customer is just skeptical about letting you run your executable on a server. This may sound silly, but I have seen this happen too many times to make assumptions.

Start to the Rescue

There is a very nice little utility for the good old command prompt that allows you to fire up new processes: START.EXE. This comes with all versions of Windows and it takes any command line executable as input, fires it up in a new thread and returns control back to the caller.

Using start.exe, we can write batch script that fire up multiple copies of the same executable. It looks like this:

REM SpawnMany.cmd

REM Author: Thomas Kejser
REM Purpose: Spawns many copies of the same executable. Useful for running many things in parallel

@ECHO OFF
ECHO Spawning %2 copies of %1

FOR /L %%i IN (1, 1, %2) DO (
ECHO Spawning thread %%i
START "Worker%%i" /Min %1 %%i %2
)

Each new process is started in a minimized window and we pass the thread number and the total number of threads to it. Using this little batch script, we can now do this:

SpawnMany.exe Worker.Exe 16

This starts 16 workers, each with their own thread number assigned. Very useful for running stuff in parallel in a quick and dirty way. For example, I use this to run the TPC-H data generator dbgen.exe highly parallelized.

Notice that I added the EXIT command at the end of the worker.cmd batch. This makes sure that the window closes itself when done executing.

Summary

In this blog, I have shown you how to write a little batch script to fire up multiple threads, from the command line. each doing their own work in parallel. The script is “zero dependency” which makes it ideal for server use and for hacking together quick and dirty parallelism for test scenarios.

I mentioned that SQL server does not have a way to start up new connections from T-SQL. This is not strictly true. Sorry for leading you astray, but I wanted you to see how to do this from the command line first (and go through the pain Devil ). There is a way to hack SQL Server and implement a stored procedure I like to call sp_executesql_async. This will be the subject of a future blog, but since I am heading into a lab for a few weeks, you just have to wait for it.

More about Warehousing at: DW and Big Data

The post Running Many Batch Statements in Parallel appeared first on Fighting Bad Data Modeling.

Running Many Batch Statements in Parallel

Getting to the Command Line

Start to the Rescue

Summary

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112