excel - 中間処理を伴う MS Excel から MS Access へのエクスポート

Question

Excel (.XLS) 形式でレポートを生成するアプリケーションがあります。これらのレポートのデータを MS Access 2010 データベースの既存のテーブルに追加する必要があります。典型的なレコードは次のとおりです。

INC000000004154 Closed  Cbeebies    BBC Childrens   HQ6 monitor wall dropping out.  HQ6 P3  3/7/2013 7:03:01 PM 3/7/2013 7:03:01 PM 3/7/2013 7:14:15 PM The root cause of the problem was the power supply to the PC which was feeding the monitor. HQ6 Monitor wall dropping out. BBC Third Party Contractor supply this equipment.

複雑なのは、データに対していくつかの限定的な処理を行う必要があることです。ビズ

ダイアグラム

具体的には、名前を数値に変換するルックアップをいくつか実行し、日付文字列を解析する必要があります (何らかの理由で、レポートは日付を日付形式ではなくテキスト形式でスプレッドシートに入れます)。

XLRD /XLWT を使用して Python でこれを行うことができるようになりましたが、Excel または Access で行うことをお勧めします。これにアプローチする良い方法について誰かアドバイスはありますか? VBA を使用したくないので、MS Excel マクロを記録し、新しく作成した XLS ファイルでそのマクロを実行することはできますか?

score 2 · Accepted Answer

一部のExcelデータをMS Accessに直接インポートできますが、要件が何らかの処理を行うことである場合、それなしでそれを達成する方法がわかりません。

PentahoやTalendなどのETLアプリケーション。_ それは確かにハンマーを使ってアリを粉砕するようなものです。
Python またはその他のプログラミング言語による、その他の外部データ処理パイプライン。
VBA (マクロまたはハンドコーディングによるもの)。VBA は、文字通り何十年もの間、Access でそのようなことを行うのに非常に優れています。
Excel と Access を使用しているため、その領域内にとどまることが、問題を解決するための最良の解決策のように見えます。
クエリを使用するだけ:
データを変換せずに、Excel からのデータを格納することのみを目的としたテーブルにインポートします。次に、その生データからクエリを作成して、欠落している情報を追加し、結果を最終的な宛先テーブルに追加する前にデータをマッサージします。
このソリューションには、マクロを使用して簡単に記録できる簡単な手順を Access で作成できるという利点があります。

score 1 · Accepted Answer

少し前にこの質問をして、Python で行う方が簡単だと判断しました。Gord は私に共有するように求めました。

"""
Routine to migrate the S7 data from MySQL to the new Access
database.

We're using the pyodbc libraries to connect to Microsoft Access
Note that there are 32- and 64-bit versions of these libraries
available but in order to work the word-length for pyodbc and by
implication Python and all its associated compiled libraries must
match that of MS Access. Which is an arse as I've just had to
delete my 64-bit installation of Python and replace it and all
the libraries with the 32-bit version. 

Tim Greening-Jackson 08 May 2013 (timATgreening-jackson.com)
"""

import pyodbc
import re
import datetime
import tkFileDialog
from Tkinter import *

class S7Incident:
    """
    Class containing the records downloaded from the S7.INCIDENTS table 
    """
    def __init__(self, id_incident, priority, begin, acknowledge,
         diagnose, workaround,fix, handoff, lro, nlro,
         facility, ctas, summary, raised, code):
    self.id_incident=unicode(id_incident)
    self.priority = {u'P1':1, u'P2':2, u'P3':3, u'P4':4, u'P5':5} [unicode(priority.upper())]
    self.begin = begin
    self.acknowledge = acknowledge
    self.diagnose = diagnose
    self.workaround = workaround
    self.fix = fix
    self.handoff = True if handoff else False
    self.lro = True if lro else False
    self.nlro = True if nlro else False
    self.facility = unicode(facility)
    self.ctas = ctas
    self.summary = "** NONE ***" if type(summary) is NoneType else summary.replace("'","")
    self.raised = raised.replace("'","")
    self.code = 0 if code is None else code
    self.production = None
    self.dbid = None

    def __repr__(self):
    return "[{}] ID:{} P{} Prod:{} Begin:{} A:{} D:+{}s W:+{}s F:+{}s\nH/O:{} LRO:{} NLRO:{} Facility={} CTAS={}\nSummary:'{}',Raised:'{}',Code:{}".format(
        self.id_incident,self.dbid, self.priority, self.production, self.begin,
        self.acknowledge, self.diagnose, self.workaround, self.fix,
        self.handoff, self.lro, self.nlro, self.facility, self.ctas,
        self.summary, self.raised, self.code)

    def ProcessIncident(self, cursor, facilities, productions):
    """
    Produces the SQL necessary to insert the incident in to the Access
    database, executes it and then gets the autonumber ID (dbid) of the newly
    created incident (this is used so LRO, NRLO CTAS and AD1 can refer to
    their parent incident.

    If the incident is classed as LRO, NLRO, CTAS then the appropriate
    record is created. Returns the dbid.
    """
    if self.raised.upper() in productions:
        self.production = productions[self.raised.upper()]
    else:
       self.production = 0

    sql="""INSERT INTO INCIDENTS (ID_INCIDENT, PRIORITY, FACILITY, BEGIN,
    ACKNOWLEDGE, DIAGNOSE, WORKAROUND, FIX, HANDOFF, SUMMARY, RAISED, CODE, PRODUCTION)
    VALUES ('{}', {}, {}, #{}#, {}, {}, {}, {}, {}, '{}', '{}', {}, {})
    """.format(self.id_incident, self.priority, facilities[self.facility], self.begin,
       self.acknowledge, self.diagnose, self.workaround, self.fix,
       self.handoff, self.summary, self.raised, self.code, self.production)
    cursor.execute(sql)
    cursor.execute("SELECT @@IDENTITY")
    self.dbid = cursor.fetchone()[0]

    if self.lro:
        self.ProcessLRO(cursor, facilities[self.facility])

    if self.nlro:
        self.ProcessNLRO(cursor, facilities[self.facility])

    if self.ctas:
        self.ProcessCTAS(cursor, facilities[self.facility], self.ctas)

    return self.dbid

    def ProcessLRO(self, cursor, facility):
    sql = "INSERT INTO LRO (PID, DURATION, FACILITY) VALUES ({}, {}, {})"\
          .format(self.dbid, self.workaround, facility)
    cursor.execute(sql)

    def ProcessNLRO(self, cursor, facility):
    sql = "INSERT INTO NLRO (PID, DURATION, FACILITY) VALUES ({}, {}, {})"\
          .format(self.dbid, self.workaround, facility)
    cursor.execute(sql)

    def ProcessCTAS(self, cursor, facility, code):
    sql = "INSERT INTO CTAS (PID, DURATION, FACILITY, CODE) VALUES ({}, {}, {}, {})"\
          .format(self.dbid, self.workaround, facility, self.ctas)
    cursor.execute(sql)


class S7AD1:
    """
    S7.AD1 records.
    """
    def __init__(self, id_ad1, date, ref, commentary, adjustment):
    self.id_ad1 = id_ad1
    self.date = date
    self.ref = unicode(ref)
    self.commentary = unicode(commentary)
    self.adjustment = float(adjustment)
    self.pid = 0
    self.production = 0

    def __repr__(self):
    return "[{}] Date:{} Parent:{} PID:{} Amount:{} Commentary: {} "\
           .format(self.id_ad1, self.date.strftime("%d/%m/%y"), self.ref, self.pid, self.adjustment, self.commentary)

    def SetPID(self, pid):
    self.pid = pid

    def SetProduction(self, p):
    self.production = p

    def Process(self, cursor):
    sql = "INSERT INTO AD1 (pid, begin, commentary, production, adjustment) VALUES ({}, #{}#, '{}', {}, {})"\
          .format(self.pid, self.date.strftime("%d/%m/%y"), self.commentary, self.production, self.adjustment)
    cursor.execute(sql)

class S7Financial:
    """
    S7 monthly financial summary of income and penalties from S7.FINANCIALS table.
    These are identical in the new database
    """
    def __init__(self, month, year, gco, cta, support, sc1, sc2, sc3, ad1):
    self.begin = datetime.date(year, month, 1)
    self.gco = float(gco)
    self.cta = float(cta)
    self.support = float(support)
    self.sc1 = float(sc1)
    self.sc2 = float(sc2)
    self.sc3 = float(sc3)
    self.ad1 = float(ad1)

    def __repr__(self):
    return "Period: {} GCO:{:.2f} CTA:{:.2f} SUP:{:.2f} SC1:{:.2f} SC2:{:.2f} SC3:{:.2f} AD1:{:.2f}"\
           .format(self.start.strftime("%m/%y"), self.gco, self.cta, self.support, self.sc1, self.sc2, self.sc3, self.ad1)

    def Process(self, cursor):
    """
    Insert in to FINANCIALS table
    """
    sql = "INSERT INTO FINANCIALS (BEGIN, GCO, CTA, SUPPORT, SC1, SC2, SC3, AD1) VALUES (#{}#, {}, {}, {}, {}, {}, {},{})"\
          .format(self.begin, self.gco, self.cta, self.support, self.sc1, self.sc2, self.sc3, self.ad1)
    cursor.execute(sql)

class S7SC3:
    """
    Miscellaneous S7 SC3 stuff. The new table is identical to the old one. 
    """
    def __init__(self, begin, month, year, p1ot, p2ot, totchg, succchg, chgwithinc, fldchg, egychg):
    self.begin = begin
    self.p1ot = p1ot
    self.p2ot = p2ot
    self.changes = totchg
    self.successful = succchg
    self.incidents = chgwithinc
    self.failed = fldchg
    self.emergency = egychg

    def __repr__(self):
    return "{} P1:{} P2:{} CHG:{} SUC:{} INC:{} FLD:{} EGY:{}"\
           .format(self.period.strftime("%m/%y"), self.p1ot, self.p1ot, self.changes, self.successful, self.incidents, self.failed, self.emergency)


    def Process(self, cursor):
    """
    Inserts a record in to the Access database
    """
    sql = "INSERT INTO SC3 (BEGIN, P1OT, P2OT, CHANGES, SUCCESSFUL, INCIDENTS, FAILED, EMERGENCY) VALUES\
        (#{}#, {}, {}, {}, {}, {}, {}, {})"\
          .format(self.begin, self.p1ot, self.p2ot, self.changes, self.successful, self.incidents, self.failed, self.emergency)
    cursor.execute(sql)

def ConnectToAccessFile():
    """
    Prompts the user for an Access database file, connects, creates a cursor,
    cleans out the tables which are to be replaced, gets a hash of the facilities
    table keyed on facility name returning facility id
    """
    # Prompts the user to select which Access DB file he wants to use and then attempts to connect
    root = Tk()
    dbname = tkFileDialog.askopenfilename(parent=root, title="Select output database", filetypes=[('Access 2010', '*.accdb')])
    root.destroy()
    # Connect to the Access (new) database and clean its existing incidents etc. tables out as
    # these will be replaced with the new data
    dbcxn = pyodbc.connect("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ="+dbname+";")
    dbcursor=dbcxn.cursor()
    print("Connected to {}".format(dbname))
    for table in ["INCIDENTS", "AD1", "LRO", "NLRO", "CTAS", "SC3", "PRODUCTIONS", "FINANCIALS"]:
    print("Clearing table {}...".format(table))
    dbcursor.execute("DELETE * FROM {}".format(table))
    # Get the list of facilities from the Access database...
    dbcursor.execute("SELECT id, facility FROM facilities")
    rows = dbcursor.fetchall()
    dbfacilities = {unicode(row[1]):row[0] for row in rows}
    return dbcxn, dbcursor, dbfacilities

# Entry point

incre = re.compile("INC\d{12}[A-Z]?") # Regex that matches incident references

try:
    dbcxn, dbcursor, dbfacilities = ConnectToAccessFile()
    # Connect to the MySQL S7 (old) database and read the incidents and ad1 tables
    s7cxn = pyodbc.connect("DRIVER={MySQL ODBC 3.51 Driver}; SERVER=localhost;DATABASE=s7; UID=root; PASSWORD=********; OPTION=3")
    print("Connected to MySQL S7 database")
    s7cursor = s7cxn.cursor()
    s7cursor.execute("""
    SELECT id_incident, priority, begin, acknowledge,
    diagnose, workaround, fix, handoff, lro, nlro,
    facility, ctas, summary, raised, code FROM INCIDENTS""")
    rows = s7cursor.fetchall()
    # Discard any incidents which don't have a reference of the form INC... as they are ancient
    print("Fetching incidents")
    s7incidents = {unicode(row[0]):S7Incident(*row) for row in rows if incre.match(row[0])}

    # Get the list of productions from the S7 database to replace the one we've just deleted ...

    print("Fetching productions")    
    s7cursor.execute("SELECT DISTINCT RAISED FROM INCIDENTS")
    rows = s7cursor.fetchall()
    s7productions = [r[0] for r in rows]

    # ... now get the AD1s ...

    print("Fetching AD1s")    
    s7cursor.execute("SELECT id_ad1, date, ref, commentary, adjustment from AD1")
    rows = s7cursor.fetchall()
    s7ad1s = [S7AD1(*row) for row in rows]

    # ... and the financial records ...

    print("Fetching Financials")    
    s7cursor.execute("SELECT month, year, gco, cta, support, sc1, sc2, sc3, ad1 FROM Financials")
    rows = s7cursor.fetchall()
    s7financials = [S7Financial(*row) for row in rows]
    print("Writing financials ({})".format(len(s7financials)))
    [p.Process(dbcursor) for p in s7financials]

    # ... and the SC3s.

    print("Fetching SC3s")    
    s7cursor.execute("SELECT begin, month, year, p1ot, p2ot, totchg, succhg, chgwithinc, fldchg, egcychg from SC3")
    rows = s7cursor.fetchall()
    s7sc3s = [S7SC3(*row) for row in rows]
    print("Writing SC3s ({})".format(len(s7sc3s)))
    [p.Process(dbcursor) for p in s7sc3s]

    # Re-create the productions table in the new database. Note we refer to production
    # by number in the incidents table so need to do the SELECT @@IDENTITY to give us the
    # autonumber index. To make sure everything is case-insensitive convert the
    # hash keys to UPPERCASE.

    dbproductions = {}
    print("Writing productions ({})".format(len(s7productions)))
    for p in sorted(s7productions):
    dbcursor.execute("INSERT INTO PRODUCTIONS (PRODUCTION) VALUES ('{}')".format(p))
    dbcursor.execute("SELECT @@IDENTITY")
    dbproductions[p.upper()] = dbcursor.fetchone()[0]


    # Now process the incidents etc. that we have retrieved from the S7 database

    print("Writing incidents ({})".format(len(s7incidents)))
    [s7incidents[k].ProcessIncident(dbcursor, dbfacilities, dbproductions) for k in sorted(s7incidents)]

    # Match the new parent incident IDs in the AD1s and then write to the new table. Some
    # really old AD1s don't have the parent incident reference in the REF field, it is just
    # mentioned SOMEWHERE in the commentary. So if the REF field doesn't match then do a
    # re.search (not re.match!) for it. It isn't essential to match these older AD1s with
    # their parent incident, but it is quite useful (and tidy).

    print("Matching and writing AD1s".format(len(s7ad1s)))
    for a in s7ad1s:
    if a.ref in s7incidents:
        a.SetPID(s7incidents[a.ref].dbid)
        a.SetProduction(s7incidents[a.ref].production)
    else:
        z=incre.search(a.commentary)
        if z and z.group() in s7incidents:
        a.SetPID(s7incidents[z.group()].dbid)
        a.SetProduction(s7incidents[z.group()].production)

    a.Process(dbcursor)

    print("Comitting changes")
    dbcursor.commit()
finally:
    print("Closing databases")
    dbcxn.close()
    s7cxn.close()

excel - 中間処理を伴う MS Excel から MS Access へのエクスポート

3 に答える 3

Related

Reference