Formatted input in Python -
i have file has following:
b c d 1 2 3 4 5 2 2 4 3 1 3 4
note 4 on line 2 followed new line.
i want make dictionary looks this
['a']['1'] = 2, d['b']['1'] = 3, ..., d['d']['1'] = 5, d['b']['2'] = 2, etc
the blanks should not appear in dictionary.
what's best way in python?
the data single digits right? lines column headers? in case, can this:
it = iter(datafile) cols = list(next(it)[2::2]) d = {} row in it: col, val in zip(cols, row[2::2]): if val != ' ': d.setdefault(col, {})[row[0]] = int(val)
based on author's data , code added, above code isn't enough. if format of document 31 pairs of data 12 months in groups of 6, handle in many ways. wrote. it's not elegant, not efficient can be, get's job done. 1 of reasons why index row first, column.
def process(data): import re hre = re.compile(r' +([a-z]+)'*6) sre = re.compile(r' +([a-z]+) ([a-z]+)'*6) dre = re.compile(r'(\d{1,2}) ' + r'(.{4}) (.{4}) {,4}'*6) = iter(data) headers = none result = {} line in it: if not line: continue if not headers: # find first header hmatch = hre.match(line) if hmatch: subs = iter(sre.match(next(it)).groups()) headers = [h + next(subs) h in hmatch.groups() _ in range(2)] count = 0 else: # fill in data dmatch = dre.match(line) row = dmatch.group(1) col, d in zip(headers, dmatch.groups()[1:]): if d.strip(): result.setdefault(col, {})[row] = int(d) count += 1 if count == 31: headers = none return result
data = """ times of sunrise , sunset (for ideal horizon & meteorological conditions) year 2012 make corrections daylight saving time necessary. ------------------------------------------------------------------------------ jan feb mar apr may jun rise set rise set rise set rise set rise set rise set 1 0513 1925 0541 1918 0606 1851 0628 1812 0648 1738 0708 1720 2 0514 1925 0541 1918 0606 1850 0628 1811 0649 1737 0709 1719 3 0515 1925 0542 1917 0607 1849 0629 1810 0649 1736 0709 1719 4 0515 1926 0543 1916 0608 1847 0630 1808 0650 1736 0710 1719 5 0516 1926 0544 1915 0609 1846 0630 1807 0651 1735 0710 1719 6 0517 1926 0545 1915 0609 1845 0631 1806 0651 1734 0711 1719 7 0518 1926 0546 1914 0610 1844 0632 1805 0652 1733 0711 1719 8 0519 1926 0547 1913 0611 1843 0632 1803 0653 1732 0712 1719 9 0519 1926 0548 1912 0612 1841 0633 1802 0653 1731 0712 1718 10 0520 1926 0549 1911 0612 1840 0634 1801 0654 1731 0712 1718 11 0521 1926 0550 1911 0613 1839 0634 1800 0655 1730 0713 1718 12 0522 1926 0551 1910 0614 1838 0635 1759 0655 1729 0713 1718 13 0523 1926 0551 1909 0615 1836 0636 1757 0656 1729 0714 1719 14 0524 1926 0552 1908 0615 1835 0636 1756 0657 1728 0714 1719 15 0525 1925 0553 1907 0616 1834 0637 1755 0657 1727 0714 1719 16 0526 1925 0554 1906 0617 1832 0638 1754 0658 1727 0715 1719 17 0527 1925 0555 1905 0617 1831 0638 1753 0659 1726 0715 1719 18 0527 1925 0556 1904 0618 1830 0639 1752 0659 1725 0715 1719 19 0528 1924 0557 1903 0619 1829 0640 1751 0700 1725 0716 1719 20 0529 1924 0558 1902 0619 1827 0640 1749 0701 1724 0716 1719 21 0530 1924 0558 1901 0620 1826 0641 1748 0701 1724 0716 1720 22 0531 1923 0559 1900 0621 1825 0642 1747 0702 1723 0716 1720 23 0532 1923 0600 1859 0621 1824 0642 1746 0703 1723 0716 1720 24 0533 1923 0601 1858 0622 1822 0643 1745 0703 1722 0717 1720 25 0534 1922 0602 1857 0623 1821 0644 1744 0704 1722 0717 1721 26 0535 1922 0602 1855 0624 1820 0644 1743 0705 1722 0717 1721 27 0536 1921 0603 1854 0624 1818 0645 1742 0705 1721 0717 1721 28 0537 1921 0604 1853 0625 1817 0646 1741 0706 1721 0717 1722 29 0538 1920 0605 1852 0626 1816 0646 1740 0706 1720 0717 1722 30 0539 1920 0626 1815 0647 1739 0707 1720 0717 1722 31 0540 1919 0627 1813 0707 1720 jul aug sep oct nov dec rise set rise set rise set rise set rise set rise set 1 0717 1723 0705 1740 0632 1759 0553 1818 0518 1841 0503 1907 2 0717 1723 0704 1741 0631 1800 0552 1819 0517 1842 0503 1908 3 0717 1724 0703 1741 0630 1801 0551 1819 0517 1843 0503 1909 4 0717 1724 0702 1742 0629 1801 0550 1820 0516 1843 0503 1910 5 0717 1724 0701 1743 0627 1802 0548 1821 0515 1844 0503 1911 6 0717 1725 0700 1743 0626 1802 0547 1821 0514 1845 0503 1911 7 0716 1725 0700 1744 0625 1803 0546 1822 0513 1846 0503 1912 8 0716 1726 0659 1745 0624 1804 0545 1823 0513 1847 0503 1913 9 0716 1726 0658 1745 0622 1804 0543 1823 0512 1848 0503 1914 10 0716 1727 0657 1746 0621 1805 0542 1824 0511 1849 0503 1914 11 0716 1727 0656 1746 0620 1805 0541 1825 0511 1850 0503 1915 12 0715 1728 0655 1747 0618 1806 0540 1825 0510 1850 0504 1916 13 0715 1729 0654 1748 0617 1807 0538 1826 0509 1851 0504 1916 14 0715 1729 0653 1748 0616 1807 0537 1827 0509 1852 0504 1917 15 0714 1730 0652 1749 0614 1808 0536 1827 0508 1853 0505 1918 16 0714 1730 0651 1750 0613 1809 0535 1828 0508 1854 0505 1918 17 0713 1731 0650 1750 0612 1809 0534 1829 0507 1855 0505 1919 18 0713 1731 0649 1751 0610 1810 0533 1830 0507 1856 0506 1920 19 0713 1732 0648 1751 0609 1810 0531 1830 0506 1857 0506 1920 20 0712 1733 0647 1752 0608 1811 0530 1831 0506 1858 0507 1921 21 0712 1733 0645 1753 0607 1812 0529 1832 0505 1859 0507 1921 22 0711 1734 0644 1753 0605 1812 0528 1833 0505 1859 0508 1922 23 0711 1734 0643 1754 0604 1813 0527 1834 0505 1900 0508 1922 24 0710 1735 0642 1755 0603 1813 0526 1834 0504 1901 0509 1923 25 0709 1736 0641 1755 0601 1814 0525 1835 0504 1902 0509 1923 26 0709 1736 0640 1756 0600 1815 0524 1836 0504 1903 0510 1923 27 0708 1737 0638 1756 0559 1815 0523 1837 0503 1904 0510 1924 28 0707 1738 0637 1757 0557 1816 0522 1838 0503 1905 0511 1924 29 0707 1738 0636 1758 0556 1817 0521 1838 0503 1906 0512 1924 30 0706 1739 0635 1758 0555 1817 0520 1839 0503 1906 0512 1925 31 0705 1739 0634 1759 0519 1840 0513 1925 """.split('\n')
>>> d = process(data) >>> d['decrise']['8'] 503 >>> d {'augset': {'24': 1755, '25': 1755, '26': 1756, '27': 1756, '20': 1752...
Comments
Post a Comment