1
0
mirror of https://github.com/kellyjonbrazil/jc.git synced 2025-07-15 01:24:29 +02:00

refine streaming parsers

This commit is contained in:
Kelly Brazil
2022-02-07 06:29:17 -08:00
parent d1e0ee6123
commit 2f3f78e8d3
16 changed files with 194 additions and 185 deletions

View File

@ -85,6 +85,9 @@ pydoc-markdown -m jc.lib "${toc_config}" > ../docs/lib.md
echo Building docs for: utils echo Building docs for: utils
pydoc-markdown -m jc.utils "${toc_config}" > ../docs/utils.md pydoc-markdown -m jc.utils "${toc_config}" > ../docs/utils.md
echo Building docs for: streaming
pydoc-markdown -m jc.streaming "${toc_config}" > ../docs/streaming.md
echo Building docs for: universal parser echo Building docs for: universal parser
pydoc-markdown -m jc.parsers.universal "${toc_config}" > ../docs/parsers/universal.md pydoc-markdown -m jc.parsers.universal "${toc_config}" > ../docs/parsers/universal.md

View File

@ -122,4 +122,4 @@ Returns:
### Parser Information ### Parser Information
Compatibility: linux, darwin, freebsd Compatibility: linux, darwin, freebsd
Version 1.1 by Kelly Brazil (kellyjonbrazil@gmail.com) Version 1.0 by Kelly Brazil (kellyjonbrazil@gmail.com)

View File

@ -14,6 +14,8 @@ and file-types to dictionaries and lists of dictionaries.
>>> help('jc') >>> help('jc')
>>> help('jc.lib') >>> help('jc.lib')
>>> help('jc.utils') >>> help('jc.utils')
>>> help('jc.streaming')
>>> help('jc.parsers.universal')
>>> jc.get_help('parser_module_name') >>> jc.get_help('parser_module_name')
## Online Documentation ## Online Documentation

114
docs/streaming.md Normal file
View File

@ -0,0 +1,114 @@
# Table of Contents
* [jc.streaming](#jc.streaming)
* [streaming\_input\_type\_check](#jc.streaming.streaming_input_type_check)
* [streaming\_line\_input\_type\_check](#jc.streaming.streaming_line_input_type_check)
* [stream\_success](#jc.streaming.stream_success)
* [stream\_error](#jc.streaming.stream_error)
* [add\_jc\_meta](#jc.streaming.add_jc_meta)
* [raise\_or\_yield](#jc.streaming.raise_or_yield)
<a id="jc.streaming"></a>
# jc.streaming
jc - JSON CLI output utility streaming utils
<a id="jc.streaming.streaming_input_type_check"></a>
### streaming\_input\_type\_check
```python
def streaming_input_type_check(data: Iterable) -> None
```
Ensure input data is an iterable, but not a string or bytes. Raises
`TypeError` if not.
<a id="jc.streaming.streaming_line_input_type_check"></a>
### streaming\_line\_input\_type\_check
```python
def streaming_line_input_type_check(line: str) -> None
```
Ensure each line is a string. Raises `TypeError` if not.
<a id="jc.streaming.stream_success"></a>
### stream\_success
```python
def stream_success(output_line: Dict, ignore_exceptions: bool) -> Dict
```
Add `_jc_meta` object to output line if `ignore_exceptions=True`
<a id="jc.streaming.stream_error"></a>
### stream\_error
```python
def stream_error(e: BaseException, line: str) -> Dict
```
Return an error `_jc_meta` field.
<a id="jc.streaming.add_jc_meta"></a>
### add\_jc\_meta
```python
def add_jc_meta(func)
```
Decorator for streaming parsers to add stream_success and stream_error
objects. This simplifies the yield lines in the streaming parsers.
With the decorator on parse():
# successfully parsed line:
yield output_line if raw else _process(output_line)
# unsuccessfully parsed line:
except Exception as e:
yield raise_or_yield(ignore_exceptions, e, line)
Without the decorator on parse():
# successfully parsed line:
if raw:
yield stream_success(output_line, ignore_exceptions)
else:
stream_success(_process(output_line), ignore_exceptions)
# unsuccessfully parsed line:
except Exception as e:
yield stream_error(raise_or_yield(ignore_exceptions, e, line))
In all cases above:
output_line: (Dict) successfully parsed line yielded as a dict
e: (BaseException) exception object as the first value
of the tuple if the line was not successfully parsed.
line: (str) string of the original line that did not
successfully parse.
ignore_exceptions: (bool) continue processing lines and ignore
exceptions if True.
<a id="jc.streaming.raise_or_yield"></a>
### raise\_or\_yield
```python
def raise_or_yield(ignore_exceptions: bool, e: BaseException, line: str) -> tuple
```
Return the exception object and line string if ignore_exceptions is
True. Otherwise, re-raise the exception from the exception object with
an annotation.

View File

@ -8,12 +8,7 @@
* [convert\_to\_int](#jc.utils.convert_to_int) * [convert\_to\_int](#jc.utils.convert_to_int)
* [convert\_to\_float](#jc.utils.convert_to_float) * [convert\_to\_float](#jc.utils.convert_to_float)
* [convert\_to\_bool](#jc.utils.convert_to_bool) * [convert\_to\_bool](#jc.utils.convert_to_bool)
* [stream\_success](#jc.utils.stream_success)
* [stream\_error](#jc.utils.stream_error)
* [add\_jc\_meta](#jc.utils.add_jc_meta)
* [input\_type\_check](#jc.utils.input_type_check) * [input\_type\_check](#jc.utils.input_type_check)
* [streaming\_input\_type\_check](#jc.utils.streaming_input_type_check)
* [streaming\_line\_input\_type\_check](#jc.utils.streaming_line_input_type_check)
* [timestamp](#jc.utils.timestamp) * [timestamp](#jc.utils.timestamp)
* [\_\_init\_\_](#jc.utils.timestamp.__init__) * [\_\_init\_\_](#jc.utils.timestamp.__init__)
@ -166,73 +161,6 @@ Returns:
True/False False unless a 'truthy' number or string is found True/False False unless a 'truthy' number or string is found
('y', 'yes', 'true', '1', 1, -1, etc.) ('y', 'yes', 'true', '1', 1, -1, etc.)
<a id="jc.utils.stream_success"></a>
### stream\_success
```python
def stream_success(output_line: Dict, ignore_exceptions: bool) -> Dict
```
Add `_jc_meta` object to output line if `ignore_exceptions=True`
<a id="jc.utils.stream_error"></a>
### stream\_error
```python
def stream_error(e: BaseException, line: str) -> Dict
```
Return an error `_jc_meta` field.
<a id="jc.utils.add_jc_meta"></a>
### add\_jc\_meta
```python
def add_jc_meta(func)
```
Decorator for streaming parsers to add stream_success and stream_error
objects. This simplifies the yield lines in the streaming parsers.
With the decorator on parse():
# successfully parsed line:
yield output_line if raw else _process(output_line)
# unsuccessfully parsed line:
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
Without the decorator on parse():
# successfully parsed line:
yield stream_success(output_line, ignore_exceptions) if raw else stream_success(_process(output_line), ignore_exceptions)
# unsuccessfully parsed line:
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield stream_error(e, line)
In all cases above:
output_line: (Dict): successfully parsed line yielded as a dict
e: (BaseException): exception object as the first value
of the tuple if the line was not successfully parsed.
line: (str): string of the original line that did not
successfully parse.
<a id="jc.utils.input_type_check"></a> <a id="jc.utils.input_type_check"></a>
### input\_type\_check ### input\_type\_check
@ -243,27 +171,6 @@ def input_type_check(data: str) -> None
Ensure input data is a string. Raises `TypeError` if not. Ensure input data is a string. Raises `TypeError` if not.
<a id="jc.utils.streaming_input_type_check"></a>
### streaming\_input\_type\_check
```python
def streaming_input_type_check(data: Iterable) -> None
```
Ensure input data is an iterable, but not a string or bytes. Raises
`TypeError` if not.
<a id="jc.utils.streaming_line_input_type_check"></a>
### streaming\_line\_input\_type\_check
```python
def streaming_line_input_type_check(line: str) -> None
```
Ensure each line is a string. Raises `TypeError` if not.
<a id="jc.utils.timestamp"></a> <a id="jc.utils.timestamp"></a>
### timestamp Objects ### timestamp Objects

View File

@ -10,6 +10,8 @@ and file-types to dictionaries and lists of dictionaries.
>>> help('jc') >>> help('jc')
>>> help('jc.lib') >>> help('jc.lib')
>>> help('jc.utils') >>> help('jc.utils')
>>> help('jc.streaming')
>>> help('jc.parsers.universal')
>>> jc.get_help('parser_module_name') >>> jc.get_help('parser_module_name')
## Online Documentation ## Online Documentation

View File

@ -66,7 +66,7 @@ Examples:
import itertools import itertools
import csv import csv
import jc.utils import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta from jc.streaming import streaming_input_type_check, add_jc_meta, raise_or_yield
from jc.exceptions import ParseError from jc.exceptions import ParseError
@ -124,7 +124,7 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object Iterator object
""" """
jc.utils.compatibility(__name__, info.compatible, quiet) jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data) streaming_input_type_check(data)
# convert data to an iterable in case a sequence like a list is used as input. # convert data to an iterable in case a sequence like a list is used as input.
# this allows the exhaustion of the input so we don't double-process later. # this allows the exhaustion of the input so we don't double-process later.
@ -158,8 +158,4 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
try: try:
yield row if raw else _process(row) yield row if raw else _process(row)
except Exception as e: except Exception as e:
if not ignore_exceptions: yield raise_or_yield(ignore_exceptions, e, str(row))
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, str(row)

View File

@ -51,7 +51,9 @@ Examples:
""" """
from typing import Dict, Iterable, Union from typing import Dict, Iterable, Union
import jc.utils import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError from jc.exceptions import ParseError
@ -119,12 +121,12 @@ def parse(
Iterator object Iterator object
""" """
jc.utils.compatibility(__name__, info.compatible, quiet) jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data) streaming_input_type_check(data)
for line in data: for line in data:
try: try:
streaming_line_input_type_check(line)
output_line: Dict = {} output_line: Dict = {}
jc.utils.streaming_line_input_type_check(line)
# parse the content here # parse the content here
# check out helper functions in jc.utils # check out helper functions in jc.utils
@ -136,8 +138,4 @@ def parse(
raise ParseError('Not foo data') raise ParseError('Not foo data')
except Exception as e: except Exception as e:
if not ignore_exceptions: yield raise_or_yield(ignore_exceptions, e, line)
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line

View File

@ -101,7 +101,9 @@ Examples:
... ...
""" """
import jc.utils import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError from jc.exceptions import ParseError
import jc.parsers.universal import jc.parsers.universal
@ -183,7 +185,7 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object Iterator object
""" """
jc.utils.compatibility(__name__, info.compatible, quiet) jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data) streaming_input_type_check(data)
section = '' # either 'cpu' or 'device' section = '' # either 'cpu' or 'device'
headers = '' headers = ''
@ -192,10 +194,9 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
for line in data: for line in data:
try: try:
jc.utils.streaming_line_input_type_check(line) streaming_line_input_type_check(line)
output_line = {} output_line = {}
# ignore blank lines and header line # ignore blank lines and header line
if line == '\n' or line == '' or line.startswith('Linux'): if line == '\n' or line == '' or line.startswith('Linux'):
continue continue
@ -231,8 +232,4 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
raise ParseError('Not iostat data') raise ParseError('Not iostat data')
except Exception as e: except Exception as e:
if not ignore_exceptions: yield raise_or_yield(ignore_exceptions, e, line)
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line

View File

@ -79,7 +79,9 @@ Examples:
""" """
import re import re
import jc.utils import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError from jc.exceptions import ParseError
@ -146,13 +148,13 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object Iterator object
""" """
jc.utils.compatibility(__name__, info.compatible, quiet) jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data) streaming_input_type_check(data)
parent = '' parent = ''
for line in data: for line in data:
try: try:
jc.utils.streaming_line_input_type_check(line) streaming_line_input_type_check(line)
# skip line if it starts with 'total 1234' # skip line if it starts with 'total 1234'
if re.match(r'total [0-9]+', line): if re.match(r'total [0-9]+', line):
@ -200,8 +202,4 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
yield output_line if raw else _process(output_line) yield output_line if raw else _process(output_line)
except Exception as e: except Exception as e:
if not ignore_exceptions: yield raise_or_yield(ignore_exceptions, e, line)
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line

View File

@ -86,10 +86,10 @@ Examples:
import string import string
import ipaddress import ipaddress
import jc.utils import jc.utils
from jc.exceptions import ParseError
from jc.streaming import ( from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
) )
from jc.exceptions import ParseError
class info(): class info():
@ -500,8 +500,8 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
for line in data: for line in data:
try: try:
output_line = {}
streaming_line_input_type_check(line) streaming_line_input_type_check(line)
output_line = {}
# skip blank lines # skip blank lines
if line.strip() == '': if line.strip() == '':

View File

@ -89,13 +89,13 @@ Examples:
import re import re
from typing import Dict, Iterable, Union from typing import Dict, Iterable, Union
import jc.utils import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta from jc.streaming import (
from jc.exceptions import ParseError add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
class info(): class info():
"""Provides parser metadata (version, author, etc.)""" """Provides parser metadata (version, author, etc.)"""
version = '1.1' version = '1.0'
description = '`rsync` command streaming parser' description = '`rsync` command streaming parser'
author = 'Kelly Brazil' author = 'Kelly Brazil'
author_email = 'kellyjonbrazil@gmail.com' author_email = 'kellyjonbrazil@gmail.com'
@ -168,7 +168,7 @@ def parse(
Iterator object Iterator object
""" """
jc.utils.compatibility(__name__, info.compatible, quiet) jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data) streaming_input_type_check(data)
summary: Dict = {} summary: Dict = {}
process: str = '' process: str = ''
@ -272,7 +272,7 @@ def parse(
for line in data: for line in data:
try: try:
jc.utils.streaming_line_input_type_check(line) streaming_line_input_type_check(line)
output_line: Dict = {} output_line: Dict = {}
# ignore blank lines # ignore blank lines
@ -452,19 +452,12 @@ def parse(
continue continue
except Exception as e: except Exception as e:
if not ignore_exceptions: yield raise_or_yield(ignore_exceptions, e, line)
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
# gather final item
try: try:
if summary: if summary:
yield summary if raw else _process(summary) yield summary if raw else _process(summary)
except Exception as e: except Exception as e:
if not ignore_exceptions: yield raise_or_yield(ignore_exceptions, e, '')
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line

View File

@ -83,7 +83,9 @@ Examples:
""" """
import shlex import shlex
import jc.utils import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError from jc.exceptions import ParseError
@ -154,14 +156,14 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object Iterator object
""" """
jc.utils.compatibility(__name__, info.compatible, quiet) jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data) streaming_input_type_check(data)
output_line = {} output_line = {}
os_type = '' os_type = ''
for line in data: for line in data:
try: try:
jc.utils.streaming_line_input_type_check(line) streaming_line_input_type_check(line)
line = line.rstrip() line = line.rstrip()
# ignore blank lines # ignore blank lines
@ -287,20 +289,12 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
output_line = {} output_line = {}
except Exception as e: except Exception as e:
if not ignore_exceptions: yield raise_or_yield(ignore_exceptions, e, line)
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
try:
# gather final item # gather final item
try:
if output_line: if output_line:
yield output_line if raw else _process(output_line) yield output_line if raw else _process(output_line)
except Exception as e: except Exception as e:
if not ignore_exceptions: yield raise_or_yield(ignore_exceptions, e, '')
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line

View File

@ -101,7 +101,9 @@ Examples:
... ...
""" """
import jc.utils import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError from jc.exceptions import ParseError
@ -173,7 +175,7 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object Iterator object
""" """
jc.utils.compatibility(__name__, info.compatible, quiet) jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data) streaming_input_type_check(data)
procs = None procs = None
buff_cache = None buff_cache = None
@ -182,9 +184,9 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
tz = None tz = None
for line in data: for line in data:
output_line = {}
try: try:
jc.utils.streaming_line_input_type_check(line) streaming_line_input_type_check(line)
output_line = {}
# skip blank lines # skip blank lines
if line.strip() == '': if line.strip() == '':
@ -272,8 +274,4 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
raise ParseError('Not vmstat data') raise ParseError('Not vmstat data')
except Exception as e: except Exception as e:
if not ignore_exceptions: yield raise_or_yield(ignore_exceptions, e, line)
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line

View File

@ -4,6 +4,21 @@ from functools import wraps
from typing import Dict, Iterable from typing import Dict, Iterable
def streaming_input_type_check(data: Iterable) -> None:
"""
Ensure input data is an iterable, but not a string or bytes. Raises
`TypeError` if not.
"""
if not hasattr(data, '__iter__') or isinstance(data, (str, bytes)):
raise TypeError("Input data must be a non-string iterable object.")
def streaming_line_input_type_check(line: str) -> None:
"""Ensure each line is a string. Raises `TypeError` if not."""
if not isinstance(line, str):
raise TypeError("Input line must be a 'str' object.")
def stream_success(output_line: Dict, ignore_exceptions: bool) -> Dict: def stream_success(output_line: Dict, ignore_exceptions: bool) -> Dict:
"""Add `_jc_meta` object to output line if `ignore_exceptions=True`""" """Add `_jc_meta` object to output line if `ignore_exceptions=True`"""
if ignore_exceptions: if ignore_exceptions:
@ -43,8 +58,10 @@ def add_jc_meta(func):
Without the decorator on parse(): Without the decorator on parse():
# successfully parsed line: # successfully parsed line:
yield stream_success(output_line, ignore_exceptions) if raw \\ if raw:
else stream_success(_process(output_line), ignore_exceptions) yield stream_success(output_line, ignore_exceptions)
else:
stream_success(_process(output_line), ignore_exceptions)
# unsuccessfully parsed line: # unsuccessfully parsed line:
except Exception as e: except Exception as e:
@ -82,26 +99,16 @@ def add_jc_meta(func):
return wrapper return wrapper
def streaming_input_type_check(data: Iterable) -> None:
"""
Ensure input data is an iterable, but not a string or bytes. Raises
`TypeError` if not.
"""
if not hasattr(data, '__iter__') or isinstance(data, (str, bytes)):
raise TypeError("Input data must be a non-string iterable object.")
def streaming_line_input_type_check(line: str) -> None:
"""Ensure each line is a string. Raises `TypeError` if not."""
if not isinstance(line, str):
raise TypeError("Input line must be a 'str' object.")
def raise_or_yield( def raise_or_yield(
ignore_exceptions: bool, ignore_exceptions: bool,
e: BaseException, e: BaseException,
line: str line: str
) -> tuple: ) -> tuple:
"""
Return the exception object and line string if ignore_exceptions is
True. Otherwise, re-raise the exception from the exception object with
an annotation.
"""
ignore_exceptions_msg = '... Use the ignore_exceptions option (-qq) to ignore streaming parser errors.' ignore_exceptions_msg = '... Use the ignore_exceptions option (-qq) to ignore streaming parser errors.'
if not ignore_exceptions: if not ignore_exceptions:

View File

@ -1,4 +1,4 @@
.TH jc 1 2022-02-04 1.18.3 "JSON CLI output utility" .TH jc 1 2022-02-07 1.18.3 "JSON CLI output utility"
.SH NAME .SH NAME
jc \- JSONifies the output of many CLI tools and file-types jc \- JSONifies the output of many CLI tools and file-types
.SH SYNOPSIS .SH SYNOPSIS