1
0
mirror of https://github.com/kellyjonbrazil/jc.git synced 2025-07-13 01:20:24 +02:00

refine streaming parsers

This commit is contained in:
Kelly Brazil
2022-02-07 06:29:17 -08:00
parent d1e0ee6123
commit 2f3f78e8d3
16 changed files with 194 additions and 185 deletions

View File

@ -85,6 +85,9 @@ pydoc-markdown -m jc.lib "${toc_config}" > ../docs/lib.md
echo Building docs for: utils
pydoc-markdown -m jc.utils "${toc_config}" > ../docs/utils.md
echo Building docs for: streaming
pydoc-markdown -m jc.streaming "${toc_config}" > ../docs/streaming.md
echo Building docs for: universal parser
pydoc-markdown -m jc.parsers.universal "${toc_config}" > ../docs/parsers/universal.md

View File

@ -122,4 +122,4 @@ Returns:
### Parser Information
Compatibility: linux, darwin, freebsd
Version 1.1 by Kelly Brazil (kellyjonbrazil@gmail.com)
Version 1.0 by Kelly Brazil (kellyjonbrazil@gmail.com)

View File

@ -14,6 +14,8 @@ and file-types to dictionaries and lists of dictionaries.
>>> help('jc')
>>> help('jc.lib')
>>> help('jc.utils')
>>> help('jc.streaming')
>>> help('jc.parsers.universal')
>>> jc.get_help('parser_module_name')
## Online Documentation

114
docs/streaming.md Normal file
View File

@ -0,0 +1,114 @@
# Table of Contents
* [jc.streaming](#jc.streaming)
* [streaming\_input\_type\_check](#jc.streaming.streaming_input_type_check)
* [streaming\_line\_input\_type\_check](#jc.streaming.streaming_line_input_type_check)
* [stream\_success](#jc.streaming.stream_success)
* [stream\_error](#jc.streaming.stream_error)
* [add\_jc\_meta](#jc.streaming.add_jc_meta)
* [raise\_or\_yield](#jc.streaming.raise_or_yield)
<a id="jc.streaming"></a>
# jc.streaming
jc - JSON CLI output utility streaming utils
<a id="jc.streaming.streaming_input_type_check"></a>
### streaming\_input\_type\_check
```python
def streaming_input_type_check(data: Iterable) -> None
```
Ensure input data is an iterable, but not a string or bytes. Raises
`TypeError` if not.
<a id="jc.streaming.streaming_line_input_type_check"></a>
### streaming\_line\_input\_type\_check
```python
def streaming_line_input_type_check(line: str) -> None
```
Ensure each line is a string. Raises `TypeError` if not.
<a id="jc.streaming.stream_success"></a>
### stream\_success
```python
def stream_success(output_line: Dict, ignore_exceptions: bool) -> Dict
```
Add `_jc_meta` object to output line if `ignore_exceptions=True`
<a id="jc.streaming.stream_error"></a>
### stream\_error
```python
def stream_error(e: BaseException, line: str) -> Dict
```
Return an error `_jc_meta` field.
<a id="jc.streaming.add_jc_meta"></a>
### add\_jc\_meta
```python
def add_jc_meta(func)
```
Decorator for streaming parsers to add stream_success and stream_error
objects. This simplifies the yield lines in the streaming parsers.
With the decorator on parse():
# successfully parsed line:
yield output_line if raw else _process(output_line)
# unsuccessfully parsed line:
except Exception as e:
yield raise_or_yield(ignore_exceptions, e, line)
Without the decorator on parse():
# successfully parsed line:
if raw:
yield stream_success(output_line, ignore_exceptions)
else:
stream_success(_process(output_line), ignore_exceptions)
# unsuccessfully parsed line:
except Exception as e:
yield stream_error(raise_or_yield(ignore_exceptions, e, line))
In all cases above:
output_line: (Dict) successfully parsed line yielded as a dict
e: (BaseException) exception object as the first value
of the tuple if the line was not successfully parsed.
line: (str) string of the original line that did not
successfully parse.
ignore_exceptions: (bool) continue processing lines and ignore
exceptions if True.
<a id="jc.streaming.raise_or_yield"></a>
### raise\_or\_yield
```python
def raise_or_yield(ignore_exceptions: bool, e: BaseException, line: str) -> tuple
```
Return the exception object and line string if ignore_exceptions is
True. Otherwise, re-raise the exception from the exception object with
an annotation.

View File

@ -8,12 +8,7 @@
* [convert\_to\_int](#jc.utils.convert_to_int)
* [convert\_to\_float](#jc.utils.convert_to_float)
* [convert\_to\_bool](#jc.utils.convert_to_bool)
* [stream\_success](#jc.utils.stream_success)
* [stream\_error](#jc.utils.stream_error)
* [add\_jc\_meta](#jc.utils.add_jc_meta)
* [input\_type\_check](#jc.utils.input_type_check)
* [streaming\_input\_type\_check](#jc.utils.streaming_input_type_check)
* [streaming\_line\_input\_type\_check](#jc.utils.streaming_line_input_type_check)
* [timestamp](#jc.utils.timestamp)
* [\_\_init\_\_](#jc.utils.timestamp.__init__)
@ -166,73 +161,6 @@ Returns:
True/False False unless a 'truthy' number or string is found
('y', 'yes', 'true', '1', 1, -1, etc.)
<a id="jc.utils.stream_success"></a>
### stream\_success
```python
def stream_success(output_line: Dict, ignore_exceptions: bool) -> Dict
```
Add `_jc_meta` object to output line if `ignore_exceptions=True`
<a id="jc.utils.stream_error"></a>
### stream\_error
```python
def stream_error(e: BaseException, line: str) -> Dict
```
Return an error `_jc_meta` field.
<a id="jc.utils.add_jc_meta"></a>
### add\_jc\_meta
```python
def add_jc_meta(func)
```
Decorator for streaming parsers to add stream_success and stream_error
objects. This simplifies the yield lines in the streaming parsers.
With the decorator on parse():
# successfully parsed line:
yield output_line if raw else _process(output_line)
# unsuccessfully parsed line:
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
Without the decorator on parse():
# successfully parsed line:
yield stream_success(output_line, ignore_exceptions) if raw else stream_success(_process(output_line), ignore_exceptions)
# unsuccessfully parsed line:
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield stream_error(e, line)
In all cases above:
output_line: (Dict): successfully parsed line yielded as a dict
e: (BaseException): exception object as the first value
of the tuple if the line was not successfully parsed.
line: (str): string of the original line that did not
successfully parse.
<a id="jc.utils.input_type_check"></a>
### input\_type\_check
@ -243,27 +171,6 @@ def input_type_check(data: str) -> None
Ensure input data is a string. Raises `TypeError` if not.
<a id="jc.utils.streaming_input_type_check"></a>
### streaming\_input\_type\_check
```python
def streaming_input_type_check(data: Iterable) -> None
```
Ensure input data is an iterable, but not a string or bytes. Raises
`TypeError` if not.
<a id="jc.utils.streaming_line_input_type_check"></a>
### streaming\_line\_input\_type\_check
```python
def streaming_line_input_type_check(line: str) -> None
```
Ensure each line is a string. Raises `TypeError` if not.
<a id="jc.utils.timestamp"></a>
### timestamp Objects

View File

@ -10,6 +10,8 @@ and file-types to dictionaries and lists of dictionaries.
>>> help('jc')
>>> help('jc.lib')
>>> help('jc.utils')
>>> help('jc.streaming')
>>> help('jc.parsers.universal')
>>> jc.get_help('parser_module_name')
## Online Documentation

View File

@ -66,7 +66,7 @@ Examples:
import itertools
import csv
import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta
from jc.streaming import streaming_input_type_check, add_jc_meta, raise_or_yield
from jc.exceptions import ParseError
@ -124,7 +124,7 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object
"""
jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data)
streaming_input_type_check(data)
# convert data to an iterable in case a sequence like a list is used as input.
# this allows the exhaustion of the input so we don't double-process later.
@ -158,8 +158,4 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
try:
yield row if raw else _process(row)
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, str(row)
yield raise_or_yield(ignore_exceptions, e, str(row))

View File

@ -51,7 +51,9 @@ Examples:
"""
from typing import Dict, Iterable, Union
import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta
from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError
@ -119,12 +121,12 @@ def parse(
Iterator object
"""
jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data)
streaming_input_type_check(data)
for line in data:
try:
streaming_line_input_type_check(line)
output_line: Dict = {}
jc.utils.streaming_line_input_type_check(line)
# parse the content here
# check out helper functions in jc.utils
@ -136,8 +138,4 @@ def parse(
raise ParseError('Not foo data')
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
yield raise_or_yield(ignore_exceptions, e, line)

View File

@ -101,7 +101,9 @@ Examples:
...
"""
import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta
from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError
import jc.parsers.universal
@ -183,7 +185,7 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object
"""
jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data)
streaming_input_type_check(data)
section = '' # either 'cpu' or 'device'
headers = ''
@ -192,10 +194,9 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
for line in data:
try:
jc.utils.streaming_line_input_type_check(line)
streaming_line_input_type_check(line)
output_line = {}
# ignore blank lines and header line
if line == '\n' or line == '' or line.startswith('Linux'):
continue
@ -231,8 +232,4 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
raise ParseError('Not iostat data')
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
yield raise_or_yield(ignore_exceptions, e, line)

View File

@ -79,7 +79,9 @@ Examples:
"""
import re
import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta
from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError
@ -146,13 +148,13 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object
"""
jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data)
streaming_input_type_check(data)
parent = ''
for line in data:
try:
jc.utils.streaming_line_input_type_check(line)
streaming_line_input_type_check(line)
# skip line if it starts with 'total 1234'
if re.match(r'total [0-9]+', line):
@ -200,8 +202,4 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
yield output_line if raw else _process(output_line)
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
yield raise_or_yield(ignore_exceptions, e, line)

View File

@ -86,10 +86,10 @@ Examples:
import string
import ipaddress
import jc.utils
from jc.exceptions import ParseError
from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError
class info():
@ -500,8 +500,8 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
for line in data:
try:
output_line = {}
streaming_line_input_type_check(line)
output_line = {}
# skip blank lines
if line.strip() == '':

View File

@ -89,13 +89,13 @@ Examples:
import re
from typing import Dict, Iterable, Union
import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta
from jc.exceptions import ParseError
from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
class info():
"""Provides parser metadata (version, author, etc.)"""
version = '1.1'
version = '1.0'
description = '`rsync` command streaming parser'
author = 'Kelly Brazil'
author_email = 'kellyjonbrazil@gmail.com'
@ -168,7 +168,7 @@ def parse(
Iterator object
"""
jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data)
streaming_input_type_check(data)
summary: Dict = {}
process: str = ''
@ -272,7 +272,7 @@ def parse(
for line in data:
try:
jc.utils.streaming_line_input_type_check(line)
streaming_line_input_type_check(line)
output_line: Dict = {}
# ignore blank lines
@ -452,19 +452,12 @@ def parse(
continue
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
yield raise_or_yield(ignore_exceptions, e, line)
# gather final item
try:
if summary:
yield summary if raw else _process(summary)
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
yield raise_or_yield(ignore_exceptions, e, '')

View File

@ -83,7 +83,9 @@ Examples:
"""
import shlex
import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta
from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError
@ -154,14 +156,14 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object
"""
jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data)
streaming_input_type_check(data)
output_line = {}
os_type = ''
for line in data:
try:
jc.utils.streaming_line_input_type_check(line)
streaming_line_input_type_check(line)
line = line.rstrip()
# ignore blank lines
@ -287,20 +289,12 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
output_line = {}
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield raise_or_yield(ignore_exceptions, e, line)
yield e, line
try:
# gather final item
try:
if output_line:
yield output_line if raw else _process(output_line)
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
yield raise_or_yield(ignore_exceptions, e, '')

View File

@ -101,7 +101,9 @@ Examples:
...
"""
import jc.utils
from jc.utils import ignore_exceptions_msg, add_jc_meta
from jc.streaming import (
add_jc_meta, streaming_input_type_check, streaming_line_input_type_check, raise_or_yield
)
from jc.exceptions import ParseError
@ -173,7 +175,7 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
Iterator object
"""
jc.utils.compatibility(__name__, info.compatible, quiet)
jc.utils.streaming_input_type_check(data)
streaming_input_type_check(data)
procs = None
buff_cache = None
@ -182,9 +184,9 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
tz = None
for line in data:
output_line = {}
try:
jc.utils.streaming_line_input_type_check(line)
streaming_line_input_type_check(line)
output_line = {}
# skip blank lines
if line.strip() == '':
@ -272,8 +274,4 @@ def parse(data, raw=False, quiet=False, ignore_exceptions=False):
raise ParseError('Not vmstat data')
except Exception as e:
if not ignore_exceptions:
e.args = (str(e) + ignore_exceptions_msg,)
raise e
yield e, line
yield raise_or_yield(ignore_exceptions, e, line)

View File

@ -4,6 +4,21 @@ from functools import wraps
from typing import Dict, Iterable
def streaming_input_type_check(data: Iterable) -> None:
"""
Ensure input data is an iterable, but not a string or bytes. Raises
`TypeError` if not.
"""
if not hasattr(data, '__iter__') or isinstance(data, (str, bytes)):
raise TypeError("Input data must be a non-string iterable object.")
def streaming_line_input_type_check(line: str) -> None:
"""Ensure each line is a string. Raises `TypeError` if not."""
if not isinstance(line, str):
raise TypeError("Input line must be a 'str' object.")
def stream_success(output_line: Dict, ignore_exceptions: bool) -> Dict:
"""Add `_jc_meta` object to output line if `ignore_exceptions=True`"""
if ignore_exceptions:
@ -43,8 +58,10 @@ def add_jc_meta(func):
Without the decorator on parse():
# successfully parsed line:
yield stream_success(output_line, ignore_exceptions) if raw \\
else stream_success(_process(output_line), ignore_exceptions)
if raw:
yield stream_success(output_line, ignore_exceptions)
else:
stream_success(_process(output_line), ignore_exceptions)
# unsuccessfully parsed line:
except Exception as e:
@ -82,26 +99,16 @@ def add_jc_meta(func):
return wrapper
def streaming_input_type_check(data: Iterable) -> None:
"""
Ensure input data is an iterable, but not a string or bytes. Raises
`TypeError` if not.
"""
if not hasattr(data, '__iter__') or isinstance(data, (str, bytes)):
raise TypeError("Input data must be a non-string iterable object.")
def streaming_line_input_type_check(line: str) -> None:
"""Ensure each line is a string. Raises `TypeError` if not."""
if not isinstance(line, str):
raise TypeError("Input line must be a 'str' object.")
def raise_or_yield(
ignore_exceptions: bool,
e: BaseException,
line: str
) -> tuple:
"""
Return the exception object and line string if ignore_exceptions is
True. Otherwise, re-raise the exception from the exception object with
an annotation.
"""
ignore_exceptions_msg = '... Use the ignore_exceptions option (-qq) to ignore streaming parser errors.'
if not ignore_exceptions:

View File

@ -1,4 +1,4 @@
.TH jc 1 2022-02-04 1.18.3 "JSON CLI output utility"
.TH jc 1 2022-02-07 1.18.3 "JSON CLI output utility"
.SH NAME
jc \- JSONifies the output of many CLI tools and file-types
.SH SYNOPSIS