Zum Inhalt

dsstools

dsstools

dssTools

dssTools is a Python wrapper around NetworkX to ease usage for network analysis for our case of Digital Social Sciences.

NOT_IMPLEMENTED = NotImplementedError('This feature is yet to be implemented.') module-attribute

NxElementView = TypeVar('NxElementView', NodeDataView, EdgeDataView, OutEdgeDataView, InEdgeDataView, Iterator[tuple]) module-attribute

logger = get_logger('WDCAPI') module-attribute

Category

Bases: LowercaseStrEnum

Enum providing for categorizing internal codes.

The following values are accepted: TEXT: Used for terms in text. This code is representative of a term found in a node text. MANUAL: Used for manually created codes in dssCode (or other sources).

MANUAL = auto() class-attribute instance-attribute

TEXT = auto() class-attribute instance-attribute

Code

Bases: NamedTuple

Helper tuple class for passing complex arguments as node attributes.

category: Category instance-attribute

term: str instance-attribute

__str__()

to_str()

Convert the code to string.

Description()

Class containing description drawing preferences.

alpha = 0.5 instance-attribute

size = 8 instance-attribute

text = '' instance-attribute

set_text(text)

Set the description setting.

Parameters:

Name Type Description Default
text str

Text to set as description

required

Edges(labels=None)

Bases: GraphElement

alphas = fixed(1) instance-attribute

arrow_size = 2 instance-attribute

colors = fixed('lightgrey') instance-attribute

labels = labels instance-attribute

sizes = fixed(0.5) instance-attribute

set_alphas(arg)

Set alpha based on argument.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

set_colors(arg)

Set the colors of the displayed nodes.

Parameters:

Name Type Description Default
arg GenericMapping | str

Colors to set. String values will be mapped onto all nodes.

required

Returns:

Type Description

self

set_sizes(arg)

Set size of labels for nodes as pt sizes.

Parameters:

Name Type Description Default
arg GenericMapping | int | float

Font size for text labels

required

Returns: self

set_transparency(arg)

Set transparency based on argument.

This is the same as set_alphas(). Alpha is the value for the transparency of a color.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

EgoNetwork(ego)

Bases: Supplier

Class for matching graphelements to their role in an ego-network and supplying EgoMapping.

Parameters:

Name Type Description Default
ego str | int

name of the node at the center of the ego-network

required

ego = ego instance-attribute

fallback = None instance-attribute

ElementAttribute(keyword)

Bases: Supplier

Class for graph element values (already set in the graph.

Parameters:

Name Type Description Default
keyword str

the key of the inherent attribute

required

fallback = None instance-attribute

keyword: str = keyword class-attribute instance-attribute

__repr__()

__str__()

ForceAtlas2Layouter

Bases: Layouter

Create layouts using ForceAtlas 2 as backend.

Note: This layouter engine is quite peculiar to create good results with. Please ensure you at least read the corresponding entry in the NetworkX documentation: https://networkx.org/documentation/stable/reference/generated/networkx.drawing.layout.forceatlas2_layout.html

__str__()

create_layout(graph, seed=None, pos=None, **kwargs)

name() staticmethod

read_from_file(filename, **kwargs)

Reads position from JSON file under filepath.

The following structure for the JSON is expected, where each key contains an array of length 2 containing the coordinates. Coordinates should be in the range [-1,1]:

{
    "domain1": [-0.1467271130230262, 0.25512246449304427],
    "domain2": [-0.3683594304205127, 0.34942480334119136],
}

This structure is generated through dsstools.Layouter().write_to_file().

Parameters:

Name Type Description Default
filename Union[Path]

Path to file to be read.

required
**kwargs
{}

Returns:

Type Description
dict

Dictionary of nodes and positions.

read_from_graph(graph, pos_name=('x', 'y'))

Read positions from node attributes in the graph.

This is relevant when importing from Pajek or GEXF files where the positions are already set with another tool. Imported values are normalized onto [-1,1] in all directions.

Parameters:

Name Type Description Default
graph Graph

Graph object including the node attributes.

required
pos_name tuple

Node attribute names to look for. These depend on the imported file format.

('x', 'y')

Returns:

Type Description
dict

Dictionary of positions per Node.

read_or_create_layout(filepath, graph, seed=None, overwrite=False, **kwargs)

Read positions from file. If non-existant create pos and write to file.

Parameters:

Name Type Description Default
filepath Union[str, Path]

Filename to read positions from

required
graph Graph

Graph object to update

required
seed Optional[int]

Seed to use for the layout.

None
overwrite bool

Overwrite existing file (default False)

False

Returns:

Type Description
dict

Dictionary of positions per Node. Will return an empty dict if creation

dict

failed.

write_to_file(positions, path)

GenericMapping()

Bases: ABC

Generic Interface for mapping visual attributes to graph elements values.

fallback = None instance-attribute

supplier = None instance-attribute

__repr__()

get(graph_element, graph) abstractmethod

Parameters:

Name Type Description Default
graph_element NxElementView

NxElementView

required
graph Graph

nx.Graph

required

Returns: A dictionary with graph_element as the keys and the value as visual mapping.

GraphDescriptor(graph, include_defaults=True, round_floats_to=4, max_level=None) dataclass

This class provides a dataframe (~table) view of the given graph.

Every metric you add is its own column and every node its own row. It allows you to add custom metrics for more detailed analysis and save the dataframe as either csv or xlsx document.

The naming hierarchy is as follows
  • if activated, default metrics are always set first
  • if a custom metric is equal to a default metric, the values will be replaced
  • if a node attribute name is equal to regular or custom metric in df, the node attribute will have the number of duplicates as suffix
  • if two nodes have the same attribute, the attribute will be considered equal and their individual values will be in the same column

Parameters:

Name Type Description Default
graph Graph

The graph you want to save/analyse

required
include_defaults bool

The class adds betweenness, degree and centrality as default metrics for all nodes. You can deactivate this behaviour by setting this to False (default True)

True
round_floats_to int

The class rounds every float down to 4 decimal points by default. This guarantees that cells won't grow to big, making it hard to analyse the data. Increase this value for more details

4
max_level int | None

If your nodes hold some nested structure (dict of dicts) this value defines how 'deep' the level of unpacking goes. The unpacked values will become their own columns. If set to None, all values will be unpacked (default = None)

None

custom_calculations: dict[str, pd.Series] = field(init=False, default_factory=dict) class-attribute instance-attribute

dataframe: pd.DataFrame = field(init=False) class-attribute instance-attribute

graph: nx.Graph instance-attribute

include_defaults: bool = True class-attribute instance-attribute

max_level: int = None class-attribute instance-attribute

round_floats_to: int = 4 class-attribute instance-attribute

__create_dataframe()

Creates a dataframe view of a graph.

Every Node has its own row (index) and every attribute its own column.

If not all Nodes have the same attributes, 'None' will be set as placeholder value.

__ensure_uniqueness(col_name)

Ensures that no node attribute overrides a metric column.

Warns the user, if an attribute is named the same as a metric.

Parameters:

Name Type Description Default
col_name str

Essentially the node attribute that needs to be checked.

required

Returns:

Type Description
str

A unique name for the attribute.

__flatten_dict(flat_data, parent_key='', sep='.', level=0)

Flattens a nested dictionary up to a specified max depth.

If a dictionary is encountered at max depth, it is replaced with "PLACEHOLDER".

Parameters:

Name Type Description Default
flat_data dict

The dictionary to flatten.

required
parent_key str

The base key for nested keys.

''
sep str

Separator used for flattened keys.

'.'
level int

Current recursion depth.

0

Returns:

Type Description

A flattened dictionary.

__post_init__()

add_custom_metrics(custom_metrics)

Allows you to add custom graph metrics by passing a dictionary of metric names and functions that operate on the graph.

Custom metrics will override default metrics if they are named the same.

Examples:

def calculate_clustering(graph):
    return nx.clustering(graph)

# Note how some values must be wrapped in a dictionary first,
# else pandas will read them as NaN
def calculate_shortest_path_length(graph):
    return dict(nx.shortest_path_length(graph))

custom_metrics = {
    'Clustering': calculate_clustering,
    'Shortest path length': calculate_shortest_path_length,
    'Closeness': lambda graph: nx.closeness_centrality(graph)
}

GraphDescriptor(graph=mygraph).add_custom_metrics(custom_metrics)

Parameters:

Name Type Description Default
custom_metrics dict[str, callable]

A dictionary where keys are metric names and values are functions accepting a NetworkX graph and return a dictionary of node-based metric values (otherwise values in dataframe might be NaN).

required

Returns:

Type Description
GraphDescriptor

self

get_dataframe()

write_file(save_path, *, excel_engine='openpyxl')

Saves the dataframe at the given location in the provided format.

The saving format will be determined dynamically based on the path suffix

Parameters:

Name Type Description Default
save_path str | Path

the saving location (and format)

required
excel_engine str

the type of engine you want to use for saving the file in xlsx-format. Uses 'openpyxl' as default. 'openpyxl' must be installed in order to work correctly

'openpyxl'

Returns:

Type Description
GraphDescriptor

self

GraphElement

set_alphas(arg)

Set alpha based on argument.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

set_colors(arg)

Set the colors of the displayed nodes.

Parameters:

Name Type Description Default
arg GenericMapping | str

Colors to set. String values will be mapped onto all nodes.

required

Returns:

Type Description

self

set_sizes(arg)

Set size of labels for nodes as pt sizes.

Parameters:

Name Type Description Default
arg GenericMapping | int | float

Font size for text labels

required

Returns: self

set_transparency(arg)

Set transparency based on argument.

This is the same as set_alphas(). Alpha is the value for the transparency of a color.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

GraphImporter(identifier=None, *, token=None, api='https://dss-wdc.wiso.uni-hamburg.de/api', insecure=False, timeout=60, params=None)

Bases: WDC

Class enabling graph imports from the WDC server.

Parameters:

Name Type Description Default
identifier int | None

Identifier of the network data. This is a numeric ID identifying the graph. Unintuitively, this differs from the corresponding text search data, if you need both data types.

None
token str | None

Token for authorization.

None
api str

API address to send request to. Leave this as is.

'https://dss-wdc.wiso.uni-hamburg.de/api'
insecure bool

Hide warning regarding missing https.

False
timeout int

Set the timeout to the server. Increase this if you request large networks.

60
params dict | None

These are additional keyword arguments passed onto the API endpoint. See https://dss-wdc.wiso.uni-hamburg.de/#_complex_datatypes_for_the_api_requests for further assistance.

None

api = api[:-1] if api.endswith('/') else api instance-attribute

endpoint = 'domaingraph' class-attribute instance-attribute

identifier = identifier instance-attribute

params = params if params else {} instance-attribute

session = requests.Session() instance-attribute

timeout = timeout instance-attribute

token property writable

Get the password token.

get_edges()

Get the edges of the selected graph.

Args:

Returns:

Type Description

List of edges, containing a 3-tuple of the structure (from,

to, weight).

get_graph(graph_type=nx.DiGraph)

Get the full graph containing nodes and edges.

Parameters:

Name Type Description Default
graph_type

Type of graph to create. (Default value = nx.DiGraph)

DiGraph

Returns:

Type Description

Graph from API

get_nodes()

Get the nodes of the selected graph.

Args:

Returns:

Type Description

List of nodes, containing a 2-tuple of the structure (ID,

{additional_data_dict}).

list_available_graphs()

List available graphs for the current token.

Returns:

Type Description
list[dict]

List of dicts containing the graphs with metadata.

GraphKey(mapping, label=None, graph_element=None)

Class to create graph key objects like colorbars or legends.

label = label instance-attribute

mapping = mapping instance-attribute

shape = None instance-attribute

create_legend()

GraphKeyGenerator()

Base class for generating graph keys.

Graph keys contain both colorbars and legends as a MatplotLib figure.

colorbars: list = [] instance-attribute

keys: list = [] instance-attribute

legends: list = [] instance-attribute

add_graph_key(graph_key)

draw_keys()

Determine the figure axis and draw the graph keys.

Colorbars have their own axis while 3 legends are drawn on 1 axis.

place_colorbar(colorbar, ax)

Generates then places the given colorbar on the given axis.

The colorbar is generated based on its mapping and the fallback is appended to the bottom of the generated colorbar axis.

Parameters:

Name Type Description Default
colorbar GraphKey

colorbar to be placed

required
ax _AxesBase

column axis in which to place the legend

required

place_legend(legend, ax, index)

Places legend on the given axis based on the index. Limit of 3 legends per axis, fills axis from top to bottom.

Parameters:

Name Type Description Default
legend GraphKey

Legend to be placed

required
ax _AxesBase

column axis in which to place the legend

required
index int

index of the legend for determining placement within the given ax

required

sort_keys()

Add the graph keys to either self.colorbars or self.legends based on the attributes of the graph keys.

GraphvizLayouter

Bases: Layouter

Create layouts using graphviz as backend.

This is rather complicated to install the proper dependencies for. Not for the faint of heart.

__str__()

create_layout(graph, seed=None, pos=None, prog='fdp', additional_args='', **kwargs)

name() staticmethod

read_from_file(filename, **kwargs)

Reads position from JSON file under filepath.

The following structure for the JSON is expected, where each key contains an array of length 2 containing the coordinates. Coordinates should be in the range [-1,1]:

{
    "domain1": [-0.1467271130230262, 0.25512246449304427],
    "domain2": [-0.3683594304205127, 0.34942480334119136],
}

This structure is generated through dsstools.Layouter().write_to_file().

Parameters:

Name Type Description Default
filename Union[Path]

Path to file to be read.

required
**kwargs
{}

Returns:

Type Description
dict

Dictionary of nodes and positions.

read_from_graph(graph, pos_name=('x', 'y'))

Read positions from node attributes in the graph.

This is relevant when importing from Pajek or GEXF files where the positions are already set with another tool. Imported values are normalized onto [-1,1] in all directions.

Parameters:

Name Type Description Default
graph Graph

Graph object including the node attributes.

required
pos_name tuple

Node attribute names to look for. These depend on the imported file format.

('x', 'y')

Returns:

Type Description
dict

Dictionary of positions per Node.

read_or_create_layout(filepath, graph, seed=None, overwrite=False, **kwargs)

Read positions from file. If non-existant create pos and write to file.

Parameters:

Name Type Description Default
filepath Union[str, Path]

Filename to read positions from

required
graph Graph

Graph object to update

required
seed Optional[int]

Seed to use for the layout.

None
overwrite bool

Overwrite existing file (default False)

False

Returns:

Type Description
dict

Dictionary of positions per Node. Will return an empty dict if creation

dict

failed.

write_to_file(positions, path)

ImageCollection(iterable=None)

Bases: list

Class for exporting multiple ImageGenerators in one go.

__setitem__(id, item)

append(item)

Add new ImageGenerator to ImageCollection.

Parameters:

Name Type Description Default
item ImageGenerator

Item to append to list.

required

Returns:

create_flipbook(path, **kwargs)

"Creates a flipbook as PPTX or PDF depending on file ending.

For the specific valid keyword arguments see create_flipbook_pdf or create_flipbook_pptx which this a wrapper for.

Parameters:

Name Type Description Default
path Path | str

Path to save flipbook to. File ending decides on the internal file format.

required

Returns:

Type Description

Either a PDF or PPTX object.

create_flipbook_pdf(path)

Create PDF containing all ImageGenerators.

Parameters:

Name Type Description Default
path Path | str

Path to save PDF to.

required

Returns:

Type Description
PdfPages

Generated PDF object.

create_flipbook_pptx(path, titles=None, left=Cm(4), top=Cm(-5.3), height=Cm(25))

Create PPTX containing all ImageGenerators.

Parameters:

Name Type Description Default
path Path | str

Path to save file to.

required
titles Optional[list]

Titles to give each slide. (Default value = None)

None
left Length

Left offset of the image on the slide, starting from upper left. (Default value = Cm(4))

Cm(4)
top Length

Top offset of the image on the slide, starting from upper left. (Default value = Cm(-5.3))

Cm(-5.3)
height Length

Height of the image. By default uses a sensible default. If you change this, you might have to adapt the left and top arguments as well. (Default value = Cm(25))

Cm(25)

Returns:

Type Description
Presentation

Generated PPTX object.

create_multiple_in_one(fig, path, dpi=200)

extend(other)

Extend existing ImageCollection with another one.

Parameters:

Name Type Description Default
other Iterable[ImageGenerator]

Another ImageCollection to extend with.

required

Returns:

insert(id, item)

Insert an item at a specific spot.

Parameters:

Name Type Description Default
id

Spot to insert at.

required
item ImageGenerator

The item to insert.

required

Returns:

Type Description

The updated ImageCollection.

ImageGenerator(graph)

Base class for setting up image generation.

axis = None instance-attribute

axlimit = 1.05 instance-attribute

canvas_height = 10 instance-attribute

canvas_right = None instance-attribute

canvas_width = 10 instance-attribute

continous_cmap = mpl.colormaps['viridis'] instance-attribute

description = Description() instance-attribute

dpi = 200 instance-attribute

edges = Edges() instance-attribute

graph = graph instance-attribute

graph_keys = GraphKeyGenerator() instance-attribute

img_dir = Path('.') instance-attribute

nodes = Nodes(Labels()) instance-attribute

qualitative_cmap = mpl.colormaps['tab10'] instance-attribute

change_graph(graph)

Set the graph attribute.

Parameters:

Name Type Description Default
graph

A NetworkX graph object.

required

Returns:

Type Description

self

deepcopy()

Create deep copy of the object.

This is the same as calling copy.deepcopy() on the object

draw()

draw_description()

Draw description below the image according to the settings.

draw_edges()

Draw edges according to the settings.

draw_labels()

Draw labels based on values.

draw_nodes()

Draw nodes according to the settings.

set_axis(axis)

Set an existing matplotlib axis object as ImageGenerator object.

Parameters:

Name Type Description Default
axis

Matplotlib axis

required

Returns:

Type Description

self

set_graph_key(mapping, graph_element=None, label=None)

Set a graph key.

Graph keys consist of legends and colorbars.

Parameters:

Name Type Description Default
mapping Sequential | Qualitative

mapping strategy for the basis of the graph key

required
graph_element Literal['edges', 'nodes', None]

the graph element to be modeled, necessary for generating text box legends.

None
label str | None

optional label to be applied to the graph key

None

Returns:

Type Description

self

set_graph_keys(graph_keys)

Set the graph_keys setting.

Graph keys consist of legends and colorbars.

Parameters:

Name Type Description Default
graph_keys GraphKeyGenerator

GraphKeyGenerator object which contains all legends and/or colorbars to be drawn

required

Returns:

Type Description

self

write_file(path)

Write file to disk on the given path.

Will also close the internal figure object.

Parameters:

Name Type Description Default
path str | Path

str | Path: Path to write the file to.

required

Returns:

Type Description

self

write_json(path)

Write the graph data to a json file.

This is following nx.node_link_data format as shown here: https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.json_graph.node_link_data.html

Parameters:

Name Type Description Default
path str | Path

saving location and name for the json-file

required

Returns:

Type Description
ImageGenerator

self

IncorrectKeyword

Bases: Exception

KamadaKawaiLayouter

Bases: Layouter

Create layouts using Kamada-Kawai as backend.

__str__()

create_layout(graph, seed=None, pos=None, **kwargs)

name() staticmethod

read_from_file(filename, **kwargs)

Reads position from JSON file under filepath.

The following structure for the JSON is expected, where each key contains an array of length 2 containing the coordinates. Coordinates should be in the range [-1,1]:

{
    "domain1": [-0.1467271130230262, 0.25512246449304427],
    "domain2": [-0.3683594304205127, 0.34942480334119136],
}

This structure is generated through dsstools.Layouter().write_to_file().

Parameters:

Name Type Description Default
filename Union[Path]

Path to file to be read.

required
**kwargs
{}

Returns:

Type Description
dict

Dictionary of nodes and positions.

read_from_graph(graph, pos_name=('x', 'y'))

Read positions from node attributes in the graph.

This is relevant when importing from Pajek or GEXF files where the positions are already set with another tool. Imported values are normalized onto [-1,1] in all directions.

Parameters:

Name Type Description Default
graph Graph

Graph object including the node attributes.

required
pos_name tuple

Node attribute names to look for. These depend on the imported file format.

('x', 'y')

Returns:

Type Description
dict

Dictionary of positions per Node.

read_or_create_layout(filepath, graph, seed=None, overwrite=False, **kwargs)

Read positions from file. If non-existant create pos and write to file.

Parameters:

Name Type Description Default
filepath Union[str, Path]

Filename to read positions from

required
graph Graph

Graph object to update

required
seed Optional[int]

Seed to use for the layout.

None
overwrite bool

Overwrite existing file (default False)

False

Returns:

Type Description
dict

Dictionary of positions per Node. Will return an empty dict if creation

dict

failed.

write_to_file(positions, path)

Labels()

Bases: GraphElement

alphas = None instance-attribute

colors = None instance-attribute

font_families = None instance-attribute

labels = [] instance-attribute

show_labels = False instance-attribute

sizes = None instance-attribute

set_alphas(arg)

Set alpha based on argument.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

set_colors(arg)

Set the colors of the displayed nodes.

Parameters:

Name Type Description Default
arg GenericMapping | str

Colors to set. String values will be mapped onto all nodes.

required

Returns:

Type Description

self

set_font_families(arg)

Set font family for all labels if single font is passed,.

Allows for multiple fonts to be set if an array of fonts is passed, allows for fonts to be individually set for labels based on the given node if a dictionary is passed.

Parameters:

Name Type Description Default
arg GenericMapping | str

Font family

required

set_labels(arg)

Set labels for nodes based on arguments.

Parameters:

Name Type Description Default
arg dict

node identifier as the integer and the label as the string

required

set_sizes(arg)

Set size of labels for nodes as pt sizes.

Parameters:

Name Type Description Default
arg GenericMapping | int | float

Font size for text labels

required

Returns: self

set_transparency(arg)

Set transparency based on argument.

This is the same as set_alphas(). Alpha is the value for the transparency of a color.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

Layouter

__str__()

create_layout(graph, seed=None, pos=None, **kwargs)

Create position dictionary according to set layout engine. Default layout is Spring.

Parameters:

Name Type Description Default
graph Graph

Graph object

required
seed Optional[int]

Set a default seed (default None)

None
pos Optional[dict]

Pre-populated positions

None

Returns:

Type Description
dict

Dictionary of node and positions.

name() staticmethod

read_from_file(filename, **kwargs)

Reads position from JSON file under filepath.

The following structure for the JSON is expected, where each key contains an array of length 2 containing the coordinates. Coordinates should be in the range [-1,1]:

{
    "domain1": [-0.1467271130230262, 0.25512246449304427],
    "domain2": [-0.3683594304205127, 0.34942480334119136],
}

This structure is generated through dsstools.Layouter().write_to_file().

Parameters:

Name Type Description Default
filename Union[Path]

Path to file to be read.

required
**kwargs
{}

Returns:

Type Description
dict

Dictionary of nodes and positions.

read_from_graph(graph, pos_name=('x', 'y'))

Read positions from node attributes in the graph.

This is relevant when importing from Pajek or GEXF files where the positions are already set with another tool. Imported values are normalized onto [-1,1] in all directions.

Parameters:

Name Type Description Default
graph Graph

Graph object including the node attributes.

required
pos_name tuple

Node attribute names to look for. These depend on the imported file format.

('x', 'y')

Returns:

Type Description
dict

Dictionary of positions per Node.

read_or_create_layout(filepath, graph, seed=None, overwrite=False, **kwargs)

Read positions from file. If non-existant create pos and write to file.

Parameters:

Name Type Description Default
filepath Union[str, Path]

Filename to read positions from

required
graph Graph

Graph object to update

required
seed Optional[int]

Seed to use for the layout.

None
overwrite bool

Overwrite existing file (default False)

False

Returns:

Type Description
dict

Dictionary of positions per Node. Will return an empty dict if creation

dict

failed.

write_to_file(positions, path)

Nodes(labels=None)

Bases: GraphElement

alphas = fixed(1) instance-attribute

colors = fixed('lightgrey') instance-attribute

contour_colors = fixed('white') instance-attribute

contour_sizes = fixed(0.5) instance-attribute

labels = labels instance-attribute

positions = None instance-attribute

sizes = fixed(50) instance-attribute

set_alphas(arg)

Set alpha based on argument.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

set_colors(arg)

Set the colors of the displayed nodes.

Parameters:

Name Type Description Default
arg GenericMapping | str

Colors to set. String values will be mapped onto all nodes.

required

Returns:

Type Description

self

set_contour_colors(arg)

Set the contour color of the displayed nodes.

Contour means the outer border of a node.

Parameters:

Name Type Description Default
arg GenericMapping | str

Colors to set. String values will be mapped onto all node contours. Additional options contain "node" and "edge" to automatically select the corresponding color.

required

Returns:

Type Description

self

set_contour_sizes(arg)

Set the contour sizes of the displayed nodes.

Contour means the outer border of a node.

Parameters:

Name Type Description Default
arg GenericMapping | float | int

Sizes to set. Integer values will be mapped onto all node contours. String values will get mapped onto the corresponding data arrays or closeness values per node.

required

Returns:

Type Description

self

set_positions(pos)

Set the node positions as a dict or list.

When using a file, use set_position_file() instead.

Parameters:

Name Type Description Default
pos dict | list | Path | str

dict | list: Array of positions. Dicts should be keyed by node ID.

required

Returns:

Type Description

self

set_sizes(arg)

Set the sizes of the displayed nodes.

Parameters:

Name Type Description Default
arg GenericMapping | float | int

Sizes to set. Scalar values will be mapped onto all nodes. String values

required

Returns:

Type Description

self

set_transparency(arg)

Set transparency based on argument.

This is the same as set_alphas(). Alpha is the value for the transparency of a color.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

NumpyEncoder

Bases: JSONEncoder

Json encoder for numpy arrays.

default(o)

Percentile(supplier)

Bases: Supplier

Class for filtering an existing supplier by the percentile.

Parameters:

Name Type Description Default
supplier Supplier

Supplier whose values will be evaluated based on the percentile range, must contain numeric values

required

fallback = None instance-attribute

percentile_method = 'linear' instance-attribute

percentile_range = None instance-attribute

supplier = supplier instance-attribute

PositionKeyCoder()

Provides methods to consistently en- & decode position data for nx.Graphs

float_prefix = '__float__' instance-attribute

int_prefix = '__int__' instance-attribute

str_prefix = '' instance-attribute

decode_typed_keys(dct)

object_hook for json.load() that recognises the prefixes set by the encoder.

Parameters:

Name Type Description Default
dct dict

A dictionary from a json file.

required

Returns:

Type Description
dict

A decoded version of the dictionary respectively node position data

encode_typed_keys(obj)

Recursively unpacks json-formats that we use for saving positions.

Use this to prepare a positions file for json.dumps. This ensures that integers can be set as keys respectively nodes.

Parameters:

Name Type Description Default
obj any

The json-content that needs to be encoded.

required

Returns:

Type Description
dict | list

A valid format for the json.dump()-function.

Qualitative(supplier, mapping=None, *, cmap=None)

Bases: GenericMapping

Class for assigning a value based on attributes in graph elements.

Assign a value to the items in the graph element attributes.

Parameters:

Name Type Description Default
supplier Supplier

Supplier The raw graph attributes on which to base the mapping.

required
mapping Mapping | None

The mapping with the keys as the values of the graph element attribute and the values as the desired visual values.

None

QUALITATIVE_COLORMAPS = ['Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'Set1', 'Set2', 'Set3', 'tab10', 'tab20', 'tab20b', 'tab20c'] class-attribute instance-attribute

colormap = cmap instance-attribute

fallback = mapping[None] instance-attribute

mapping = mapping instance-attribute

supplier = supplier instance-attribute

__repr__()

__str__()

get(graph_element, graph)

Get the values in the graph element based on the attribute mapping.

Parameters:

Name Type Description Default
graph_element NxElementView

NxElementView

required
graph

nx.Graph

required

Returns:

Type Description
dict

Dict with the keys as the index of the graph element

dict

and the values as the desired visual values based on the attribute

RawDictionary(dictionary)

Bases: Supplier

Assign a dictionary to the graph elements.

This can be a subgraph or a normal graph. The keys must match at least one of the node ids or edge tuples. Example: The dictionaries returned by NetworkX calculations on graphs return suitable dictionaries

Parameters:

Name Type Description Default
dictionary dict

key is GraphElement, value is value to be supplied

required

dictionary = dictionary instance-attribute

fallback = None instance-attribute

Sequential(supplier, scale='lin', *, out_range, in_range=None, fallback=None, post_processor=lambda x: x, cmap=None)

Bases: GenericMapping, ABC

Abstract class for returning visual attributes on various scales.

Parameters:

Name Type Description Default
*
required
in_range tuple | None

tuple of the min and max values of the set before normalization

None
out_range tuple[Numeric, Numeric]

tuple of the min and max values of the final scale

required
fallback StrNumeric | None

the visual value for none

None
supplier Supplier

Supplier The raw graph attributes on which to base the mapping, must have numeric values

required
post_processor Callable

function applied to the values after scaling (e.g. colormapping, conversion)

lambda x: x
cmap Colormap

optional colormap, used where raw colormap is necessary instead of postprocessor

None

colormap = cmap instance-attribute

fallback = parse_color(fallback) instance-attribute

in_range = in_range instance-attribute

out_range = out_range instance-attribute

post_processor: Callable[[float], object] = post_processor instance-attribute

scale = lambda x: x instance-attribute

supplier = supplier instance-attribute

__repr__()

__set_in_range(values)

Calculate the in range by determining the min/max of the graph element.

The value is returned after applying the appropriate scalable attribute strategy.

Parameters:

Name Type Description Default
values

list of values from the graph element

required

__str__()

also(preprocessor)

Helper function for easier usage of mapping by users.

Parameters:

Name Type Description Default
preprocessor Callable[[float], float]

function that may need to be called before scaling

required

Returns:

Type Description

Sequential

get(graph_element, graph)

Get the values by normalizing supplier values and applying a scale.

If all values are the same, the average of the out_range will be applied to all values.

Parameters:

Name Type Description Default
graph_element NxElementView

NxElementView

required
graph Graph

nx Graph

required

Returns:

Type Description
dict

dict with the keys as the index of the graph element

dict

and the values as the desired visual values based on scale

StructuralAttribute(keyword=None, *, reverse=False, alt_nx_calculation=None)

Bases: Supplier

Class for providing structural graph element values.

These are attributes based on the graph structure and need to be calculated.

Parameters:

Name Type Description Default
keyword Optional[str]

the keyword for the calculation

None
reverse bool

determines if the calculation should use inverted edge values if True, default False

False

ATTRIBUTES = ['indegree', 'outdegree', 'degree', 'centrality', 'betweenness', 'closeness'] class-attribute instance-attribute

__keyword: str | None = keyword.lower() if keyword is not None else None instance-attribute

alt_nx_calculation = alt_nx_calculation if keyword is None else None instance-attribute

fallback = None instance-attribute

reverse = reverse instance-attribute

Supplier()

Bases: ABC

Basic interface for supplying graph element values.

fallback = None instance-attribute

TextSearch(identifier=None, *, token=None, api='https://dss-wdc.wiso.uni-hamburg.de/api', insecure=False, timeout=60, params=None)

Bases: WDC

Class allowing to search for keywords in the WDC API.

Parameters:

Name Type Description Default
identifier str | None

Identifier of the network data. For the text search this is normally in the form 20121227_intermediaries (a date string with a short text appended).

None
token str | None

Token for authorization.

None
api str

API address to send request to. Leave this as is.

'https://dss-wdc.wiso.uni-hamburg.de/api'
insecure bool

Hide warning regarding missing https.

False
timeout int

Set the timeout to the server. Increase this if you request large networks.

60

Returns:

Type Description

Instance of TextSearch

api = api[:-1] if api.endswith('/') else api instance-attribute

endpoint = 'snapshot' class-attribute instance-attribute

identifier = identifier instance-attribute

params = params if params else {} instance-attribute

session = requests.Session() instance-attribute

timeout = timeout instance-attribute

token property writable

Get the password token.

_(domains, terms)

__query_domains(domains, query_term, missing_domains=None, key=None)

get_missing(domains)

Compare given domains and hits on the API and return the difference.

Parameters:

Name Type Description Default
domains Iterable

Domains to compare against

required
domains Iterable

Iterable:

required

Returns:

Type Description
set

Difference of domains

get_snapshots(name_tag='')

List available snapshots by name.

Parameters:

Name Type Description Default
name_tag

Filter for name tag. (Default value = "")

''

Returns:

Type Description
set

Available snapshot ids.

search(domains, terms)

Searches the given keywords across a Graph or iterator.

For using a complex, already existing Solr query it is recommended to use the following structure: {"some-key": "your-query OR some-other-query"} (see the docstring for the terms parameter).

Parameters:

Name Type Description Default
domains Graph | List

Set of identifiers to search in. Both graphs and lists are allowed.

required
terms List[str] | List[List[str]] | dict[str, str] | dict[str, List[str]] | Series | DataFrame

Terms to search for. Various structures are allowed. Lists of lists combine all response values into one response, e.g. [[A,B],[C,D]] means A and B counts will be combined into one value. This is helpful for using synonyms. In legends the first value in the inner list sets the "key". dict[str, List[str]] follow the same structure of combining the values in the list but give the result the selected key.

required

Returns:

Type Description

Updated graph or dict containing the responses.

WDC(*, token=None, api='https://dss-wdc.wiso.uni-hamburg.de/api', insecure=False, timeout=120, params=None)

Internal class for interacting with the WDC API.

Parameters:

Name Type Description Default
token str | None

Token for authorization.

None
api str

API address to send request to. Leave this as is.

'https://dss-wdc.wiso.uni-hamburg.de/api'
insecure bool

Hide warning regarding missing https.

False
timeout int

Set the timeout to the server. Increase this if you request large networks.

120
params dict | None

These are additional keyword arguments passed onto the API endpoint. See https://dss-wdc.wiso.uni-hamburg.de/#_complex_datatypes_for_the_api_requests for further assistance.

None

api = api[:-1] if api.endswith('/') else api instance-attribute

endpoint = 'UNSET' class-attribute instance-attribute

identifier = None instance-attribute

params = params if params else {} instance-attribute

session = requests.Session() instance-attribute

timeout = timeout instance-attribute

token property writable

Get the password token.

WDCGeneric(identifier=None, *, token=None, api='https://dss-wdc.wiso.uni-hamburg.de/api', insecure=False, timeout=60, params=None)

Bases: WDC

Public class for calling WDC-API directly.

If you want to interact with WDC directly, you need to use this WDCGeneric class since it allows you to parse an identifier for snapshots (or use subclasses of WDC such as TextSearch or GraphImporter directly). This is for advanced users only.

Note: WDCGeneric was split up with v0.10.0 into WDC and WDCGeneric This exists to ensure compatibility with older versions. This also helps with type hinting across different versions.

Parameters:

Name Type Description Default
identifier str | int

Identifier of the network data.

None
token str | None

Token for authorization.

None
api str

API address to send request to. Leave this as is.

'https://dss-wdc.wiso.uni-hamburg.de/api'
insecure bool

Hide warning regarding missing https.

False
timeout int

Set the timeout to the server. Increase this if you request large networks.

60
params dict | None

These are additional keyword arguments passed onto the API endpoint. See https://dss-wdc.wiso.uni-hamburg.de/#_complex_datatypes_for_the_api_requests for further assistance.

None

api = api[:-1] if api.endswith('/') else api instance-attribute

endpoint = 'UNSET' class-attribute instance-attribute

identifier = identifier instance-attribute

params = params if params else {} instance-attribute

session = requests.Session() instance-attribute

timeout = timeout instance-attribute

token property writable

Get the password token.

clean_graph_data_attributes(graph)

Replace empty strings in data attributes with np.nan.

comfort_fixed(attribute)

comfort_numeric_fixed(attribute)

comfort_str_fixed(attribute)

ensure_file_format(path, *, default_format, format_filter=None)

Ensures that the provided path has a saving format and its parents exist.

If the saving format is not in the defined format_filter, a TypeError will be raised.

Parameters:

Name Type Description Default
path str | Path

the path that needs to be validated

required
default_format str

the format a programmer can set that will be used as default, if no format was provided. Leading periods can be included.

required
format_filter set | None

this specifies a filter of accepted formats. Leading periods can be included. Adding the default parameter isn't mandatory, since it is added dynamically but should be best practice.

None

Returns:

Type Description
tuple[Path, str]

the filepath and format (without leading dot) as an 2-Tuple

filter_ego_network(base, ego_network=None, attr=None, ego=None, incoming=None, outgoing=None, mutuals=None)

Create a mapping for a network around one selected network.

Filters an ego-network, so that the base mapping can be applied to all graphelements not in the ego-network and the new_values can be applied to all graphelements that are part of the ego-network.

Parameters:

Name Type Description Default
base GenericMapping | str | int | float

mapping that is to be applied to graphelements not in the ego-network

required
ego_network GenericMapping | str | int | float | None

mapping that is to be applied to graphelements inside the ego-network, dictionary mapping must have "ego", and either "neighbors" for nodes or "ego_edge" as keys

None
attr EgoNetwork | str | int | None

either the name of the node around which ego-network is constructed or an EgoNetwork Supplier

None
ego GenericMapping | str | int | float | None

mapping only applied to the ego

None
incoming GenericMapping | str | int | float | None

mapping only applied to incoming, overrides new_mapping

None
outgoing GenericMapping | str | int | float | None

mapping only applied to outgoing, overrides new_mapping

None
mutuals GenericMapping | str | int | float | None

mapping only applied both incoming and outgoing, overrides new_mapping

None

filtering(base, new_values, attr, predicate)

Filter the visual values of the graph element based on the attr values.

If the predicate is True the value of the new Mapping is applied, keyed by node. If the predicate is False the value of the base Mapping is applied, keyed by node. If the base mapping contains a fallback, the nodes with the fallback value will retain that value.

Parameters:

Name Type Description Default
base str | int | float | GenericMapping

the original Mapping whose values will be assigned if the supplier value

required
new_values str | int | float | GenericMapping

the new Mapping whose values will be assigned if the supplier value

required
attr str | Supplier

attribute that provides the values to be evaluated by the predicate

required
predicate

the expression to evaluate the attr values, must return a boolean

required

Returns:

Type Description

Filter Mapping filtered by predicate applied to attr by assigning new_values if

predicate is True and base values if predicate is False.

Examples:

G.add_node("a", rating=3, school_type="uni")
G.add_node("b", rating=7, school_type="college")

rating = sequential("rating", out_range=(12, 36))

new_mapping = fixed(1)

image.nodes.set_sizes(filtering(rating, new_mapping, "school_type", lambda x:x is "uni"))

sizes = a: 1, b: 36
node a has been given the value 1 because the "school_type" is "uni"
node b remains unchanged

fixed(value)

Set a fixed value, that is constant across all items in the chosen graph element.

Parameters:

Name Type Description Default
value

v

required

Returns:

Type Description

FixedValue

Examples:

image.nodes.set_sizes(fixed(75))
# the size of all nodes is now 75

image.edges.set_colors("green")
# the color of all nodes is now green

from_node(node_mapping, source, *, fallback=None)

Assign a value from a node to the edges.

Parameters:

Name Type Description Default
node_mapping GenericMapping

GenericMapping the color should correspond to.

required
source Literal['incoming', 'outgoing', 'matching']

"incoming" uses the value from the incoming node | "outgoing"

required
fallback

the fallback value assigned when incoming and outgoing nodes

None

get_logger(name)

Return logger name.

import_from_data_forge(slug, snapshot, token, domain='dss-graph.wiso.uni-hamburg.de', cache=True, remove_selfloops=True, contract_redirects=False, explicit_include=False)

Import Graph object from dssCode.

Parameters:

Name Type Description Default
slug str

Name slug of the project (see dssCode-Interface)

required
snapshot str

Snapshot hash

required
domain str

The domain for the API call

'dss-graph.wiso.uni-hamburg.de'
cache (bool, Path, str)

Pass the cache directory. Defaults to temporary dir.

True
remove_selfloops bool

Remove edge selfloops.

True
contract_redirects bool

Contract redirecting nodes into one.

False
explicit_include bool

Include only explicitely marked nodes into graph

False

Returns:

Type Description
DiGraph

nx.DiGraph: Graph with the imported data.

list_wdcapi_graphs(token, timeout=60)

List accessible graph identifiers for the given token.

Comfort wrapper around the GraphImporter().list_available_graphs() object which should mirror the common structure of NetworkX.

Parameters:

Name Type Description Default
token str

str: Token to authenticate with.

required
timeout int

int: Timeout in seconds after the request cancels. Leave at default. (Default value = 60)

60

Returns:

percentile(base, new_values, attr, perc_range=None, method='linear')

Create a mapping that modifies an existing mapping by percentage.

Parameters:

Name Type Description Default
base str | int | float | GenericMapping

the Mapping, whose values are assigned to the node if the attr values are inside the perc_range

required
new_values str | int | float | GenericMapping

the Mapping, whose values are assigned to the node if the attr values are outside the perc_range

required
attr str | Supplier

the attribute with which the percentile is calculated, must contain numeric values

required
perc_range tuple

tuple of min and max range for the percentile calculation, must be between 0 and 100

None
method str

str, optional, default "linear" This parameter specifies the method to use for estimating the percentile. There are many different methods, some unique to NumPy. See the notes for explanation. The options sorted by their R type as summarized in the H&F paper [1]_ are:

  1. 'inverted_cdf'
  2. 'averaged_inverted_cdf'
  3. 'closest_observation'
  4. 'interpolated_inverted_cdf'
  5. 'hazen'
  6. 'weibull'
  7. 'linear' (default)
  8. 'median_unbiased'
  9. 'normal_unbiased'

The first three methods are discontinuous. NumPy further defines the following discontinuous variations of the default 'linear' (7.) option:

  • 'lower'
  • 'higher',
  • 'midpoint'
  • 'nearest'
'linear'

Returns:

Type Description

Filter Mapping filtered by percentile of attr assigning values based on new_values

Examples:

G.add_node("a", rating=3)
G.add_node("b", rating=7)

base_mapping = sequential("rating", fallback="orange", cmap="viridis")
new_mapping = fixed("blue")

image.nodes.set_colors(percentile(base_mapping, new_mapping, "degree", perc_range=(0, 50))

qualitative(attr, mapping=None, *, cmap=None)

Use an attribute or colormap as value.

Parameters:

Name Type Description Default
attr str | Code

str name of the category

required
mapping dict | None

dict of category values as the key and desired values as the values

None
cmap

str of a valid colormap or colormap object

None

Returns:

Type Description

Nominal

Examples:

G.add_node("a", pet="dog")
G.add_node("b", pet="cat")

image.nodes.set_colors(qualitative("rating", {"cat": "red", "dog": "green"}))
# color for node "a" is now "green" and "red" for node "b"

image.nodes.set_colors(qualitative("rating", cmap="Pastel1"))
# color for node "a" is now the first color in the Pastel1 colormap and the
# second color for node "b"

read_from_pickle(folder='', timestamp='')

Read cached graph from directory.

Automatically selects the newest instance, except a timestamp is given.

Parameters:

Name Type Description Default
dir (str, Path)

Path to directory to search for pickles. If empty, default to temp dir.

required
timestamp str

timestamp to explicitely select for.

''

Returns:

Type Description
DiGraph

nx.DiGraph: Graph with the imported data.

read_wdcapi(identifier, token, timeout=60, graph_type=nx.DiGraph)

Import a graph from the WDC API.

Use the identifier you select in read_wdcapi().

Comfort wrapper around the GraphImporter().get_graph() object which should mirror the common structure of NetworkX.

Parameters:

Name Type Description Default
identifier str

str: Identifier of the graph.

required
token str

str: Token to authenticate with.

required
timeout int

int: Timeout in seconds after the request cancels. For very large graphs this should be increased. (Default value = 60)

60
graph_type

Type of graph to return. For crawled graphs nx.DiGraph is recommended. (Default value = nx.DiGraph)

DiGraph

Returns:

Type Description

The imported graph.

sequential(attr, scale='lin', out_range=None, in_range=None, fallback=None, cmap=None)

Scale on graph element or structural graph attributes.

Parameters:

Name Type Description Default
*
required
attr str | Supplier

str name of graph element or structural graph attribute to be scaled

required
scale str | Callable

scale on which the values should be assigned

'lin'
out_range

tuple of the min and max values of the final scale

None
in_range

tuple of the min and max values of the set before normalization

None
fallback

a color value or numeric value for None values

None

Returns:

Type Description

Sequential

Examples:

G.add_node("a", rating=3)
G.add_node("b", rating=7)

image.nodes.set_sizes(sequential("degree", "log", out_range=(12, 36), fallback=5))

image.nodes.set_sizes(sequential("rating", linear(), out_range=(12, 36), fallback=5))

image.nodes.set_colors(sequential("rating", fallback="orange", cmap="viridis"))