编程题--swap_node
问题重述
Problem Statement:
Given a linked list, swap the two nodes present at position i and j. The positions are based on 0-based indexing.
Note: You have to swap the nodes and not just the values.
Example: * linked_list = 3 4 5 2 6 1 9 * positions = 3 4 * output = 3 4 5 6 2 1 9
Explanation: * The node at position 3 has the value 2 * The node at position 4 has the value 6 * Swapping these nodes will result in a final order of nodes of 3 4 5 6 2 1 9
python实现
123456789101112131415161718192021 ...
编程题--Even_After_Odd_Nodes.md
问题陈述
Problem Statement:
Given a linked list(单链表) with integer data, arrange the elements in such a manner that all nodes with even numbers are placed after odd numbers. Do not create any new nodes and avoid using any other data structure. The relative order of even and odd elements must not change.
Example: * linked list = 1 2 3 4 5 6 * output = 1 3 5 2 4 6
python实现
1234567891011121314151617181920212223242526272829303132333435363738class Node(object): def __init__(self, value): self.value ...
编程题--Pascal's Triangle(杨辉三角)
问题陈述
Problem Statement:
Find and return the nth row of Pascal's triangle in the form a list. n is 0-based.
For exmaple, if n = 4, then output = [1, 4, 6, 4, 1].
To know more about Pascal's triangle: https://www.mathsisfun.com/pascals-triangle.html
python实现
12345678910111213141516171819def next_row(current_row): row = [1] for i in range(len(current_row)): if i == len(current_row)-1: break row.append(current_row[i] + current_row[i+1]) row.append(1) return ...
编程题--skip_i_delete_j
问题重述
Problem Statement:
You are given the head of a linked list and two integers, i and j. You have to retain the first i nodes and then delete the next j nodes. Continue doing so until the end of the linked list.
Example: * linked-list = 1 2 3 4 5 6 7 8 9 10 11 12 * i = 2 * j = 3 * Output = 1 2 6 7 11 12
python实现
123456789101112131415161718192021222324# LinkedList Node class for your referenceclass Node: def __init__(self, data): self.data = data self.next = Nonedef skip_i ...
编程题--最大和的子数组
问题陈述
Problem Statement:
You have been given an array containg numbers. Find and return the largest sum in a contiguous subarray within the input array.
Example 1: * arr= [1, 2, 3, -4, 6] * The largest sum is 8, which is the sum of all elements of the array.
Example 2: * arr = [1, 2, -5, -4, 1, 6] * The largest sum is 7, which is the sum of the last two elements of the array.
python实现
12345678910111213141516171819202122232425def find_first_positive(input_list): for i in range(len(input_l ...
编程题--重复数
问题陈述
Problem Statement:
You have been given an array of length = n. The array contains integers from 0 to n - 2. Each number in the array is present exactly once except for one number which is present twice. Find and return this duplicate number present in the array
Example: * arr = [0, 2, 3, 1, 4, 5, 3] * output = 3 (because 3 is present twice)
The expected time complexity for this problem is O(n) and the expected space-complexity is O(1).
python实现
12345678910def duplicate_number(arr): ...
编程题--加一
问题陈述
Problem Statement:
You are given a non-negative number in the form of list elements. For example, the number 123 would be provided as arr = [1, 2, 3]. Add one to the number and return the output in the form of a new list.
Example 1: * input = [1, 2, 3] * output = [1, 2, 4]
Example 2: * input = [9, 9, 9] * output = [1, 0, 0, 0]
Challenge:
One way to solve this problem is to convert the input array into a number and then add one to it. For example, if we have input = [1, 2, 3], you coul ...
天池-新闻推荐之排序模型与模型融合
排序模型
在推荐系统中,召回之后就是对召回后的候选集进行排序。
通过召回的操作进行了问题规模的缩减,对于每个用户选择出了N篇文章作为候选集,并基于召回的候选集构建了与用户历史相关的特征,以及用户本身的属性、文章本身的属性特征、用户与文章之间的特征。
排序阶段可以使用机器学习的方法:通过机器学习模型对构造好的特征进行学习,然后对测试集进行预测,得到测试集中每个候选集用户点击的概率,返回点击概率最大的Topk个文章作为最终的结果。
排序模型的选择
排序阶段选择了三个比较有代表性的排序模型,它们分别是:
LGB排序模型
LGB分类模型
深度学习的分类模型DIN
排序模型们的结果出来后,使用模型集成的方法选择出较好的结果。
经典的模型集成方法
输出结果的加权融合
Stacking(将模型的输出结果在使用一个简单模型进行预测)
导入包
1234567891011import numpy as npimport pandas as pdimport picklefrom tqdm import tqdmimport gc, osimport timefr ...
天池-新闻推荐之特征工程
特征工程
原始数据的直接特征
文章的自身特征, category_id表示这文章的类型, created_at_ts表示文章建立的时间, 这个关系着文章的时效性, words_count是文章的字数, 一般字数太长我们不太喜欢点击, 也不排除有人就喜欢读长文。
文章的内容embedding特征, 这个召回的时候用过, 这里可以选择使用, 也可以选择不用, 也可以尝试其他类型的embedding特征, 比如W2V等。
用户的设备特征信息
上面这些直接可以用的特征, 待做完特征工程之后, 直接就可以根据article_id或者是user_id把这些特征加入进去。 但是我们需要先基于召回的结果,构造一些特征,然后制作标签,形成一个监督学习的数据集。
构造监督数据集的思路
根据召回结果, 我们会得到一个{user_id: [可能点击的文章列表]}形式的字典。 那么我们就可以对于每个用户, 每篇可能点击的文章构造一个监督测试集, 比如对于用户user1, 假设得到的他的召回列表{user1: [item1, item2, item3]}, 我们就可以得到三行数据(u ...
Albert实战
Sentence pair classification
Albert预训练模型下载
GitHub上下载预训练参数和Albert源码。
数据的读取
修改源码中run_classifier.py中的DataProcessor为自己任务的数据处理类。
数据格式
label | sentence1(句子1) | sentence2(句子2) |
!---! | !---! | !---! |
|0或1|句子1|句子2|
修改输出
在run_classifier.py中 在main函数中,do_predict部分: 原输出为概率值,二分类修改为0/1。
Sentence (and sentence-pair) classification tasks
Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR. Next, download the BERT-Base checkpo ...